[PASC] PascalCoin: Induplicatable NFT

Quote from: vibajajo64 on October 19, 2016, 11:45:58 PM

OCminer has won 574 out of the last 1026 blocks which is 56%. Will this change if Vorksholk comes out with the new miner?

Anyone know what is the cost of the type of computer OCminer is using or one that would win blocks? Thanks

It certainly will. No promises, but you'll probably see an card like the RX 480 hit 500 MH/s+, and "ocminer" probably has ~10,000 MH/s or ~20 RX 480s (or a more optimized/faster miner with fewer/different cards, of course). An optimized kernel with BFI and an even more intelligent midstate would certainly do even better, but just grabbing my CUDA code, shoving it in a .CL file, changing the method signature/params, and then (the time consuming part) building the host code should see pretty significant improvements over CUDA. Also for anyone wondering: the OpenCL implementation won't provide any benefit for NVidia cards.

For anyone who's of the technical persuasion wondering why AMD GPUs are so much faster than NVidia GPUs for SHA-256 and similar hashing functions:
NVidia focuses more on architecture complexity, AMD focuses more on raw compute performance
SHA-256 uses 32-bit int rotation. AMD GPUs have a single instruction for this, NVidia GPUs have to do (a << r) | (a >> (32 - r)) which is two shifts and an 'or' instruction. So oversimplification: AMD chips are about 3 times better at 32-bit integer rotation.

As we saw with Siacoin (Blake2b-algo), NVidia GPUs performed amazingly well (still unable to match cost-per-dollar with AMD, but they were still incredibly competitive). Why was this? I was able to implement three of the four shifts used by Blake2b as byte_perm, since the rotations were rotations by amounts that were divisible by 8 (so the shifts could be realized as selecting certain bytes in a certain order, rather than actually shifting bits). In SHA-256, this is not the case (the shifts are by 2, 3, 5, 7, 10, 11, 13, 17, 18, 19, 22, and 25, none of which are divisible by 8...). Also, blake2b uses 64-bit rotations which AMD doesn't have a single instruction for (although their 32-bit instructions still offer some advantage).

GFLOPs isn't really a useful metric, since we're not concerned with floating point performance necessarily. GFLOPs is a nice tool to compare relative performance of multiple chips of the same architecture, but between architectures, for the purposes of mining, it isn't terribly useful.

Also, RX 480 arrived, so fingers crossed Saturday gives me enough time to throw this together.