bytemaster (OP)
|
|
October 19, 2013, 03:58:11 PM |
|
Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.
I don't wanna pollute your thread with many small replies. If you answer what hashing-power we're looking at in mhashes/s I could make a more thorough reply with numbers. Also, is 32 bit output for birthdayhash right? That would only take storing 77000 entries for 50% chance of finding a collision. There is a trade off between latency and rate of finding a collision. I am actually using something closer to 36 bits. Beyond 36 bits and the time it takes for a Core i7 to find a single collision would be too long. Remember the collision rate is only the rate at which lottery tickets are found. Difficulty is the rate at which blocks could be found. So this algorithm could be tuned to require no more than 4 GB of RAM after which it produces a collision for every single BirthdayHash... however, there are still benefits for storing and comparing all matches and not just single collisions because every potential match is a lottery ticket. Therefore the number of bits in the collision is almost irrelevant beyond a certain point.
|
|
|
|
bytemaster (OP)
|
|
October 19, 2013, 04:00:10 PM |
|
Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.
I don't wanna pollute your thread with many small replies. If you answer what hashing-power we're looking at in mhashes/s I could make a more thorough reply with numbers. Also, is 32 bit output for birthdayhash right? That would only take storing 77000 entries for 50% chance of finding a collision. My Scrypt-like hashing function runs on a single core at about 1 k/hash with relatively few optimizations.
|
|
|
|
bytemaster (OP)
|
|
October 19, 2013, 04:02:21 PM |
|
FYI... I do not consider the algorithm broken if software enhancements can improve performance. I fully expect that with all of the smart minds out there that the algorithmic performance on commodity hardware will improve by several orders of magnitude.
|
|
|
|
PassTheAmmo
Newbie
Offline
Activity: 23
Merit: 0
|
|
October 19, 2013, 04:11:54 PM |
|
My Scrypt-like hashing function runs on a single core at about 1 k/hash with relatively few optimizations.
Ok. At that speed nodes in a DHT would only need a few kb/s of internet bandwidth to reach the full potential of the DHT. How much gain that would give would depend on the difficulty of finding a match. I think it would be cool if you made it even slower (if only by repeating it enough times) to make it suitable for one huge DHT. I don't know if it's possible, but for other reasons than the ones you brought up. Anyway, good luck, I think that you could have a great proof of work function with some modifications. I'm outta here!
|
|
|
|
bytemaster (OP)
|
|
October 19, 2013, 04:28:43 PM |
|
My Scrypt-like hashing function runs on a single core at about 1 k/hash with relatively few optimizations.
Ok. At that speed nodes in a DHT would only need a few kb/s of internet bandwidth to reach the full potential of the DHT. How much gain that would give would depend on the difficulty of finding a match. I think it would be cool if you made it even slower (if only by repeating it enough times) to make it suitable for one huge DHT. I don't know if it's possible, but for other reasons than the ones you brought up. Anyway, good luck, I think that you could have a great proof of work function with some modifications. I'm outta here! It all comes down to a tradeoff with validation time. DHT like KAD would require much more bandwidth than the 'raw data' being published.
|
|
|
|
FreeTrade
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
October 19, 2013, 04:54:36 PM |
|
The "+" operator is concatenation rather than addition so the math you are doing is not valid.
Ah - of course! Thank you. BirthdayHash is only memory intensive enough to limit the cache available to the GPU. Scrypt-like hash functions have some benefit at fighting ASIC designs as long as their validation time does not exceed 1ms. By combining Scrypt-like functions with Momentum you get the best of all currently known techniques.
So if we assume 1ms to generate a BirthdayHash, this gives us 1000x60x5 hashes for a single core during the target time for a block - 300,000 hashes - very little memory required. The CPU and Memory will be idling, waiting for more completed hashes from the GPU - you want it the other way round, a super fast hash to fill up the memory fast.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
bytemaster (OP)
|
|
October 19, 2013, 05:19:59 PM |
|
I added some more instrumentation, the speed is closer to 50 k/hash per second for the Birthday Hash algorithm.
|
|
|
|
FreeTrade
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
October 19, 2013, 05:39:34 PM |
|
I added some more instrumentation, the speed is closer to 50 k/hash per second for the Birthday Hash algorithm.
Ok, so 50,000 x 60 x 5 x 36 (bits) - 540000000 bits - So writing 64MB to memory over the course of 5 minutes . . . . not so taxing. I think we'd want to be maxing out the memory bus by creating hash collision candidates such that we're writing in the region of 1GB/Sec to memory.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
bytemaster (OP)
|
|
October 19, 2013, 05:43:23 PM |
|
I added some more instrumentation, the speed is closer to 50 k/hash per second for the Birthday Hash algorithm.
Ok, so 50,000 x 60 x 5 x 36 (bits) - 540000000 bits - So writing 64MB to memory over the course of 5 minutes . . . . not so taxing. I think we'd want to be maxing out the memory bus by creating hash collision candidates such that we're writing in the region of 1GB/Sec to memory. In my test benchmark, running on a single CPU core I utilize about 2 GB in 5 minutes... though I am not using the most efficient storage possible. For starters, I am using a hash table with 64bit (birthday) 64 bit value (nonce) so that is already about 4x as much memory as you calculated. Now I suspect that some of that 2 GB is from the hash table not being 'full'. There is obviously a Memory/CPU/synchronization tradeoff that can be made with the hash table. Going to 8 cores on an i7 will probably use at least 8GB of RAM which is consumer grade right now.
|
|
|
|
FreeTrade
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
October 19, 2013, 06:02:32 PM |
|
about 2 GB in 5 minutes...
Hmm - I think you're going to want something that uses 1GB in 1 second, run 300 times. If you're not keeping the memory bus or hashtable running at maximum capacity, you can just add extra cores until it is up to full capacity. That surely is what we're trying to avoid.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
bytemaster (OP)
|
|
October 19, 2013, 06:07:11 PM |
|
about 2 GB in 5 minutes...
Hmm - I think you're going to want something that uses 1GB in 1 second, run 300 times. If you're not keeping the memory bus or hashtable running at maximum capacity, you can just add extra cores until it is up to full capacity. That surely is what we're trying to avoid. Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.
|
|
|
|
FreeTrade
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
October 19, 2013, 06:22:22 PM |
|
Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.
Okay, we better inform the author of the white paper who wrote, >For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform. Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
bytemaster (OP)
|
|
October 19, 2013, 06:42:06 PM |
|
Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.
Okay, we better inform the author of the white paper who wrote, >For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform. Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization. LOL, what was the author who wrote that thinking. My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.
|
|
|
|
prophetx
Legendary
Offline
Activity: 1666
Merit: 1010
he who has the gold makes the rules
|
|
October 20, 2013, 02:53:29 AM |
|
Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.
Okay, we better inform the author of the white paper who wrote, >For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform. Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization. LOL, what was the author who wrote that thinking. My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it. You may wish to reconsider foiling efforts that create the most efficient market outcomes and use of resources. Imagine a world where all transactions and communications are conducted through p2p networks. This world needs a massive amount of computing power, networks which are costlier to operate lose out to networks which can be operated cost efficiently. Also keep in mind that GPU farms keep out botnets.
|
|
|
|
bytemaster (OP)
|
|
October 20, 2013, 03:09:18 AM |
|
Good luck filling 1 GB with cryptographically secure data in 1 second. In debug mode I cannot even memset 1 GB in 1 second.
Okay, we better inform the author of the white paper who wrote, >For example, simply populating 1 Gigabyte of memory with crypto-graphically secure pseudorandom data can take a second to perform. Anyway, the exact speed is not important - the point is still the same, the slower the hash, the more room for paralellization. LOL, what was the author who wrote that thinking. My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it. You may wish to reconsider foiling efforts that create the most efficient market outcomes and use of resources. Imagine a world where all transactions and communications are conducted through p2p networks. This world needs a massive amount of computing power, networks which are costlier to operate lose out to networks which can be operated cost efficiently. Also keep in mind that GPU farms keep out botnets. Efficiency and decentralization... are generally at odds with each other until we can come up with an alternative to proof of work. Ripple and their consensus approach seems like a promising alternative.
|
|
|
|
FreeTrade
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
October 20, 2013, 01:09:44 PM |
|
My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm.
Ok, and why is that important to you. I'm probably thinking in narrower terms (just for an altcoin) so I don't see the virtue in this - can you explain more please. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.
It sounds like you're employing two methods to make the coin GPU resistant - contention for access to the memory/hashtable and requiring GPU cache. I think this actually doubles the 'attack' space . . . suggest focusing on one, and I recommend the first one (y'now, your new brilliant idea), as the second has already seen a lot of effort.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
bytemaster (OP)
|
|
October 20, 2013, 02:22:04 PM |
|
My aim was to create the fastest hash function I could that met one key criteria... unpredictable run length. The unpredictable run length means an unpredictable number of iterations of some fast algorithm.
Ok, and why is that important to you. I'm probably thinking in narrower terms (just for an altcoin) so I don't see the virtue in this - can you explain more please. The other criteria was to use just enough memory to foil GPU parallelization by consuming a lot of GPU cache. The point of Scrypt was to make parallel operation difficult and its primary weakness is that validation time suffers as you scale it.
It sounds like you're employing two methods to make the coin GPU resistant - contention for access to the memory/hashtable and requiring GPU cache. I think this actually doubles the 'attack' space . . . suggest focusing on one, and I recommend the first one (y'now, your new brilliant idea), as the second has already seen a lot of effort. Unpredictable run length foils GPU threads by having many of the data-parallel threads idle waiting on the longer-running threads. Fortunately I defined the white paper in generic terms so we can use what ever hash method you like. I will point out though that there are means of avoiding synchronization on the hash table through pipelining. The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage.
|
|
|
|
smolen
|
|
October 20, 2013, 02:49:10 PM |
|
The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage.
Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...
|
Of course I gave you bad advice. Good one is way out of your price range.
|
|
|
bytemaster (OP)
|
|
October 20, 2013, 02:51:10 PM |
|
The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage.
Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function... This is why I hobble the GPU with AES and unpredictable run time. It destroys the massive parallelism by forcing most of the threads to idle.
|
|
|
|
FreeTrade
Legendary
Offline
Activity: 1470
Merit: 1030
|
|
October 20, 2013, 04:54:36 PM |
|
Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...
So . . . some numbers . . . let's say SHA-256 because a lot of numbers are readily available - that's a 256 bit hash result, 32 bytes. If we wanted to a gb/sec from that, we want 1024x1024x1024/32 hashes . . . . or 32 M/h per second. There are faster hashes, and actually 0.5/gig per second would probably more than suffice.
|
RepNet is a reputational social network blockchain for uncensored Twitter/Reddit style discussion. 10% Interest On All Balances. 100% Distributed to Users and Developers.
|
|
|
|