bytemaster


October 21, 2013, 03:08:41 AM 

Once the RAM is full everything is a read so it'll be perfectly parallel with some negligible constant time overhead.
Improvements against scrypt for the birthdayhash will help a bit, but like you said any tricks generally translate fine into ASICs.
For your example, since the mac pro has 16 times more RAM and hashing power it will be exactly 16^2 or 256 times better. Well worth the 50x cost. Since this is quadratic it will just get worse when you get into 128GB servers.
I like the elegance of having the RAM be the main resource for hashing, but I think we have a ways to go before making it practical. I'll keep thinking about it.
Nathaniel
The problem with memory is that setups will then turn to server motherboards, some of which can hold an insane amount of RAM. http://www.newegg.com/Product/Product.aspx?Item=N82E16813182346 This motherboard has 24 Ram Slots for $490.00 For best value, we can go with 16gb memory sticks at $150.00 each also available on newegg. http://www.newegg.com/Product/Product.aspx?Item=N82E16820226412With 24 of these, you would have 384gb of Ram. Granted, your cost would be ~$6000 when you finally put the system together. However, with 384gb of ram, this system is 144 times as good as your Mac Pro for around 1.2x the cost. People would just have systems like this lying around. Also, that doesn't even count one of Intel's Server Motherboards which can hold theoretically up to 1.5TB of ram (48 slots at 32gb each). However, at current prices your looking at 30 grand just in ram for that. However, that being said, I would support a memory based coin. I kind of like the idea of having my house covered with servers each having hundreds of gb of ram versus the idea of my house covered with Asics . We have solved this problem by limiting the nonce search space so that the maximum amount of useful memory for gaining nonlinear performance increases is around 4 GB. Assuming you have exhausted all possible nonces you would have to clear the memory and start over. Now it is a race to see how fast you can fill 4 GB of memory. While an ASIC could certainly do it quickly, memory bus speeds would still be the limiting factor.








Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.



td services


October 21, 2013, 05:43:15 AM 

So I think the fundamental idea is sound, I don't make arguments against it, and I don't see convincing ones here. We do differ in the details.
My current thoughts for my preferred implementation are as follows 
1. Require between 256MB  1GB memory >To allow for use on very low end equiment
2. Limit A and B such that all possible matches are exhausted in less than 1 second  H would need to change every second and could include new transactions. >So that faster computers are not disproportionately advantaged over slower ones.
3. Change Hash(H + A + B) < TargetDifficulty to Hash(H + A + B + MemoryHash(A) + MemoryHash(B)) < TargetDifficulty >To prevent GPUs evaluating this step before adding B to memory.
I agree with this configuration. It encourages the use of low power, low cost, fanless boards for always on use which are also useful for personal clouds, self hosting, and community networks. I have had a hard time finding low cost ARM boards with more than .5 gb per core. The 4 gb configuration favors desktops and laptops which use a lot more power and are prone to cooling problems over time due to dust accumulation from always on use. Failed power supplies from overheating with dust clogged vents are by far the most common hardware failure I've seen in 12 years of servicing computers as a business.




FreeTrade
Legendary
Offline
Activity: 1064


October 21, 2013, 10:37:14 AM 

The problem with your p(n) and q(n) curves are that they measure the probability of at least one match. What we really want to measure is the expected number of matches since due to a difficulty setting having more than one match is important. This is what gives a relationship of number of hashes/second * amount of RAM. Even if we only increase the hashes/second by a factor of two we still get twice as many matches giving linear scaling in exactly the same way as just using whatever birthdayhash algorithm you choose by itself.
Actually I've given it some more thought, and now I'm finding this very convincing. Let's say 365 possible birthdays and a CPU can generate 10 per second, and process a maximum of 10 per second into memory. Add a GPU that can generate 100 per second, reduce the search space by 90%, discard 90 of the birthdays that aren't in the smaller search space, and send the 10 to the CPU to process into a search space of 36 days. So the q(n) might only be half as effective as the p(n), but hashing can be added *infinity and we're getting linear scaling. We need to prevent easy reduction of the search space.




bytemaster


October 21, 2013, 02:09:52 PM 

The problem with your p(n) and q(n) curves are that they measure the probability of at least one match. What we really want to measure is the expected number of matches since due to a difficulty setting having more than one match is important. This is what gives a relationship of number of hashes/second * amount of RAM. Even if we only increase the hashes/second by a factor of two we still get twice as many matches giving linear scaling in exactly the same way as just using whatever birthdayhash algorithm you choose by itself.
Actually I've given it some more thought, and now I'm finding this very convincing. Let's say 365 possible birthdays and a CPU can generate 10 per second, and process a maximum of 10 per second into memory. Add a GPU that can generate 100 per second, reduce the search space by 90%, discard 90 of the birthdays that aren't in the smaller search space, and send the 10 to the CPU to process into a search space of 36 days. So the q(n) might only be half as effective as the p(n), but hashing can be added *infinity and we're getting linear scaling. We need to prevent easy reduction of the search space. You want hashspeed to be linear once the target memory has been consumed. As the number of items in memory grows the performance of the algorithm approaches the linear speed of the birthday hash alone; however, it does not eliminate the need for the memory to be present. Putting the momentum propertyaside, having a linear hash function that requires multiple GB of RAM to perform at peak performance means that the cost of going parallel with the hash function is an order of magnitude higher. But lets see if we can keep things nonlinear. Suppose your BirthdayHash has 32 bits, but your nonce search space only has 16 bits. You will run out of nonces long before you are able to generate enough birthdays to get very high on the q(n) curve. I would guess you could tune the parameters in the following way: Limit the nonce search space such that the expected number of birthday matches is 1. If this were the actual birthday problem, we would set the nonce nonce search space to ~50 and number of days to 365. 99% of the time you would find 1 match, though sometimes you might find 2 or 0. Once all nonces of been searched, you have to start over again. Using these parameters you would require 50x as much memory to find the target as to verify the target. If you were to grow both the nonce search space and the bits of the BirthdayHash you would maintain the same principle while requiring any amount of RAM. The performance of the hash function would now be a matter of how quickly one could fill the RAM. This is of course linear with respect to processing power. The goal is to require as much RAM as possible per unit of processing power. All we are really achieving with this hash function is to increase the number and type of transistors necessary. Some goals: 1) get the fastest possible BirthdayHash algorithm because this will generate data via a CPU at a rate that nears the memory throughput. Any ASIC that could perform the HASH faster would be stuck waiting on the memory bus. Once the memory bus becomes the bottleneck in performance we have won.




FreeTrade
Legendary
Offline
Activity: 1064


October 21, 2013, 04:31:54 PM 

But lets see if we can keep things nonlinear. Suppose your BirthdayHash has 32 bits, but your nonce search space only has 16 bits. You will run out of nonces long before you are able to generate enough birthdays to get very high on the q(n) curve. I would guess you could tune the parameters in the following way:
Limit the nonce search space such that the expected number of birthday matches is 1. If this were the actual birthday problem, we would set the nonce nonce search space to ~50 and number of days to 365. 99% of the time you would find 1 match, though sometimes you might find 2 or 0.
Once all nonces of been searched, you have to start over again.
Okay, yes, by limiting the nonce search space, and keeping the memory bus full, I don't think adding hashing power will be helpful. Thanks for the explanation.




bytemaster


October 21, 2013, 04:59:54 PM 

The question becomes what is the fastest possible cryptographically secure hash that can be performed by a CPU? It doesn't have to be the most secure of cryptographic hash functions... I am thinking that there could probably be some combination of cryptographic hash function extended by a non cryptographically secure hashing algorithm such as city.
For example: You could generate a large number of birthdays per nonce like so:
P = sha512( H + nonce ) CITY256( P + 1 ) + CITY256(P+2) + ...
If you generated 16KB of nonces per sha512()... you would get a massive speed up and I suspect suffer little if any from the potential of attacks on CITY.




Etlase2


October 22, 2013, 12:15:41 AM 

We have solved this problem by limiting the nonce search space so that the maximum amount of useful memory for gaining nonlinear performance increases is around 4 GB. Assuming you have exhausted all possible nonces you would have to clear the memory and start over. Now it is a race to see how fast you can fill 4 GB of memory. While an ASIC could certainly do it quickly, memory bus speeds would still be the limiting factor.
Doesn't this mean that the PoW loses its "momentum" property then, though? I think it's probably for the better anyway, as relying on an unusual mechanic of the PoW algorithm to reduce some gameable property of bitShares is probably a shortsighted idea. Or am I mistaken about the momentum?




bytemaster


October 22, 2013, 04:57:39 AM 

#define MAX_NONCE (1<<26)
std::vector< std::pair<uint32_t,uint32_t> > momentum_search( pow_seed_type head ) { std::unordered_map<uint64_t,uint32_t> found; found.reserve( MAX_NONCE ); std::vector< std::pair<uint32_t,uint32_t> > results;
for( uint32_t i = 0; i < MAX_NONCE; ) { fc::sha512::encoder enc; enc.write( (char*)&i, sizeof(i) ); enc.write( (char*)&head, sizeof(head) );
auto result = enc.result(); for( uint32_t x = 0; x < 8; ++x ) { uint64_t birthday = result._hash[x] >> 14; uint32_t nonce = i+x; auto itr = found.find( birthday ); if( itr != found.end() ) { results.push_back( std::make_pair( itr>second, nonce ) ); } else { found[birthday] = nonce; } } i += 8; } return results; }
bool momentum_verify( pow_seed_type head, uint32_t a, uint32_t b ) { uint32_t ia = (a / 8) * 8; fc::sha512::encoder enca; enca.write( (char*)&ia, sizeof(ia) ); enca.write( (char*)&head, sizeof(head) ); auto ar = enca.result();
uint32_t ib = (b / 8) * 8; fc::sha512::encoder encb; encb.write( (char*)&ib, sizeof(ib) ); encb.write( (char*)&head, sizeof(head) ); auto br = encb.result();
return (ar._hash[a%8]>>14) == (br._hash[b%8]>>14); }
I have put together this code which is not optimized in the least, but I estimate once fully optimized will require 2^26 * 12 = 768 MB of RAM minimum and will run in one or two seconds on a Core i7. The nonce space is limited to 2^26 items and the collision difficulty is 50 bits which on average finds 23 pairs per run. This would require memory bandwidth of 512 MB/sec sustained. Assuming you built an ASIC for SHA512 it would be at most 100x faster given the highest memory bus speeds available. To integrate this with a new blockchain you simply add 2 nonces to the block header, call momentum_verify and then use the existing sha256 difficulty from bitcoin to adjust the difficulty.





smolen


October 22, 2013, 10:10:49 AM 

use the existing sha256 difficulty from bitcoin to adjust the difficulty.
Could existing sha256 ASICs be used to precompute nonce filed in block header?

Of course I gave you bad advice. Good one is way out of your price range.



bytemaster


October 22, 2013, 02:22:18 PM 

use the existing sha256 difficulty from bitcoin to adjust the difficulty.
Could existing sha256 ASICs be used to precompute nonce filed in block header? No. Finding the collision depends upon the block header and everything is based on sha512 for finding collisions.




dna_gym


October 24, 2013, 08:10:52 AM 

I think that Momentum also has TimeMemoryTradeOff (TMTO). There are some birthday attack algorithms, one particular example is Floyd's cycle finding algorithm, which can be used to find hash collisions using memory of O(1).
Fortunately, because the nonce space is limited to 2^26 bits, and the collision difficulty is 50 bits, the "memoryless" attacker is forced to try about 16 million (2^(5026)) times harder to find one 50bitcollision than he was given full nonce size of 50 bits.




FreeTrade
Legendary
Offline
Activity: 1064


October 24, 2013, 10:45:48 AM 

The Birthday Hash will never find a 256 bit collision, so you would truncate it to 64 bit of which perhaps 3438 bits will be used. Saturating the PCI bus is not best approach here.
Or you might use the hash to describe 8 32 bit birthdays, or 7 36 bit birthdays . . . modifying it as you wanted to modify the difficulty. Not a problem. Hmmm  I'm reconsidering this. While the 256 bit hash is cryptologically secure . . . can we say the same thing about just the first 32 bits of it? Could an attacker shape the data to generate a hash where he controlled the first 32 bits of it?




webr3


October 27, 2013, 02:38:22 AM 

so currently you have: 1) Given a block of data D, calculate H = Hash(D). 2) Find nonce A and nonce B such that BirthdayHash(A +H) == BirthdayHash( B+H) 3) If Hash(H + A + B) < TargetDifficulty then a result has been found, otherwise keep searching. Step 2 never needs to be run, BirthdayHash is never executed. All you do is increment A and run Hash(H + A + A) until a result has been found that is under the TargetDifficulty. As soon as you've found this, then B == A and of course BirthdayHash(A+H) == BirthdayHash(B+H). To fix this, all you need to do is create a rule such that A and B must be different. Rather than just assuming it and not hard coding it or building it in to the algo that A != B. Seems quite a critical thing to miss from a white paper. You'll also note that you didn't check for this in your own validation function bool momentum_verify( pow_seed_type head, uint32_t a, uint32_t b ) { uint32_t ia = (a / 8) * 8; fc::sha512::encoder enca; enca.write( (char*)&ia, sizeof(ia) ); enca.write( (char*)&head, sizeof(head) ); auto ar = enca.result();
uint32_t ib = (b / 8) * 8; fc::sha512::encoder encb; encb.write( (char*)&ib, sizeof(ib) ); encb.write( (char*)&head, sizeof(head) ); auto br = encb.result();
return (ar._hash[a%8]>>14) == (br._hash[b%8]>>14); }
Very surprised to see it missing given your comment to me that: What you have just stipulated was implied by using variables A and B and simply pointing out that A cannot be equal to B is IMPLIED and also OBVIOUS. It does not change the algorithm or the idea. So obvious in fact that I suspect everyone else reading it also assumed that requirement.
Sorry, you don't get 30 BTC because I am doing NOTHING DIFFERENT and your feedback provided 0 value.
Hopefully it is of some value, and you correct the code and add it in to the paper, else of course the whole approach is pointless is A can equal B




bytemaster


October 27, 2013, 02:45:00 AM 

Webr3, thank you for the feedback. It would certainly add clarity that such a thing must be checked. Other things that we are checking that are not referenced in the white paper is that A and B are both below some MAX NONCE value. I will update the white paper with additional clarity and enhancements regarding the ideal parameters for various goals.




webr3


October 27, 2013, 02:46:22 AM 

Webr3, thank you for the feedback. It would certainly add clarity that such a thing must be checked. Other things that we are checking that are not referenced in the white paper is that A and B are both below some MAX NONCE value. I will update the white paper with additional clarity and enhancements regarding the ideal parameters for various goals.
Yes, do remember to update that verify function too, sometimes it's the most obvious things that catch you out further down the line




bytemaster


October 27, 2013, 02:57:52 AM 

Yep... there are many checks I left out of the verify method 1) I need to check that nonce < MAX 2) check that a != b Your feedback is useful from an open source development point of view and of course I cannot ever prove that these checks were on my TODO list. And I am glad you brought it up because we should never assume others didn't miss something. Look at what recently happened to Namecoin! Anyway my plan is to offer smaller bounties for implementation bugs once I have completed my implementation. I will send you a tip for pointing it out because after thinking about it your feedback did provide nonzero value. Thanks for looking at things.




webr3


October 27, 2013, 03:01:22 AM 

Yep... there are many checks I left out of the verify method 1) I need to check that nonce < MAX 2) check that a != b Aye, you've got me wishing I'd just kept my mouth shut and waited to see if it was in the live code, could have mined a bounty out if not Regardless, it's a good PoW system and will be considering using it, or a variation of it with a customized salsa20 in the future.




bytemaster


October 27, 2013, 03:05:52 AM 

note I added to my last reply.. I certainly don't want you wishing you didn't reply.




FreeTrade
Legendary
Offline
Activity: 1064


October 27, 2013, 11:22:01 AM 

This assumes ruling out the trivial A=B, of course.
Yeah, I think we're on top of that one




