Bitcoin Forum
May 06, 2024, 12:41:51 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 »  All
  Print  
Author Topic: Momentum Proof-of-Work Bounty - 30 BTC [PAID]  (Read 14104 times)
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 05:17:05 PM
 #41

Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...

So . . . some numbers . . . let's say SHA-256 because a lot of numbers are readily available - that's a 256 bit hash result, 32 bytes. If we wanted to a gb/sec from that, we want 1024x1024x1024/32 hashes . . . . or 32 M/h per second. There are faster hashes, and actually 0.5/gig per second would probably more than suffice.

The Birthday Hash will never find a 256 bit collision, so you would truncate it to 64 bit of which perhaps 34-38 bits will be used.   Saturating the PCI bus is not best approach here.

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
1714956111
Hero Member
*
Offline Offline

Posts: 1714956111

View Profile Personal Message (Offline)

Ignore
1714956111
Reply with quote  #2

1714956111
Report to moderator
1714956111
Hero Member
*
Offline Offline

Posts: 1714956111

View Profile Personal Message (Offline)

Ignore
1714956111
Reply with quote  #2

1714956111
Report to moderator
Unlike traditional banking where clients have only a few account numbers, with Bitcoin people can create an unlimited number of accounts (addresses). This can be used to easily track payments, and it improves anonymity.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714956111
Hero Member
*
Offline Offline

Posts: 1714956111

View Profile Personal Message (Offline)

Ignore
1714956111
Reply with quote  #2

1714956111
Report to moderator
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
October 20, 2013, 05:20:42 PM
 #42

The Birthday Hash will never find a 256 bit collision, so you would truncate it to 64 bit of which perhaps 34-38 bits will be used.   Saturating the PCI bus is not best approach here.

Or you might use the hash to describe 8 32 bit birthdays, or 7 36 bit birthdays . . . modifying it as you wanted to modify the difficulty. Not a problem.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 05:26:24 PM
 #43

The Birthday Hash will never find a 256 bit collision, so you would truncate it to 64 bit of which perhaps 34-38 bits will be used.   Saturating the PCI bus is not best approach here.

Or you might use the hash to describe 8 32 bit birthdays, or 7 36 bit birthdays . . . modifying it as you wanted to modify the difficulty. Not a problem.

I suppose if your goal was to accelerate birthday generation then this would be a very GPU friendly POW.

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
October 20, 2013, 05:35:25 PM
 #44

I suppose if your goal was to accelerate birthday generation then this would be a very GPU friendly POW.

Oh I disagree.

With traditional proof-of-work systems likes SHA256 or Scrypt it was possible to gain
performance through parallelism alone. However, regardless of how efficiently an ASIC
can run the individual birthday hash steps in parallel, your use of memory must scale to
store the result of every parallel run or you will lose an algorithmic advantage that
cannot be overcome by increasing levels of parallelism


Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
Nathaniel B
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
October 20, 2013, 05:41:55 PM
 #45


I had considered the potential for such a design.  DHT will not perform well enough, you would need 100BT ethernet or perhaps gigabit before the hash table could be filled quickly enough.  From what I can tell a SUPER COMPUTER with a ton of RAM and high-speed interconnects would give you the best performance.   Here is the difference, such a super computer could be built from commodity parts.     


Well, if the hash is so fast you could just collaborate by splitting the birthday space in N ranges and each computer just throws away any result that isn't in its range. This trades hashing speed for memory, so works with rather than against your pow.

What range do you suggest for N and how fast is the hash?
This should be possible to test.

Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper.  You save on memory but the cost is far greater.   Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.



The problem with your p(n) and q(n) curves are that they measure the probability of at least one match. What we really want to measure is the expected number of matches since due to a difficulty setting having more than one match is important. This is what gives a relationship of number of hashes/second * amount of RAM. Even if we only increase the hashes/second by a factor of two we still get twice as many matches giving linear scaling in exactly the same way as just using whatever birthdayhash algorithm you choose by itself.
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 05:49:17 PM
 #46

I suppose if your goal was to accelerate birthday generation then this would be a very GPU friendly POW.

Oh I disagree.

With traditional proof-of-work systems likes SHA256 or Scrypt it was possible to gain
performance through parallelism alone. However, regardless of how efficiently an ASIC
can run the individual birthday hash steps in parallel, your use of memory must scale to
store the result of every parallel run or you will lose an algorithmic advantage that
cannot be overcome by increasing levels of parallelism


Yes, that is true.  All it means is that the ideal mining rig is now outfitted with MAX ram and the highest end GPU that is able to fill it.  

It would significantly increase the memory requirements for ASIC designs, that is certainly true.    Ok, I think you might be on to something here.  The limit on performance will now be based upon how much RAM you can put on a given machine.    I would like to point out that once you run out of RAM you still gain performance by adding more processing power, it just grows linearly instead of non-linerally. 


https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
October 20, 2013, 05:57:01 PM
 #47


Yes, that is true.  All it means is that the ideal mining rig is now outfitted with MAX ram and the highest end GPU that is able to fill it.  

It would significantly increase the memory requirements for ASIC designs, that is certainly true.    Ok, I think you might be on to something here.  The limit on performance will now be based upon how much RAM you can put on a given machine.    I would like to point out that once you run out of RAM you still gain performance by adding more processing power, it just grows linearly instead of non-linerally. 


So there's always going to be an ideal mining rig but the best we can hope for is that its cost scales linearly with its performance. The memory bus is the bottleneck that is hardest to expand - that's why we want to keep it full as much as possible.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 06:00:31 PM
 #48

Quote
The problem with your p(n) and q(n) curves are that they measure the probability of at least one match. What we really want to measure is the expected number of matches since due to a difficulty setting having more than one match is important. This is what gives a relationship of number of hashes/second * amount of RAM. Even if we only increase the hashes/second by a factor of two we still get twice as many matches giving linear scaling in exactly the same way as just using whatever birthdayhash algorithm you choose by itself.

This is a very insightful comment.  I would like to address that these curves show the efficiency of the algorithm as memory increases.  As the number of stored birthdays approach the number of days per year the efficiency approaches that of the Birthday Hash alone which if it were SHA256 would be little different than Bitcoin.  

What this also implies is that if you use 32 bit birthday hash with SHA256() it would be nearly identical to Bitcoin except that it would require a constant 4 GB of memory to be filled before reaching full speed and you would have extra overhead associated with accessing that memory.   Because 4 GB of memory is cheap, this would not be enough to stop ASICs or effect any real change aside from some slight momentum required to get up to speed after changing the block.

If you use a 36 bit birthday hash then it would require 256GB constant memory to get up to speed.  
If you use a 256 bit birthday hash then you may never find a solution.

Conclusion:  
ASIC or not, the momentum POW does make it more difficult to game markets based on block chains.  
The number of bits in the birthday hash may want to be one factor in adjusting difficulty.

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 06:05:14 PM
 #49

Quote
As the number of stored birthdays approach the number of days per year the efficiency approaches that of the Birthday Hash alone which if it were SHA256 would be little different than Bitcoin. 

I am actually going to back off on this a bit, it would still be much better than bitcoin because checking the RAM is a point of centralization that would significantly complicate the degree to which the algorithm could be made parallel. 

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 06:17:17 PM
 #50

Your idea is useless because there can always be ASICs that are better than generic hardware at specific tasks. Which is why they are called ASICs.

You might be confused about the state of art of certain classes of integrated circuits. I'll try to make this simple, as related to cryptography.

Speeding up the processing of a memory-hard algorithm requires two things: (1) faster memory (clock speed), and (2) more memory (parallelism and minimization of data set movement). Consumer RAM ("Memory" as it were) is nearly already AT the state of the art in terms of speed (DDR3 and soon DDR4). Adding MORE of it simply means adding more sticks of RAM. From a customized machine standpoint, this means adding more of the DDR SDRAM ICs (the constituent chips that make up a PC's RAM modules) and optimizing their data transports.

The argument that I recall the Scrypt paper making (and that I see the Momentum paper makes [2nd paragraph of introduction]) is that it becomes economically infeasible to design a system that outperforms a standard PC by such a margin that makes it worthwhile to develop said system. In other words, it costs too much to design a specialized system for a memory-hard algorithm when desktop PCs typically already perform really well with readily-available, relatively cheap hardware.

Q. Is someone going to develop an ASIC to handle the specific functions that the CPU in a desktop PC handles? Are they going to be able to do it faster and cheaper than Intel's latest tech?
A. Probably not, developing for the latest transistor size would be difficult and immensely costly.

Q. Is someone going to develop a memory ASIC that performs better than consumer RAM?
A. Probably not, consumer RAM is already near the state of the art.

Q. If someone DID actually build the above two ASICs and also built an efficient platform around them on such a scale that the (alt-)coin mining market would actually support, would it be economically feasible (that is, would you get at least 100% ROI)?
A. Most certainly not. The licensing alone would kill the hopes for feasibility.

I hope this makes sense. If I have misspoken or misinformed anywhere in the above, please do not hesitate to correct me.

The argument that I don't see made often enough is when economic feasibility isn't the goal, what happens? For instance, if a government wanted to build a system to take over and crumble a cryptocurrency network and [fiat] money was no object -- then yes, they could probably develop such a system. But any economist who knows about the engineering involved (quite a rare subset I would imagine!) would just tell this government to buy as much consumer PC hardware as they could rather than develop new tech. "Why re-invent the wheel?"

p.s. BombaUcigasa, you're coming off very trollish. You ask good questions, but try to lighten up a bit on your accusatory tone and people might take you more seriously. Thanks.

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
October 20, 2013, 06:59:52 PM
 #51

So I think the fundamental idea is sound, I don't make arguments against it, and I don't see convincing ones here. We do differ in the details.

My current thoughts for my preferred implementation are as follows -

1. Require between 256MB - 1GB memory
->To allow for use on very low end equiment

2. Limit A and B such that all possible matches are exhausted in less than 1 second - H would need to change every second and could include new transactions.
->So that faster computers are not disproportionately advantaged over slower ones.

3. Change
Hash(H + A + B) < TargetDifficulty
to
Hash(H + A + B + MemoryHash(A) + MemoryHash(B)) < TargetDifficulty
->To prevent GPUs evaluating this step before adding B to memory.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 07:17:14 PM
 #52

So I think the fundamental idea is sound, I don't make arguments against it, and I don't see convincing ones here. We do differ in the details.

My current thoughts for my preferred implementation are as follows -

1. Require between 256MB - 1GB memory
->To allow for use on very low end equiment

2. Limit A and B such that all possible matches are exhausted in less than 1 second - H would need to change every second and could include new transactions.
->So that faster computers are not disproportionately advantaged over slower ones.

3. Change
Hash(H + A + B) < TargetDifficulty
to
Hash(H + A + B + MemoryHash(A) + MemoryHash(B)) < TargetDifficulty
->To prevent GPUs evaluating this step before adding B to memory.

This is relatively good, except that I want the momentum property to prevent gaming of the BitShares market.  

By limiting the nonce search space you eliminate super computers from dominating (good) while still making the cost of an ASIC dominated by RAM (good).  

I think specifying time to exhaust all nonces is pointless because it depends upon implementation and hardware.  I would suggest using 32 bit nonces and letting the time be what it will.

I would make sure that the BirthdayHash(X) method used AES instructions which are hardware optimized on x86 and ARM and thus leave as little room as possible for ASIC enhancement.
 

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 07:21:06 PM
 #53

By targeting the memory and CPU abilities of the common computer as the means of reaching maximum performance you have eliminated ASIC and super computers.   Due to moores law, the memory requirements should be targeted 1-2 years in the future and then have some means to adjust over time.

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
October 20, 2013, 07:38:53 PM
 #54

This is relatively good, except that I want the momentum property to prevent gaming of the BitShares market.  

That's a feature for you. It's a bug for me.

I think specifying time to exhaust all nonces is pointless because it depends upon implementation and hardware.  I would suggest using 32 bit nonces and letting the time be what it will.

For an unlimited search on a single H, faster computers have a disproportionate advantage because they climb the probability graph much quicker. Limiting the time allowed on a single H, slower computers spend the same proportion of time in the higher part of the probability graph as faster computers.

I would make sure that the BirthdayHash(X) method used AES instructions which are hardware optimized on x86 and ARM and thus leave as little room as possible for ASIC enhancement.

Don't know much about that - pretty sure I wouldn't be using SHA-256 though.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 20, 2013, 07:46:33 PM
 #55

This is relatively good, except that I want the momentum property to prevent gaming of the BitShares market.  

That's a feature for you. It's a bug for me.

I think specifying time to exhaust all nonces is pointless because it depends upon implementation and hardware.  I would suggest using 32 bit nonces and letting the time be what it will.

For an unlimited search on a single H, faster computers have a disproportionate advantage because they climb the probability graph much quicker. Limiting the time allowed on a single H, slower computers spend the same proportion of time in the higher part of the probability graph as faster computers.

I would make sure that the BirthdayHash(X) method used AES instructions which are hardware optimized on x86 and ARM and thus leave as little room as possible for ASIC enhancement.

Don't know much about that - pretty sure I wouldn't be using SHA-256 though.

I think we are in basic agreement... I didn't say unlimited search, I said 32 bit limited search... though I suppose you could tie the bit limit to the memory limit so would recommend 29 bits minimum.   This would allow you to fill 4 GB of memory worth of unique nonces.   Within a year or two every cell phone will have 4 GB of memory and 64 bit quad-core processors with graphics units able to fill it.


https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
October 20, 2013, 08:11:30 PM
 #56

Due to moores law, the memory requirements should be targeted 1-2 years in the future and then have some means to adjust over time.

I'm thinking its better to set the requirements low for a single process and more capable computers can just run multiple processes . . . it'll scale for powerful computers (linearly with cost of hardware and energy hopefully) but it won't scale for GPUs as they'll quickly run out of both memory and memory bus access.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
October 20, 2013, 10:13:29 PM
 #57

The GPU can produce blocks of 100K nonce/hash pairs which a CPU will then push into the hash table at 100K / sec and the result would be significant GPU advantage. 
Just a quick thought, you'd better to oversaturate PCIe 3.0 x16 bandwidth to be sure. But that will require really fast hash function...
This is why I hobble the GPU with AES and unpredictable run time.  It destroys the massive parallelism by forcing most of the threads to idle.
Sorry for some confusion, I'll clarify. Your idea is to make GPU useless by using GPU-hard hash function; all hashes can be uploaded from GPU to the host, but they are produced very slow, to give GPU miner no (economic) advantage. Mine is to make hash function so fast that the produced results cannot be uploaded to the host in reasonable time. So GPU will have ability to produce hashes, but not enough memory to find collision, and it would be cheaper to calculate hashes on CPU than to upload them from GPU.

Of course I gave you bad advice. Good one is way out of your price range.
BitshireHashaway
Full Member
***
Offline Offline

Activity: 182
Merit: 100



View Profile WWW
October 21, 2013, 12:53:47 AM
 #58

Once the RAM is full everything is a read so it'll be perfectly parallel with some negligible constant time overhead.

Improvements against scrypt for the birthdayhash will help a bit, but like you said any tricks generally translate fine into ASICs.

For your example, since the mac pro has 16 times more RAM and hashing power it will be exactly 16^2 or 256 times better. Well worth the 50x cost. Since this is quadratic it will just get worse when you get into 128GB servers.

I like the elegance of having the RAM be the main resource for hashing, but I think we have a ways to go before making it practical. I'll keep thinking about it.

Nathaniel

The problem with memory is that setups will then turn to server motherboards, some of which can hold an insane amount of RAM. http://www.newegg.com/Product/Product.aspx?Item=N82E16813182346 This motherboard has 24 Ram Slots for $490.00

For best value, we can go with 16gb memory sticks at $150.00 each also available on newegg. http://www.newegg.com/Product/Product.aspx?Item=N82E16820226412

With 24 of these, you would have 384gb of Ram. Granted, your cost would be ~$6000 when you finally put the system together. However, with 384gb of ram, this system is 144 times as good as your Mac Pro for around 1.2x the cost. People would just have systems like this lying around.

Also, that doesn't even count one of Intel's Server Motherboards which can hold theoretically up to 1.5TB of ram (48 slots at 32gb each). However, at current prices your looking at 30 grand just in ram for that.

However, that being said, I would support a memory  based coin. I kind of like the idea of having my house covered with servers each having hundreds of gb of ram versus the idea of my house covered with Asics Smiley.
Sergio_Demian_Lerner
Hero Member
*****
Offline Offline

Activity: 552
Merit: 622


View Profile WWW
October 21, 2013, 02:20:06 AM
 #59

bytemaster: Your idea of using hash collisions is good, but is not new!

I proposed it on my paper MAVEPAY, April 16, 2012  (and even then I thought I wasn't  something new).

Here is the part of the paper where the Colission POW (as I called it) is proposed, available in http://bitslog.files.wordpress.com/2012/04/mavepay1.pdf

The Collision POW:
The collision POW is based on the difficulty of finding partial preimage collisions of two messages. The packet broadcast has the following properties:
•packet=< command, nonce1, nonce2, bn >
•tmp=Hash(command||bn).
•pow1=Hash(tmp||nonce1).
•pow2=Hash(tmp||nonce2).
•pow1 and pow2 must be prefixed by some predefined number of zero bits and share some number of following bits (the remaining bits can differ).
The related attack is the partial collision attack.

Birthday attacks require large amounts of memory and so preclude the use of GPUs and other massively parallel computer architectures. An attacker is therefore less capable of using these technologies to mount a command flooding attack.

Could you please add a reference in your paper to the Collision Pow of MAVEPAY ?

Best regards, Sergio.
bytemaster (OP)
Hero Member
*****
Offline Offline

Activity: 770
Merit: 566

fractally


View Profile WWW
October 21, 2013, 03:05:02 AM
 #60

Thanks for the reference, I will gladly give credit where it is due.

https://fractally.com - the next generation of decentralized autonomous organizations (DAOs).
Pages: « 1 2 [3] 4 5 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!