Bitcoin Forum
December 13, 2024, 08:44:11 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 »  All
  Print  
Author Topic: Looking for PoW recommendations.  (Read 2578 times)
Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 28, 2015, 01:37:33 AM
 #1

I'm looking for a set of ~10 or so hashing algorithms suitable for securing a block chain, and if at all possible I'd like them to cater for machines with strengths as *varied* as possible.  Ideally people with different machines should have different inflection points along the scale of hashes-per-minute to current-difficulty-for-each-algorithm for switching from one to another, and in the "steady state" where each algorithm is getting the same number of blocks and difficulty remains constant, different types of machine would be working on each algorithm and nobody would have a reason to switch.

In the long run, The difficulty adjusting semi-independently for each algorithm would mean that each gets an approximately equal number of blocks.

So I'm looking for hashing algorithms...
one more efficient for CPUs with server-sized memory,
one more efficient for CPU's on high-end home machines,
one more efficient for ASICs and small-memory devices
    (this is easy -- SHA256D)
two more efficient for GPU's, (because there are two main
    families of GPUs  out there with different strengths)
one more efficient for single-threaded (like ARM) processors

...etc.

As miners I figure you guys probably know a bit about this stuff, so I would like to ask for your recommendations, if you have any.

I don't really believe there's much of anything that a high-end home machine is actually more efficient at doing than a server-class machine, but my intent is that the server-class machines should usually have a different option available that saves them the effort of competing with the home machines, so the home machines should have at least one algorithm pretty much to themselves most of the time and only deal with competition from the server-class crew when their algorithm's difficulty dips extra-low.

"Efficiency" here is therefore sort of game-theoretic; it would mean that anybody who can do what you can better, *usually* has something more suited to their own machine's strengths that they can be doing instead.  More capable machines (well, other than ASICs) give you more options for switching between algorithms as their respective difficulties vary, but each type should have a "niche" where everybody else usually has something better to do.

Anyway, I think this is a worthwhile thing because it would keep a good variety in what kind of machines (and therefore what socio-economic classes of people) can get blocks.

Thanks for any response you have; even if you don't have specific recommendations I'd like to hear your thoughts.
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 28, 2015, 10:27:57 AM
Last edit: March 28, 2015, 10:41:22 AM by smolen
 #2

Let's see. There is Momentum function, based on birthday paradox, with an interesting idea that hashing speed increases during nonce range scan as memory gets filled. Parameters chosen for Protoshares coin were too weak to keep GPU away Wink There is FPGA-frienly Blake-256. A good proposal to use hard problem from graph theory, Cuckoo Cycle.

Also
Hi again, SolidCoin developers!

After a dinner I came up with another two ideas:

1) Instead of evaluating a total recursive function over a commutative ring of integers you could try a simpler thing. Require evaluating a specific value of primitive recursive function over the field of reals that has fractional dimension. Good starting example:

http://en.wikipedia.org/wiki/Logistic_map

just pick the value of r that is in the chaotic region.

Implement reals as long double that is 80-bit (10-bytes) on the Intel/AMD CPUs. Not all the C++ compilers really support long double, but the logistic map function is so trivial that you can include in-line assembly which will be in the order of 10 lines.

2) This is a variant of the above that embraces the enemy instead of fighting it. Implement (1) with reals as cl_double which is supported in OpenCL by NVidia cards but not by AMD cards. Then short AMD stock and go long NVDA for additiona gains.

Again, good luck!


EDIT: Oops, forgot about this discussion
Quote
2) This is a variant of the above that embraces the enemy instead of fighting it. Implement (1) with reals as cl_double which is supported in OpenCL by NVidia cards but not by AMD cards. Then short AMD stock and go long NVDA for additiona gains.

I'm pretty sure this isn't true, cl_double is called in the example source code bundled with AMD's APP SDK

The difference in IEEE 754 implementation between nVidia and AMD GPUs is that nVidia tends to be congruent with the results that x86/x64 processors spit out, while AMD's are not because the FP precision is less accurate.

But as far as I know if you can do it in the FPUs and it's invertible, you can do it in integer logic too.  Whether the FPU operations would hold an advantage in that case, I do not know.
Indeed, seems AMD has catched up with double-precision.
If there is some math operation which output is bit-to-bit same across all NVIDIA hardware but differs in least significant bits from x86 and AMD, such operation can be used for NVIDIA-only coin. Simulating those least significant bits would be possible, but expensive. Perhaps IEEE 754 floating point math is deterministic and the same across different hardware, but in fast math mode NVIDIA and AMD could be different.


Of course I gave you bad advice. Good one is way out of your price range.
Bombadil
Hero Member
*****
Offline Offline

Activity: 644
Merit: 500



View Profile
March 28, 2015, 01:15:24 PM
Last edit: March 28, 2015, 01:35:48 PM by Bombadil
 #3

If you want to make/find a separate Nvidia and AMD friendly algo, you'll need to make sure every algo is 100% optimized before you'll see the differences.
But >90% optimized algos are hard to find, the last one was Scrypt probably. Many new algos keep on popping up these days, so no algo will perform at max speed, always room for new improvements.
So key would be to pick heavily optimized algos and compare them when that happened. Older & popular algos are more prone to be optimized (X11, quark, etc..)

I can already tell you that the Whirlpool algo performs pretty well on older (=< compute 3.5, =< 780TI) Nvidia cards, while the newer Nvidia cards and most AMDs are in the same region. Quark does great on recent Nvidias, but not by that much difference. It all depends on optimization levels, I suppose.
I'm glad you're thinking about us cudaminers too Cheesy Many devs release new algos with just an opencl miner. It works on Nvidias, but mostly at 1/3 it could do when programmed in Cuda.
Some people also think only AMDs are worth it while mining, while my records are showing otherwise.

Don't forget about Scrypt for ASICs too Wink

There's also this peculiar algo that requires at least 15GB of RAM, called Ramhog.

Maybe add PoS too, as it enlarges the mining scene to those who don't have mining equipment. A low % reward (1-10) or even a fixed reward might be the right way. High % PoS is only interesting for pure PoS coins.

And what about systems like Curecoin? They take 45% of a mined block and reward that to folders, off-chain. They also take 10% and use that for the back-end of that system. That latter is an argument why it might be less popular, but it's an interesting coin and distribution system nonetheless.
Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 28, 2015, 06:22:29 PM
 #4

My current plan is to release with a linux daemon and linux qt client with built-in CPU miner only.  The source code will be available (except for the genesis block nonce) four weeks before launch. After launch, there'll be a 4-week period of 1/10x normal rewards, then ten weeks of ramping up to normal reward levels at ten percent per week.  I *think* that's enough time for somebody who has a Windows machine to port things to Windows, somebody who has a mac to port things to whatever they're calling their operating system these days (are they still on cats?), somebody who has cudaminer to write the appropriate cudaminer  plugins, etc, before they miss out on much.  Anyway, after that ramp-up period, the proof-of-work award will be a constant amount per block forever, so it's not like somebody's going to be missing out on a huge instamine if they don't have their mining client ready.

Proof-of-stake awards will exist, but they'll be mostly for the users rather than the miners, and will only start to matter very, very, slowly.  When a transaction is made, the input coins will get ten percent stake award per year for up to two years since the tx that created them.  Most of that will go to the people making the transaction, but a tiny fraction will go (along with tx fees) to the miners.  Under certain circumstances (txIn unspent for over a year) the miner would get up to half of the stake award for a particular txIn, but I think that'll be rare.  People will usually make a tx to themselves, rather than enter a no-interest (for them) period a year in.

Right now my plan is that blocks will always be formed on the basis of proof-of-work.  It'll take about eight years before there are as many coins created by proof-of-stake (and mostly awarded to those who make transactions) as there are formed by proof-of-work (and awarded to miners).  You can think of that, if you like, as an eight year reward halving interval for the miners.

Both (Transactions as) proof-of-stake and proof-of-work will matter for purposes of resolving forks though, so if you want to come out on top in a fork, you include as many recent transactions as possible.

Also, because the block header is changed in both length and format (I wanted to move the extranonce and current-difficulty to the header where they belong, among other things), there aren't any presently extant ASICs that will mine it.  So there'll be a long period of GPU and CPU co-dominance before any ASICs exist (if mining the thing ever becomes valuable enough to justify ASIC development at all).  Still, I want to have at least one hashing algo that will be obvious "low hanging fruit" for the ASIC developers so that if and when they get there they can leave the other algos alone.

For the memory-hard hashes I have been considering momentum, cuckoo cycle and scrypt, all with some tougher parameters than usually found.  

I particularly like momentum hash, because somebody who can build a complete table should asymptotically approach one momentum hash per colliding-hash performed.  This makes it very efficient for a particular amount of memory, with no advantage for greater amounts, so it seems like a good way to tune for machines with different levels of memory.  I've seen how it broke for Protoshares, though, and that's a pretty general break.  It means GPU memory counts for about 10x main memory in terms of hashing speed, so if the complete momentum table is less than 20Gbytes, it'll be faster on GPU.  That's a pretty tough spec for a high-end home machine, but if GPUs have no advantage there, the GPU miners will be working on a different algorithm I hope.  If I pop it up to 40Gbytes and 60Gbytes, those can be for the server-class-machines (for now - home machines in a few more years).  

At 20Gbytes it's good for a custom-built high-end home machine - GPUs might be as fast there, but GPUs have easier things to do like concentrate on the SHA256D stuff.  

I don't care as much for cuckoo cycle, because it doesn't approach anything near the asymptotic hashing efficiency for large-memory machines that momentum does.  Using a classic example from Graph theory was a great idea though; it inspires me to look into solvable instances of other classic problems like factoring, etc.  

Factoring, and modular factoring in particular, are very interesting problems that I'd really like to see more serious public research on and publicly developed ASICs for.  The *public* development of ASICs for that problem would force everybody to use better cryptographic keys, which they need to do anyway because I'm pretty sure that *secret* ASIC development has enabled certain parties to read their private communications for a long time now given the size keys some of them are using.

I'm less knowledgeable about the general capabilities of GPU's; I haven't done much programming on them.  The idea of using Floating-point for anything like consensus sort of gives me hives, and Intel's 80-bit floats are notorious among cryptographers and hard-math people for their inconsistent results on CPUs.  They mysteriously get the bottom 16 bits of mantissa zeroed if the compiler decides to do a register spill to memory - which optimizing compilers do sometimes and don't sometimes.  And If somebody comes out with another FDIV bug or something it could mean a chain-fork from hell.  Still, if *checking* the PoW is done with some very explicit integer math that has well-defined behavior, it could work.  


Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 28, 2015, 10:19:34 PM
 #5

Thanks for the pointers to Ramhog and Whirlpool.  It's fairly easy to target several memory sizes for Memory-intensive and CPU algorithms, and the ASIC-able algorithms are also easy to find.  But I'm really sort of at a loss to figure out what hashes are better for which families of graphics cards. 

Anyway, GPUs are going to own a lot of the hashing power on this thing for quite a while; aside from the seriously memory-intensive things I'm putting together, GPUs are going to be getting both their own share and the ASIC-able share for the foreseeable future.   And because of awarding an equal number of coins per algorithm, developing an ASIC will only net people 1/algorithms coins anyway; I'm hoping that the prospect of getting 1/10 (or less) of the coins for the effort of developing ASICs will make them a poor business proposition for a long-ish time. But hopefully not forever.   Grin

It's weird to think that if some hash function gets completely broken, the difficulty adjustments will immediately kick it to the curb and after a few quick blocks are awarded to it, life will continue as though it hadn't been broken at all.

Bombadil
Hero Member
*****
Offline Offline

Activity: 644
Merit: 500



View Profile
March 28, 2015, 10:54:47 PM
 #6

Thanks for the pointers to Ramhog and Whirlpool.  It's fairly easy to target several memory sizes for Memory-intensive and CPU algorithms, and the ASIC-able algorithms are also easy to find.  But I'm really sort of at a loss to figure out what hashes are better for which families of graphics cards. 

Well, you could make a new algo altogether per family and make sure it exploits the strengths of those families. Starting from scratch with that in mind will also make it A LOT easier to fully optimize the miner.
Will also generate attention, miners will like such algos and will probably be used in future coins too, but then you can claim you were the first.
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 28, 2015, 11:36:59 PM
 #7

I can already tell you that the Whirlpool algo performs pretty well on older (=< compute 3.5, =< 780TI) Nvidia cards, while the newer Nvidia cards and most AMDs are in the same region. Quark does great on recent Nvidias, but not by that much difference. It all depends on optimization levels, I suppose.
I'm glad you're thinking about us cudaminers too Cheesy Many devs release new algos with just an opencl miner. It works on Nvidias, but mostly at 1/3 it could do when programmed in Cuda.
Some people also think only AMDs are worth it while mining, while my records are showing otherwise.
Actually, I did WhirlpoolX for AMD, which is just Whirlpool, and slaughtered high end Maxwell.  Grin
NVidia camp is defenseless - sp_ is on vacation, djm34 was busy with Ziftr, Christians are rather silent. Anyway, I think we are getting close to the GCN hardware limits. Straightforward 8-bit table approach appears to be limited by something like 5 instructions per s-box, wide-block multiplication could save ~25%, my current kernel is slightly above 540 MH/s on 280x@1150, that equals to ~9 op/sbox. I don't have yet good representation of Whirlpool s-box, the best known AES takes 114 AND/XOR/NXOR gates, with 32-bit registers that is ~3.6 op/sbox, probably the linear part of circuit going to be of the same size, so expect ~7 op/sbox for bitslising.

And what about systems like Curecoin? They take 45% of a mined block and reward that to folders, off-chain. They also take 10% and use that for the back-end of that system. That latter is an argument why it might be less popular, but it's an interesting coin and distribution system nonetheless.
The idea of scientific coin is great, but good implementation is yet to come. Curecoin performs some calculation by undisclosed algorithm, we can only hope that it is for good. On the bright side, anonymous microbiology research could become profitable with North Korean or Iranian grants Grin Grin Grin

Of course I gave you bad advice. Good one is way out of your price range.
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 29, 2015, 12:04:31 AM
 #8

So there'll be a long period of GPU and CPU co-dominance before any ASICs exist (if mining the thing ever becomes valuable enough to justify ASIC development at all).  Still, I want to have at least one hashing algo that will be obvious "low hanging fruit" for the ASIC developers so that if and when they get there they can leave the other algos alone.
Bitcoin almost turned FPGAs into consumer-grade devices and I was too lazy to grab one. May be there will be another chance with your coin Smiley

For the memory-hard hashes I have been considering momentum, cuckoo cycle and scrypt, all with some tougher parameters than usually found.  

I particularly like momentum hash, because somebody who can build a complete table should asymptotically approach one momentum hash per colliding-hash performed.  This makes it very efficient for a particular amount of memory, with no advantage for greater amounts, so it seems like a good way to tune for machines with different levels of memory.  I've seen how it broke for Protoshares, though, and that's a pretty general break.  It means GPU memory counts for about 10x main memory in terms of hashing speed, so if the complete momentum table is less than 20Gbytes, it'll be faster on GPU.  That's a pretty tough spec for a high-end home machine, but if GPUs have no advantage there, the GPU miners will be working on a different algorithm I hope.  If I pop it up to 40Gbytes and 60Gbytes, those can be for the server-class-machines (for now - home machines in a few more years).  

At 20Gbytes it's good for a custom-built high-end home machine - GPUs might be as fast there, but GPUs have easier things to do like concentrate on the SHA256D stuff.  
Choosing good base hash function is tough task. If it will too slow, hash calculations and collision search could be separated, GPUs will serve as hashing co-processors for workstations with loads of memory, or Ciscos will do bucket sort, distributing images between small PCs in a cluster. If it will be too small, the algorithm could be vulnerable to SAT solvers or algebraic attacks.

I don't care as much for cuckoo cycle, because it doesn't approach anything near the asymptotic hashing efficiency for large-memory machines that momentum does.  Using a classic example from Graph theory was a great idea though; it inspires me to look into solvable instances of other classic problems like factoring, etc. 
Nearest vector problem looks like a good candidate with decent cryptographic background.

Of course I gave you bad advice. Good one is way out of your price range.
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 29, 2015, 12:12:48 AM
 #9

It's weird to think that if some hash function gets completely broken, the difficulty adjustments will immediately kick it to the curb and after a few quick blocks are awarded to it, life will continue as though it hadn't been broken at all.
Dr. Nicolas Courtois talking about "rolling curves"

Of course I gave you bad advice. Good one is way out of your price range.
Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 29, 2015, 02:15:45 AM
 #10


Choosing good base hash function is tough task. If it will too slow, hash calculations and collision search could be separated, GPUs will serve as hashing co-processors for workstations with loads of memory, or Ciscos will do bucket sort, distributing images between small PCs in a cluster. If it will be too small, the algorithm could be vulnerable to SAT solvers or algebraic attacks.


I've discovered a couple ways to "tweak" momentum hash.

I can force contention for the table (1 read and 1 write at an unpredicted location, per hash performed) by using a representation trick, so GPU hashing parallelism gets crushed by a bandwidth-to-main-memory requirement.   So I can prevent pipelined hashing from being done without a need to read and write the table.

The representation trick is rather simple, really:  the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2n.  If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.  This also reduces the effectiveness of the Bloom filter implementation by at least 50%; Bloom filter users will have to throw away their filter/tables every so often, losing their momentum, and build new ones.

Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 29, 2015, 02:29:53 AM
 #11

Dr. Curtois has a point about secp256k1.  It's more serious than the choice of one of multiple hash functions, which will just get kicked to the curb by difficulty adjustments if it gets broken: Secp256k1 is what actually secures people's coins in Bitcoin and every altcoin I've ever heard of.  There is no known attack, but it isn't as well studied and well vetted as some other curves, and at least one theoretical weakness (which could eventually turn into an attack) has been found.  It doesn't affect mining/block chain security, but if there actually is a viable attack it HORRIBLY affects the security of the coins in the block chain against theft.

I haven't done a damn thing about it yet in the code I'm working on, but that's a pointed reminder that I really need to before I release.  

I'm nervous about it though; as a rule of thumb the more things you change the more chances you're taking on breaking it.  And I'm changing a lot of stuff.

smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 29, 2015, 11:00:55 AM
 #12

the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2n.  If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.
Probably I didn't understand your trick, how PoW will be verified on small nodes?

There is no known attack, but it isn't as well studied and well vetted as some other curves, and at least one theoretical weakness (which could eventually turn into an attack) has been found.  It doesn't affect mining/block chain security, but if there actually is a viable attack it HORRIBLY affects the security of the coins in the block chain against theft.
Another potential point of failure is hash function used in Merkle tree.

Of course I gave you bad advice. Good one is way out of your price range.
MaxDZ8
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500



View Profile
March 29, 2015, 01:01:09 PM
 #13

No such thing as "efficient for home CPUs": CPUs are not intensive compute devices. As long as the algo needs compute, it will be blasted by a proper compute-oriented architecture.

The only thing that makes sense for CPU is yescrypt, I'm surprised nobody pulled that out. Yescrypt is not compute oriented and therefore cannot be accelerated by using fast compute-oriented architectures. It is surprisingly scalable to cheap CPUs (that does not imply mining with them is profitable).
Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 29, 2015, 06:03:17 PM
 #14

It is not necessary that something should actually be more efficient for home CPU's; in fact, if something is truly efficient for ordinary run-of-the-mill home CPU's, that's like a pledge to feed all the poor starving botnet operators. 

And botnet operators are the scum of the earth and I would rather kick them in the face than feed them, so I really and truly don't want anything that wouldn't enable a better class of machine to absolutely run circles around truly ordinary home machines.

What meets my criterion is that different classes of machines, if arranging the various hashing algorithms in order of expected revenue per minute  of hashing, should all end up with a different sequence - and preferably with a different *first* algorithm of the sequence.
Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 29, 2015, 06:38:16 PM
 #15

the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2n.  If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.
Probably I didn't understand your trick, how PoW will be verified on small nodes?

Okay...  What's visible in the block is the nonce presented in the block header, which, when the block header is hashed, yields a hash that meets the difficulty target.  You take *that* nonce and deconstruct it into a (64-bit) base nonce and two (smaller, depends on table size) offsets.  The 64-bit nonce being one of the three momentum nonces,  The 64-bit nonce minus the first of the two offsets being the second momentum nonce, and the 64-bit nonce plus the second of the two offsets being the third momentum nonce (and yes, I have made the nonce field in the header much wider to accommodate real memory-hard algorithms.)

Next you take the header and hash it three times - once with each of the three momentum nonces, instead of the original presented nonce.  If the resulting hashes have the required degree of collision, then the presented nonce is valid.  Meaning, it is one of the very few possible presentable nonces among billions which satisfies the triplet property required by momentum hashing.

Nothing that's hard for a small node to check; it has to do a hash, check the difficulty, separate the fields of the presented nonce, do one addition and one subtraction, then do three more hashes and compare the results for collision.

The trick is that because the two offsets are limited in width, the three momentum nonces must all be found in a narrow subrange of the 64-bit search space.  That means that, once you have populated your table with two collisions per hash  target, you can't just start hashing at top speed presenting whatever's in the table along with the new value as a triplet; because if you do, your hashing will rapidly overshoot the subrange and then you won't be able to form triplets that can be encoded that way because you'd have to use offsets that are too big to fit in the limited offset bit width.  

Instead, you have to keep updating your table.  Every time you find a new nonce, you have to evict the oldest nonce found in its table location (the one nearest, or past, the trailing edge of the subrange you're currently searching) and replace it with the new nonce (which is on the absolute leading edge of the subrange you're currently searching).  That way, when you find the next nonce whose hash matches that target, you don't have the too-old entry in your table (now past the trailing edge of the subrange) preventing you from forming a valid presentable hash.  And this means that you have to do a table read and a table write with every new nonce, instead of just a table read.  Writes are much slower, and mean that hashing can't be parallelized much because table contention.

Another potential point of failure is hash function used in Merkle tree.

Oh?  I haven't heard of anything dodgy or suspect about the Merkle tree implementation.  What's better about something else?  

smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 29, 2015, 08:56:46 PM
Last edit: March 30, 2015, 01:26:32 AM by smolen
 #16

Another potential point of failure is hash function used in Merkle tree.
Oh?  I haven't heard of anything dodgy or suspect about the Merkle tree implementation.  What's better about something else?  
 At the time of writing I meant possible full solution of SHA-256 in the future, but your response reminded me about old concerns with block header and inner structure of SHA-256 - and guess who started discussion here on bitcointalk? Well, SHA-256 splits input message into 512 bit (64 byte) blocks and process them in iterations (64 rounds each), the last block is padded with zeroes and message length. When Bitcoin computes Merkle root, it mostly (except transaction data itself) applies double SHA-256 to 64 bit messages. So the first iteration of the first hash process fixed state (initialization vector) and variable schedule (2 256-bit child hashes), the second iteration process variable 512 256 bit state (result of previous iteration) and fully fixed schedule (padding block), the only iteration of the second hash gets fixed initialization vectors and a schedule with 256 variable and 256 fixed bits. Clearly there is some interference between SHA-256 design and the way it is used in Bitcoin. I see no way to exploit it, on the other hand double SHA-256 as used in block header hashing seems safe to me too, but Sergio_Demian_Lerner talks about possible attacks! (OK, there is a way to boost up mining by throwing away ~40000 binary operations out of ~120000, unfortunately, days when it could be used with dynamically reconfigured FPGAs are gone) Proposed new header structure hints that Sergio considers the first 512-bit block in header too fixed, but I fail to grasp the nature of his concerns.
 Speculating of total solution of SHA-256 (quite unlikely while NP barrier stands, even Keccak seems more vulnerable despite bigger inner state because of lower height), it potentially could be possible to isolate new nodes by network manipulations, give them rewritten history with the same hashes and cause a fork or perform double-spend. This rather fantastic scenario could be performed only once, given that Bitcoin developers watch the blockchain.

EDIT: Still trying to understand what attack this header change protects from. Sergio dragged as much noise as possible from transaction data to schedule part. So might be attack works by fixing schedule (nonce and time in the header) and searching in state. Or there is some bad schedules that make state hashing trivial. But there is Merkle tree, even if degenerated, it's at least one double SHA-256. But there is obligatory block height in coinbase, so we cannot precompute more than for the current block. May be there is some way to lower the complexity by neutralizing one part of schedule by properly calculated rest? So nonce + free time bits will lower hardness of header hashing, and in coinbase we have plenty of free variables. Or ability to do length extension. But Satoshi chooses DOUBLE hashing, so enough noise will go through full hashing, how to jump over it? No, SHA2 developers choose fixed IV (even for short messages) and were aware about differentials and linear approximations. Good low-degree approximation? Unbelievable, and that would be sensation. Mystery. Oh, sorry for braindump Smiley

Of course I gave you bad advice. Good one is way out of your price range.
Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 30, 2015, 05:46:32 AM
Last edit: March 30, 2015, 06:43:41 AM by Cryddit
 #17

I can answer some of that.  

Double SHA256 was chosen over single SHA256 to thwart block boundary extension attacks.  

With single SHA256, the mid-state (basis, along with the last block, on which the final hash is calculated) would depend only on the data in the first block.  If this were the case, an adversary finding a collision for just the final block of the header could  substitute bogus data for the last block.  This would result in a fake header having the same hash (although the second block is different).  Finding a one-block collision, in this case, isn't as difficult as you'd hope, because the bitcoin header varies only 12 bytes of the final block.  

By using SHA256D, the final mid state (AND the final block which is hashed starting from that mid-state) depends on every bit of the header, including the bits in the last block.

This makes finding a collision on the whole header much, much harder.

That's why Satoshi picked SHA256D rather than plain SHA256.  

And from that fact alone I knew he must be a crypto pro.  Most beginners, or even talented, interested programmers who haven't done a lot of crypto already, would never have considered the possibility of an extension attack and modified the application of SHA256 to counter it.  The usual thing for everybody except old hands, is that people think of SHA256 as a bulletproof primitive.  But it isn't.  It's sequential, and its state when it comes to any block depends only on the earlier blocks.  

That said, I have absolutely no idea why Satoshi picked secp256k1 for the elliptic-curve stuff.  

And I have no idea what attack Sergio is talking about.  I was pretty sure that with the doubled construction to stop extension attacks, bitcoin's application of SHA256 was flawless.  

Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 30, 2015, 06:18:02 AM
Last edit: March 30, 2015, 06:39:34 AM by Cryddit
 #18

Another potential point of failure is hash function used in Merkle tree.

Oh?  I haven't heard of anything dodgy or suspect about the Merkle tree implementation.  What's better about something else?  

Aw crap, you're absolutely right.  Bitcoin (and pretty much every alt) is using a non-standard variation of Merkle trees which is subject to malleability.   Under some circumstances sequences of transactions could be repeated in a block without changing the Merkle root.  If there are clients that respond differently (reprocess, ignore, or reject block) to the repeated sequence of tx, and somebody puts an attack block out there that contains such a repeated sequence, that could cause a chain fork.  
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
March 30, 2015, 12:23:53 PM
 #19

I was a bit confused with the term 'collision' in description of modified Momentum, was it for all 3 out of 3 or any 2 out of 3. Anyway, both lead to nice puzzles Smiley
GPU could calculate all hashes, discard all images but in small range that fits to GPU memory and do collision search there. So hash should be slow enough or search space should be big enough to prevent it. Separate PCs or GPUs could calculate ranges around adjanced base nonces (separated by two offset widths), perform collision search and then exchange data layer by layer instead of dropping a layer and calculating next. So hash should be fast enough to saturate PCI-E and LAN bandwidth.

New ARM cores (some or all, not sure) have hardware SHA1, a chance for mobiles to have a stake at mining.

Burst coin employs a novel way to waste resources, HDD mining. Perhaps they failed to address time to memory tradeoff.

Of course I gave you bad advice. Good one is way out of your price range.
Cryddit (OP)
Legendary
*
Offline Offline

Activity: 924
Merit: 1132


View Profile
March 30, 2015, 08:12:29 PM
 #20

One of the ways to do momentum hashing is to search for a triplet birthday; that is, find three nonces such that when the block is hashed with each of them, the resulting hashes match (to a certain width). 

There are variations on the theme, some of which work by making it deliberately harder to construct an efficiently searchable compact index (forcing more table searching for each sub-nonce considered).  But the idea is that in order to form any kind of valid *whole* momentum nonce you have find to some number of sub-nonces that have some particular mathematical relationship which you can only find by comparing their results, meaning you have to keep an indexed table of the intermediate results.  The twin paradox says you can find pairs with some related property fairly quickly, but efficiently finding triplets require either lotsa luck or a pretty full table. 

Pages: [1] 2 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!