Bitcoin Forum

Alternate cryptocurrencies => Mining (Altcoins) => Topic started by: Cryddit on March 28, 2015, 01:37:33 AM



Title: Looking for PoW recommendations.
Post by: Cryddit on March 28, 2015, 01:37:33 AM
I'm looking for a set of ~10 or so hashing algorithms suitable for securing a block chain, and if at all possible I'd like them to cater for machines with strengths as *varied* as possible.  Ideally people with different machines should have different inflection points along the scale of hashes-per-minute to current-difficulty-for-each-algorithm for switching from one to another, and in the "steady state" where each algorithm is getting the same number of blocks and difficulty remains constant, different types of machine would be working on each algorithm and nobody would have a reason to switch.

In the long run, The difficulty adjusting semi-independently for each algorithm would mean that each gets an approximately equal number of blocks.

So I'm looking for hashing algorithms...
one more efficient for CPUs with server-sized memory,
one more efficient for CPU's on high-end home machines,
one more efficient for ASICs and small-memory devices
    (this is easy -- SHA256D)
two more efficient for GPU's, (because there are two main
    families of GPUs  out there with different strengths)
one more efficient for single-threaded (like ARM) processors

...etc.

As miners I figure you guys probably know a bit about this stuff, so I would like to ask for your recommendations, if you have any.

I don't really believe there's much of anything that a high-end home machine is actually more efficient at doing than a server-class machine, but my intent is that the server-class machines should usually have a different option available that saves them the effort of competing with the home machines, so the home machines should have at least one algorithm pretty much to themselves most of the time and only deal with competition from the server-class crew when their algorithm's difficulty dips extra-low.

"Efficiency" here is therefore sort of game-theoretic; it would mean that anybody who can do what you can better, *usually* has something more suited to their own machine's strengths that they can be doing instead.  More capable machines (well, other than ASICs) give you more options for switching between algorithms as their respective difficulties vary, but each type should have a "niche" where everybody else usually has something better to do.

Anyway, I think this is a worthwhile thing because it would keep a good variety in what kind of machines (and therefore what socio-economic classes of people) can get blocks.

Thanks for any response you have; even if you don't have specific recommendations I'd like to hear your thoughts.


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 28, 2015, 10:27:57 AM
Let's see. There is Momentum (https://bitcointalk.org/index.php?topic=313479.0;all) function, based on birthday paradox, with an interesting idea that hashing speed increases during nonce range scan as memory gets filled. Parameters chosen for Protoshares coin were too weak to keep GPU away ;) There is FPGA-frienly Blake-256 (https://bitcointalk.org/index.php?topic=306894.0). A good proposal to use hard problem from graph theory, Cuckoo Cycle (https://bitcointalk.org/index.php?topic=451177).

Also
Hi again, SolidCoin developers!

After a dinner I came up with another two ideas:

1) Instead of evaluating a total recursive function over a commutative ring of integers you could try a simpler thing. Require evaluating a specific value of primitive recursive function over the field of reals that has fractional dimension. Good starting example:

http://en.wikipedia.org/wiki/Logistic_map

just pick the value of r that is in the chaotic region.

Implement reals as long double that is 80-bit (10-bytes) on the Intel/AMD CPUs. Not all the C++ compilers really support long double, but the logistic map function is so trivial that you can include in-line assembly which will be in the order of 10 lines.

2) This is a variant of the above that embraces the enemy instead of fighting it. Implement (1) with reals as cl_double which is supported in OpenCL by NVidia cards but not by AMD cards. Then short AMD stock and go long NVDA for additiona gains.

Again, good luck!


EDIT: Oops, forgot about this discussion
Quote
2) This is a variant of the above that embraces the enemy instead of fighting it. Implement (1) with reals as cl_double which is supported in OpenCL by NVidia cards but not by AMD cards. Then short AMD stock and go long NVDA for additiona gains.

I'm pretty sure this isn't true, cl_double is called in the example source code bundled with AMD's APP SDK

The difference in IEEE 754 implementation between nVidia and AMD GPUs is that nVidia tends to be congruent with the results that x86/x64 processors spit out, while AMD's are not because the FP precision is less accurate.

But as far as I know if you can do it in the FPUs and it's invertible, you can do it in integer logic too.  Whether the FPU operations would hold an advantage in that case, I do not know.
Indeed, seems AMD has catched up (http://stackoverflow.com/questions/9114039/differences-between-cl-khr-fp64-and-cl-amd-fp64) with double-precision.
If there is some math operation which output is bit-to-bit same across all NVIDIA hardware but differs in least significant bits from x86 and AMD, such operation can be used for NVIDIA-only coin. Simulating those least significant bits would be possible, but expensive. Perhaps IEEE 754 floating point math is deterministic and the same across different hardware, but in fast math mode NVIDIA and AMD could be different.



Title: Re: Looking for PoW recommendations.
Post by: Bombadil on March 28, 2015, 01:15:24 PM
If you want to make/find a separate Nvidia and AMD friendly algo, you'll need to make sure every algo is 100% optimized before you'll see the differences.
But >90% optimized algos are hard to find, the last one was Scrypt probably. Many new algos keep on popping up these days, so no algo will perform at max speed, always room for new improvements.
So key would be to pick heavily optimized algos and compare them when that happened. Older & popular algos are more prone to be optimized (X11, quark, etc..)

I can already tell you that the Whirlpool algo (https://bitcointalk.org/index.php?topic=679134.0) performs pretty well on older (=< compute 3.5, =< 780TI) Nvidia cards, while the newer Nvidia cards and most AMDs are in the same region. Quark does great on recent Nvidias, but not by that much difference. It all depends on optimization levels, I suppose.
I'm glad you're thinking about us cudaminers too :D Many devs release new algos with just an opencl miner. It works on Nvidias, but mostly at 1/3 it could do when programmed in Cuda.
Some people also think only AMDs are worth it while mining, while my records are showing otherwise.

Don't forget about Scrypt for ASICs too ;)

There's also this peculiar algo that requires at least 15GB of RAM, called Ramhog (https://bitcointalk.org/index.php?topic=655789.0).

Maybe add PoS too, as it enlarges the mining scene to those who don't have mining equipment. A low % reward (1-10) or even a fixed reward might be the right way. High % PoS is only interesting for pure PoS coins.

And what about systems like Curecoin (https://bitcointalk.org/index.php?topic=330685.0)? They take 45% of a mined block and reward that to folders, off-chain. They also take 10% and use that for the back-end of that system. That latter is an argument why it might be less popular, but it's an interesting coin and distribution system nonetheless.


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 28, 2015, 06:22:29 PM
My current plan is to release with a linux daemon and linux qt client with built-in CPU miner only.  The source code will be available (except for the genesis block nonce) four weeks before launch. After launch, there'll be a 4-week period of 1/10x normal rewards, then ten weeks of ramping up to normal reward levels at ten percent per week.  I *think* that's enough time for somebody who has a Windows machine to port things to Windows, somebody who has a mac to port things to whatever they're calling their operating system these days (are they still on cats?), somebody who has cudaminer to write the appropriate cudaminer  plugins, etc, before they miss out on much.  Anyway, after that ramp-up period, the proof-of-work award will be a constant amount per block forever, so it's not like somebody's going to be missing out on a huge instamine if they don't have their mining client ready.

Proof-of-stake awards will exist, but they'll be mostly for the users rather than the miners, and will only start to matter very, very, slowly.  When a transaction is made, the input coins will get ten percent stake award per year for up to two years since the tx that created them.  Most of that will go to the people making the transaction, but a tiny fraction will go (along with tx fees) to the miners.  Under certain circumstances (txIn unspent for over a year) the miner would get up to half of the stake award for a particular txIn, but I think that'll be rare.  People will usually make a tx to themselves, rather than enter a no-interest (for them) period a year in.

Right now my plan is that blocks will always be formed on the basis of proof-of-work.  It'll take about eight years before there are as many coins created by proof-of-stake (and mostly awarded to those who make transactions) as there are formed by proof-of-work (and awarded to miners).  You can think of that, if you like, as an eight year reward halving interval for the miners.

Both (Transactions as) proof-of-stake and proof-of-work will matter for purposes of resolving forks though, so if you want to come out on top in a fork, you include as many recent transactions as possible.

Also, because the block header is changed in both length and format (I wanted to move the extranonce and current-difficulty to the header where they belong, among other things), there aren't any presently extant ASICs that will mine it.  So there'll be a long period of GPU and CPU co-dominance before any ASICs exist (if mining the thing ever becomes valuable enough to justify ASIC development at all).  Still, I want to have at least one hashing algo that will be obvious "low hanging fruit" for the ASIC developers so that if and when they get there they can leave the other algos alone.

For the memory-hard hashes I have been considering momentum, cuckoo cycle and scrypt, all with some tougher parameters than usually found.  

I particularly like momentum hash, because somebody who can build a complete table should asymptotically approach one momentum hash per colliding-hash performed.  This makes it very efficient for a particular amount of memory, with no advantage for greater amounts, so it seems like a good way to tune for machines with different levels of memory.  I've seen how it broke for Protoshares, though, and that's a pretty general break.  It means GPU memory counts for about 10x main memory in terms of hashing speed, so if the complete momentum table is less than 20Gbytes, it'll be faster on GPU.  That's a pretty tough spec for a high-end home machine, but if GPUs have no advantage there, the GPU miners will be working on a different algorithm I hope.  If I pop it up to 40Gbytes and 60Gbytes, those can be for the server-class-machines (for now - home machines in a few more years).  

At 20Gbytes it's good for a custom-built high-end home machine - GPUs might be as fast there, but GPUs have easier things to do like concentrate on the SHA256D stuff.  

I don't care as much for cuckoo cycle, because it doesn't approach anything near the asymptotic hashing efficiency for large-memory machines that momentum does.  Using a classic example from Graph theory was a great idea though; it inspires me to look into solvable instances of other classic problems like factoring, etc.  

Factoring, and modular factoring in particular, are very interesting problems that I'd really like to see more serious public research on and publicly developed ASICs for.  The *public* development of ASICs for that problem would force everybody to use better cryptographic keys, which they need to do anyway because I'm pretty sure that *secret* ASIC development has enabled certain parties to read their private communications for a long time now given the size keys some of them are using.

I'm less knowledgeable about the general capabilities of GPU's; I haven't done much programming on them.  The idea of using Floating-point for anything like consensus sort of gives me hives, and Intel's 80-bit floats are notorious among cryptographers and hard-math people for their inconsistent results on CPUs.  They mysteriously get the bottom 16 bits of mantissa zeroed if the compiler decides to do a register spill to memory - which optimizing compilers do sometimes and don't sometimes.  And If somebody comes out with another FDIV bug or something it could mean a chain-fork from hell.  Still, if *checking* the PoW is done with some very explicit integer math that has well-defined behavior, it could work.  




Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 28, 2015, 10:19:34 PM
Thanks for the pointers to Ramhog and Whirlpool.  It's fairly easy to target several memory sizes for Memory-intensive and CPU algorithms, and the ASIC-able algorithms are also easy to find.  But I'm really sort of at a loss to figure out what hashes are better for which families of graphics cards. 

Anyway, GPUs are going to own a lot of the hashing power on this thing for quite a while; aside from the seriously memory-intensive things I'm putting together, GPUs are going to be getting both their own share and the ASIC-able share for the foreseeable future.   And because of awarding an equal number of coins per algorithm, developing an ASIC will only net people 1/algorithms coins anyway; I'm hoping that the prospect of getting 1/10 (or less) of the coins for the effort of developing ASICs will make them a poor business proposition for a long-ish time. But hopefully not forever.   ;D

It's weird to think that if some hash function gets completely broken, the difficulty adjustments will immediately kick it to the curb and after a few quick blocks are awarded to it, life will continue as though it hadn't been broken at all.



Title: Re: Looking for PoW recommendations.
Post by: Bombadil on March 28, 2015, 10:54:47 PM
Thanks for the pointers to Ramhog and Whirlpool.  It's fairly easy to target several memory sizes for Memory-intensive and CPU algorithms, and the ASIC-able algorithms are also easy to find.  But I'm really sort of at a loss to figure out what hashes are better for which families of graphics cards. 

Well, you could make a new algo altogether per family and make sure it exploits the strengths of those families. Starting from scratch with that in mind will also make it A LOT easier to fully optimize the miner.
Will also generate attention, miners will like such algos and will probably be used in future coins too, but then you can claim you were the first.


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 28, 2015, 11:36:59 PM
I can already tell you that the Whirlpool algo (https://bitcointalk.org/index.php?topic=679134.0) performs pretty well on older (=< compute 3.5, =< 780TI) Nvidia cards, while the newer Nvidia cards and most AMDs are in the same region. Quark does great on recent Nvidias, but not by that much difference. It all depends on optimization levels, I suppose.
I'm glad you're thinking about us cudaminers too :D Many devs release new algos with just an opencl miner. It works on Nvidias, but mostly at 1/3 it could do when programmed in Cuda.
Some people also think only AMDs are worth it while mining, while my records are showing otherwise.
Actually, I did WhirlpoolX for AMD, which is just Whirlpool, and slaughtered high end Maxwell.  ;D
NVidia camp is defenseless - sp_ is on vacation, djm34 was busy with Ziftr, Christians are rather silent. Anyway, I think we are getting close to the GCN hardware limits. Straightforward 8-bit table approach appears to be limited by something like 5 instructions per s-box (https://bitcointalk.org/index.php?topic=977245.msg10875244#msg10875244), wide-block multiplication could save ~25% (https://bitcointalk.org/index.php?topic=826901.msg10735642#msg10735642), my current kernel is slightly above 540 MH/s on 280x@1150, that equals to ~9 op/sbox. I don't have yet good representation of Whirlpool s-box, the best known AES takes 114 AND/XOR/NXOR gates, with 32-bit registers that is ~3.6 op/sbox, probably the linear part of circuit going to be of the same size, so expect ~7 op/sbox for bitslising.

And what about systems like Curecoin (https://bitcointalk.org/index.php?topic=330685.0)? They take 45% of a mined block and reward that to folders, off-chain. They also take 10% and use that for the back-end of that system. That latter is an argument why it might be less popular, but it's an interesting coin and distribution system nonetheless.
The idea of scientific coin is great, but good implementation is yet to come. Curecoin performs some calculation by undisclosed algorithm, we can only hope that it is for good. On the bright side, anonymous microbiology research could become profitable with North Korean or Iranian grants ;D ;D ;D


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 29, 2015, 12:04:31 AM
So there'll be a long period of GPU and CPU co-dominance before any ASICs exist (if mining the thing ever becomes valuable enough to justify ASIC development at all).  Still, I want to have at least one hashing algo that will be obvious "low hanging fruit" for the ASIC developers so that if and when they get there they can leave the other algos alone.
Bitcoin almost turned FPGAs into consumer-grade devices and I was too lazy to grab one. May be there will be another chance with your coin :)

For the memory-hard hashes I have been considering momentum, cuckoo cycle and scrypt, all with some tougher parameters than usually found.  

I particularly like momentum hash, because somebody who can build a complete table should asymptotically approach one momentum hash per colliding-hash performed.  This makes it very efficient for a particular amount of memory, with no advantage for greater amounts, so it seems like a good way to tune for machines with different levels of memory.  I've seen how it broke for Protoshares, though, and that's a pretty general break.  It means GPU memory counts for about 10x main memory in terms of hashing speed, so if the complete momentum table is less than 20Gbytes, it'll be faster on GPU.  That's a pretty tough spec for a high-end home machine, but if GPUs have no advantage there, the GPU miners will be working on a different algorithm I hope.  If I pop it up to 40Gbytes and 60Gbytes, those can be for the server-class-machines (for now - home machines in a few more years).  

At 20Gbytes it's good for a custom-built high-end home machine - GPUs might be as fast there, but GPUs have easier things to do like concentrate on the SHA256D stuff.  
Choosing good base hash function is tough task. If it will too slow, hash calculations and collision search could be separated, GPUs will serve as hashing co-processors for workstations with loads of memory, or Ciscos will do bucket sort, distributing images between small PCs in a cluster. If it will be too small, the algorithm could be vulnerable to SAT solvers or algebraic attacks.

I don't care as much for cuckoo cycle, because it doesn't approach anything near the asymptotic hashing efficiency for large-memory machines that momentum does.  Using a classic example from Graph theory was a great idea though; it inspires me to look into solvable instances of other classic problems like factoring, etc. 
Nearest vector problem (https://en.wikipedia.org/wiki/Lattice_problem) looks like a good candidate with decent cryptographic background.


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 29, 2015, 12:12:48 AM
It's weird to think that if some hash function gets completely broken, the difficulty adjustments will immediately kick it to the curb and after a few quick blocks are awarded to it, life will continue as though it hadn't been broken at all.
Dr. Nicolas Courtois talking about "rolling curves" (http://blog.bettercrypto.com/?p=1004)


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 29, 2015, 02:15:45 AM

Choosing good base hash function is tough task. If it will too slow, hash calculations and collision search could be separated, GPUs will serve as hashing co-processors for workstations with loads of memory, or Ciscos will do bucket sort, distributing images between small PCs in a cluster. If it will be too small, the algorithm could be vulnerable to SAT solvers or algebraic attacks.


I've discovered a couple ways to "tweak" momentum hash.

I can force contention for the table (1 read and 1 write at an unpredicted location, per hash performed) by using a representation trick, so GPU hashing parallelism gets crushed by a bandwidth-to-main-memory requirement.   So I can prevent pipelined hashing from being done without a need to read and write the table.

The representation trick is rather simple, really:  the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2n.  If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.  This also reduces the effectiveness of the Bloom filter implementation by at least 50%; Bloom filter users will have to throw away their filter/tables every so often, losing their momentum, and build new ones.



Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 29, 2015, 02:29:53 AM
Dr. Curtois has a point about secp256k1.  It's more serious than the choice of one of multiple hash functions, which will just get kicked to the curb by difficulty adjustments if it gets broken: Secp256k1 is what actually secures people's coins in Bitcoin and every altcoin I've ever heard of.  There is no known attack, but it isn't as well studied and well vetted as some other curves, and at least one theoretical weakness (which could eventually turn into an attack) has been found.  It doesn't affect mining/block chain security, but if there actually is a viable attack it HORRIBLY affects the security of the coins in the block chain against theft.

I haven't done a damn thing about it yet in the code I'm working on, but that's a pointed reminder that I really need to before I release.  

I'm nervous about it though; as a rule of thumb the more things you change the more chances you're taking on breaking it.  And I'm changing a lot of stuff.



Title: Re: Looking for PoW recommendations.
Post by: smolen on March 29, 2015, 11:00:55 AM
the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2n.  If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.
Probably I didn't understand your trick, how PoW will be verified on small nodes?

There is no known attack, but it isn't as well studied and well vetted as some other curves, and at least one theoretical weakness (which could eventually turn into an attack) has been found.  It doesn't affect mining/block chain security, but if there actually is a viable attack it HORRIBLY affects the security of the coins in the block chain against theft.
Another potential point of failure is hash function used in Merkle tree.


Title: Re: Looking for PoW recommendations.
Post by: MaxDZ8 on March 29, 2015, 01:01:09 PM
No such thing as "efficient for home CPUs": CPUs are not intensive compute devices. As long as the algo needs compute, it will be blasted by a proper compute-oriented architecture.

The only thing that makes sense for CPU is yescrypt, I'm surprised nobody pulled that out. Yescrypt is not compute oriented and therefore cannot be accelerated by using fast compute-oriented architectures. It is surprisingly scalable to cheap CPUs (that does not imply mining with them is profitable).


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 29, 2015, 06:03:17 PM
It is not necessary that something should actually be more efficient for home CPU's; in fact, if something is truly efficient for ordinary run-of-the-mill home CPU's, that's like a pledge to feed all the poor starving botnet operators. 

And botnet operators are the scum of the earth and I would rather kick them in the face than feed them, so I really and truly don't want anything that wouldn't enable a better class of machine to absolutely run circles around truly ordinary home machines.

What meets my criterion is that different classes of machines, if arranging the various hashing algorithms in order of expected revenue per minute  of hashing, should all end up with a different sequence - and preferably with a different *first* algorithm of the sequence.


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 29, 2015, 06:38:16 PM
the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2n.  If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.
Probably I didn't understand your trick, how PoW will be verified on small nodes?

Okay...  What's visible in the block is the nonce presented in the block header, which, when the block header is hashed, yields a hash that meets the difficulty target.  You take *that* nonce and deconstruct it into a (64-bit) base nonce and two (smaller, depends on table size) offsets.  The 64-bit nonce being one of the three momentum nonces,  The 64-bit nonce minus the first of the two offsets being the second momentum nonce, and the 64-bit nonce plus the second of the two offsets being the third momentum nonce (and yes, I have made the nonce field in the header much wider to accommodate real memory-hard algorithms.)

Next you take the header and hash it three times - once with each of the three momentum nonces, instead of the original presented nonce.  If the resulting hashes have the required degree of collision, then the presented nonce is valid.  Meaning, it is one of the very few possible presentable nonces among billions which satisfies the triplet property required by momentum hashing.

Nothing that's hard for a small node to check; it has to do a hash, check the difficulty, separate the fields of the presented nonce, do one addition and one subtraction, then do three more hashes and compare the results for collision.

The trick is that because the two offsets are limited in width, the three momentum nonces must all be found in a narrow subrange of the 64-bit search space.  That means that, once you have populated your table with two collisions per hash  target, you can't just start hashing at top speed presenting whatever's in the table along with the new value as a triplet; because if you do, your hashing will rapidly overshoot the subrange and then you won't be able to form triplets that can be encoded that way because you'd have to use offsets that are too big to fit in the limited offset bit width.  

Instead, you have to keep updating your table.  Every time you find a new nonce, you have to evict the oldest nonce found in its table location (the one nearest, or past, the trailing edge of the subrange you're currently searching) and replace it with the new nonce (which is on the absolute leading edge of the subrange you're currently searching).  That way, when you find the next nonce whose hash matches that target, you don't have the too-old entry in your table (now past the trailing edge of the subrange) preventing you from forming a valid presentable hash.  And this means that you have to do a table read and a table write with every new nonce, instead of just a table read.  Writes are much slower, and mean that hashing can't be parallelized much because table contention.

Another potential point of failure is hash function used in Merkle tree.

Oh?  I haven't heard of anything dodgy or suspect about the Merkle tree implementation.  What's better about something else?  



Title: Re: Looking for PoW recommendations.
Post by: smolen on March 29, 2015, 08:56:46 PM
Another potential point of failure is hash function used in Merkle tree.
Oh?  I haven't heard of anything dodgy or suspect about the Merkle tree implementation.  What's better about something else?  
 At the time of writing I meant possible full solution of SHA-256 in the future, but your response reminded me about old concerns with block header and inner structure of SHA-256 - and guess who started discussion here on bitcointalk (https://bitcointalk.org/index.php?topic=626377.0)? Well, SHA-256 splits input message into 512 bit (64 byte) blocks and process them in iterations (64 rounds each), the last block is padded with zeroes and message length. When Bitcoin computes Merkle root, it mostly (except transaction data itself) applies double SHA-256 to 64 bit messages. So the first iteration of the first hash process fixed state (initialization vector) and variable schedule (2 256-bit child hashes), the second iteration process variable 512 256 bit state (result of previous iteration) and fully fixed schedule (padding block), the only iteration of the second hash gets fixed initialization vectors and a schedule with 256 variable and 256 fixed bits. Clearly there is some interference between SHA-256 design and the way it is used in Bitcoin. I see no way to exploit it, on the other hand double SHA-256 as used in block header hashing seems safe to me too, but Sergio_Demian_Lerner (https://bitcointalk.org/index.php?action=profile;u=24826) talks about possible attacks! (OK, there is a way to boost up mining by throwing away ~40000 binary operations out of ~120000, unfortunately, days when it could be used with dynamically reconfigured FPGAs are gone) Proposed new header structure hints that Sergio considers the first 512-bit block in header too fixed, but I fail to grasp the nature of his concerns.
 Speculating of total solution of SHA-256 (quite unlikely while NP barrier stands, even Keccak seems more vulnerable despite bigger inner state because of lower height), it potentially could be possible to isolate new nodes by network manipulations, give them rewritten history with the same hashes and cause a fork or perform double-spend. This rather fantastic scenario could be performed only once, given that Bitcoin developers watch the blockchain.

EDIT: Still trying to understand what attack this header change protects from. Sergio dragged as much noise as possible from transaction data to schedule part. So might be attack works by fixing schedule (nonce and time in the header) and searching in state. Or there is some bad schedules that make state hashing trivial. But there is Merkle tree, even if degenerated, it's at least one double SHA-256. But there is obligatory block height in coinbase, so we cannot precompute more than for the current block. May be there is some way to lower the complexity by neutralizing one part of schedule by properly calculated rest? So nonce + free time bits will lower hardness of header hashing, and in coinbase we have plenty of free variables. Or ability to do length extension. But Satoshi chooses DOUBLE hashing, so enough noise will go through full hashing, how to jump over it? No, SHA2 developers choose fixed IV (even for short messages) and were aware about differentials and linear approximations. Good low-degree approximation? Unbelievable, and that would be sensation. Mystery. Oh, sorry for braindump :)


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 30, 2015, 05:46:32 AM
I can answer some of that.  

Double SHA256 was chosen over single SHA256 to thwart block boundary extension attacks.  

With single SHA256, the mid-state (basis, along with the last block, on which the final hash is calculated) would depend only on the data in the first block.  If this were the case, an adversary finding a collision for just the final block of the header could  substitute bogus data for the last block.  This would result in a fake header having the same hash (although the second block is different).  Finding a one-block collision, in this case, isn't as difficult as you'd hope, because the bitcoin header varies only 12 bytes of the final block.  

By using SHA256D, the final mid state (AND the final block which is hashed starting from that mid-state) depends on every bit of the header, including the bits in the last block.

This makes finding a collision on the whole header much, much harder.

That's why Satoshi picked SHA256D rather than plain SHA256.  

And from that fact alone I knew he must be a crypto pro.  Most beginners, or even talented, interested programmers who haven't done a lot of crypto already, would never have considered the possibility of an extension attack and modified the application of SHA256 to counter it.  The usual thing for everybody except old hands, is that people think of SHA256 as a bulletproof primitive.  But it isn't.  It's sequential, and its state when it comes to any block depends only on the earlier blocks.  

That said, I have absolutely no idea why Satoshi picked secp256k1 for the elliptic-curve stuff.  

And I have no idea what attack Sergio is talking about.  I was pretty sure that with the doubled construction to stop extension attacks, bitcoin's application of SHA256 was flawless.  



Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 30, 2015, 06:18:02 AM
Another potential point of failure is hash function used in Merkle tree.

Oh?  I haven't heard of anything dodgy or suspect about the Merkle tree implementation.  What's better about something else?  

Aw crap, you're absolutely right.  Bitcoin (and pretty much every alt) is using a non-standard variation of Merkle trees which is subject to malleability.   Under some circumstances sequences of transactions could be repeated in a block without changing the Merkle root.  If there are clients that respond differently (reprocess, ignore, or reject block) to the repeated sequence of tx, and somebody puts an attack block out there that contains such a repeated sequence, that could cause a chain fork.  


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 30, 2015, 12:23:53 PM
I was a bit confused with the term 'collision' in description of modified Momentum, was it for all 3 out of 3 or any 2 out of 3. Anyway, both lead to nice puzzles :)
GPU could calculate all hashes, discard all images but in small range that fits to GPU memory and do collision search there. So hash should be slow enough or search space should be big enough to prevent it. Separate PCs or GPUs could calculate ranges around adjanced base nonces (separated by two offset widths), perform collision search and then exchange data layer by layer instead of dropping a layer and calculating next. So hash should be fast enough to saturate PCI-E and LAN bandwidth.

New ARM cores (some or all, not sure) have hardware SHA1, a chance for mobiles to have a stake at mining.

Burst coin employs a novel way to waste resources, HDD mining. Perhaps they failed to address time to memory tradeoff.


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 30, 2015, 08:12:29 PM
One of the ways to do momentum hashing is to search for a triplet birthday; that is, find three nonces such that when the block is hashed with each of them, the resulting hashes match (to a certain width). 

There are variations on the theme, some of which work by making it deliberately harder to construct an efficiently searchable compact index (forcing more table searching for each sub-nonce considered).  But the idea is that in order to form any kind of valid *whole* momentum nonce you have find to some number of sub-nonces that have some particular mathematical relationship which you can only find by comparing their results, meaning you have to keep an indexed table of the intermediate results.  The twin paradox says you can find pairs with some related property fairly quickly, but efficiently finding triplets require either lotsa luck or a pretty full table. 



Title: Re: Looking for PoW recommendations.
Post by: djm34 on March 30, 2015, 09:50:33 PM
One of the ways to do momentum hashing is to search for a triplet birthday; that is, find three nonces such that when the block is hashed with each of them, the resulting hashes match (to a certain width). 

There are variations on the theme, some of which work by making it deliberately harder to construct an efficiently searchable compact index (forcing more table searching for each sub-nonce considered).  But the idea is that in order to form any kind of valid *whole* momentum nonce you have find to some number of sub-nonces that have some particular mathematical relationship which you can only find by comparing their results, meaning you have to keep an indexed table of the intermediate results.  The twin paradox says you can find pairs with some related property fairly quickly, but efficiently finding triplets require either lotsa luck or a pretty full table. 



could be interesting... (haven't had time yet to read through the whole discussion)


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 31, 2015, 02:28:29 AM
There are variations on the theme, some of which work by making it deliberately harder to construct an efficiently searchable compact index (forcing more table searching for each sub-nonce considered).
It provokes to create efficiently searchable and update-able index, that's good for a database, but, I think, wrong for mining. The scarcest resource going to be memory bandwidth, we can sacrify memory and then CPU for it. The miner has not to be deterministic, some nonces could be thrown away. I'd start with set of buckets with ring buffer inside of each, advance through nonces in big chunks, newly calculated hashes are first uploaded (from cache to main memory) into buckets and then, when entire new chunk is distributed, download each bucket back to fast cache and search for triplets with hash table. Memory access going to be smooth, may be even some HDD bandwidth could be used :)


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 31, 2015, 02:45:03 AM
More fun around Bitcoin block structure - the order of transactions in block is almost not restricted, so couple of KB of hidden data could be embedded into blockchain per 1MB block. If updating nonce2 becomes the bottleneck for mining and transaction fees will be significant, miner could iterate through transaction order by swapping children in nodes, optimal block will contain power of two number of transactions - 64, 128, 256 etc


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 31, 2015, 03:13:06 AM
I figured out the solution to another problem; unfortunately this solution will completely screw with fine distinctions among hashing algorithms.  Pretty much the only meaningful parameter remaining to tweak is the amount of memory it'll take, because no matter what happens, mining will be CPU-bound by the rate at which miners can calculate signatures.

And yes, that can be parallelized, and GPU'd, and ASIC'd.  I can screw with momentum hash badly enough to make it remain memory size bound, and maybe even somewhat memory bandwidth bound.  But it's gonna be ugly.

You guys want a laugh?  Here's the current block header structure. See if you can spot the ringer.  Hint:  I regard mining pools as an existential threat to the stability of any cryptocurrency.


offset   bytes    block
0        4        0   block version
4        4        0   block height
12       8        0   UNIX timestamp
20       4        0   current target (of this algm, compressed form)
24       4        0   current hash algm
28       4        0   zero, do not use (future update space)
32      32        0   merkle root of current tx tree

0       32        1   hash of prev block
32      32        1   merkle root of current unspent txout set

0       20        2   current nonce (yes damnit, that's huge)
20      32        2   sig of all the above (including nonce) with privkey of coinbase txOut
52      12        2   reserved for SHA2 endbit & length field (it actually uses 8 bytes)



Title: Re: Looking for PoW recommendations.
Post by: smolen on March 31, 2015, 03:24:09 AM
And yes, that can be parallelized, and GPU'd, and ASIC'd.
Momentum as in Protoshares? If yes, I have a nice puzzle to solve!

20      32        2   sig of all the above (including nonce) with privkey of coinbase txOut
Useless. EDIT: even if coinbase is restricted to the only txout


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 31, 2015, 03:32:14 AM
I'd start with set of buckets with ring buffer inside of each, advance through nonces in big chunks, newly calculated hashes are first uploaded (from cache to main memory) into buckets and then, when entire new chunk is distributed, download each bucket back to fast cache and search for triplets with hash table. Memory access going to be smooth, may be even some HDD bandwidth could be used :)

Consider how you'd index a search for "triplets" with a hash collision in 35 bits -- except that any 4 of those 35 bits MUST be different between hash A and hash B, and a different 4 bits MUST be different between hash B and hash C.  

Suddenly, you are either trying to write nonces into many buckets of your table (use too much memory) or trying to read from
about ten-thousand different unpredicted places in a simple hash-indexed table (cache abuse).  Of course, you may be able to do that by the time you calculate another signature over the next subnonce....  


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 31, 2015, 03:34:00 AM
And yes, that can be parallelized, and GPU'd, and ASIC'd.
Momentum as in Protoshares? If yes, I have a nice puzzle to solve!

20      32        2   sig of all the above (including nonce) with privkey of coinbase txOut
Useless. EDIT: even if coinbase is restricted to the only txout


How so?  It is my objective to ensure that the person who finds the winning hash is able to spend the txOut.  What better way than ensuring that he needs the Privkey in the first place to form the block that must be hashed? 


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 31, 2015, 03:39:49 AM
And yes, that can be parallelized, and GPU'd, and ASIC'd.
Momentum as in Protoshares? If yes, I have a nice puzzle to solve!

20      32        2   sig of all the above (including nonce) with privkey of coinbase txOut
Useless. EDIT: even if coinbase is restricted to the only txout


How so?  It is my objective to ensure that the person who finds the winning hash is able to spend the txOut.  What better way than ensuring that he needs the Privkey in the first place to form the block that must be hashed?  
I've seen code that checks only the first txOut of coinbase, it could be 1 satoshi dust, the rest is transferred to unsiclosed pubkeyhash.
If block can contain transaction that spends coinbase txout (almost sure it can), privkey of coinbase txout could be fully disclosed to everyone, the second transaction will transfer to undisclosed pubkey.
EDIT: Wait, I was too fast. It's useless for protected closed source miner, such as my Smelter.


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 31, 2015, 03:50:17 AM

If block can contain transaction that spends coinbase txout (almost sure it can), privkey of coinbase txout could be fully disclosed to everyone, the second transaction will transfer to undisclosed pubkey.

Coinbase txOut is not spendable until more than 100 blocks have passed since it was mined.  Ah, I think I get the attack.  You're saying that *NEXT* time the miner gets a block, he can spend the 1-satoshi txOut from the current coinbase and claim the rest as 'mining fee?'  

Good point.  I'll go make sure it can't happen. coinbase can only have one txOut and it must be the full amount.

EDIT:  I'm editing this thing a lot....

Actually, nope.  It suffices to ensure that the coinbase has only one txOut.  If the miner doesn't make it for the full amount he's due, that's tough toenails; the rest isn't going to appear out of nowhere when he spends his undersized txOut.


Title: Re: Looking for PoW recommendations.
Post by: smolen on March 31, 2015, 04:17:35 AM
Coinbase txOut is not spendable until more than 100 blocks have passed since it was mined.
Yes, I was wrong about it. Thought it is wallet restriction :(

It suffices to ensure that the coinbase has only one txOut.
Seems so. Now I have to invent new trick for my miner :)

And what kind of 'ringer' did you mean initially? Something with hash?

EDIT: May be the purpose of this header structure is to ensure that miners will do hash+signature per nonce, and encourage developing open-source ECC math for GPU?


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on March 31, 2015, 10:25:56 PM
Actually opensource ecc code for the GPU would be pretty nice, and possibly worthwhile.

But mostly, what was going on with that was that I was trying to insure that nobody who didn't have the ability to spend the txOut would be finding the hash.

Pool mining is an existential threat to the future of cryptocurrency.  I understand why people do it, but putting that much control (more specifically putting everyone that much closer to a 51% attack) into the hands of a pool admin is dangerous.

The other 'ringer' is the root of the merkle tree of unspent txOuts.  That allows "rolling root" block chain compression where the oldest blocks can be allowed to fall off the end of the block chain after long enough.



Title: Re: Looking for PoW recommendations.
Post by: smolen on April 01, 2015, 12:22:04 AM
But mostly, what was going on with that was that I was trying to insure that nobody who didn't have the ability to spend the txOut would be finding the hash.
May be there is a way to provide miner with only partial knowledge of the key, say, miner will be able to sign only message with fixed prefix. Unlikely, but careful with hash used in signing algorithm.
With ECDSA ASICs, fast cheap internet and slow PoW pools will continue business as usual.
Homomorphic encryption (https://en.wikipedia.org/wiki/Homomorphic_encryption). Seems not ready yet.
Pool could prevent miner from spending block reward by ceasing his pending payments and broadcasting the key. So pool gets nothing (like in block withholding attack), but uncooperative miner with high probability takes a loss. Variant: association of miners cooperate, put some funds under escrow, establish shared secret (private key), reward is splitted in p2pool fashion. If uncooperative spend is detected, association waives reward by broadcasting the key and confiscates defector's funds (or burns all funds, if there is no way to determine rogue member). I guess we are on territory of reverse game theory here.


Title: Re: Looking for PoW recommendations.
Post by: tromp on April 01, 2015, 02:57:55 AM
But mostly, what was going on with that was that I was trying to insure that nobody who didn't have the ability to spend the txOut would be finding the hash.

The pool operator can give each participating miner pre-signed headers to work on.

You can only prevent this by having PoW attempts be so cheap that the pool operator
(who can likely do hardware accelerated signing) could not hand out signed headers fast enough
to keep all miners busy.

But your desire for a memory hard PoW suggests rather slow PoW attempts, in which case
the pool operator can easily keep up...


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on April 01, 2015, 03:45:37 AM

The pool operator can give each participating miner pre-signed headers to work on.


What good is a pre-signed header when each signature is only good for one nonce? 


Title: Re: Looking for PoW recommendations.
Post by: smolen on April 01, 2015, 11:59:40 AM
Actually opensource ecc code for the GPU would be pretty nice, and possibly worthwhile.
Long residue math (https://en.wikipedia.org/wiki/Residue_number_system) would be fun to do in OpenCL, but is any coin ready for bulk signature verification?


Title: Re: Looking for PoW recommendations.
Post by: tromp on April 01, 2015, 02:39:17 PM
The pool operator can give each participating miner pre-signed headers to work on.
What good is a pre-signed header when each signature is only good for one nonce? 

I thought your nonce field defined the 2^64 search space. Re-reading your earlier post, I now see that it defines the triple-nonce solution within the search space as well.

But if every element in your search space requires a new signature then your PoW will be very compute intensive, your search space rather limited in size, and thus not particularly memory hard.


Title: Re: Looking for PoW recommendations.
Post by: djm34 on April 01, 2015, 03:09:06 PM
The pool operator can give each participating miner pre-signed headers to work on.
What good is a pre-signed header when each signature is only good for one nonce?  

I thought your nonce field defined the 2^64 search space. Re-reading your earlier post, I now see that it defines the triple-nonce solution within the search space as well.

But if every element in your search space requires a new signature then your PoW will be very compute intensive, your search space rather limited in size, and thus not particularly memory hard.
actually it depends a bit how the 3 nonces are searched (it would become mem hard, if this implies to store the hashes generated by previous nonce).
but yes, it would probably be possible to search for 3 nonces in one pass... could take quite some time...
but I am wondering if it could be possible to search the 3 nonces in parallel or is it F1(nonce1) some_operation F2(nonce2) some_operation F3(nonce3)


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on April 01, 2015, 05:57:04 PM
As I said, I haven't done much programming on GPUs so I don't know in great detail yet what their capabilities are.  Right now I'm just working from the conventional knowledge that they're highly parallel and have 2Gbytes or so of memory - and memory bandwidth that's about 10x as fast as CPU memory bandwidth. 

Right now the "Big Honkin' Server" task looks like searching for triplet subnonces whose hashes match - EXCEPT for one bit - for the first fifty bits.  Because of the way the triple is represented, the nonces have to be within a 45-bit range of each other, but that's not really an issue because you can spend all day searching inside a 45-bit range. 

You can compute them in parallel, but you're going to need a lot of data structure to store them in. For every one you compute you have fifty different places to look for a matching nonce.  The more previous results you can store, the better the odds that you'll have a couple of matching nonces in that set of places and be able to make a triplet.  In fact if you find more than two you can make more than one triplet.

That's a pretty brutal task.  I have been trying to think of every possible way for a GPU program to "cheat" at it using its 10x memory bandwidth advantage in a 2Gbyte data structure - and then  applying the same compression techniques to a 64G data structure to make the task even more brutal.  This one is supposed to be for big-memory servers only, and GPUs will find a far more profitable use of their time chewing on the SHA256D stuff that the big-memory servers (which don't usually have any use for high-end graphics cards and aren't built with them) can't do nearly as fast as they can.

It's actually kind of fun - I haven't done this kind of "use every trick in the book to absolutely maximize performance given a particular set of resources" programming in years. 


Title: Re: Looking for PoW recommendations.
Post by: smolen on April 01, 2015, 07:47:28 PM
Right now the "Big Honkin' Server" task looks like searching for triplet subnonces whose hashes match - EXCEPT for one bit - for the first fifty bits.  Because of the way the triple is represented, the nonces have to be within a 45-bit range of each other, but that's not really an issue because you can spend all day searching inside a 45-bit range. 
Draft idea: two values that differ only in one bit will be separated by Hamming distance of 1. Miner could backet hashes by Hamming weight, possibly discarding hashes with low and high weight, then generate Bloom filter for even backets and perform search for odd ones in two neighbor (even, +-1) backets.


Title: Re: Looking for PoW recommendations.
Post by: AliassailA on April 02, 2015, 12:50:56 AM
It might work on a quad channel lga 2011... I think you would exceed bandwidth here under normal circumstances. If I am wrong it is because I'm stupid.


Title: Re: Looking for PoW recommendations.
Post by: smolen on April 02, 2015, 03:25:49 PM
It might work on a quad channel lga 2011... I think you would exceed bandwidth here under normal circumstances. If I am wrong it is because I'm stupid.
No idea how multiple memory buses are handled by OSes and how to explicitly balance a load between it, so far no coin developer rolled out such task ;)


Title: Re: Looking for PoW recommendations.
Post by: Cryddit on April 02, 2015, 04:14:52 PM

Draft idea: two values that differ only in one bit will be separated by Hamming distance of 1. Miner could backet hashes by Hamming weight, possibly discarding hashes with low and high weight, then generate Bloom filter for even backets and perform search for odd ones in two neighbor (even, +-1) backets.


Aaaand, props to smolen for coming up with the idea that would in fact put that task in reach of a regular GPU. 

If you start with an exemplar given string of bits, make a hash table, and then every time you have a collision you evict whichever one is furthest in Hamming distance from your example string, then after a while you will collect a bunch of strings clustered very closely in Hamming space and stop updating your table very often - erasing the memory contention constraint and allowing all those parallel processors to hunt, without even needing to consult the table, for strings within a given Hamming distance of the original example.  And when they find one, they light it up; the table of things clustered in Hamming space around that original example string will yield dozens, or even hundreds, of near-triplets that include the new string.

With the parallelism and memory bandwidth advantages, they'd rapidly reach a state where they're able to find near-triplets very efficiently indeed. 

If you do the same trick with the 64G memory structure, you'll be able to collect a much larger Hamming radius and light it up with a much greater fraction of new inputs, but you can't parallelize the search to more than the 8 cores or whatever on your CPU and, because the searching threads don't even need to consult the table if they're looking for things within a small Hamming radius of an example, the GPU can.

So now I get to try to come up with some weird variation on the Cuckoo cycle. 


Title: Re: Looking for PoW recommendations.
Post by: smolen on April 02, 2015, 07:51:10 PM
Scientific calculations, floating point and limited range search just cry to be combined. May be there are real-world problems of some value for science or business that can be turned into good PoW? Protein folding (by undisclosed algorithm) is not PoW; searching for prime number sequences has more value in software developed than in calculations done, what else could be used?


Title: Re: Looking for PoW recommendations.
Post by: djm34 on April 03, 2015, 01:35:59 AM
Scientific calculations, floating point and limited range search just cry to be combined. May be there are real-world problems of some value for science or business that can be turned into good PoW? Protein folding (by undisclosed algorithm) is not PoW; searching for prime number sequences has more value in software developed than in calculations done, what else could be used?
Feynman diagrams... (well  ;D from my point of view a little less stupid than prime numbers, but not necessary by far though...)

ps: a miner is just a big basic Monte Carlo generator, so any kind of simulation could be used... but it is unclear to me if there is a benefit for crypto or for science (who already has at its disposal such tool...).

The problem in implementing other stuff than cryptographic functions, mean it is breakable and it is possible to cheat by using such or such approximation (the danger in using floating point calculation and complex mathematical formula)