Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: Anders on July 17, 2014, 01:11:13 PM

Title: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 01:11:13 PM

"Rule 30 is a one-dimensional binary cellular automaton rule introduced by Stephen Wolfram in 1983.[2] Using Wolfram's classification scheme, Rule 30 is a Class III rule, displaying aperiodic, chaotic behaviour." -- http://en.wikipedia.org/wiki/Rule_30

I read somewhere that when taking every other center bit of the Rule 30 cellular automaton, it's very difficult to determine from what initial condition the resulting bit string was generated. That can be used as a cryptographic hash function.

Indeed I found that this has already been done, such as:

A New Cryptographic Hash Function Based on Cellular Automata Rules 30, 134 and Omega-Flip Network -- http://www.ipcsit.com/vol27/32-ICICN2012-N10006.pdf

A new cryptographic hash function based on the Cellular Automaton Rule 30 -- http://www.zimuel.it/talks/Nks2006.pdf

I haven't read much about it yet but my idea is to start with the bits of the message as the initial condition for the Rule 30 cellular automaton. And then run the automaton a fixed number of steps so that the cells in the automaton have highly random values. Then after that every second bit is taken from the middle column of the automaton with 2n iterations, where n is the number of bits of the hash (digest) value.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 01:52:04 PM

Here is the reason for why every other cell in the center column should be used instead of each consecutive cell/row:

"So can such rules be used for cryptography? I strongly suspect that they can, and that in fact they allow one to construct systems that are at least as secure to cryptanalysis as any that are known. ... The picture below shows evidence for this. The cells marked by dots have colors that are taken as given, and then the colors of other cells are filled in according to the average that is obtained by starting from all possible initial conditions." -- Stephen Wolfram, A New Kind of Science, pp. 603, 605.

http://s27.postimg.org/j1a9xk3yb/rule_30.png

Even fewer cells could be used since the image text says it would make the cryptanalysis even harder.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 03:56:50 PM

Here is a JavaScript implementation of my Rule 30 hash function: http://jsfiddle.net/d2g2R/

Title: Re: Rule 30 automaton as hash function
Post by: dentldir on July 17, 2014, 04:37:46 PM

A relevant cryptanalysis can be found in:

Analysis of Pseudo Random Sequences Generated by Cellular Automata
Willi Meier, Othmar Stdelbach

It doesn't cover every other center bit but does address the CA as a whole.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 05:10:59 PM

Quote from: dentldir on July 17, 2014, 04:37:46 PM

It seems that using cellular automata for hash functions is a viable option according to that paper: http://download.springer.com/static/pdf/386/chp%253A10.1007%252F3-540-46416-6_17.pdf?auth66=1405789057_31fc1522b59f8110ad57d060b99b9ccc&ext=.pdf I wonder how well my Rule 30 hash function performs. I have no clue how to test it. It would be good to have a general test that can be fed a bulk of sample hashes. I found this: http://hashtest.sourceforge.net but I can't determine whether the test is useful or not. I guess I need to learn more about cryptography.

Title: Re: Rule 30 automaton as hash function
Post by: Peter R on July 17, 2014, 07:18:53 PM

I expect Rule 30 would be cryptographically as strong (or stronger) than SHA2. (Side note: Rule 30 is used by the Mathematica function Random[])

Off the top of my head (and I could be very wrong here), I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions). If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message). Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa). Your "hash value" would be the N central-most bits.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 07:47:30 PM

Here is an improved version (live demo): http://jsfiddle.net/L2eNR/

In the first version a message of for example 160 characters 'a' would generate the same hash as 160 characters 'b' (actually zero). Not good. I simply wrapped the message modulo 80 with xor. In the new version there is an additional xor that is dependent on the characters in the message. And more rows are skipped initially in the new version to allow the bits to blend from left to right of the automaton.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 07:53:10 PM

Quote from: Peter R on July 17, 2014, 07:18:53 PM

The 1-dimensional automata only grow 2 cells per generation (iteration). And if every other row is skipped (for stronger cryptography) then it means that the automaton grows with 4 cells per generation.

In my hash function I first reduce the message to n characters (80 in my JavaScript example).

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 08:27:32 PM

Quote from: Peter R on July 17, 2014, 07:18:53 PM

If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message). Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa). Your "hash value" would be the N central-most bits.

I forgot to comment on that. Yes, the bits in the original message must be blended like that. That's why I start with only 80 characters (640 bits) as initial condition in the automaton. And it seems that more than 640 iterations are needed initially. I think it has to do with that the cells need to be influenced left to right and then back right to left again.

Edit: But the initial "blending" can be calculated in a 1-dimensional array. Because the bits for the hash are taken from continuing expanding the cellular automaton in one dimension.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 09:12:23 PM

The reduction of the message into 640 bits I make for the initial condition of the cellular automaton, is itself a hash function. And if that hash sucks in terms of collisions, then those collisions will remain even after the Rule 30 scrambling. The simple xor wrapping could be good enough though. If for example a hash is calculated for a 1 megabyte file, then the initial condition will contain 640 bits of complete mess (in a good sense) because of the xor of all the bytes in the file. And even for a very short message the Rule 30 scrambling will turn the hash into a highly random sequence of 256 bits. So the 256-bit hash in my JavaScript example should possibly work well for all types of messages of all lengths.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 17, 2014, 10:32:35 PM

Same hash function, just corrected array size: http://jsfiddle.net/n9Ssu/

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 18, 2014, 09:04:30 AM

Here is new version with a random generator added to the initial condition: http://jsfiddle.net/Nem6N/

The rationale for that is that messages with repeated patterns can cause collisions when those patterns match the modulo wrap. Now different lengths of messages will produce different initial conditions regardless of how similar the messages are and regardless of what repeated patterns they contain.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 18, 2014, 03:46:34 PM

Improved version ~~without~~ with random generator: http://jsfiddle.net/ALV5X/

Very fast for huge messages.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 18, 2014, 04:36:51 PM

Quote from: Peter R on July 17, 2014, 07:18:53 PM

... I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions). ...

Yes, if the full message was used as initial condition the algorithm would be O(n²) and require several times the message length storage (in a 1-dimentional array). What I do is to crunch down the message to an initial condition of max 640 bits (or whatever fixed value is chosen). This makes my hash function O(n) and the storage requirement fixed to a small value.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 19, 2014, 07:51:59 AM

Ouch. Professional cryptographers/cryptanalysts, the below article seems to say, usually only have time and interest to look at new algorithms if they have been published in the mainstream professional literature or developed by known cryptographers. >:(:D It's understandable since it probably takes a lot of time and effort to look at even simple cryptographic algorithms. Stephen Wolfram's quote I posted earlier can perhaps help a bit, since even though he's not a cryptographer he is a top expert in mathematics. (A cipher is different than a hash function but I think the article implicitly describes cryptography in general.)

"Memo to the Amateur Cipher Designer

Congratulations. You've just invented this great new cipher, and you want to do something with it. You're new in the field; no one's heard of you, and you don't have any credentials as a cryptanalyst. You want to get well-known cryptographers to look at your work. What can you do?

Unfortunately, you have a tough road ahead of you. I see about two new cipher designs from amateur cryptographers every week. The odds of any of these ciphers being secure are slim. The odds of any of them being both secure and efficient are negligible. The odds of any of them being worth actual money are virtually non-existent.

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break. It's not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis. And the only way to prove that is to subject the algorithm to years of analysis by the best cryptographers around.

"The best cryptographers around" break a lot of ciphers. The academic literature is littered with the carcasses of ciphers broken by their analyses. But they're a busy bunch; they don't have time to break everything. How do they decide what to look at?

Ideally, cryptographers should only look at ciphers that have a reasonable chance of being secure. And since anyone can create a cipher that he believes to be secure, this means that cryptographers should only look at ciphers created by people whose opinions are worth something. No one is impressed if a random person creates an cipher he can't break; but if one of the world's best cryptographers creates an cipher he can't break, now that's worth looking at." -- https://www.schneier.com/crypto-gram-9810.html#cipherdesign

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 19, 2014, 04:52:55 PM

"The notion of Computational Irreducibility (CIR) seems to have been first put forward by Wolfram. Given a physical system whose behavior can be calculated by simulating explicitly each step of its evolution, is it always possible to predict the outcome without tracing each step? Is there always a shortcut to go directly to the nth step? Wolfram conjectured [16, 17, 18] that in most cases the answer is no. While many computations admit shortcuts that allow them to be performed more rapidly, others cannot be sped up. Computations that cannot be sped up by means of any shortcut are called computationally irreducible. ...

In this context, Israeli and Goldenfeld in [9] have shown that some automata that are apparently computationally irreducible have nevertheless properties that are predictable. But these properties are obtained by coarse graining and don’t account for small scales details. Moreover some automata (rule 30 for example) seem to be impossible to coarse grain." -- http://arxiv.org/ftp/arxiv/papers/1304/1304.5247.pdf

The comment about that rule 30 "seems to be" computationally irreducible is actually stronger than it sounds. It's often difficult to give definite answers since some future method may break the assumption. However, the same is the case with provably secure hash functions:

"A cryptographic hash function has provable security against collision attacks if finding collisions is provably polynomial-time reducible from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable.

It means that if finding collisions would be feasible in polynomial time by algorithm A, we could find and use polynomial time algorithm R (reduction algorithm) that would use algorithm A to solve problem P, which is widely supposed to be unsolvable in polynomial time." -- http://en.wikipedia.org/wiki/Provably_secure_cryptographic_hash_function#Provably_secure_hash_functions

Notice the description "widely supposed". Again, no definite answer even when the hash function is provably secure.

Title: Re: Rule 30 automaton as hash function
Post by: Peter R on July 19, 2014, 06:41:13 PM

I don't think this thread is generating much interest because your proposal doesn't seem to be very useful. Even if it worked, why would anyone choose this over a widely-analyzed standard like SHA2? And the fact that you are hashing the message (you called it "crunching to 640 bits") before you hash the message (with Rule 30) suggests that perhaps you are choosing the wrong tool for the job.

That being said, I do find Wolfram's Theory of Computation Irreducibility very interesting. I expect that he is right, and I think it would imply that there exists a certain group of hash functions that are fundamentally impenetrable by cryptanalysis. In essence, they are already fully dense in their complexity. As far as I know, no one has shown the Rule 30 has any predictability at all--and I know a lot of people would love to prove Stephen Wolfram wrong. But even if Rule 30 was computationally irreducible, it's still not necessarily useful as a hash function because it is at least O(n^2) with message length to use it in "raw form" (without your pre-hashing). And then any further changes you make to improve its memory and speed performance may affect its security--which means a lot more analysis. So I think this is really just of theoretical interest at this point.

Title: Re: Rule 30 automaton as hash function
Post by: DeathAndTaxes on July 19, 2014, 06:56:23 PM

I have to agree with Peter.

Quote

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.

Schneier's law

A cryptographic primitive becomes more valuable the more it is used (which creates a self reinforcing cycle). Better known and deployed algorithms are more likely to be researched that esoteric ones. The longer the algorithm is around without any published breaks the higher the confidence that the algorithm can't be broken at the current time.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 19, 2014, 07:21:20 PM

Quote from: Peter R on July 19, 2014, 06:41:13 PM

In Bitcoin SHA-256 is used twice iirc; first a hash is generated then that hash is hashed again. From a speculative conspiratorial view, the NSA may have given the public community the SHA family of hash functions because the NSA can break them, for national security reasons. So from that perspective a rule 30 hash function may be useful for privacy reasons.

For example, instead of starting with my simple initial condition, the initial condition could for example be a SHA-256 or SHA-512 hash. My simple initial condition may be enough though since the rule 30 scrambling generates a highly random hash even from plain text as the initial condition.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 19, 2014, 07:44:12 PM

It's easy to generate a collision for my hash function by the fact that the initial condition is generated with such a simple algorithm. Here is one collision: the byte (0=0x00, 1=0x01) sequence 1000000000000000000000000000000000000000000000000000000000000000000000000000000 00 will generate the same hash as 0100000000000000000000000000000000000000000000000000000000000000000000000000000 01.

Yet I believe that's just because the pigeon hole principle shows that collisions will always occur when a longer message is converted into a shorter hash value.

"In mathematics, the pigeonhole principle states that if n items are put into m containers, with n > m, then at least one container must contain more than one item.[1] ... For example, collisions are inevitable in a hash table because the number of possible keys exceeds the number of indices in the array. A hashing algorithm, no matter how clever, cannot avoid these collisions." -- http://en.wikipedia.org/wiki/Pigeonhole_principle

For fun, try to crack this hash:

Code:

6ab4a25ef4c85a3197674b2796c40c71ccc81a8f82297a97020ad8ffc391ca4b

http://jsfiddle.net/ALV5X/

Hint: the message is 7 characters long and starts with 'bitcoi'. ;D

When you have cracked the message you can try to find another different message that generates the same hash.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 19, 2014, 08:04:12 PM

Here is a harder problem: Try to find a message that produces the following hash:

Code:

b27ac6cabaede78ee58ef73c6099e4e8dd8f1491ababe9b3f858f3d1644a51dd

http://jsfiddle.net/ALV5X/

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 20, 2014, 07:03:34 AM

Wolfram's computational irreducibility, if exists, defines the perfect proof of work, therefore the discussion is on spot here or better in an alt-coin forum.

I think that a simple POW algorithm eliminates the risk of possible shortcuts that might exist in complex algorithms like the SHA group.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 20, 2014, 08:00:28 AM

Quote from: grau on July 20, 2014, 07:03:34 AM

Yes, if computational irreducibility can be proved or at least assumed to be true according to experts, then it per definition means no shortcuts, no matter what fancy mathematics is used.

SHA-1 is considered broken if I understood it correctly. In that case the same may happen to SHA-2.

If SHA-2 becomes broken then Bitcoin will maybe need to replace the SHA-256 hash function, or add another hash on top of SHA-256.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 20, 2014, 12:06:02 PM

Maybe I should explain how my hash function works. Here is an illustration with a smaller version:

Starting with a sample message: 0123456789

For each byte in the message the initial condition is set to the byte value xor previous wrapped initial condition result, plus the sum of all previous bytes in the message, plus a random value, summed together modulo 256, resulting in something like 5396. In this simple illustration modulo 10 is used for the values and modulo 4 for the length of the initial condition.

The initial condition 5396 is then used in the rule 30 cellular automaton by setting the cell values equal to the bit values of the initial condition. Then a fixed number of rows (cellular automaton generations) are skipped to allow all cells to influence each other left to right and back again. And then after that a hash is generated as the value of every other cell in the expansion of the center column of the cellular automaton. Other values than 640 bits initial condition and 256-bit hash as in my JavaScript example (http://jsfiddle.net/ALV5X/) can be used.

Now that I have published it here, no one else can turn this into a patent. 8)

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 20, 2014, 07:02:07 PM

Exact same hash function with fewer calculations: http://jsfiddle.net/7TuBy/

The previous version calculated a lot of unnecessary cells. The cellular automaton expands like this:

http://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Rule30-first-500-rows.png/500px-Rule30-first-500-rows.png

Only the cells represented by the triangle need to be calculated.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 20, 2014, 08:48:15 PM

Quote from: DeathAndTaxes on July 19, 2014, 06:56:23 PM

I have to agree with Peter.

Quote

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.

Yes, the cryptographic algorithms need to be heavily battle tested in real applications over a long period of time. The scary thing however is that many cryptographic algorithms can become broken overnight, even after several years of widespread use.

To use a rule 30 hash for Bitcoin today would probably be too shaky. Instead a rule 30 hash can be used in a new altcoin as a pilot project to test the concept in a real application with real attacks and attempts to break the algorithm.

Another scary thing I came to think of: what if some Bitcoin miner already has discovered a way to break SHA-256? And they keep that knowledge secret for their own benefit.

Title: Re: Rule 30 automaton as hash function
Post by: DeathAndTaxes on July 20, 2014, 09:45:38 PM

Quote from: Anders on July 20, 2014, 08:48:15 PM

That is very rarely true. It depends on what you mean by broken. Faster than brute force no matter how much time or energy that would take (or even if the human race could generate energy on that scale)? Sure. Going from no know flaws to exploitable in a real world scenario. That hasn't happened to any major cryptographic hashing function in the last 30 years. Take SHA-1 for example. Collision attacks were found as early as 2005 however almost a decade later there isn't a single known preimage attack. Even MD5 which wouldn't be recommended for any new application is still resistance to first and second preimage attacks.

Quote

Another scary thing I came to think of: what if some Bitcoin miner already has discovered a way to break SHA-256? And they keep that knowledge secret for their own benefit.

If you could break SHA-2 there are a lot more valuable things you could do with it. Hell even if you didn't know what to do with it you could sell it to a three letter agency for seven figures easy. Still a flaw in SHA-2 it is unlikely to be useful to a miner. A miner isn't looking for a single hash but rather one of trillions of quadrillions of possible valid hashes. Mining is block is only on complexity 2^60. If you found a flaw in SHA-2 which would allow a preimage attack on the order of 2^80 it would be huge but would still be a thousand times slower than just mining by brute force.

Title: Re: Rule 30 automaton as hash function
Post by: smoothie on July 20, 2014, 09:51:57 PM

Quote from: DeathAndTaxes on July 19, 2014, 06:56:23 PM

I have to agree with Peter.

Quote

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.

In otherwords trust is through time. And the test of time of a particular algorithm or cryptocoin for that matter also needs to stand that test...

Much of these new alt coins have not stood that test of time.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 20, 2014, 09:57:50 PM

Computational irreducibility, in the case of a hash function, only protects against shortcuts when calculating the hash. It doesn't protect from going backwards from the hash to a matching message. Fortunately, in the case of rule 30, to determine previous values quickly becomes increasingly difficult, as Stephen Wolfram pointed out.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 20, 2014, 10:10:23 PM

Quote from: DeathAndTaxes on July 20, 2014, 09:45:38 PM

... Going from no know flaws to exploitable in a real world scenario. That hasn't happened to any major cryptographic hashing function in the last 30 years. Take SHA-1 for example. Collision attacks were found as early as 2005 however almost a decade later there isn't a single known preimage attack. Even MD5 which wouldn't be recommended for any new application is still resistance to first and second preimage attacks.

Or successful attacks have happen that people keep silent about! Who in the professional world would admit that their company/service got the cryptographic hash broken?

I read that SHA-1 was considered broken just because someone figured out how to find matches some three magnitudes faster than brute force. In practice this may mean that SHA-1 is still secure enough and that it's not recommended for the fear of exposure of even more efficient ways for finding matches.

I myself wouldn't fully trust either SHA-2 or a rule 30 hash function. Who knows what clever techniques the NSA and other agencies like that have. And big companies like IBM may also have methods they keep hidden from the public community.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 20, 2014, 10:29:09 PM

Quote from: DeathAndTaxes on July 20, 2014, 09:45:38 PM

But what if a Bitcoin miner has figured out how to select the bits in the block to get the required number of leading zeros? A direct method that bypasses the entire trillion tries brute force approach.

Title: Re: Rule 30 automaton as hash function
Post by: DeathAndTaxes on July 20, 2014, 10:47:42 PM

Quote from: Anders on July 20, 2014, 10:29:09 PM

Then Bitcoin would fail. There is no evidence anything like that is possible. What if Earth collided with a blackhole tomorrow?

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 21, 2014, 07:32:36 AM

I wonder if a rule 30 hash function would be quantum-safe. Here is an article about SHA-256 not being quantum-safe:

"Bitcoin Is Not Quantum-Safe, And How We Can Fix It When Needed

... Quantum computers have two major tools that make them superior to classical computers in breaking cryptography: Shor’s algorithm and Grover’s algorithm. ... A modified version of Shor’s algorithm can crack elliptic curve cryptography as well, and Grover’s algorithm attacks basically anything, including SHA256 and RIPEMD-160." -- http://bitcoinmagazine.com/6021/bitcoin-is-not-quantum-safe-and-how-we-can-fix/

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 21, 2014, 11:30:12 AM

If I have a hash with 16 bits, then in theory it should take 2^N/2 = 2⁸ = 256 attempts on average to find a collision.

"As an example, if a 64-bit hash is used, there are approximately 1.8 × 10¹⁹ different outputs. If these are all equally probable (the best case), then it would take 'only' approximately 5 billion attempts (5.1 × 10⁹) to generate a collision using brute force. This value is called birthday bound[2] and for n-bit codes it could be computed as 2^n/2.[3]" -- http://en.wikipedia.org/wiki/Birthday_attack#Mathematics

When I tested my hash function I get an average of around 350, not 256:

Attempt 1: 552
Attempt 2: 100
Attempt 3: 452
Attempt 4: 360
Attempt 5: 273
Attempt 6: 336
Attempt 7: 337
Attempt 8: 117
Attempt 9: 419
Attempt 10: 742
Attempt 11: 313
Attempt 12: 408
Attempt 13: 200
Attempt 14: 383
Attempt 15: 385
Attempt 16: 185
Attempt 17: 286
Attempt 18: 265
Attempt 19: 398
Attempt 20: 566
----------------
Average: 354

Source: http://jsfiddle.net/wsEU7/

Is there an error in my code, or is Math.random() in JavaScript a poor random generator, or is the theoretical value 256 only an approximation? ???

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 21, 2014, 11:48:54 AM

I changed the collision test to use different lengths of the messages tested (1 - 160 characters). Then the average value was around 400, even higher than with fixed length test messages:

Attempt 1: 41
Attempt 2: 487
Attempt 3: 122
Attempt 4: 584
Attempt 5: 771
Attempt 6: 474
Attempt 7: 458
Attempt 8: 378
Attempt 9: 109
Attempt 10: 312
Attempt 11: 585
Attempt 12: 558
Attempt 13: 227
Attempt 14: 515
Attempt 15: 266
Attempt 16: 423
Attempt 17: 462
Attempt 18: 305
Attempt 19: 808
Attempt 20: 296
----------------
Average: 409

Source: http://jsfiddle.net/4nFz8/

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 21, 2014, 01:13:56 PM

I tested with messages of up to 10,000 characters and got an average of about 350, the same as for fixed messages of length 4.

Attempt 1: 419
Attempt 2: 263
Attempt 3: 185
Attempt 4: 372
Attempt 5: 351
Attempt 6: 434
Attempt 7: 155
Attempt 8: 619
Attempt 9: 319
Attempt 10: 336
Attempt 11: 333
Attempt 12: 442
Attempt 13: 95
Attempt 14: 639
Attempt 15: 290
Attempt 16: 284
Attempt 17: 435
Attempt 18: 514
Attempt 19: 98
Attempt 20: 352
----------------
Average: 347

Source: http://jsfiddle.net/My76F/

So on average it seems that around 350 attempts are needed instead of the theoretical 256 attempts. It also shows that my wrapping method for the initial condition seems to work. And the collision-rate looks good, unless I have messed something up in my test.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 21, 2014, 08:20:06 PM

My current hash function is only potentially good when the message is unknown. When the message is known it's easy to construct collisions, such as manipulating the content in a document so that it gets the same hash value. Not good.

One solution to make the hash function more general, other than using the entire message as initial condition (which will become computationally very demanding for long messages), is, as I described earlier, to use another known hash function such as SHA-512 to generate the initial condition.

It's a bit boring to have to rely on another already existing hash function. I will try to figure out a way to make the rule 30 hash function more general without being dependent on other known hash function standards.

Also, the often used Merkle–Damgård construction has problems:

"Unfortunately, this construction also has several undesirable properties:

Length extension — once an attacker has one collision, he can find more very cheaply.
Second preimage attacks against long messages are always much more efficient than brute force.
Multicollisions (many messages with the same hash) can be found with only a little more work than collisions.[5]
"Herding attacks" (first committing to an output h, then mapping messages with arbitrary starting values to h) are possible for more work than finding a collision, but much less than would be expected to do this for a random oracle.[6][7]
"Extension attacks": Given the hash H(X) of an unknown input X, it is easy to find the value of H(pad(X) || Y), where pad is the padding function of the hash. That is, it is possible to find hashes of inputs related to X even though X remains unknown.[8] A random oracle would not have this property, and this may lead to simple attacks even for natural schemes proven secure in the random oracle model.[9] Length extension attack was actually used to attack a number of commercial web message authentication schemes such as one used by Flickr.[10]

" -- http://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Security_characteristics

Title: Re: Rule 30 automaton as hash function
Post by: DeathAndTaxes on July 21, 2014, 10:23:16 PM

Quote from: Anders on July 21, 2014, 08:20:06 PM

One solution to make the hash function more general, other than using the entire message as initial condition (which will become computationally very demanding for long messages), is, as I described earlier, to use another known hash function such as SHA-512 to generate the initial condition.

Then the function is no more secure than SHA-2 so why not just use SHA-2 if the goal is security?

It still may be useful to for a PoW by moving the nonce outside of the blockheader.

R30(nonce + H(blockheader)) < target

The security of the PoW still relies on the preimage resistance of H however if R30 is irreducible then it would prevent more efficient work in the PoW. This has the advantage of making mining hardware highly commoditized which means lower margins (anyone can do it and they work about the same) which is optimal from a security point of view.

Quote

Also, the often used Merkle–Damgård construction has problems: <snipped>

These are weaknesses known to M-D and they are what cryptographers target when attempting to "break" the hashing function. To date nobody has shown a preimage attack on SHA-1 or the more complex SHA-2 is possible.

The issue of length extension doesn't apply to PoW as the header has a fixed length and ordering. Even if you could perform a preimage attack on an existing block via length extension the resulting block would be invalid regardless of the block hash because Bitcoin blocks must be exactly 840 bytes and the elements ordered in a specific order.

In applications where the hash will protect variable length data using a HMAC over the pure hashing function is preferable. HMAC don't suffer from length extension attacks and they make collision attacks less effective. Still this is academical at this point as most hashing functions are still secure against preimage attacks (even the ancient MD5). A major goal of the SHA-3 competition was to bypass some of the weaknesses of M-D construction and as such there are theoretical length extension attacks on SHA-3. Still time trumps all, SHA-2 has been vetted more than SHA-3 at least as of today. Maybe in a decade or so but SHA-3 is a little ahead of its time as SHA-2 held up better than NIST expected it to.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 22, 2014, 06:50:11 AM

Quote from: DeathAndTaxes on July 21, 2014, 10:23:16 PM

Then the function is no more secure than SHA-2 so why not just use SHA-2 if the goal is security?

Hmm... Yes, the combination would perhaps not be more secure than SHA-2 alone except as a way of obfuscating the initial hash.

Quote

It still may be useful to for a PoW by moving the nonce outside of the blockheader.

R30(nonce + H(blockheader)) < target

The security of the PoW still relies on the preimage resistance of H however if R30 is irreducible then it would prevent more efficient work in the PoW. This has the advantage of making mining hardware highly commoditized which means lower margins (anyone can do it and they work about the same) which is optimal from a security point of view.

Concatenating the Bitcoin nonce and the hash of the blockheader and use it as an initial condition for an R30 hash would perhaps work. I don't know enough about the Bitcoin protocol to know for sure. As a proof of work to produce a value less than the target, the work required can easily be made harder by increasing the number of rows (generations) that should be calculated before the hash value is extracted.

In the standard rule 30 cellular automaton the initial condition is always only one bit set to 1. The trick I have done is to replace that initial condition with a whole sequence of bits. And the other part of the trick is to skip enough generations in the cellular automaton so that the initial bits influence each other left to right and back again. This ensures that the initial condition is distributed in a highly random way that is unique for each message, when the hash is long enough. The hash value is taken from every other cell in the center column of the cellular automaton after the initial (skipped) rows have been calculated.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 22, 2014, 08:56:07 AM

Here is a version of my hash function using a kind of Merkle–Damgård construction: http://jsfiddle.net/YkPA3/

Length extension attacks are protected against by concatenating the initial condition with the bits of the length of the message. Any change in length of the message and that bit pattern will become different, making the cellular automaton expand with different cell values.

The compression function is a simple cipher with a key that depends on the previous block (of 640 bits).

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 22, 2014, 03:45:11 PM

"The Davies–Meyer single-block-length compression function feeds each block of the message (m_i) as the key to a block cipher. It feeds the previous hash value (H_i-1) as the plaintext to be encrypted. The output ciphertext is then also XORed (\oplus) with the previous hash value (H_i-1) to produce the next hash value (H_i)." -- http://en.wikipedia.org/wiki/One-way_compression_function#Davies.E2.80.93Meyer

http://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Davies-Meyer_hash.svg/230px-Davies-Meyer_hash.svg.png

The trick with the Davies–Meyer compression function seems to be the xor of the previous hash value with the ciphertext.

I made a very simple version based on this: http://jsfiddle.net/2mK9R/

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 22, 2014, 07:27:03 PM

The initial conditions I have developed are way too amateurish for any serious application. Anyway, it's fun to learn about hash functions, and hopefully others new to this stuff can learn some basics too.

Length extension attacks are a serious threat even to SHA-256. In Bitcoin SHA-256 is used twice in a row. I learned that this technique is actually called SHA-256d:

"You should also consider using a generic length-extension defense such as the "SHA-256d" design by Ferguson and Schneier. SHA256d(x) = SHA256(SHA256(x))." -- http://crypto.stackexchange.com/questions/893/at-the-current-time-is-sha256-the-de-facto-standard-for-strong-cryptographic-ha

With rule 30, length extension defense can be done like in one of my previous posts. As an initial condition, start with the SHA-256 hash and concatenate it with the bits for the length of the message. And then run the rule 30 automaton as I have described earlier. That could be a serious use of the rule 30 hash function.

"Distinguishing H² from a random oracle (essentially an ideal hash) is much cheaper that it should, namely 2⁶⁴ for SHA-256d. This doesn't lead to any practical attacks, but it hurts security proofs relying on indistinguishably. It is easy to avoid this problem by using distinct prefixes for the inner and outer hash, so I see little reason to use H² in practice." -- http://crypto.stackexchange.com/questions/7895/weaknesses-in-sha-256d

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 22, 2014, 10:14:27 PM

There is a risk that any form of reducing the length of the original message will affect the security of the rule 30 hash function (as Peter R already pointed out earlier).

Therefore I made a version that uses the entire message as initial condition: http://jsfiddle.net/8aR3W/

It's very computationally demanding for long messages. On the other hand, computing performance is progressing exponentially year by year, and many practical cryptographic applications deal with short messages.

Title: Re: Rule 30 automaton as hash function
Post by: Peter R on July 23, 2014, 05:59:24 AM

I agree with DeathAndTaxes and Grau that Rule 30 is probably most interesting as a proof of work. Using it as such, and applying D&T's suggestion

R30(nonce + H(blockheader)) < target

also means you don't have to worry about arbitrary-length messages.

Have you considered what the minimum requirement is for the number of cells in width need to apply Rule 30 to these fixed-length initial conditions? Can the cell width simply be equal to the bit-length of the initial conditions, or do you need additional width to ensure irreversibility (I wouldn't think so)? Have you given any thought to what a Rule 30 PoW ASIC would look like? I suspect it might be simpler and cheaper than a SHA256 ASIC (although I understand the SHA256 ASICs are fairly simple too).

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 23, 2014, 10:00:07 AM

Quote from: Peter R on July 23, 2014, 05:59:24 AM

I did a quick test now, and skipping a number of rows equal to 1.5 times the number of bits in the initial condition seems enough: http://jsfiddle.net/r3qAd/

And that makes theoretical sense since that's the point where all the bits have influenced each other and reached back to the center column.

Quote

Can the cell width simply be equal to the bit-length of the initial conditions, or do you need additional width to ensure irreversibility (I wouldn't think so)?

I think skipping rows 1.5 times the number of bits in the initial condition is enough. The initial condition is the number of bits in the message plus 32 bits for the concatenation of the length bits. The length bits need to be added or else there would be no difference for different length messages containing only zeros. The cellular automaton in addition must be large enough to contain the additional rows for calculating the hash value, i.e. taking the bit value of every other cell in the center column.

Quote

Have you given any thought to what a Rule 30 PoW ASIC would look like? I suspect it might be simpler and cheaper than a SHA256 ASIC (although I understand the SHA256 ASICs are fairly simple too).

I guess a hardware implementation of the automaton would be very fast. I have also seen a term "quantum dot cellular automaton".

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 23, 2014, 10:32:08 AM

"In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every unique query with a (truly) random response chosen uniformly from its output domain. If a query is repeated it responds the same way every time that query is submitted." -- http://en.wikipedia.org/wiki/Random_oracle

The Rule 30 hash function is almost like a random oracle perhaps.

"In cryptography and computer security, length extension attacks are a type of attack on certain types of hashes which allow inclusion of extra information." -- http://en.wikipedia.org/wiki/Length_extension_attack

The Rule 30 hash function is automatically protected against length extension attacks since the bits for the length of the message are concatenated with the bits of the full message, making the cellular automaton expand with different values for every different message length.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 23, 2014, 11:18:38 AM

Here is a version of the Rule 30 hash function that skips rows 2.5 times the number of bits in the initial condition: http://jsfiddle.net/d3NS6/

The reason is to ensure that the bits become influenced enough as a protection against cryptanalysis. The image shows that 2.5 times the message length in bits is enough to make the bits to the far left and right influence the cells back into the center column after N rows.

http://s28.postimg.org/rajasb799/rule_30_skip_rows.png

The next image shows again how protection against cryptanalysis is achieved:

http://s27.postimg.org/j1a9xk3yb/rule_30.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 23, 2014, 11:57:07 AM

The image below shows how the bits for the hash value are taken from the further expansion of the cellular automaton after 2.5 N rows have been calculated. If the bit values would be taken from the top of the automaton the result would be far from random for similar messages.

http://s9.postimg.org/q10f7a4cf/rule_30_hash_function.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 23, 2014, 01:17:31 PM

Here is the Rule 30 hash function with the same algorithm and with fewer calculations: http://jsfiddle.net/GwLp4/

Since it's only the center column from where values are taken, fewer cells need to be calculated at the bottom half of the pyramid.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 23, 2014, 02:40:20 PM

I think the full message version of my Rule 30 hash function has at least the following cryptographic benefits:

Resistant against preimage attacks.
Resistant against second-preimage attacks.
Resistant against collision attacks.
Resistant against chosen-prefix collision attacks.
Resistant against length extension attacks.
Resistant against brute-force attacks.
Resistant against distinguishing attacks.
Resistant against cryptanalysis.
Close approximation to a random oracle.

"preimage resistance: for essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage x such that h(x) = y when given any y for which a corresponding input is not known.[1]

second-preimage resistance: it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given x, to find a second preimage x' ≠ x such that h(x) = h(x′).[1]" -- http://en.wikipedia.org/wiki/Preimage_attack (http://en.wikipedia.org/wiki/Preimage_attack)

"Collision attack
Find two different messages m1 and m2 such that hash(m1) = hash(m2).

Chosen-prefix collision attack
Given two different prefixes p1, p2 find two appendages m1 and m2 such that hash(p1 ∥ m1) = hash(p2 ∥ m2) (where ∥ is the concatenation operation)." -- http://en.wikipedia.org/wiki/Collision_attack (http://en.wikipedia.org/wiki/Collision_attack)

"In cryptography and computer security, length extension attacks are a type of attack on certain types of hashes which allow inclusion of extra information." -- http://en.wikipedia.org/wiki/Length_extension_attack (http://en.wikipedia.org/wiki/Length_extension_attack)

"In cryptography, a brute-force attack, or exhaustive key search, is a cryptanalytic attack that can, in theory, be used against any encrypted data[1] (except for data encrypted in an information-theoretically secure manner)." -- http://en.wikipedia.org/wiki/Brute_force_attack (http://en.wikipedia.org/wiki/Brute_force_attack)

"In cryptography, a distinguishing attack is any form of cryptanalysis on data encrypted by a cipher that allows an attacker to distinguish the encrypted data from random data. [1]" -- http://en.wikipedia.org/wiki/Distinguishing_attack (http://en.wikipedia.org/wiki/Distinguishing_attack)

"In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every unique query with a (truly) random response chosen uniformly from its output domain. If a query is repeated it responds the same way every time that query is submitted." -- http://en.wikipedia.org/wiki/Random_oracle (http://en.wikipedia.org/wiki/Random_oracle)

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 24, 2014, 11:21:16 AM

I created a Google Code project called R30 hash function: https://code.google.com/p/r30-hash-function/

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 25, 2014, 12:47:23 AM

It would probably be too risky to use R30 in Bitcoin today. The algorithm is way too untested. And even if experts would find it likely to be strong, the algorithm has to be used in real applications and be exposed to real and heavy attacks. Anyway, R30 could be a good backup in case SHA-256 would become seriously breached.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 25, 2014, 03:33:00 PM

Another potential problem with R30 other than it being untested is that it may turn out to bee too good! ;D Imagine criminal organizations including terrorists being able to communicate in a way that is hidden even to intelligence agencies like the NSA. I don't like Orwellian stuff, but I trust the NSA more than I would trust secret communication between criminal elements.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 26, 2014, 07:28:52 PM

I think Satoshi's solution to the Byzantine General's problem is the first real world application of Wolfram's computational irreducibility, since doing an irreducible computation is the ideal proof of work.

My intuition says that known attacks to hash functions are only special cases of reducibility checks and a computationally irreducible function must be resistant of them by definition.

Maybe SHA256 is also computationally irreducible, it is however more likely that irreducibility of an R30 hash will be rigorously proven, than that of SHA256.

It is desirable to use the least complex irreducible computation for proof of work, as that leaves least room for implementation differences. If R30 is irreducible then it is also the least complex of such.

Note that computation of the below area is sufficient.

https://i.imgur.com/qG79PEz.jpg

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 26, 2014, 08:19:23 PM

Quote from: Anders on July 23, 2014, 11:18:38 AM

This sounds weak. This is the path to perceived security through perceived complexity, the thinking how SHA probably was constructed.

If there is really something with R30, then below should be sufficient.

https://i.imgur.com/UMOxwJO.png

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 26, 2014, 08:42:13 PM

Note that R30 has an apparent asymmetry in its "chaos".

The below pictures hint that left and right side of an input do not receive the same "kind" of hashing. Likely irrelevant, but puzzling.

https://i.imgur.com/7AqZnAA.png
https://i.imgur.com/p7pYxoQ.png

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 26, 2014, 09:04:30 PM

Does not "seem" to hurt.

https://i.imgur.com/g9yiNyc.png

Title: Re: Rule 30 automaton as hash function
Post by: Peter R on July 27, 2014, 12:46:22 AM

I mined my name :D

https://i.imgur.com/wgF3ufJ.png

I noticed that the nonce needs to be on the left side of the message, as the hashing seems much more effective moving to the right than moving to the left (as Grau observed too). So, I think this phenomenon needs to be better understood before you can decide what column to take your hash from and how many rows to calculate.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 03:48:45 AM

Quote from: grau on July 26, 2014, 07:28:52 PM

Note that computation of the below area is sufficient.

https://i.imgur.com/qG79PEz.jpg

Yes, that area is what is calculated in my Java (and latest JavaScript) version of R30.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 03:59:03 AM

Quote from: Peter R on July 27, 2014, 12:46:22 AM

Nice. So for that smaller hash at least is shows that leading zeros can be achieved with R30. Yes, the right side of the Rule 30 automaton is more chaotic it seems. But did you skip at least 1.5 times the number of bits in the initial condition? That's the smallest number of generations needed for both left and right side to become "blended" back into the center column.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 04:23:29 AM

The right side of the automaton is where the "randomness" is so the right value in blue needs to be mixed back into the center column:

http://s2.postimg.org/f9ktf7qqx/30_hash.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 04:45:58 AM

The same with the leftmost value in the initial condition. It needs to become "randomized" in the right side of the automaton and then brought into the center column. Otherwise the hash values will be similar for similar messages. This means 1.5 N rows where N is the number of bits in the initial condition.

http://s15.postimg.org/53nnooiqj/30_hash.png

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 04:46:40 AM

Quote from: Anders on July 27, 2014, 04:23:29 AM

The right side of the automaton is where the "randomness" is so the right value in blue needs to be mixed back into the center column:

http://s2.postimg.org/f9ktf7qqx/30_hash.png

Excuse, but that sounds arbitrary to me.
I am looking for the least complex but irreducible hash, and am not convienced that "mixing back" would be needed.

https://i.imgur.com/g9yiNyc.png

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 05:07:42 AM

You see below that the hash is already part of the right side of an R30 evolution and that every bit of it has input from every bit of the value to be hashed.

https://i.imgur.com/WnafL3z.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 05:11:09 AM

Quote from: grau on July 27, 2014, 04:46:40 AM

Quote from: Anders on July 27, 2014, 04:23:29 AM

The right side of the automaton is where the "randomness" is so the right value in blue needs to be mixed back into the center column:

Excuse, but that sounds arbitrary to me.
I am looking for the least complex but irreducible hash, and am not convienced that "mixing back" would be needed.

I added this to the R30 Wiki page:

"The right side of the Rule 30 cellular automaton has chaotic behavior while the left side shows regularities. This means that the leftmost bit value in the initial condition needs to reach the right side and then back into the center column, which requires at least 1.5 N rows calculated before the hash value is extracted, where N is the number of bits in the initial condition." -- https://code.google.com/p/r30-hash-function/wiki/R30HashFunction

You can try with 0.5 N which is the value needed for both left and right side of the initial condition to reach the center column. The hash values then become poor with similar messages producing similar hashes, without enough randomness needed for well approximation of a random oracle.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 05:17:04 AM

Quote from: Anders on July 27, 2014, 05:11:09 AM

You can try with 0.5 N which is the value needed for both left and right side of the initial condition to reach the center column. The hash values then become poor with similar messages producing similar hashes, without enough randomness needed for well approximation of a random oracle.

I have not seen the proof for above, yet. It is only your intuition against mine.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 05:32:04 AM

It appears that neither of us had the right intuition. I plotted the effect of a bit on the outer right side. The question is: What is the slope of the yellow line?

https://i.imgur.com/7sKpFDi.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 05:36:51 AM

Quote from: grau on July 27, 2014, 05:17:04 AM

Quote from: Anders on July 27, 2014, 05:11:09 AM

I have not seen the proof for above, yet. It is only your intuition against mine.

Here is a hash function with 0.5 N: http://jsfiddle.net/UQ6B7/

Examples of poor hash values with 0.5 N:

Message	Hash value
aaaaaaaaaaaaaaaaaa	f257274bd5c24f00b46718fd1ca4eba7b4f9cb6e2ad6ad1e37052faef1bbd9c8
aaaaaaaaaaaaaaaaab	f257274bd5e2439806662c8974ba0ce0466179da04755ec3e18cc8fea4202282
aaaaaaaaaaaaaaaaac	f257274bd5c24f00b467187d070bf196aecffa11f0d27bceb9bbf637b3cfe038
aaaaaaaaaaaaaaaaad	f257274bd5c24f00b467187d070bf196aecffa0df0d27b98feb474ede682559e
aaaaaaaaaaaaaaaaae	f257274bd5e24300b467187d070bf196aecffa0df0d27b980eaf332cf1222f82

Compare with R30:

Message	Hash value
aaaaaaaaaaaaaaaaaa	1e37052faef1bbd9c866c69a246ee30536f9b68133e5101ea7f290b630303138
aaaaaaaaaaaaaaaaab	c3e18cc8fea420228257228e7ef0a0addc4554130b343d5f43d5e1dcb47dff0e
aaaaaaaaaaaaaaaaac	ceb9bbf637b3cfe0383a2b2c6c67cc0174946e5387a6fa136290dac8f71dfa68
aaaaaaaaaaaaaaaaad	98feb474ede682559e459ceaae5518563fabdb3b06da4a488f46069750730877
aaaaaaaaaaaaaaaaae	980eaf332cf1222f82052e23d535a551793971725362acb7a192e0afa9243957

Yikes! It seems that 2.5 N is too small value too. Hmm... The chaotic mixing of Rule 30 needs to be better understood.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 05:41:56 AM

Quote from: Anders on July 27, 2014, 05:36:51 AM

Yikes! It seems that 2.5 N is too small value too. Hmm... The chaotic mixing of Rule 30 needs to better understood.

Exactly. See the slope question I raised before.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 05:42:07 AM

Quote from: grau on July 27, 2014, 05:32:04 AM

It appears that neither of us had the right intuition. I plotted the effect of a bit on the outer right side. The question is: What is the slope of the yellow line?

https://i.imgur.com/7sKpFDi.png

Interesting. The area on the right seems to be the chaotic part. And the slope at a steeper angle than 45 degrees. See my previous post where it seems that I was wrong about 2.5 N.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 05:58:25 AM

Here you are from Wolfram, "Sensitivity to initial conditions"

https://www.wolframscience.com/nksonline/page-251?firstview=1 (https://www.wolframscience.com/nksonline/page-251?firstview=1)

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 06:25:17 AM

I run quite a few initial conditions to see that the slope is far from trivial, see example:

I see no safe lower bound for the slope yet.

https://i.imgur.com/ODjKhqE.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 07:00:07 AM

Quote from: grau on July 27, 2014, 06:25:17 AM

I run quite a few initial conditions to see that the slope is far from trivial, see example:

I see no safe lower bound for the slope yet.

https://i.imgur.com/ODjKhqE.png

Wow. That's pretty nonlinear. Maybe my initial value of 4N is needed after all. Then the hash value will be taken from below where the slope crosses the center column. Unless the slope can bend nonlinearly the other way in some extreme ways. Tricky.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 08:54:01 AM

The influence of the right side bit may look nonlinear yet the mixing of cell values is actually a linear process. The bits have influenced each other fully first at the blue lines. And then those values need to reach the center column, which means 2.5 N.

http://s9.postimg.org/q10f7a4cf/frule_30_hash_function.png

In my previous version I started the automaton at the message bits. It should start with the whole initial condition which includes the concatenated length bits, like this: http://jsfiddle.net/74TNa/

This version with 2.5 N produces better hash values:

Message	Hash value
aaaaaaaaaaaaaaaaaa	e43391dfc7da619e4639a6f0b46559772f39ac7b4d258dd60f5652226a550822
aaaaaaaaaaaaaaaaab	42bec61717e9f484b29653b5da5e0060f00bf41516e007440644e8df1d9aec94
aaaaaaaaaaaaaaaaac	5282f5d32d100117b0c5a05a8ae5994e6d36d32e7445003aca97830516b9177c
aaaaaaaaaaaaaaaaad	2f11a9e1094d1b5982f97405b4aa1c015dce940fc25fd81935bee5c6ec7ad7ae
aaaaaaaaaaaaaaaaae	375bc9e8757b2b1840408532b992bde636272025e43d7a1e95cf07b0512747c5

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 09:10:21 AM

Quote from: Anders on July 27, 2014, 07:00:07 AM

In degenerate cases you would even need 5N, see below for 010101... initial conditions
https://i.imgur.com/lLMYnZg.png

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 09:16:38 AM

Quote from: Anders on July 27, 2014, 08:54:01 AM

It should start with the whole initial condition which includes the concatenated length bits

Sounds alchemy. You just hope that length in binary representation is not a specific pattern as above.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 09:44:55 AM

Quote from: grau on July 27, 2014, 09:16:38 AM

Quote from: Anders on July 27, 2014, 08:54:01 AM

It should start with the whole initial condition which includes the concatenated length bits

Sounds alchemy. You just hope that length in binary representation is not a specific pattern as above.

The length bits are only there to make distinctions between messages that otherwise would produce only zeros or messages that end with zeros which would give identical initial conditions without the length bits. So the length bits must be included. And the initial condition includes those length bits. In my previous version I added the length bits without including them in the number of bits of the initial condition.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 09:55:48 AM

Quote from: Anders on July 27, 2014, 09:44:55 AM

I do not care of a variable length message, but am looking for the least complex irreducible computation hash for a block header. Message wrapping should be in an other logical layer either.

Here is what I get for a special pattern 80 byte input attempting to produce 32 bytes of hash after 5N length. It still fails.

https://i.imgur.com/rrlmICP.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 10:07:28 AM

Quote from: grau on July 27, 2014, 09:55:48 AM

Quote from: Anders on July 27, 2014, 09:44:55 AM

Ok, good point. The worst case pattern needs to be identified, and then the number of rows needed for that pattern. I tested with messages UUUUUU... for which I needed 7N before the hash values started to become different! (U = ASCII 0x55 = 01010101)

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 10:10:00 AM

It seems 7 is the bare minimum for a setup like in Bitcoin 80 bytes -> 32 bytes hash.

https://i.imgur.com/5Yjh7ir.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 10:14:20 AM

Quote from: grau on July 27, 2014, 10:10:00 AM

It seems 7 is the bare minimum for a setup like in Bitcoin 80 bytes -> 32 bytes hash.

I got 7 too by trial and error. But is that the worst case pattern? And what about the leftmost value?

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 10:26:10 AM

Quote from: Anders on July 27, 2014, 10:14:20 AM

Quote from: grau on July 27, 2014, 10:10:00 AM

It seems 7 is the bare minimum for a setup like in Bitcoin 80 bytes -> 32 bytes hash.

I got 7 too by trial and error. But is that the worst case pattern? And what about the leftmost value?

Thats the beauty of the minimum design. One can enumerate all patterns to figure the worst case, since inputs are only 3 bit wide there are only 8 possible patterns.

The leftmost must be fine.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 10:40:50 AM

Quote from: grau on July 27, 2014, 10:26:10 AM

Quote from: Anders on July 27, 2014, 10:14:20 AM

Quote from: grau on July 27, 2014, 10:10:00 AM

It seems 7 is the bare minimum for a setup like in Bitcoin 80 bytes -> 32 bytes hash.

I got 7 too by trial and error. But is that the worst case pattern? And what about the leftmost value?

Thats the beauty of the minimum design. One can enumerate all patterns to figure the worst case, since inputs are only 3 bit wide there are only 8 possible patterns.

The leftmost must be fine.

Actually there are 2 2 bit patters in addition 0,1,0,1 ... and 1,0,1,0... so we have 10 cases.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 10:52:10 AM

It seems the 0,1,0,1 ... and 1,0,1,0 ... two bit patterns are the worst case that need >7N evolutions.

And there is a corner case of 0,0,0 ... hat leads to a hash of 0,0,0....

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 11:12:21 AM

With such hash function:
- the work needed is exactly quantified by number of logical operations, no implementation specific gimmicks.
- units implementing rule 30 are identical and utmost simple
- all units can run in parallel for a single layer
- has constant memory requirement, that is just two arrays one of the current and one for next generation of the evolution,
- irreducibility of computation could be rigorously proven

Its single parameter is the number of evolutions before emitting the hash.
This parameter should be set >= 7 times input width.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 11:28:55 AM

I have no idea of hardware design, but guess that the whole net could be laid out as is without a memory, it would just need exact clocking.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 01:03:45 PM

Quote from: grau on July 27, 2014, 10:52:10 AM

It seems the 0,1,0,1 ... and 1,0,1,0 ... two bit patterns are the worst case that need >7N evolutions.

And there is a corner case of 0,0,0 ... hat leads to a hash of 0,0,0....

I think 7N is enough. I tried up to 15,360 bits and it works, with different hash results for UUUU...U and UUUU...V. It also works when changing the start of the message to VUUU...U.

R30 with 7N: http://jsfiddle.net/2Mxt2/

The case with all zeros is taken care of by the extra bits for the length of the message. It's only the empty message '' that results in hash value zero.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 01:09:22 PM

Quote from: grau on July 27, 2014, 11:12:21 AM

- has constant memory requirement, that is just two arrays one of the current and one for next generation of the evolution,

Actually, I use only one array for the cellular automaton in my implementation. I save a single value in a temporary variable, then write the result of the next generation into the array, and for the next calculation I use the temporary variable instead of the value in the array, iterated from left to right for all the cells.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 01:21:46 PM

Quote from: Anders on July 27, 2014, 01:09:22 PM

Quote from: grau on July 27, 2014, 11:12:21 AM

- has constant memory requirement, that is just two arrays one of the current and one for next generation of the evolution,

I do not think such algorithm should be optimised for space, rather time. Yours is rather sub-optimal doing everything sequentially although substitutions are independent and not using that even in software operating on bit arrays is faster than on individual bits. I think I will write one that beats this with a few magnitudes for fun :)

Later today .. now I rather play with the kids.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 02:54:00 PM

Fortunately it seems that 7N is enough. I tested with messages of length 100,000 bits:

The R30 hash for "UUU...U" is: e4ebe82d717cf60eb6b7d6473f647d733fb0efbfe930bd3048bb14b0cbe7e97f
The R30 hash for "UUU...V" is: 0f821effa4773fdbbe6d7a8c50bd2146c0fe75e577fab0d59dd57c8aa582ba78

But how to prove that 7N is enough mathematically? ???

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 03:38:45 PM

Quote from: grau on July 27, 2014, 01:21:46 PM

Quote from: Anders on July 27, 2014, 01:09:22 PM

Quote from: grau on July 27, 2014, 11:12:21 AM

- has constant memory requirement, that is just two arrays one of the current and one for next generation of the evolution,

Yes, definitely. The memory requirement is tiny. Then two arrays may be faster. And also to calculate with for example 16-bit words instead of 1 bit at a time as I do now.

If you want to compare the hash values, use my latest version 7N and with reversed bits (the least significant bit for each byte in the message now starts from the left in the automaton): http://jsfiddle.net/MA3vS/

Title: Re: Rule 30 automaton as hash function
Post by: justusranvier on July 27, 2014, 04:25:36 PM

Tell me I'm not the only one who initially misread this headline and imagined something completely different...

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 05:19:33 PM

Quote from: Anders on July 27, 2014, 02:54:00 PM

But how to prove that 7N is enough mathematically? ???

We enumerated all three bit patterns and 7N is enough to propagate any change to the middle column, even in worst case pattern. A pattern with repetition cycle of more than 3 bits or a random pattern must start local triangles that propagate at a higher slope than the worst case two bit pattern 01010 ...

Is this enough?

Well, enough for what? I think we should first be clear of properties needed for the purpose of block header hashing.
My take is:
- no known inverse - Wolfram demonstrated nicely that the middle column gives no clue of the input after a few generations.
- full propagation - the nonce should be able to flip any bit in the hash - we think given with 7N generations.
- proof of work - means no shortcuts, irreducible, given more clearly than in SHA256
- able to meet any difficulty target - thats a hard question, but did someone prove it for SHA256d? I think not. If there was an attempt to prove this, it would be easier for this one.

did I missed some?

Title: Re: Rule 30 automaton as hash function
Post by: Peter R on July 27, 2014, 05:38:14 PM

Quote from: grau on July 27, 2014, 05:19:33 PM

Quote from: Anders on July 27, 2014, 02:54:00 PM

But how to prove that 7N is enough mathematically? ???

I don't believe it's a requirement to select the hash bits from the central column (although I realize that this is what Mathematica does with Random[]). Given the fact that the influence of the nonce propagates better moving right than moving left, I think it makes sense to select a column to the right of center. If your example image (nice graphics BTW) shows the worst case propagation, then perhaps this is sufficient:

https://i.imgur.com/pXs8W7q.png

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 06:08:50 PM

Fewest calculations so far and with 7N: http://jsfiddle.net/34FyH/

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 06:11:51 PM

Quote from: grau on July 27, 2014, 05:19:33 PM

Quote from: Anders on July 27, 2014, 02:54:00 PM

But how to prove that 7N is enough mathematically? ???

Proved for Bitcoin, yes. But what about longer messages? It seems that the slope converges to 6. something for large messages (I have tested up to 100,000 bits). 7N seems enough, but I don't know for sure if it will work for say a 1 megabyte file as the message.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 06:26:07 PM

Here is a more efficient implementation, not yet parallel, but already illustrates the power of simple design for implementation.

Hashing the argument in hex (assumed block header) to 32 bytes of output after 7 * 80 * 8 rounds with rule 30.

Code:

package com.bitsofproof.r30hash;
import java.math.BigInteger;
public class Main 
{
    public static void main(String[] args)
    {
	    BigInteger input = new BigInteger (args[0], 16);
	    BigInteger result = BigInteger.ZERO;
	    int rounds = (80 * 7 + 32) * 8;
	    for ( int i = 0; i < rounds; ++i )
	    {
		    BigInteger q = input.shiftLeft (1);
		    BigInteger r = input.shiftLeft (2);
		    input = input.xor(q.or(r));
		    if ( i >= 80 * 7 * 8  )
		    {
			    if ( input.testBit (i+1) )
			    {
				    result = result.setBit (0);
			    }
			    result = result.shiftLeft (1);
		    }
	    }
	   System.out.println(result.toString (16));
    }
}

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 06:40:06 PM

Quote from: Peter R on July 27, 2014, 05:38:14 PM

I don't believe it's a requirement to select the hash bits from the central column ...

It is apparent that chaos converges to order toward both left and right, so intuition says to stay in the middle if you want maximum entropy.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 07:15:17 PM

Quote from: grau on July 27, 2014, 06:26:07 PM

Code:

package com.bitsofproof.r30hash;
import java.math.BigInteger;
public class Main 
{
    public static void main(String[] args)
    {
	    BigInteger input = new BigInteger (args[0], 16);
	    BigInteger result = BigInteger.ZERO;
	    int rounds = (80 * 7 + 32) * 8;
	    for ( int i = 0; i < rounds; ++i )
	    {
		    BigInteger q = input.shiftLeft (1);
		    BigInteger r = input.shiftLeft (2);
		    input = input.xor(q.or(r));
		    if ( i >= 80 * 7 * 8  )
		    {
			    if ( input.testBit (i+1) )
			    {
				    result = result.setBit (0);
			    }
			    result = result.shiftLeft (1);
		    }
	    }
	   System.out.println(result.toString (16));
    }
}

Neat. I noticed in this paper that indeed Rule 30 can be defined as x_i' = x_i-1 xor (x_i or x_i+1):

Cryptography with Cellular Automata -- http://www.stephenwolfram.com/publications/academic/cryptography-cellular-automata.pdf

I wonder about the performance of BigInteger though.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 08:11:37 PM

Using the Java long type (64 bits) would make the algorithm fast. Tedious to implement however with all the padding, boundary matching and shifts of bits between the longs etc. I'm too lazy for that at the moment. Plus BigInteger is perhaps fast enough to match the use of longs for the implementation of R30.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 08:25:18 PM

If I were to use this for real (or had more time for fun) I'd rewrite it using a vector of longs and also parallelise it since computation on (groups of) longs can be executed with CPUs cores in parallel only synchronising at change of generation.

I am sure the BigInteger with the analytic form of rule 30 already beats your javascript with 2 magnitudes :P

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 27, 2014, 08:59:24 PM

I think with machine code the shifts of the 64 bit registers can automatically be chained. Maybe assembler code in C/C++ would be efficient. Of course, a hardware implementation could be made even much faster than that.

In reality only Bitcoin miners who would use dedicated R30 hardware would be competitive.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 27, 2014, 10:36:35 PM

Quote from: Anders on July 27, 2014, 08:59:24 PM

Sure. Writing software for this is just for research and fun.

The best in class code in whatever software layer will be still lots of magnitudes away of an ASIC for R30, since it is utmost simple, homogenous and parallelizable at finest scale with constant memory (if any) need.

This efficiency combined with irreducibility of computation, no inverse and ability to satisfy any difficulty would make it the perfect proof of work.

The ability to satisfy any difficulty is something that yet bugs me.. I wish I could convince myself that it has this property, in the limited generation of 7N.

Since there is 80 bytes input to determine 32 bytes output, there is plenty of freedom to create whatever pattern though.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 28, 2014, 08:59:31 AM

Quote from: grau on July 27, 2014, 08:25:18 PM

It turns out that the performance of the below is pretty dismal if compared with SHA-256 as implemented by the standard java runtime libraries. My guess is that memory allocation / shuffling for increasing length of BigIntegers dominates it. This needs a pre-allocated store of the right size.

Code:

	private static BigInteger r30hash (BigInteger input)
	{
		BigInteger result = BigInteger.ZERO;
		for ( int i = 0; i < (80 * 7 + 32) * 8; ++i )
		{
			input = input.xor (input.shiftLeft (1).or (input.shiftLeft (2)));
			if ( i >= 80 * 7 * 8  && input.testBit (i + 1) )
			{
				result = result.setBit (0);
			}
			result = result.shiftLeft (1);
		}
		return result;
	}

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 09:23:05 AM

Fast Java implementation of R30 using 64-bit longs:

Code:

	public final byte[] digest(byte[] message) {
		int maxMessageBytes = message.length;
		int maxKeyBytes = maxMessageBytes + 4;
		int maxKeyBits = maxKeyBytes * 8;
		byte[] hash = new byte[MAX_HASH_BYTES];
		byte[] key = new byte[maxKeyBytes];
		for (int i = 0; i < maxMessageBytes; i++) {
			key[i + 2] = message[i];
		}
		key[0] = (byte) (maxMessageBytes >>> 24);
		key[1] = (byte) (maxMessageBytes >>> 16);
		key[maxKeyBytes - 2] = (byte) (maxMessageBytes >>> 8);
		key[maxKeyBytes - 1] = (byte) maxMessageBytes;
		int maxHashBits = MAX_HASH_BYTES * 8;
		int skipRows = maxKeyBits * 7;
		int maxCells = 2;
		maxCells += maxKeyBits;
		maxCells += skipRows;
		maxCells += maxHashBits * 2;
		int maxLongs = (maxCells + 63) >>> 6;
		maxCells = maxLongs << 6;
		int cellsMid = maxCells / 2;		
		long[] cells = new long[maxLongs];
		int keyStart = (maxCells - maxKeyBits) / 2;
		for (int i = 0; i < key.length; i++) {
			int keyChar = key[i];
			int bitPos = 0x80;
			for (int j = 0; j < 8; j++) {
				long b = (keyChar & bitPos) >>> (7 - j);
				int bitIndex = keyStart + i * 8 + j;
				cells[bitIndex >>> 6] |= b << (63 - (bitIndex % 64));
				bitPos >>>= 1;
			}
		}
		int bitCount = 0;
		int mid = 0;
		int longMid = maxLongs / 2;
		int longMidShift = longMid * 2 == maxLongs ? 63 : 31;
		int maxRow = skipRows + maxHashBits * 2;
		for (int row = 0; row < maxRow; row++) {
			int doubleRow = row * 2;
			int calcWidth = doubleRow;
			if (calcWidth > maxRow - 2) {
				calcWidth = maxRow - ((doubleRow) % maxRow) + 2;
			} else {
				calcWidth += maxKeyBits;
			}
			int halfWidth = calcWidth / 2 + 2;
			int start = (cellsMid - halfWidth) >>> 6;
			int end = (cellsMid + halfWidth + 63) >>> 6;
			mid = (int) ((cells[longMid] >>> longMidShift) & 0x01);
			long carryLeft = 0L;
			for (int i = start; i < end; i++) {
				long l = cells[i];
				long carryRight = i < maxLongs - 1 ? cells[i + 1] >>> 63 : 0;
				long cellRight = (l << 1) | carryRight;
				long cellLeft = (l >>> 1) | carryLeft;
				carryLeft = l << 63;
				cells[i] = cellLeft ^ (l | cellRight);
			}
			if (row < skipRows) {
				continue;
			}
			if (row % 2 == 1) {
				if (mid == 1) {
					int bufPos = bitCount >>> 3;
					hash[bufPos] ^= 1 << (bitCount % 8);
				}
				bitCount++;
			}
		}
		return hash;
	}

Slower JavaScript alternative: http://jsfiddle.net/7DV7Z/

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 28, 2014, 12:11:37 PM

Remove message length encoding. That's a foreign concept to a hash function.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 12:25:41 PM

I did a worst case test with one million bits messages of 010101010... and it works:

The R30 hash for UUU...U (1000000 bits) is: 10ea3662363b4940ed208b2a70960f7520bd5309e3f059ea5d106cdde9f2d7a1
The R30 hash for UUU...V (1000000 bits) is: 8903d17a9ec35cce2f028907e0fef7ccabbbd12d8f4a9f635e1d0503f395b1e9

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 12:28:28 PM

Quote from: grau on July 28, 2014, 12:11:37 PM

Remove message length encoding. That's a foreign concept to a hash function.

The message length bits are needed. Otherwise for example messages 000 and 000000 would result in the same initial condition and have the same hash value.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 12:39:53 PM

Aha. Here is how the length bits are handled in SHA-256:

"Pre-processing:
append the bit '1' to the message
append k bits '0', where k is the minimum number >= 0 such that the resulting message
length (modulo 512 in bits) is 448.
append length of message (without the '1' bit or padding), in bits, as 64-bit big-endian integer
(this will make the entire post-processed length a multiple of 512 bits)" -- http://en.wikipedia.org/wiki/SHA-2#Pseudocode

That's actually similar to how I do it in my current R30 version.

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 28, 2014, 12:48:24 PM

The hash function should be defined for certain a block size.

Padding is an optional preprocessing step for the case the block size is not met exactly at input, and is then on bits not bytes.
I would leave that aside until the algorithm is not settled.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 01:07:54 PM

Quote from: grau on July 28, 2014, 12:48:24 PM

I think the fixed block size used in many hash functions today simply is a consequence (limitation) of using the Davies–Meyer compression function.

Treating the whole message as a single unit could be more robust.

Instead of adding length bits in R30 I could simply add for example 1 before and after the message as a preprocessing. That would work too since then even messages containing only zeros would be different when they have different lengths.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 02:47:02 PM

Maybe not for Bitcoin but the O(n²) of R30 makes it very computationally intensive for long messages.

One way of reducing the calculations for long messages is to use the Merkle–Damgård construction together with Davies–Meyer single-block-length compression function.

With a message of 1,000,000 bits, taking the square of that results in 10¹². By dividing that message into fixed-sized blocks of 1000 bits the result is 1000² * 1000 = 10⁹. A three orders of magnitude improvement.

A similar technique is used in SHA-2 and other standard hash functions.

Title: Re: Rule 30 automaton as hash function
Post by: DeathAndTaxes on July 28, 2014, 03:02:39 PM

Quote from: Anders on July 28, 2014, 02:47:02 PM

I think you have to consider what is your intention for using R30? As a general purpose non-cryptographic hash? As a general purpose cryptographic hash? Or as a PoW function?

Irreducibility of computation isn't really that important for the first two categories but is (in theory) pretty important for a PoW. Say someone found a way to produce SHA-256 hashes 10,000x faster for a given amount of die space. It wouldn't do anything to compromise the security of SHA-2 as a cryptographic hash but it would allow one to exploit the bitcoin network for cheap.

This is why I indicated using something like:

Code:

R30(nonce + H(blockheader)) < target

Where:
H= a secure cryptographic hashing function (i.e. SHA-2)
Blockheader = arbitrary length blockheader (excluding the nonce)
Nonce = 64 bit (don't make Satoshi's mistake of using a nonce "too small")

Title: Re: Rule 30 automaton as hash function
Post by: grau on July 28, 2014, 03:13:15 PM

Quote from: DeathAndTaxes on July 28, 2014, 03:02:39 PM

Code:

R30(nonce + H(blockheader)) < target

Agree, R30 is interesting as perfect POW not as much as cryptographic hash.

As side note: I am not sure Satoshi made a mistake here. Maybe he wanted the block header is updated with new transactions instead of rolling a nonce until a block is found.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 03:21:22 PM

Quote from: DeathAndTaxes on July 28, 2014, 03:02:39 PM

Code:

R30(nonce + H(blockheader)) < target

I believe R30 is an excellent hash function for cryptography. Although that's a risky assumption since not even Stephen Wolfram has a definite proof of that I assume. My aim is for R30 to be a general hash function. Using it for Bitcoin proof of work would be an interesting test of concept.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 28, 2014, 04:26:17 PM

Quote from: DeathAndTaxes on July 28, 2014, 03:02:39 PM

Code:

R30(nonce + H(blockheader)) < target

I don't know much about Bitcoin, but the nonce could start with 64 bits and then expanded in the future if needed! R30 would work with total length nonce + H(blockheader) being variable.

Title: Re: Rule 30 automaton as hash function
Post by: Peter R on July 28, 2014, 07:26:49 PM

Quote from: grau on July 27, 2014, 06:40:06 PM

Quote from: Peter R on July 27, 2014, 05:38:14 PM

I don't believe it's a requirement to select the hash bits from the central column ...

It is apparent that chaos converges to order toward both left and right, so intuition says to stay in the middle if you want maximum entropy.

But if it wasn't a maximal entropy process, then it should be possible to peel out the predictable part in order to short-cut the computation. In other words, the computation wouldn't be irreducible.

For irreducibility to be a useful concept, doesn't it have to be binary: either it's reducible or its not? The left side of R30 is reducible along with the far right edge (I think), and the output is reducible for a certain number of rows for an enumerable set of initial conditions, but excluding these, I think the theory is that the rest is irreducible. So I think if you can prove that the column to the right of center has less entropy than the center column, then you've also proven that R30 is not irreducible (something that AFAIK no one has been able to do).

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 29, 2014, 02:47:37 AM

Quote from: grau on July 28, 2014, 12:11:37 PM

Remove message length encoding. That's a foreign concept to a hash function.

I came to think of a way to remove the length bits. By limiting the initial condition to a fixed length of 256 bits. And if the message is larger than 255 bits then the Merkle–Damgård construction is used by turning the rest of the message into 256-bit blocks. The preprocessing is then only to add the bit value 1 and pad the message with zeros to nearest 256-bit boundary. If a message is shorter than 255 bits then it gets padded with the one bit and zeros into a single 256-bit block.

And instead of dealing with bit resolution byte resolution can be used for convenience. Then the byte 0x01 is added to the message plus padding with zero bytes 0x00 to a total of 32 bytes per block.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 29, 2014, 01:21:50 PM

R30 with the Merkle–Damgård construction:

Code:

	private final static int MAX_HASH_BYTES = 32;
	private final static int BLOCK_SIZE_BYTES = MAX_HASH_BYTES;

	public final byte[] digest(InputStream is) throws IOException {
		byte[] digest = null;
		byte[] block = new byte[BLOCK_SIZE_BYTES];
		int bytesRead = 0;
		long totalBytesRead = 0;
		while (bytesRead != -1) {
			bytesRead = 0;
			int blockBytesRead = 0;
			while (blockBytesRead < BLOCK_SIZE_BYTES && bytesRead != -1) {
				bytesRead = is.read(block, blockBytesRead, 
						BLOCK_SIZE_BYTES - blockBytesRead);
				if (bytesRead > 0) {
					blockBytesRead += bytesRead;
				}
			}
			totalBytesRead += blockBytesRead;
			if (blockBytesRead < BLOCK_SIZE_BYTES) {
				for (int i = blockBytesRead; i < BLOCK_SIZE_BYTES; i++) {
					block[i] = 0;
				}
				if (BLOCK_SIZE_BYTES - blockBytesRead >= 8) {
					for (int i = 1, j = 0; i < 9; i++, j += 8) {
						block[BLOCK_SIZE_BYTES - i] =
								(byte) (totalBytesRead >>> j);
					}
				} else {
					bytesRead = 0;
				}
			}
			if (digest != null) {
				for (int i = 0; i < BLOCK_SIZE_BYTES; i++) {
					block[i] ^= digest[i];
				}
			} else {
				digest = new byte[BLOCK_SIZE_BYTES];
			}
			byte[] nextDigest = digestBlock(block);
			for (int i = 0; i < BLOCK_SIZE_BYTES; i++) {
				digest[i] ^= nextDigest[i];
			}
		}
		return digest;
	}
	
	private final byte[] digestBlock(byte[] block) {
		int maxKeyBytes = BLOCK_SIZE_BYTES + 1;
		int maxKeyBits = maxKeyBytes * 8;
		byte[] hash = new byte[MAX_HASH_BYTES];
		byte[] key = new byte[maxKeyBytes];
		for (int i = 0; i < BLOCK_SIZE_BYTES; i++) {
			key[i] = block[i];
		}
		key[BLOCK_SIZE_BYTES] = 0x01;
		int maxHashBits = MAX_HASH_BYTES * 8;
		int skipRows = maxKeyBits * 7;
		int maxCells = 2;
		maxCells += maxKeyBits;
		maxCells += skipRows;
		maxCells += maxHashBits * 2;
		int maxLongs = (maxCells + 63) >>> 6;
		maxCells = maxLongs << 6;
		int cellsMid = maxCells / 2;		
		long[] cells = new long[maxLongs];
		int keyStart = (maxCells - maxKeyBits) / 2;
		for (int i = 0; i < key.length; i++) {
			int keyChar = key[i];
			int bitPos = 0x80;
			for (int j = 0; j < 8; j++) {
				long b = (keyChar & bitPos) >>> (7 - j);
				int bitIndex = keyStart + i * 8 + j;
				cells[bitIndex >>> 6] |= b << (63 - (bitIndex % 64));
				bitPos >>>= 1;
			}
		}
		int bitCount = 0;
		int mid = 0;
		int longMid = maxLongs / 2;
		int longMidShift = longMid * 2 == maxLongs ? 63 : 31;
		int maxRow = skipRows + maxHashBits * 2;
		for (int row = 0; row < maxRow; row++) {
			int doubleRow = row * 2;
			int calcWidth = doubleRow;
			if (calcWidth > maxRow - 2) {
				calcWidth = maxRow - ((doubleRow) % maxRow) + 2;
			} else {
				calcWidth += maxKeyBits;
			}
			int halfWidth = calcWidth / 2 + 2;
			int start = (cellsMid - halfWidth) >>> 6;
			int end = (cellsMid + halfWidth + 63) >>> 6;
			mid = (int) ((cells[longMid] >>> longMidShift) & 0x01);
			long carryLeft = 0L;
			for (int i = start; i < end; i++) {
				long l = cells[i];
				long carryRight = i < maxLongs - 1 ?
						cells[i + 1] >>> 63 : 0;
				long cellRight = (l << 1) | carryRight;
				long cellLeft = (l >>> 1) | carryLeft;
				carryLeft = l << 63;
				cells[i] = cellLeft ^ (l | cellRight);
			}
			if (row < skipRows) {
				continue;
			}
			if (row % 2 == 1) {
				if (mid == 1) {
					int bufPos = bitCount >>> 3;
					hash[bufPos] |= 1 << (7 - (bitCount % 8));
				}
				bitCount++;
			}
		}
		return hash;
	}

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 29, 2014, 03:44:21 PM

I noticed that the rightmost bit in initial condition needs to be 1. If all the bits are zero on the right side of the initial condition then the hash values become nonrandom and with collisions for similar messages. The whole left side of the Rule 30 cellular automaton is nonrandom it seems.

Title: Re: Rule 30 automaton as hash function
Post by: dentldir on July 29, 2014, 04:49:54 PM

Quote from: Anders on July 29, 2014, 03:44:21 PM

This observation is a key part of the paper I linked on the first page. You can use that non-randomness to reverse the "right adjacent" states effectively (i.e. taking 2^17 possibilities down to 32 or so). After messing with it for some time, you can see the importance of the structure left in the CA diagrams as they iterate. You ideally don't want any recognizable continuities.

One of the papers you linked (with the omega-flip network), overcomes this by alternating with a different rule and performing a permutation between subsequent runs. That paper also links to the correct testing framework you'd want to use if you intend to go further. (NIST's sts, apply the strict avalance criterion, etc.)

I'm a huge fan of CAs and Wolfram's work from the 80's but I was disappointed at the non-peer reviewed approach he took with NKS. This is one area where his intuition is superseded by decades of rigorous work by cryptography experts.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 29, 2014, 05:17:54 PM

Quote from: dentldir on July 29, 2014, 04:49:54 PM

Quote from: Anders on July 29, 2014, 03:44:21 PM

This image shows how the left side is easy to determine when the bits are sampled too tight:

http://s27.postimg.org/j1a9xk3yb/rule_30.png

Notice that when every other row of the center column is sampled, the left side becomes difficult to determine. That's the sample method I use in my hash function.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on July 30, 2014, 05:50:57 PM

I created a GitHub project for R30: https://github.com/AndersLindman/R30-hash-function

Title: Re: Rule 30 automaton as hash function
Post by: jtimon on August 05, 2014, 11:29:51 AM

Quote from: Anders on July 23, 2014, 11:18:38 AM

Why don't you just calculate the 256 bits column in the middle?
cell 255 can be the cell at the right of cell 0 (and therefore cell 255 is the one at the left of 0).
I don't think this would remove randomness and if it does it can probably be compensated with more loops.
But having a constant-size cell array would make it much simpler to optimize.
For example, using SSE2 you can probably implement this with shift and bitwise logic operations alone, using all the 256 bits of the XMM registries with each instruction!

So yeah, maybe SHA256 + R30 could become a hardfork alternative to SHA256d in case it gets somehow broken.
Even for an anti-centralization hardfork in the event that a single entity is so stupid to consistently control 30%+ of the mining hardware or something like that.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 05, 2014, 03:43:07 PM

Quote from: jtimon on August 05, 2014, 11:29:51 AM

Interesting thread, I love cellular automata (well, most bio-inspired algorithms).
Sorry if I missed something because I haven't fully read it yet.

First, SHA256 + R30 (on the resulting 256 bits) should remove some of Peter R's performance concerns, but not all.

Why don't you just calculate the 256 bits column in the middle?
cell 255 can be the cell at the right of cell 0 (and therefore cell 255 is the one at the left of 0).
I don't think this would remove randomness and if it does it can probably be compensated with more loops.
But having a constant-size cell array would make it much simpler to optimize.
For example, using SSE2 you can probably implement this with shift and bitwise logic operations alone, using all the 256 bits of the XMM registries with each instruction!

So yeah, maybe SHA256 + R30 could become a hardfork alternative to SHA256d in case it gets somehow broken.
Even for an anti-centralization hardfork in the event that a single entity is so stupid to consistently control 30%+ of the mining hardware or something like that.

I was wrong about the 2.5N value and grau measured that at least 7N is needed. See for example: https://bitcointalk.org/index.php?topic=698460.msg8047993#msg8047993

The values in the center column depend on all the values in the initial condition after 7N generations where N is the length of the initial condition in number of bits. It turns out that the rightmost value (bit) of a worst case initial condition takes a nonlinear walk first to the right and then to the left in the cellular automaton. And after 7N steps (generations) of the cellular automaton the rightmost value has reached the center column.

Without the 7N initial steps the hash values would be far from random for similar messages.

Title: Re: Rule 30 automaton as hash function
Post by: jtimon on August 05, 2014, 05:51:29 PM

Reading your answer and more of the thread, I'm not sure what you're doing anymore. I thought the initial state was the data...

My recipe is the following:

Apply SHA256 once, you get 256 bits.
Those 256 bits obtained are the initial state (the first row).
You apply R30 N times (N generations) using a constant row size of 256 cells per generation, treating cell 0 and cell 255 as neighbors (so no pyramid here, just a 256 cells-wide column all the time).
With 128 generations all cells should have influenced all other cells.
So this is constant space and can be optimized to constant time per generation, thus constant time per N generations.
If we assume that N must be at least S/2 are required, being S the size of the cell array, then is O(S^2) time and O(S) space, but here S=256 is constant.

Can any of you prove that my approach doesn't produce enough randomness?

You see, Wolfrang always starts with 1 block cell alone as initial condition and allows infinite space for simplicity but that's not a requirement for cellular automata.
They don't even need to be a 2D universe (1D space + time), you could perfectly have initiate a 16x16 binary toroid (2D space where the edges opposite edges are neighbors) with your 256 bits and use a 3D rule with much less generations. But first you have to find out which one is equivalent in randomness to rule 30, and here instead of 2^(2^3)=256 possible automata, there are 2^(2^9) to analyze!!!).

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 05, 2014, 06:21:15 PM

Quote from: jtimon on August 05, 2014, 05:51:29 PM

Sure, you could start with a SHA-256 value as initial condition instead of the full plain text message. I even proposed that somewhere earlier in this thread. Then N would be 256 bits. With 128 generations (0.5N) you would get something like this: https://bitcointalk.org/index.php?topic=698460.msg8045418#msg8045418

Notice how similar the hash values are.

Compare 0.5 N: http://jsfiddle.net/UQ6B7/

With 7N: http://jsfiddle.net/596XN/

I have experimented with 2D cellular automata and they seem to be good at producing random results. The reason for why I chose Rule 30 is that Stephen Wolfram has done extensive research on that and showed that it produces good random numbers and that it's likely that cryptanalysis will be difficult when every other row in the center column is sampled. 2D automata may be at least as good, but that would require a lot of work to find out. It's easier to build on Wolfram's research.

Title: Re: Rule 30 automaton as hash function
Post by: jtimon on August 05, 2014, 06:34:33 PM

Quote from: Anders on August 05, 2014, 06:21:15 PM

Sure, you could start with a SHA-256 value as initial condition instead of the full plain text message. I even proposed that somewhere earlier in this thread.

Yes, you answered to Peter R when he was pointing out that your message-to-256-bits mechanism was too expensive.

Quote from: Anders on August 05, 2014, 06:21:15 PM

Then N would be 256 bits. With 128 generations (0.5N) you would get something like this: https://bitcointalk.org/index.php?topic=698460.msg8045418#msg8045418

Notice how similar the hash values are.

Compare 0.5 N: http://jsfiddle.net/UQ6B7/

With 7N: http://jsfiddle.net/596XN/

I have experimented with 2D cellular automata and they seem to be good at producing random results. The reason for why I chose Rule 30 is that Stephen Wolfram has done extensive research on that and showed that it produces good random numbers and that it's likely that cryptanalysis will be difficult when every other row in the center column is sampled. 2D automata may be at least as good, but that would require a lot of work to find out. It's easier to build on Wolfram's research.

But that's with your infinite space approach, I think you're missing my main point:

Quote from: jtimon on August 05, 2014, 05:51:29 PM

Those 256 bits obtained are the initial state (the first row).
You apply R30 N times (N generations) using a constant row size of 256 cells per generation, treating cell 0 and cell 255 as neighbors (so no pyramid here, just a 256 cells-wide column all the time).
With 128 generations all cells should have influenced all other cells.

Maybe 0.5N is still not enough, but my main point is that you don't need to calculate the state of more than 256 cells per generation.
My prediction is that (for the same number of generations) with fixed size it will not only be faster to calculate, but you'll even get more randomness than with your infinite space approach.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 05, 2014, 07:00:00 PM

Quote from: jtimon on August 05, 2014, 06:34:33 PM

You mean that the cellular automaton should be truncated to a width of 256 cells? I think the randomness will be much lost then. It would produce a lot of hash collisions is my guess.

Title: Re: Rule 30 automaton as hash function
Post by: jtimon on August 05, 2014, 07:06:33 PM

Quote from: Anders on August 05, 2014, 07:00:00 PM

You mean that the cellular automaton should be truncated to a width of 256 cells?

Yes, but it is important than 0 and 255 are neighbors instead of 0 having always white on the left and 255 on its right.

Quote from: Anders on August 05, 2014, 07:00:00 PM

I think the randomness will be much lost then. It would produce a lot of hash collisions is my guess.

What it's clear is that it would be much more efficient and easier to optimize.
There's only one way to find out who's right about randomness though...
As said I think it will be as good as the infinite space one or even better. But, please, prove me wrong.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 05, 2014, 07:13:29 PM

Quote from: jtimon on August 05, 2014, 07:06:33 PM

Quote from: Anders on August 05, 2014, 07:00:00 PM

You mean that the cellular automaton should be truncated to a width of 256 cells?

Yes, but it is important than 0 and 255 are neighbors instead of 0 having always white on the left and 255 on its right.

Quote from: Anders on August 05, 2014, 07:00:00 PM

I think the randomness will be much lost then. It would produce a lot of hash collisions is my guess.

Ok, the automaton is wrapped around for width N cells. It might work. I read somewhere that the random generator in Mathematica uses Rule 30 with a kind of fixed width size like that. I'm worried however that cryptanalysis would become easier with a truncated cellular automaton. The image I posted from the book A New Kind of Science is about the full cellular automaton, not a truncated one I assume.

Title: Re: Rule 30 automaton as hash function
Post by: jtimon on August 05, 2014, 09:58:14 PM

Quote from: Anders on August 05, 2014, 07:13:29 PM

Ok, the automaton is wrapped around for width N cells. It might work. I read somewhere that the random generator in Mathematica uses Rule 30 with a kind of fixed width size like that.

Yes, I think I remember that the book said that some of the rules appeared to be random, but they weren't with certain initial conditions or with different setups from the "cannonical" one, but some (or one?) were resistant and I assume that's R30.
It's been more almost four years since I read the book, so I could be wrong

Quote from: Anders on August 05, 2014, 07:13:29 PM

I'm worried however that cryptanalysis would become easier with a truncated cellular automaton. The image I posted from the book A New Kind of Science is about the full cellular automaton, not a truncated one I assume.

Yeah, he always uses the "cannonical" except for showing exceptions or just other ways in which you can organize the different algorithms.
But I don't understand your concern. There's many other models in the book besides cellular automata, and his point is that all of them have some equivalent of R30.
I thought you, like Wolfram, had chosen cellular automata precisely because they are simple to understand and analyze.

My hope was that maybe this could serve to have a "provably secure hashing function", instead of just "nobody has broken it".
Maybe to prove that "provably secure hashing functions" (PSHF ?) are an impossibility, who knows.

IMO the most interesting lesson about the book is that unpredictable behavior (in the sense that it can only be predicted by simulating the whole process) can emerge from apparently simple deterministic rules.
Have you played go (http://en.wikipedia.org/wiki/Go_(game))? Very simple rules but no machine can beat professional players who relying on a strong intuition grown through experience.
I believe this two facts are profoundly related.

That's why I've been planning for some time to evolve a neural network to play go and become 9 dan by playing against versions of itself while(success || eternity).
But I got an 8-to-6 job and learned about Bitcoin, so it ended up playing against the machine I wrote in my first year of computer science to play reversi/othello, showing graphs of how much it kicked its ass with generation on the Y coordinate (the "dumb" machine kicked my own ass without any problem or second thought).

Sorry for the long story, that's also why I love your cellular automata hasing function (CAHF ?) idea.
I hope my little contribution is useful and I wish I had time to help with the code, but it doesn't look like you'll have problems to make 0 and 255 neighbors and try another version.

To finish, I also like the fact that you're interested in security for your custom hash function and not "anti-ASIC" nonsense.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 05, 2014, 10:42:56 PM

Quote from: jtimon on August 05, 2014, 09:58:14 PM

Quote from: Anders on August 05, 2014, 07:13:29 PM

Ok, the automaton is wrapped around for width N cells. It might work. I read somewhere that the random generator in Mathematica uses Rule 30 with a kind of fixed width size like that.

Quote from: Anders on August 05, 2014, 07:13:29 PM

Cellular automata may actually be difficult to analyze mathematically. Maybe like the game go you mentioned. And proving randomness is really tricky. My guess is that the R30 hash function acts well as a random oracle.

"In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every unique query with a (truly) random response chosen uniformly from its output domain. If a query is repeated it responds the same way every time that query is submitted." -- http://en.wikipedia.org/wiki/Random_oracle

Title: Re: Rule 30 automaton as hash function
Post by: gmaxwell on August 06, 2014, 09:27:50 AM

Quote from: Anders on August 05, 2014, 10:42:56 PM

Except you already proved it bad by conventional cryptographic standards in https://bitcointalk.org/index.php?topic=698460.msg7927510#msg7927510 where you find a collision. A good hash-function is not efficiently distinguishable from a random oracle (other than in the trivial sense where it has a compact implementation).

I've struggled a bit watching along worrying that this thread is a train-wreak of everything wrong with amateur cryptography: There are many reasonably well studied and designed functions in this space already (hash functions, and hashcash as proof of work), that benefit from decades of knoweldge and experience in building constructs which are actually irreducible for the purposes they're designed for... This is precisely the kind of effort people are counseled against, and for good reason.

Being difficult to analyze mathematically is pretty much the polar opposite what we want in a cryptographic building block— there is just an extreme danger of a complete break lurking right behind an intellectual speed-bump. It's often said that anyone can come up with a scheme that they can't crack themselves, but it's perhaps more interesting to note that almost anyone can come up with a scheme with a trivial weakness but which is hard enough to analyze that no one will find it until someone's money (or life!) is on the line.

OTOH, I'm happy people have been having fun; and most of the other POW navel gazing I've seen is even more worthless... I've drafted and canned a couple messages for this thread, struggling with the difficulty in conveying a kind of underlying understanding of what responsible work looks like for cryptographic primitives— telling you to time travel back to 1994 and read sci.crypt with me is not likely to be helpful....

— but when you get to the point of making claims that the function might actually be useful in cryptographic applications, I feel the need to speak up before another one of these incidents (http://sourceforge.net/p/bitcoin/mailman/bitcoin-development/thread/5325B5BC.3030501@gmx.de/) (which was caused by this thread (https://bitcointalk.org/index.php?topic=196378.0), which went without adequate criticism) happens (read all the messages, not just my first response).

Have fun, by all means— but cryptography is a subtle and difficult science and art. Building a good system is an engineering discipline and having any confidence of security requires formalizing your work and putting in effort which is orders of magnitude more difficult than the basic implementation tinkering. If you don't find that kind of hard core analysis interesting yourself, then sadly it's likely never going to be done for your function. I'd say you were on the right path when you attacked and found a severe cryptographic flaw in your approach, but then something went wrong when you discarded it and continued like it never happened... If BCT were a forum that I expected competent cryptographic review in, I might also say that something has gone wrong that it's taken this long to get a less than supportive message in this thread.

You were on to something with your attack— why not dig into it more and instead of adding complexity (which might just be obfuscating weaknesses instead of removing them), just assume this is a fun learning experience and see how many other ways you can break the original construct or what other kinds of seemingly-okay-but-really-broken functions you can build in this space? I think more than anything else doing cryptanalysis and finding holes in functions has increased my appreciation for the enormous challenge that doing good work in this space involves. (I suppose I could say that In one sense there is no science of cryptography except cryptanalysis).

But this is just my curmudgeonly view, offered for your consideration.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 06, 2014, 11:27:38 AM

Quote from: gmaxwell on August 06, 2014, 09:27:50 AM

Quote from: Anders on August 05, 2014, 10:42:56 PM

No! That's an old pre-hash version that absolutely sucks. It was before I had learned proper ways to do it. Here is the current version that uses a Merkle–Damgård construction and the Davies–Meyer compression function that, if I have done the implementation correctly, should be the real deal: http://jsfiddle.net/596XN/

Quote

Being difficult to analyze mathematically is pretty much the polar opposite what we want in a cryptographic building block— there is just an extreme danger of a complete break lurking right behind an intellectual speed-bump. It's often said that anyone can come up with a scheme that they can't crack themselves, but it's perhaps more interesting to note that almost anyone can come up with a scheme with a trivial weakness but which is hard enough to analyze that no one will find it until someone's money (or life!) is on the line.

Easy to analyze mathematically is no guarantee for that it will be secure. Take SHA-2 for example which is based on pretty straightforward plain math it seems, yet someone may discover a technique to break it. And Stephen Wolfram has pointed out that the math used in science today is only a tiny subset of the universe of possible constructs, a result more out of tradition and legacy rather than a deliberate design behind it.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 06, 2014, 04:23:39 PM

The reason for using a Merkle–Damgård construction is that the version without it becomes very slow for long messages. As a Davies–Meyer compression function I simply take B xor H_i-1 and then feed that into the Rule 30 hash function and then xor that result with H_i-1, where B is a 256-bit block of the message and H is the 256-bit hash value.

Notice the trick with xor of H_i-1 after the same H_i-1 has been encrypted with a message block (m_i) as key:

http://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Davies-Meyer_hash.svg/230px-Davies-Meyer_hash.svg.png

Source: http://en.wikipedia.org/wiki/One-way_compression_function#Davies.E2.80.93Meyer

This may make the hash function a bit weaker but hopefully only insignificantly weaker.

For Bitcoin proof of work, the Merkle–Damgård construction can be removed. It's only useful for long messages, except perhaps that the 256-bit limit makes it easier to prove that all cells are influencing each other to produce a random value. That could be difficult to prove for infinitely long (or arbitrarily long) messages.

Title: Re: Rule 30 automaton as hash function
Post by: 21M Bitcoin on August 06, 2014, 05:45:42 PM

sorry for OT but i want ask to be crypromaster like you all . what should i learn before ?
i type of strong mental to learn.

Title: Re: Rule 30 automaton as hash function
Post by: rarkenin on August 09, 2014, 06:18:46 PM

Quote from: 21M Bitcoin on August 06, 2014, 05:45:42 PM

sorry for OT but i want ask to be crypromaster like you all . what should i learn before ?
i type of strong mental to learn.

You know it's off-topic, so why mention it here?

Title: Re: Rule 30 automaton as hash function
Post by: grau on August 11, 2014, 10:23:19 PM

@gmaxwell: Thanks for you view, that I was looking forward in this thread.

Yes its amateur cryptography here, but knowingly.

I respectfully question if your below assumption is the path to the ideal POW.

Quote from: gmaxwell on August 06, 2014, 09:27:50 AM

What I like most in Wolfram's book is his systematic exploration of functions of utmost simplicity. Some of them are irreducible by our standard means of computation. They might be poor primitives for other cryptographic uses, but ideal for POW.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 13, 2014, 03:13:09 PM

I learned that Bitcoin has a big problem with what is called variance. Transaction times are on average 10 minutes, yet they can vary from a few minutes to an hour. That indicates that the search space for the Bitcoin miners is non-randomly distributed. Perhaps Rule 30 could give much less variance so that the transaction times become more even.

Title: Re: Rule 30 automaton as hash function
Post by: DeathAndTaxes on August 13, 2014, 03:19:28 PM

Quote from: Anders on August 13, 2014, 03:13:09 PM

That is a false conclusion. Flip four coins at once and check if they are all heads. What is the probability of that happening? 1:16 (2^4) right? Will you get a set of all heads exactly every sixteen attempts? Does that mean the outcome of a fair coin flip is not randomly distributed? Of course not.

The expected time between blocks will be 10 minutes (if current hashrate exactly matches the estimated hashrate) but the actual times between blocks will be a Poisson distribution. This is true even if there is a perfectly random distribution to hash outputs. Transaction times of 1/5th to 5x the expected are not that rare in a poisson distribution. Mathematically prove that Bitcoin block times don't follow a poisson distribution to some level of confidence and you might be able to say that the hash outputs are not randomly distributed. All the evidence to date suggests they are.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 13, 2014, 04:09:18 PM

Quote from: DeathAndTaxes on August 13, 2014, 03:19:28 PM

Quote from: Anders on August 13, 2014, 03:13:09 PM

But with four coin flips isn't on average 8 flips needed to get four heads? Oh, wait, I see now. There is no upper limit for how many flips are needed. Scary! But of course in practice the probability for very long intervals is very small.

Yes, it's a classical Poisson process it seems:

"In probability theory and statistics, the Poisson distribution (French pronunciation [pwasɔ̃]; in English usually /ˈpwɑːsɒn/), named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.[1]" -- http://en.wikipedia.org/wiki/Poisson_distribution

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 13, 2014, 04:16:12 PM

For fun I did a boolean algebra calculation for Rule 30:

abc
000 - 0
001 - 1
010 - 1
011 - 1
100 - 1
101 - 0
110 - 0
111 - 0

a'b'c + a'bc' + a'bc + ab'c' = a'(b'c + bc' + bc) + ab'c' =
a'(b'c + b(c' + c)) + ab'c' = a'(b'c + b) + ab'c' =
a'(c + b) + ab'c' = a'(c + b) + a(c + b)' =
a ⊕ (b + c)

Or, with a = a_i-1, b = a_i, c = a_i+1:

a_i-1 XOR (a_i OR a_i+1)

The same as in: Cryptography with Cellular Automata -- http://www.stephenwolfram.com/publications/academic/cryptography-cellular-automata.pdf

Title: Re: Rule 30 automaton as hash function
Post by: Peter R on August 13, 2014, 08:46:03 PM

Quote from: Anders on August 13, 2014, 03:13:09 PM

I learned that Bitcoin has a big problem with what is called variance. Transaction times are on average 10 minutes, yet they can vary from a few minutes to an hour.

It's not a problem. It's a requirement.

The variance in the block times is a mathematical consequence of the fact that the hash of any blockheader/nonce combo is equally likely to meet the difficulty target. This property is critical for decentralized mining--the probability that a miner solves a block is linearly proportional to the amount of work that miner did. If Miner A has 10 times more hashpower than Miner B, he should solve exactly 10 times more blocks on average. Any scheme that reduces block-time variance will not have this property, instead favouring big miners disproportionately to small miners.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 13, 2014, 09:05:35 PM

Quote from: Peter R on August 13, 2014, 08:46:03 PM

Quote from: Anders on August 13, 2014, 03:13:09 PM

I learned that Bitcoin has a big problem with what is called variance. Transaction times are on average 10 minutes, yet they can vary from a few minutes to an hour.

Not to go too much off topic, but after 10 minutes the difficulty could be reduced in real-time. Then the long transaction times would be reduced. I don't know if that would be possible to implement in practice though. :D

Title: Re: Rule 30 automaton as hash function
Post by: leotheminer on August 22, 2014, 02:11:18 PM

I've read this whole thread with interest and even dug out my copy of A New Kind of Science for it. Anders, have you learned anything else in the last few weeks that would make you think Rule 30 would be a good or not good PoW function?

Title: Re: Rule 30 automaton as hash function
Post by: Anders on August 24, 2014, 05:06:00 PM

Quote from: leotheminer on August 22, 2014, 02:11:18 PM

I was thinking about if Rule 30 could make ASIC implementations difficult, but I guess ASICs could easily do the same thing as CPUs/GPUs. One idea is to make the algorithm require large amounts of memory, but even that could be done with ASICs perhaps.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on September 16, 2014, 05:50:46 AM

Oh, now I understand why people here have talked about using the Rule 30 cellular automaton for Bitcoin proof of work. I read this paper: ASICs and Decentralization FAQ -- https://download.wpsoftware.net/bitcoin/asic-faq.pdf

Here is my implementation of Rule 30 as a hash function, called R30: https://github.com/AndersLindman/R30-hash-function

R30 matches the desired requirements for a proof of work algorithm. Such as:

"In order that all actors can reach consensus, the proof-of-work must require enough time that previously proven transaction history can propagate and be verified before the history is extended." -- R30 can, just like SHA-256, be used with a target that can be adjusted to compensate for increased computing power.

"... the proof-of-work must be completely parallelizable and require no state. That is, one actor with 2N hashing power should have the same return as two actors each with N hashing power." -- R30 is completely parallelizable and has no state to preserve.

"... the computation of proof-of-work must be progress free, that is, the proof-of-work calculation at time T should not depend on any part of a calculation at time T' < T." -- Each hash value in R30 is calculated independently from each other.

"... poisson process ... the probability of a proof-of-work being calculated in a time interval [t₀,t₁] is proportional to (t₁ - t₀) but independent of t₀ and t₁." -- Assuming that R30 is a close approximation of a random oracle, the probability will follow a poission process.

"The proof-of-work must be optimization free; that is, there should not be any algorithmic speedups which would give an advantage over the standard algorithm." -- R30 directly uses Rule 30 (http://mathworld.wolfram.com/Rule30.html) which is a minimal one-dimensional cellular automaton that is considered to be computationally irreducible (http://mathworld.wolfram.com/ComputationalIrreducibility.html).

"The proof-of-work must be approximation free; that is, there should not be a defective variant of the proof-of-work which achieves a speedup in excess of its defect rate." -- R30 directly calculates a certain number of generations of Rule 30 without any possibility for approximations.

"... ASIC’s are still inevitable; all ASIC resistance does is increase the startup capital required and therefore increase centralization of manufacturing." -- R30 is a simple algorithm with low memory requirement that is easy implement in ASICs.

"... increasing the complexity of proof generation often means also increasing the complexity of proof validation, often disproportionately slow." -- Rule 30 is a 1-dimensional binary cellular automaton which means a very simple algorithm that allows fast proof validation.

"Memory hardness has the effect of increasing ASIC board footprint, weakening the heat-dissipation decentralization provided by the thermodynamic limit." -- R30 has low memory requirement with the cellular automaton calculated in a 1-dimensional array and together with buffers a total size of less than a kilobyte.

Title: Re: Rule 30 automaton as hash function
Post by: jtimon on September 16, 2014, 09:59:22 PM

Quote from: Anders on August 24, 2014, 05:06:00 PM

Of course, anything can be done ASIC. ASICs are just specialized hardware. Anyone talking about "anti-ASIC" is either a fraudster or doesn't know what he's talking about (cough, cough, intheoreum).
Many people still think that Artforz already had a GPGPU implementation when he launched "GPU resistant" scrypt and he used it to mine litecoins while everybody else was basically burning money (yet litecoin's was supposed to offer "fair distribution").

That's why I liked this proposal, because you weren't talking about that kind of stupid things (well, as said I also love cellular automata).
If some properties could be proven mathematically with R30 that can not be proven for sha256d, it would make it a great candidate for an potential pow hardfork if mining companies are stupid enough to consistently controlling say, more than 30% of the hashrate, thus forcing the community to deploy a hardfork that makes their huge investments worthless.
Otherwise I think some slight modification of sha256d or some other well-known and well-tested hash function would be preferred for that case.

But yeah, R30 seems to have the same desirable properties SHA256d, it's just that it hasn't been reviewed by many cryptoanalists and it would be kind of scary to deploy it on bitcoin.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on September 17, 2014, 04:25:05 AM

Quote from: jtimon on September 16, 2014, 09:59:22 PM

... it would make it a great candidate for an potential pow hardfork if mining companies are stupid enough to consistently controlling say, more than 30% of the hashrate, thus forcing the community to deploy a hardfork that makes their huge investments worthless.

The same problem will remain with R30. A hardfork like that would only buy some time. Soon the same situation can emerge for R30 as for SHA-256. I was thinking of R30 as a proof of work option in case a switch is needed. Just one candidate among many.

Or do you mean that SHA-256 has some particular property that makes it prone to centralization of mining power?

Title: Re: Rule 30 automaton as hash function
Post by: Anders on September 17, 2014, 05:46:20 AM

Can R30 be calculated fast with ASICs? Yes, very fast.

"A High Performance ASIC for Cellular Automata (CA) Applications

CA are useful tools in modeling and simulation. However, the more complex a CA is, the longer it takes to run in typical environments. A dedicated CA machine solves this problem by allowing all cells to be computed in parallel. In this paper we present simple yet useful hardware that can perform the required computations in constant time complexity, O(l)." -- http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=4273215&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4273215

:D

Title: Re: Rule 30 automaton as hash function
Post by: jtimon on September 17, 2014, 07:02:10 PM

Quote from: Anders on September 17, 2014, 04:25:05 AM

The same problem will remain with R30. A hardfork like that would only buy some time. Soon the same situation can emerge for R30 as for SHA-256.

Well, the hope is that after losing millions or seeing other company losing millions, these people will realize that holding more than 30% of the hashing hardware is stupid and they stop doing it. I think they do it because they believe such a hardfork will never happen.

Quote from: Anders on September 17, 2014, 04:25:05 AM

I was thinking of R30 as a proof of work option in case a switch is needed. Just one candidate among many.

Sure, agreed.

Quote from: Anders on September 17, 2014, 04:25:05 AM

Or do you mean that SHA-256 has some particular property that makes it prone to centralization of mining power?

No, pows that are easy to make ASICs for like sha256d and R30 are GOOD to prevent manufacturing centralization because the barrier to entry is lower. Memory-hard functions will have less ASIC manufacturers.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on September 17, 2014, 07:37:09 PM

Quote from: jtimon on September 17, 2014, 07:02:10 PM

Quote from: Anders on September 17, 2014, 04:25:05 AM

The same problem will remain with R30. A hardfork like that would only buy some time. Soon the same situation can emerge for R30 as for SHA-256.

Quote from: Anders on September 17, 2014, 04:25:05 AM

I was thinking of R30 as a proof of work option in case a switch is needed. Just one candidate among many.

Sure, agreed.

Quote from: Anders on September 17, 2014, 04:25:05 AM

Or do you mean that SHA-256 has some particular property that makes it prone to centralization of mining power?

Ok, the switch from SHA-256 to another PoW algorithm would give the big mining pools a lesson and prevent future mining operations from getting too dominating. Could be, although it's not a guarantee. And, yes good point about that ASICs with lots of memory will lead to fewer manufacturers.

Title: Re: Rule 30 automaton as hash function
Post by: vinda on September 18, 2014, 08:19:50 AM

Quote from: Anders on July 17, 2014, 01:11:13 PM

interesting idea, looking for its completion.

Title: Re: Rule 30 automaton as hash function
Post by: Anders on September 18, 2014, 03:04:56 PM

Will PoW with R30 consume a lot of electricity? Yes, about the same as for SHA-256 I'm afraid. But it may actually be a justified use of energy. Even if a Proof of Stake (PoS) system could be safe enough and consume much less energy, the idea of PoS makes me think of concentration of power in the hands of a small number of people. That seems like a step backwards in human history to me. We should aim at decentralization, not only in terms of technology but also in terms of preventing power being concentrated into a small group of elite people.