Bitcoin Forum
April 20, 2024, 12:03:35 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 »  All
  Print  
Author Topic: Rule 30 automaton as hash function  (Read 10575 times)
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 01:11:13 PM
 #1

"Rule 30 is a one-dimensional binary cellular automaton rule introduced by Stephen Wolfram in 1983.[2] Using Wolfram's classification scheme, Rule 30 is a Class III rule, displaying aperiodic, chaotic behaviour." -- http://en.wikipedia.org/wiki/Rule_30

I read somewhere that when taking every other center bit of the Rule 30 cellular automaton, it's very difficult to determine from what initial condition the resulting bit string was generated. That can be used as a cryptographic hash function.

Indeed I found that this has already been done, such as:

A New Cryptographic Hash Function Based on Cellular Automata Rules 30, 134 and Omega-Flip Network -- http://www.ipcsit.com/vol27/32-ICICN2012-N10006.pdf

A new cryptographic hash function based on the Cellular Automaton Rule 30 -- http://www.zimuel.it/talks/Nks2006.pdf

I haven't read much about it yet but my idea is to start with the bits of the message as the initial condition for the Rule 30 cellular automaton. And then run the automaton a fixed number of steps so that the cells in the automaton have highly random values. Then after that every second bit is taken from the middle column of the automaton with 2n iterations, where n is the number of bits of the hash (digest) value.
1713571415
Hero Member
*
Offline Offline

Posts: 1713571415

View Profile Personal Message (Offline)

Ignore
1713571415
Reply with quote  #2

1713571415
Report to moderator
1713571415
Hero Member
*
Offline Offline

Posts: 1713571415

View Profile Personal Message (Offline)

Ignore
1713571415
Reply with quote  #2

1713571415
Report to moderator
1713571415
Hero Member
*
Offline Offline

Posts: 1713571415

View Profile Personal Message (Offline)

Ignore
1713571415
Reply with quote  #2

1713571415
Report to moderator
You get merit points when someone likes your post enough to give you some. And for every 2 merit points you receive, you can send 1 merit point to someone else!
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1713571415
Hero Member
*
Offline Offline

Posts: 1713571415

View Profile Personal Message (Offline)

Ignore
1713571415
Reply with quote  #2

1713571415
Report to moderator
1713571415
Hero Member
*
Offline Offline

Posts: 1713571415

View Profile Personal Message (Offline)

Ignore
1713571415
Reply with quote  #2

1713571415
Report to moderator
1713571415
Hero Member
*
Offline Offline

Posts: 1713571415

View Profile Personal Message (Offline)

Ignore
1713571415
Reply with quote  #2

1713571415
Report to moderator
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 01:52:04 PM
 #2

Here is the reason for why every other cell in the center column should be used instead of each consecutive cell/row:

"So can such rules be used for cryptography? I strongly suspect that they can, and that in fact they allow one to construct systems that are at least as secure to cryptanalysis as any that are known. ... The picture below shows evidence for this. The cells marked by dots have colors that are taken as given, and then the colors of other cells are filled in according to the average that is obtained by starting from all possible initial conditions." -- Stephen Wolfram, A New Kind of Science, pp. 603, 605.



Even fewer cells could be used since the image text says it would make the cryptanalysis even harder.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 03:56:50 PM
 #3

Here is a JavaScript implementation of my Rule 30 hash function: http://jsfiddle.net/d2g2R/
dentldir
Sr. Member
****
Offline Offline

Activity: 333
Merit: 250



View Profile
July 17, 2014, 04:37:46 PM
 #4

A relevant cryptanalysis can be found in:

Analysis of Pseudo Random Sequences Generated by Cellular Automata
Willi Meier, Othmar Stdelbach

It doesn't cover every other center bit but does address the CA as a whole.

1DentLdiRMv3dpmpmqWsQev8BUaty9vN3v
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 05:10:59 PM
 #5

A relevant cryptanalysis can be found in:

Analysis of Pseudo Random Sequences Generated by Cellular Automata
Willi Meier, Othmar Stdelbach

It doesn't cover every other center bit but does address the CA as a whole.


It seems that using cellular automata for hash functions is a viable option according to that paper: http://download.springer.com/static/pdf/386/chp%253A10.1007%252F3-540-46416-6_17.pdf?auth66=1405789057_31fc1522b59f8110ad57d060b99b9ccc&ext=.pdf I wonder how well my Rule 30 hash function performs. I have no clue how to test it. It would be good to have a general test that can be fed a bulk of sample hashes. I found this: http://hashtest.sourceforge.net but I can't determine whether the test is useful or not. I guess I need to learn more about cryptography.
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
July 17, 2014, 07:18:53 PM
 #6

I expect Rule 30 would be cryptographically as strong (or stronger) than SHA2.  (Side note: Rule 30 is used by the Mathematica function Random[])

Off the top of my head (and I could be very wrong here), I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions).  If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message).  Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa).  Your "hash value" would be the N central-most bits.  

Run Bitcoin Unlimited (www.bitcoinunlimited.info)
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 07:47:30 PM
 #7

Here is an improved version (live demo): http://jsfiddle.net/L2eNR/

In the first version a message of for example 160 characters 'a' would generate the same hash as 160 characters 'b' (actually zero). Not good. I simply wrapped the message modulo 80 with xor. In the new version there is an additional xor that is dependent on the characters in the message. And more rows are skipped initially in the new version to allow the bits to blend from left to right of the automaton.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 07:53:10 PM
 #8

I expect Rule 30 would be cryptographically as strong (or stronger) than SHA2.  (Side note: Rule 30 is used by the Mathematica function Random[])

Off the top of my head (and I could be very wrong here), I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions).  If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message).  Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa).  Your "hash value" would be the N central-most bits.  

The 1-dimensional automata only grow 2 cells per generation (iteration). And if every other row is skipped (for stronger cryptography) then it means that the automaton grows with 4 cells per generation.

In my hash function I first reduce the message to n characters (80 in my JavaScript example).
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 08:27:32 PM
Last edit: July 17, 2014, 08:51:41 PM by Anders
 #9

If you wanted to hash a 10 kbyte message with Rule 30, I think your initial conditions would have to be an array of 80,000 squares (encoding the bits in the message).  Then I think you'd need to step through at least 80,000 iterations to ensure that the bits on the far left have had a chance to influence the resultant bits on the far right (and vice versa).  Your "hash value" would be the N central-most bits.  

I forgot to comment on that. Yes, the bits in the original message must be blended like that. That's why I start with only 80 characters (640 bits) as initial condition in the automaton. And it seems that more than 640 iterations are needed initially. I think it has to do with that the cells need to be influenced left to right and then back right to left again.

Edit: But the initial "blending" can be calculated in a 1-dimensional array. Because the bits for the hash are taken from continuing expanding the cellular automaton in one dimension.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 09:12:23 PM
Last edit: July 17, 2014, 09:34:24 PM by Anders
 #10

The reduction of the message into 640 bits I make for the initial condition of the cellular automaton, is itself a hash function. And if that hash sucks in terms of collisions, then those collisions will remain even after the Rule 30 scrambling. The simple xor wrapping could be good enough though. If for example a hash is calculated for a 1 megabyte file, then the initial condition will contain 640 bits of complete mess (in a good sense) because of the xor of all the bytes in the file. And even for a very short message the Rule 30 scrambling will turn the hash into a highly random sequence of 256 bits. So the 256-bit hash in my JavaScript example should possibly work well for all types of messages of all lengths.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 17, 2014, 10:32:35 PM
 #11

Same hash function, just corrected array size: http://jsfiddle.net/n9Ssu/
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 18, 2014, 09:04:30 AM
Last edit: July 18, 2014, 03:30:50 PM by Anders
 #12

Here is new version with a random generator added to the initial condition: http://jsfiddle.net/Nem6N/

The rationale for that is that messages with repeated patterns can cause collisions when those patterns match the modulo wrap. Now different lengths of messages will produce different initial conditions regardless of how similar the messages are and regardless of what repeated patterns they contain.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 18, 2014, 03:46:34 PM
Last edit: July 19, 2014, 07:00:29 AM by Anders
 #13

Improved version without with random generator: http://jsfiddle.net/ALV5X/

Very fast for huge messages.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 18, 2014, 04:36:51 PM
 #14

... I think Rule 30 would require a lot more memory and that hashing times would be proportional to the square of the message length (instead of linear with the message length with standard hash functions). ...

Yes, if the full message was used as initial condition the algorithm would be O(n2) and require several times the message length storage (in a 1-dimentional array). What I do is to crunch down the message to an initial condition of max 640 bits (or whatever fixed value is chosen). This makes my hash function O(n) and the storage requirement fixed to a small value.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 19, 2014, 07:51:59 AM
 #15

Ouch. Professional cryptographers/cryptanalysts, the below article seems to say, usually only have time and interest to look at new algorithms if they have been published in the mainstream professional literature or developed by known cryptographers. AngryCheesy It's understandable since it probably takes a lot of time and effort to look at even simple cryptographic algorithms. Stephen Wolfram's quote I posted earlier can perhaps help a bit, since even though he's not a cryptographer he is a top expert in mathematics. (A cipher is different than a hash function but I think the article implicitly describes cryptography in general.)

"Memo to the Amateur Cipher Designer

Congratulations. You've just invented this great new cipher, and you want to do something with it. You're new in the field; no one's heard of you, and you don't have any credentials as a cryptanalyst. You want to get well-known cryptographers to look at your work. What can you do?

Unfortunately, you have a tough road ahead of you. I see about two new cipher designs from amateur cryptographers every week. The odds of any of these ciphers being secure are slim. The odds of any of them being both secure and efficient are negligible. The odds of any of them being worth actual money are virtually non-existent.

Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break. It's not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis. And the only way to prove that is to subject the algorithm to years of analysis by the best cryptographers around.

"The best cryptographers around" break a lot of ciphers. The academic literature is littered with the carcasses of ciphers broken by their analyses. But they're a busy bunch; they don't have time to break everything. How do they decide what to look at?

Ideally, cryptographers should only look at ciphers that have a reasonable chance of being secure. And since anyone can create a cipher that he believes to be secure, this means that cryptographers should only look at ciphers created by people whose opinions are worth something. No one is impressed if a random person creates an cipher he can't break; but if one of the world's best cryptographers creates an cipher he can't break, now that's worth looking at." -- https://www.schneier.com/crypto-gram-9810.html#cipherdesign
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 19, 2014, 04:52:55 PM
 #16

"The notion of Computational Irreducibility (CIR) seems to have been first put forward by Wolfram. Given a physical system whose behavior can be calculated by simulating explicitly each step of its evolution, is it always possible to predict the outcome without tracing each step? Is there always a shortcut to go directly to the nth step? Wolfram conjectured [16, 17, 18] that in most cases the answer is no. While many computations admit shortcuts that allow them to be performed more rapidly, others cannot be sped up. Computations that cannot be sped up by means of any shortcut are called computationally irreducible. ...

In this context, Israeli and Goldenfeld in [9] have shown that some automata that are apparently computationally irreducible have nevertheless properties that are predictable. But these properties are obtained by coarse graining and don’t account for small scales details. Moreover some automata (rule 30 for example) seem to be impossible to coarse grain." -- http://arxiv.org/ftp/arxiv/papers/1304/1304.5247.pdf

The comment about that rule 30 "seems to be" computationally irreducible is actually stronger than it sounds. It's often difficult to give definite answers since some future method may break the assumption. However, the same is the case with provably secure hash functions:

"A cryptographic hash function has provable security against collision attacks if finding collisions is provably polynomial-time reducible from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable.

It means that if finding collisions would be feasible in polynomial time by algorithm A, we could find and use polynomial time algorithm R (reduction algorithm) that would use algorithm A to solve problem P, which is widely supposed to be unsolvable in polynomial time." -- http://en.wikipedia.org/wiki/Provably_secure_cryptographic_hash_function#Provably_secure_hash_functions

Notice the description "widely supposed". Again, no definite answer even when the hash function is provably secure.
Peter R
Legendary
*
Offline Offline

Activity: 1162
Merit: 1007



View Profile
July 19, 2014, 06:41:13 PM
 #17

I don't think this thread is generating much interest because your proposal doesn't seem to be very useful.  Even if it worked, why would anyone choose this over a widely-analyzed standard like SHA2?  And the fact that you are hashing the message (you called it "crunching to 640 bits") before you hash the message (with Rule 30) suggests that perhaps you are choosing the wrong tool for the job.   

That being said, I do find Wolfram's Theory of Computation Irreducibility very interesting.  I expect that he is right, and I think it would imply that there exists a certain group of hash functions that are fundamentally impenetrable by cryptanalysis.  In essence, they are already fully dense in their complexity.  As far as I know, no one has shown the Rule 30 has any predictability at all--and I know a lot of people would love to prove Stephen Wolfram wrong.  But even if Rule 30 was computationally irreducible, it's still not necessarily useful as a hash function because it is at least O(n^2) with message length to use it in "raw form" (without your pre-hashing).  And then any further changes you make to improve its memory and speed performance may affect its security--which means a lot more analysis.  So I think this is really just of theoretical interest at this point. 

Run Bitcoin Unlimited (www.bitcoinunlimited.info)
DeathAndTaxes
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
July 19, 2014, 06:56:23 PM
Last edit: July 20, 2014, 07:03:34 PM by DeathAndTaxes
 #18

I have to agree with Peter.  

Quote
Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break.
Schneier's law

A cryptographic primitive becomes more valuable the more it is used (which creates a self reinforcing cycle).  Better known and deployed algorithms are more likely to be researched that esoteric ones.  The longer the algorithm is around without any published breaks the higher the confidence that the algorithm can't be broken at the current time.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 19, 2014, 07:21:20 PM
 #19

I don't think this thread is generating much interest because your proposal doesn't seem to be very useful.  Even if it worked, why would anyone choose this over a widely-analyzed standard like SHA2?  And the fact that you are hashing the message (you called it "crunching to 640 bits") before you hash the message (with Rule 30) suggests that perhaps you are choosing the wrong tool for the job.    

That being said, I do find Wolfram's Theory of Computation Irreducibility very interesting.  I expect that he is right, and I think it would imply that there exists a certain group of hash functions that are fundamentally impenetrable by cryptanalysis.  In essence, they are already fully dense in their complexity.  As far as I know, no one has shown the Rule 30 has any predictability at all--and I know a lot of people would love to prove Stephen Wolfram wrong.  But even if Rule 30 was computationally irreducible, it's still not necessarily useful as a hash function because it is at least O(n^2) with message length to use it in "raw form" (without your pre-hashing).  And then any further changes you make to improve its memory and speed performance may affect its security--which means a lot more analysis.  So I think this is really just of theoretical interest at this point.  

In Bitcoin SHA-256 is used twice iirc; first a hash is generated then that hash is hashed again. From a speculative conspiratorial view, the NSA may have given the public community the SHA family of hash functions because the NSA can break them, for national security reasons. So from that perspective a rule 30 hash function may be useful for privacy reasons.

For example, instead of starting with my simple initial condition, the initial condition could for example be a SHA-256 or SHA-512 hash. My simple initial condition may be enough though since the rule 30 scrambling generates a highly random hash even from plain text as the initial condition.
Anders (OP)
Full Member
***
Offline Offline

Activity: 126
Merit: 100



View Profile
July 19, 2014, 07:44:12 PM
Last edit: July 20, 2014, 12:31:02 AM by Anders
 #20

It's easy to generate a collision for my hash function by the fact that the initial condition is generated with such a simple algorithm. Here is one collision: the byte (0=0x00, 1=0x01) sequence 1000000000000000000000000000000000000000000000000000000000000000000000000000000 00 will generate the same hash as 0100000000000000000000000000000000000000000000000000000000000000000000000000000 01.

Yet I believe that's just because the pigeon hole principle shows that collisions will always occur when a longer message is converted into a shorter hash value.

"In mathematics, the pigeonhole principle states that if n items are put into m containers, with n > m, then at least one container must contain more than one item.[1] ...  For example, collisions are inevitable in a hash table because the number of possible keys exceeds the number of indices in the array. A hashing algorithm, no matter how clever, cannot avoid these collisions." -- http://en.wikipedia.org/wiki/Pigeonhole_principle

For fun, try to crack this hash:
Code:
6ab4a25ef4c85a3197674b2796c40c71ccc81a8f82297a97020ad8ffc391ca4b
http://jsfiddle.net/ALV5X/

Hint: the message is 7 characters long and starts with 'bitcoi'. Grin

When you have cracked the message you can try to find another different message that generates the same hash.
Pages: [1] 2 3 4 5 6 7 8 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!