Rule 30 automaton as hash function

grau

Hero Member

Offline

Activity: 836
Merit: 1021

bits of proof

Re: Rule 30 automaton as hash function

July 27, 2014, 08:25:18 PM

#101

If I were to use this for real (or had more time for fun) I'd rewrite it using a vector of longs and also parallelise it since computation on (groups of) longs can be executed with CPUs cores in parallel only synchronising at change of generation.

I am sure the BigInteger with the analytic form of rule 30 already beats your javascript with 2 magnitudes Tongue

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 27, 2014, 08:59:24 PM

#102

I think with machine code the shifts of the 64 bit registers can automatically be chained. Maybe assembler code in C/C++ would be efficient. Of course, a hardware implementation could be made even much faster than that.

In reality only Bitcoin miners who would use dedicated R30 hardware would be competitive.

grau

Hero Member

Offline

Activity: 836
Merit: 1021

bits of proof

Re: Rule 30 automaton as hash function

July 27, 2014, 10:36:35 PM
Last edit: July 27, 2014, 10:53:03 PM by grau

#103

Quote from: Anders on July 27, 2014, 08:59:24 PM

I think with machine code the shifts of the 64 bit registers can automatically be chained. Maybe assembler code in C/C++ would be efficient. Of course, a hardware implementation could be made even much faster than that.

In reality only Bitcoin miners who would use dedicated R30 hardware would be competitive.

Sure. Writing software for this is just for research and fun.

The best in class code in whatever software layer will be still lots of magnitudes away of an ASIC for R30, since it is utmost simple, homogenous and parallelizable at finest scale with constant memory (if any) need.

This efficiency combined with irreducibility of computation, no inverse and ability to satisfy any difficulty would make it the perfect proof of work.

The ability to satisfy any difficulty is something that yet bugs me.. I wish I could convince myself that it has this property, in the limited generation of 7N.

Since there is 80 bytes input to determine 32 bytes output, there is plenty of freedom to create whatever pattern though.

grau

Hero Member

Offline

Activity: 836
Merit: 1021

bits of proof

Re: Rule 30 automaton as hash function

July 28, 2014, 08:59:31 AM

#104

Quote from: grau on July 27, 2014, 08:25:18 PM

If I were to use this for real (or had more time for fun) I'd rewrite it using a vector of longs and also parallelise it since computation on (groups of) longs can be executed with CPUs cores in parallel only synchronising at change of generation.

I am sure the BigInteger with the analytic form of rule 30 already beats your javascript with 2 magnitudes Tongue

It turns out that the performance of the below is pretty dismal if compared with SHA-256 as implemented by the standard java runtime libraries. My guess is that memory allocation / shuffling for increasing length of BigIntegers dominates it. This needs a pre-allocated store of the right size.

Code:

	private static BigInteger r30hash (BigInteger input)
	{
		BigInteger result = BigInteger.ZERO;
		for ( int i = 0; i < (80 * 7 + 32) * 8; ++i )
		{
			input = input.xor (input.shiftLeft (1).or (input.shiftLeft (2)));
			if ( i >= 80 * 7 * 8  && input.testBit (i + 1) )
			{
				result = result.setBit (0);
			}
			result = result.shiftLeft (1);
		}
		return result;
	}

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 09:23:05 AM

#105

Fast Java implementation of R30 using 64-bit longs:

Code:

	public final byte[] digest(byte[] message) {
		int maxMessageBytes = message.length;
		int maxKeyBytes = maxMessageBytes + 4;
		int maxKeyBits = maxKeyBytes * 8;
		byte[] hash = new byte[MAX_HASH_BYTES];
		byte[] key = new byte[maxKeyBytes];
		for (int i = 0; i < maxMessageBytes; i++) {
			key[i + 2] = message[i];
		}
		key[0] = (byte) (maxMessageBytes >>> 24);
		key[1] = (byte) (maxMessageBytes >>> 16);
		key[maxKeyBytes - 2] = (byte) (maxMessageBytes >>> 8);
		key[maxKeyBytes - 1] = (byte) maxMessageBytes;
		int maxHashBits = MAX_HASH_BYTES * 8;
		int skipRows = maxKeyBits * 7;
		int maxCells = 2;
		maxCells += maxKeyBits;
		maxCells += skipRows;
		maxCells += maxHashBits * 2;
		int maxLongs = (maxCells + 63) >>> 6;
		maxCells = maxLongs << 6;
		int cellsMid = maxCells / 2;		
		long[] cells = new long[maxLongs];
		int keyStart = (maxCells - maxKeyBits) / 2;
		for (int i = 0; i < key.length; i++) {
			int keyChar = key[i];
			int bitPos = 0x80;
			for (int j = 0; j < 8; j++) {
				long b = (keyChar & bitPos) >>> (7 - j);
				int bitIndex = keyStart + i * 8 + j;
				cells[bitIndex >>> 6] |= b << (63 - (bitIndex % 64));
				bitPos >>>= 1;
			}
		}
		int bitCount = 0;
		int mid = 0;
		int longMid = maxLongs / 2;
		int longMidShift = longMid * 2 == maxLongs ? 63 : 31;
		int maxRow = skipRows + maxHashBits * 2;
		for (int row = 0; row < maxRow; row++) {
			int doubleRow = row * 2;
			int calcWidth = doubleRow;
			if (calcWidth > maxRow - 2) {
				calcWidth = maxRow - ((doubleRow) % maxRow) + 2;
			} else {
				calcWidth += maxKeyBits;
			}
			int halfWidth = calcWidth / 2 + 2;
			int start = (cellsMid - halfWidth) >>> 6;
			int end = (cellsMid + halfWidth + 63) >>> 6;
			mid = (int) ((cells[longMid] >>> longMidShift) & 0x01);
			long carryLeft = 0L;
			for (int i = start; i < end; i++) {
				long l = cells[i];
				long carryRight = i < maxLongs - 1 ? cells[i + 1] >>> 63 : 0;
				long cellRight = (l << 1) | carryRight;
				long cellLeft = (l >>> 1) | carryLeft;
				carryLeft = l << 63;
				cells[i] = cellLeft ^ (l | cellRight);
			}
			if (row < skipRows) {
				continue;
			}
			if (row % 2 == 1) {
				if (mid == 1) {
					int bufPos = bitCount >>> 3;
					hash[bufPos] ^= 1 << (bitCount % 8);
				}
				bitCount++;
			}
		}
		return hash;
	}

Slower JavaScript alternative: http://jsfiddle.net/7DV7Z/

grau

Hero Member

Offline

Activity: 836
Merit: 1021

bits of proof

Re: Rule 30 automaton as hash function

July 28, 2014, 12:11:37 PM

#106

Remove message length encoding. That's a foreign concept to a hash function.

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 12:25:41 PM

#107

I did a worst case test with one million bits messages of 010101010... and it works:

The R30 hash for UUU...U (1000000 bits) is: 10ea3662363b4940ed208b2a70960f7520bd5309e3f059ea5d106cdde9f2d7a1
The R30 hash for UUU...V (1000000 bits) is: 8903d17a9ec35cce2f028907e0fef7ccabbbd12d8f4a9f635e1d0503f395b1e9

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 12:28:28 PM

#108

Quote from: grau on July 28, 2014, 12:11:37 PM

Remove message length encoding. That's a foreign concept to a hash function.

The message length bits are needed. Otherwise for example messages 000 and 000000 would result in the same initial condition and have the same hash value.

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 12:39:53 PM

#109

Aha. Here is how the length bits are handled in SHA-256:

"Pre-processing:
append the bit '1' to the message
append k bits '0', where k is the minimum number >= 0 such that the resulting message
length (modulo 512 in bits) is 448.
append length of message (without the '1' bit or padding), in bits, as 64-bit big-endian integer
(this will make the entire post-processed length a multiple of 512 bits)" -- http://en.wikipedia.org/wiki/SHA-2#Pseudocode

That's actually similar to how I do it in my current R30 version.

grau

Hero Member

Offline

Activity: 836
Merit: 1021

bits of proof

Re: Rule 30 automaton as hash function

July 28, 2014, 12:48:24 PM

#110

The hash function should be defined for certain a block size.

Padding is an optional preprocessing step for the case the block size is not met exactly at input, and is then on bits not bytes.
I would leave that aside until the algorithm is not settled.

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 01:07:54 PM

#111

Quote from: grau on July 28, 2014, 12:48:24 PM

The hash function should be defined for certain a block size.

Padding is an optional preprocessing step for the case the block size is not met exactly at input, and is then on bits not bytes.
I would leave that aside until the algorithm is not settled.

I think the fixed block size used in many hash functions today simply is a consequence (limitation) of using the Davies–Meyer compression function.

Treating the whole message as a single unit could be more robust.

Instead of adding length bits in R30 I could simply add for example 1 before and after the message as a preprocessing. That would work too since then even messages containing only zeros would be different when they have different lengths.

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 02:47:02 PM

#112

Maybe not for Bitcoin but the O(n²) of R30 makes it very computationally intensive for long messages.

One way of reducing the calculations for long messages is to use the Merkle–Damgård construction together with Davies–Meyer single-block-length compression function.

With a message of 1,000,000 bits, taking the square of that results in 10¹². By dividing that message into fixed-sized blocks of 1000 bits the result is 1000² * 1000 = 10⁹. A three orders of magnitude improvement.

A similar technique is used in SHA-2 and other standard hash functions.

DeathAndTaxes

Donator
Legendary

Offline

Activity: 1218
Merit: 1079

Gerald Davis

Re: Rule 30 automaton as hash function

July 28, 2014, 03:02:39 PM

#113

Quote from: Anders on July 28, 2014, 02:47:02 PM

Maybe not for Bitcoin but the O(n²) of R30 makes it very computationally intensive for long messages.

One way of reducing the calculations for long messages is to use the Merkle–Damgård construction together with Davies–Meyer single-block-length compression function.

With a message of 1,000,000 bits, taking the square of that results in 10¹². By dividing that message into fixed-sized blocks of 1000 bits the result is 1000² * 1000 = 10⁹. A three orders of magnitude improvement.

A similar technique is used in SHA-2 and other standard hash functions.

I think you have to consider what is your intention for using R30? As a general purpose non-cryptographic hash? As a general purpose cryptographic hash? Or as a PoW function?

Irreducibility of computation isn't really that important for the first two categories but is (in theory) pretty important for a PoW. Say someone found a way to produce SHA-256 hashes 10,000x faster for a given amount of die space. It wouldn't do anything to compromise the security of SHA-2 as a cryptographic hash but it would allow one to exploit the bitcoin network for cheap.

This is why I indicated using something like:

Code:

R30(nonce + H(blockheader)) < target

Where:
H= a secure cryptographic hashing function (i.e. SHA-2)
Blockheader = arbitrary length blockheader (excluding the nonce)
Nonce = 64 bit (don't make Satoshi's mistake of using a nonce "too small")

grau

Hero Member

Offline

Activity: 836
Merit: 1021

bits of proof

Re: Rule 30 automaton as hash function

July 28, 2014, 03:13:15 PM

#114

Quote from: DeathAndTaxes on July 28, 2014, 03:02:39 PM

I think you have to consider what is your intention for using R30? As a general purpose non-cryptographic hash? As a general purpose cryptographic hash? Or as a PoW function?

Irreducibility of computation isn't really that important for the first two categories but is (in theory) pretty important for a PoW. Say someone found a way to produce SHA-256 hashes 10,000x faster for a given amount of die space. It wouldn't do anything to compromise the security of SHA-2 as a cryptographic hash but it would allow one to exploit the bitcoin network for cheap.

This is why I indicated using something like:

Code:

R30(nonce + H(blockheader)) < target

Where:
H= a secure cryptographic hashing function (i.e. SHA-2)
Blockheader = arbitrary length blockheader (excluding the nonce)
Nonce = 64 bit (don't make Satoshi's mistake of using a nonce "too small")

Agree, R30 is interesting as perfect POW not as much as cryptographic hash.

As side note: I am not sure Satoshi made a mistake here. Maybe he wanted the block header is updated with new transactions instead of rolling a nonce until a block is found.

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 03:21:22 PM

#115

Quote from: DeathAndTaxes on July 28, 2014, 03:02:39 PM

I think you have to consider what is your intention for using R30? As a general purpose non-cryptographic hash? As a general purpose cryptographic hash? Or as a PoW function?

Irreducibility of computation isn't really that important for the first two categories but is (in theory) pretty important for a PoW. Say someone found a way to produce SHA-256 hashes 10,000x faster for a given amount of die space. It wouldn't do anything to compromise the security of SHA-2 as a cryptographic hash but it would allow one to exploit the bitcoin network for cheap.

This is why I indicated using something like:

Code:

R30(nonce + H(blockheader)) < target

Where:
H= a secure cryptographic hashing function (i.e. SHA-2)
Blockheader = arbitrary length blockheader (excluding the nonce)
Nonce = 64 bit (don't make Satoshi's mistake of using a nonce "too small")

I believe R30 is an excellent hash function for cryptography. Although that's a risky assumption since not even Stephen Wolfram has a definite proof of that I assume. My aim is for R30 to be a general hash function. Using it for Bitcoin proof of work would be an interesting test of concept.

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 28, 2014, 04:26:17 PM

#116

Quote from: DeathAndTaxes on July 28, 2014, 03:02:39 PM

Code:

R30(nonce + H(blockheader)) < target

Where:
H= a secure cryptographic hashing function (i.e. SHA-2)
Blockheader = arbitrary length blockheader (excluding the nonce)
Nonce = 64 bit (don't make Satoshi's mistake of using a nonce "too small")

I don't know much about Bitcoin, but the nonce could start with 64 bits and then expanded in the future if needed! R30 would work with total length nonce + H(blockheader) being variable.

Peter R

Legendary

Offline

Activity: 1162
Merit: 1007

Re: Rule 30 automaton as hash function

July 28, 2014, 07:26:49 PM
Last edit: July 29, 2014, 04:12:47 AM by Peter R

#117

Quote from: grau on July 27, 2014, 06:40:06 PM

Quote from: Peter R on July 27, 2014, 05:38:14 PM

I don't believe it's a requirement to select the hash bits from the central column ...

It is apparent that chaos converges to order toward both left and right, so intuition says to stay in the middle if you want maximum entropy.

But if it wasn't a maximal entropy process, then it should be possible to peel out the predictable part in order to short-cut the computation. In other words, the computation wouldn't be irreducible.

For irreducibility to be a useful concept, doesn't it have to be binary: either it's reducible or its not? The left side of R30 is reducible along with the far right edge (I think), and the output is reducible for a certain number of rows for an enumerable set of initial conditions, but excluding these, I think the theory is that the rest is irreducible. So I think if you can prove that the column to the right of center has less entropy than the center column, then you've also proven that R30 is not irreducible (something that AFAIK no one has been able to do).

Run Bitcoin Unlimited (www.bitcoinunlimited.info)

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 29, 2014, 02:47:37 AM
Last edit: July 29, 2014, 03:40:09 AM by Anders

#118

Quote from: grau on July 28, 2014, 12:11:37 PM

Remove message length encoding. That's a foreign concept to a hash function.

I came to think of a way to remove the length bits. By limiting the initial condition to a fixed length of 256 bits. And if the message is larger than 255 bits then the Merkle–Damgård construction is used by turning the rest of the message into 256-bit blocks. The preprocessing is then only to add the bit value 1 and pad the message with zeros to nearest 256-bit boundary. If a message is shorter than 255 bits then it gets padded with the one bit and zeros into a single 256-bit block.

And instead of dealing with bit resolution byte resolution can be used for convenience. Then the byte 0x01 is added to the message plus padding with zero bytes 0x00 to a total of 32 bytes per block.

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 29, 2014, 01:21:50 PM
Last edit: July 30, 2014, 02:23:53 PM by Anders

#119

R30 with the Merkle–Damgård construction:

Code:

	private final static int MAX_HASH_BYTES = 32;
	private final static int BLOCK_SIZE_BYTES = MAX_HASH_BYTES;

	public final byte[] digest(InputStream is) throws IOException {
		byte[] digest = null;
		byte[] block = new byte[BLOCK_SIZE_BYTES];
		int bytesRead = 0;
		long totalBytesRead = 0;
		while (bytesRead != -1) {
			bytesRead = 0;
			int blockBytesRead = 0;
			while (blockBytesRead < BLOCK_SIZE_BYTES && bytesRead != -1) {
				bytesRead = is.read(block, blockBytesRead, 
						BLOCK_SIZE_BYTES - blockBytesRead);
				if (bytesRead > 0) {
					blockBytesRead += bytesRead;
				}
			}
			totalBytesRead += blockBytesRead;
			if (blockBytesRead < BLOCK_SIZE_BYTES) {
				for (int i = blockBytesRead; i < BLOCK_SIZE_BYTES; i++) {
					block[i] = 0;
				}
				if (BLOCK_SIZE_BYTES - blockBytesRead >= 8) {
					for (int i = 1, j = 0; i < 9; i++, j += 8) {
						block[BLOCK_SIZE_BYTES - i] =
								(byte) (totalBytesRead >>> j);
					}
				} else {
					bytesRead = 0;
				}
			}
			if (digest != null) {
				for (int i = 0; i < BLOCK_SIZE_BYTES; i++) {
					block[i] ^= digest[i];
				}
			} else {
				digest = new byte[BLOCK_SIZE_BYTES];
			}
			byte[] nextDigest = digestBlock(block);
			for (int i = 0; i < BLOCK_SIZE_BYTES; i++) {
				digest[i] ^= nextDigest[i];
			}
		}
		return digest;
	}
	
	private final byte[] digestBlock(byte[] block) {
		int maxKeyBytes = BLOCK_SIZE_BYTES + 1;
		int maxKeyBits = maxKeyBytes * 8;
		byte[] hash = new byte[MAX_HASH_BYTES];
		byte[] key = new byte[maxKeyBytes];
		for (int i = 0; i < BLOCK_SIZE_BYTES; i++) {
			key[i] = block[i];
		}
		key[BLOCK_SIZE_BYTES] = 0x01;
		int maxHashBits = MAX_HASH_BYTES * 8;
		int skipRows = maxKeyBits * 7;
		int maxCells = 2;
		maxCells += maxKeyBits;
		maxCells += skipRows;
		maxCells += maxHashBits * 2;
		int maxLongs = (maxCells + 63) >>> 6;
		maxCells = maxLongs << 6;
		int cellsMid = maxCells / 2;		
		long[] cells = new long[maxLongs];
		int keyStart = (maxCells - maxKeyBits) / 2;
		for (int i = 0; i < key.length; i++) {
			int keyChar = key[i];
			int bitPos = 0x80;
			for (int j = 0; j < 8; j++) {
				long b = (keyChar & bitPos) >>> (7 - j);
				int bitIndex = keyStart + i * 8 + j;
				cells[bitIndex >>> 6] |= b << (63 - (bitIndex % 64));
				bitPos >>>= 1;
			}
		}
		int bitCount = 0;
		int mid = 0;
		int longMid = maxLongs / 2;
		int longMidShift = longMid * 2 == maxLongs ? 63 : 31;
		int maxRow = skipRows + maxHashBits * 2;
		for (int row = 0; row < maxRow; row++) {
			int doubleRow = row * 2;
			int calcWidth = doubleRow;
			if (calcWidth > maxRow - 2) {
				calcWidth = maxRow - ((doubleRow) % maxRow) + 2;
			} else {
				calcWidth += maxKeyBits;
			}
			int halfWidth = calcWidth / 2 + 2;
			int start = (cellsMid - halfWidth) >>> 6;
			int end = (cellsMid + halfWidth + 63) >>> 6;
			mid = (int) ((cells[longMid] >>> longMidShift) & 0x01);
			long carryLeft = 0L;
			for (int i = start; i < end; i++) {
				long l = cells[i];
				long carryRight = i < maxLongs - 1 ?
						cells[i + 1] >>> 63 : 0;
				long cellRight = (l << 1) | carryRight;
				long cellLeft = (l >>> 1) | carryLeft;
				carryLeft = l << 63;
				cells[i] = cellLeft ^ (l | cellRight);
			}
			if (row < skipRows) {
				continue;
			}
			if (row % 2 == 1) {
				if (mid == 1) {
					int bufPos = bitCount >>> 3;
					hash[bufPos] |= 1 << (7 - (bitCount % 8));
				}
				bitCount++;
			}
		}
		return hash;
	}

Anders (OP)

Full Member

Offline

Activity: 126
Merit: 100

Re: Rule 30 automaton as hash function

July 29, 2014, 03:44:21 PM

#120

I noticed that the rightmost bit in initial condition needs to be 1. If all the bits are zero on the right side of the initial condition then the hash values become nonrandom and with collisions for similar messages. The whole left side of the Rule 30 cellular automaton is nonrandom it seems.