Potentially faster method for mining on the CPU

botnet (OP)

Newbie

Offline

Activity: 35
Merit: 0

Re: Potentially faster method for mining on the CPU

August 17, 2013, 09:31:37 AM

#41

Quote from: jlcooke on August 15, 2013, 08:41:58 PM

Hehe. I like encouraging people to explore this. Good learning "exorcize" (pun). 32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce! Tongue

your c1...c31 bits can be reduced by one instruction

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0

ZirconiumX

Full Member

Offline

Activity: 286
Merit: 100

Re: Potentially faster method for mining on the CPU

August 17, 2013, 01:16:37 PM
Last edit: August 18, 2013, 08:07:31 AM by ZirconiumX

#42

Quote from: botnet on August 17, 2013, 09:31:37 AM

Quote from: jlcooke on August 15, 2013, 08:41:58 PM

Hehe. I like encouraging people to explore this. Good learning "exorcize" (pun). 32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce! Tongue

your c1...c31 bits can be reduced by one instruction

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0

~~Replacing ADD with MUL makes no difference.~~ It does indeed, since I didn't see that a₁b₁ was a multiplication, not a value.

Matthew:out

faiza1990

Sr. Member

Offline

Activity: 420
Merit: 250

★☆★777Coin★☆★

Re: Potentially faster method for mining on the CPU

August 17, 2013, 01:27:13 PM

#43

Minning could return to normal people, minning Undecided

▓▓▓ BITMIXER.IO ▓▓▓▓▓▓ High Volume Bitcoin MIXER ▓▓▓

botnet (OP)

Newbie

Offline

Activity: 35
Merit: 0

Re: Potentially faster method for mining on the CPU

August 19, 2013, 04:03:09 PM

#44

Quote from: jesse11 on August 02, 2013, 09:54:00 PM

Thoughts? Make sure you have 32GB of RAM installed Grin

Do you guys know how often 'Version' changes in the block header? Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

smolen

Hero Member

Offline

Activity: 524
Merit: 500

Re: Potentially faster method for mining on the CPU

August 19, 2013, 04:20:42 PM

#45

Quote from: botnet on August 19, 2013, 04:03:09 PM

can any bits in the block header can be treated as constant

hashMerkleRoot Grin

Of course I gave you bad advice. Good one is way out of your price range.

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 19, 2013, 04:28:54 PM

#46

Quote from: botnet on August 17, 2013, 09:31:37 AM

your c1...c31 bits can be reduced by one instruction

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0

Yes, but at the expense of 1 extra depth in the circuit. The form on the site is for shortest path - ideal for implementation.

c1 == a1b1 + a1c0 + b1c0
== a1(b1+c0) + b1c0
or
== a1b1 + c0(a1 + b1)
or
== b1(a1 + c0) + a1c0

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 19, 2013, 04:46:37 PM

#47

Quote from: botnet on August 19, 2013, 04:03:09 PM

Quote from: jesse11 on August 02, 2013, 09:54:00 PM

Thoughts? Make sure you have 32GB of RAM installed Grin

Do you guys know how often 'Version' changes in the block header? Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

The last version change IIRC was at the last 50%+1 blockchain split earlier in the year.

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 20, 2013, 09:12:13 PM

#48

Quote from: botnet on August 19, 2013, 04:03:09 PM

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

http://jheusser.github.io/2013/02/03/satcoin.html

SAT solvers tried, didn't work. But I'd still like to see a set of 256 monster logic formula with 2048 bit inputs for the SHA256 round function.

smolen

Hero Member

Offline

Activity: 524
Merit: 500

Re: Potentially faster method for mining on the CPU

August 21, 2013, 01:45:21 AM

#49

Quote from: jlcooke on August 20, 2013, 09:12:13 PM

http://jheusser.github.io/2013/02/03/satcoin.html
SAT solvers tried, didn't work

...in such a naïve way

Of course I gave you bad advice. Good one is way out of your price range.

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 22, 2013, 06:57:07 PM

#50

Quote from: botnet on August 19, 2013, 04:03:09 PM

Quote from: jesse11 on August 02, 2013, 09:54:00 PM

Thoughts? Make sure you have 32GB of RAM installed Grin

Do you guys know how often 'Version' changes in the block header? Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization. The fewer variables in these equations the better

Any updates?

botnet (OP)

Newbie

Offline

Activity: 35
Merit: 0

Re: Potentially faster method for mining on the CPU

August 23, 2013, 04:51:08 PM

#51

Quote from: jlcooke on August 22, 2013, 06:57:07 PM

Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel. I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached; It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree; I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children. This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption. Any suggestions?

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 23, 2013, 05:06:21 PM

#52

Quote from: botnet on August 23, 2013, 04:51:08 PM

Quote from: jlcooke on August 22, 2013, 06:57:07 PM

Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel. I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached; It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree; I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children. This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption. Any suggestions?

Cool. How are you generating the formulas? Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits. Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.

botnet (OP)

Newbie

Offline

Activity: 35
Merit: 0

Re: Potentially faster method for mining on the CPU

August 26, 2013, 06:40:30 AM

#53

Quote from: jlcooke on August 23, 2013, 05:06:21 PM

Cool. How are you generating the formulas? Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits. Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica. BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better. Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact. It also consumed a modest 300 megs of memory for 8 bits. Still learning the maple syntax so things might improve.

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 26, 2013, 12:24:24 PM

#54

Quote from: botnet on August 26, 2013, 06:40:30 AM

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica. BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better. Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact. It also consumed a modest 300 megs of memory for 8 bits. Still learning the maple syntax so things might improve.

OK I think I see now.

I'll PM you a link to my C code for logic reduction in a few mins. You can compile C right? And know how to code? It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

!(AB) = !A + !B,
A = A + AB, and
A + B = A + !AB

It takes about 4 hours to complete a single 32-bit addition. The API has all the basic operations you want (AND, OR, XOR, NOT, etc).

botnet (OP)

Newbie

Offline

Activity: 35
Merit: 0

Re: Potentially faster method for mining on the CPU

August 27, 2013, 03:48:04 AM

#55

Quote from: jlcooke on August 26, 2013, 12:24:24 PM

I'll PM you a link to my C code for logic reduction in a few mins. You can compile C right? And know how to code? It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

This is excellent by the way

I will start integrating with my code. Right now my stuff is all C#, so I will have to compile your code into a lib and do marshaling - though at first glance of your API I don't think marshaling will have too bad of a perf impact. I'll try with c=a+b to test the runtime and validate the results match yours.

Do you know how your de Morgan approach compares to this Quine McCluskey one I keep reading about? http://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 27, 2013, 02:16:16 PM
Last edit: August 27, 2013, 05:36:29 PM by jlcooke

#56

Quote from: botnet on August 27, 2013, 03:48:04 AM

Quote from: jlcooke on August 26, 2013, 12:24:24 PM

I'll PM you a link to my C code for logic reduction in a few mins. You can compile C right? And know how to code? It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

This is excellent by the way

I will start integrating with my code. Right now my stuff is all C#, so I will have to compile your code into a lib and do marshaling - though at first glance of your API I don't think marshaling will have too bad of a perf impact. I'll try with c=a+b to test the runtime and validate the results match yours.

Do you know how your de Morgan approach compares to this Quine McCluskey one I keep reading about? http://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm

Quine-McCluskey is the analytical twin of Karnaugh Maps (http://en.wikipedia.org/wiki/Karnaugh_map)

Basically, QMC/KMap takes I/O and creates formulas. My lib lets you build the formulas directly and simplify them. You *could* create the formulas based on I/Os I guess. But since we know the SHA-256 formulas you could skip that step.

In fact you really want to, otherwise you will never have the correct formula unless you iterate all possible inputs for the SHA256 compression function (2^256 + 2^(W-array inputs) ).

Even 2^32 addition is going to be massive. The nth (0..31) output bit of 32bit-add will be 2^(n+1) terms long. You're only hope is multiple 32bit-adds start simplifying.

Download logred_v3.zip - there was a bug in performing (A+BCD+EFG+...)(HI+!JK+...)(...)... operations.

Cheers

Edit:
Playing a bit here, turns out the A=A+AB and A+B=A+!AB reductions have no effect in 32bit add. Guess that's what makes it a nice 32bit bijection.

Download logred_v4.zip which will be faster for your application.

You can disable those reduction algorithms by commenting out logred_set_reduce_2_A_AB() and logred_set_reduce_4_A_nAB_n() inside logred_set_reduce()

botnet (OP)

Newbie

Offline

Activity: 35
Merit: 0

Re: Potentially faster method for mining on the CPU

August 29, 2013, 04:10:15 PM

#57

Quote from: jlcooke on August 27, 2013, 02:16:16 PM

Playing a bit here, turns out the A=A+AB and A+B=A+!AB reductions have no effect in 32bit add. Guess that's what makes it a nice 32bit bijection.

Download logred_v4.zip which will be faster for your application.

You can disable those reduction algorithms by commenting out logred_set_reduce_2_A_AB() and logred_set_reduce_4_A_nAB_n() inside logred_set_reduce()

Been trying this out, so many questions...

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0? for me it outputs a00

2) Similarly I would expect (a + !a) = 1
   logred_set_scanf(a, "a00");
   logred_set_not(b, a);
   logred_set_xor(c, a, b);
= crash

3) Haven't debugged enough, but part 2 could be related to this?:
   logred_set_scanf(a, "!a00!b00"); = works
    logred_set_scanf(a, "!a00"); = crash

4) Actually, is there support for numeric literals (0, 1)? c = a+b is purely symbolic, but during SHA256 you add in those K-Constants which can sometimes help reduce things further.

4) x64 only? This is giving me heap problems:
#define VAR_T size_t
#define VAR_T_SIZE_BITS ( 8 * sizeof(VAR_T) )
#define VAR_A_BITS 256
#define VAR_A_LEN 4L

(worked around setting VAR_A_BITS = 128... maybe this is the source of all my problems?

)

jlcooke

Newbie

Offline

Activity: 46
Merit: 0

Re: Potentially faster method for mining on the CPU

August 29, 2013, 04:27:22 PM

#58

Note: working on Quine-McClusky now.

Always wanted a high speed lib for logic reductions ...

Quote from: botnet on August 29, 2013, 04:10:15 PM

Been trying this out, so many questions...

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0? for me it outputs a00

Nope. A or A == A. A or !A == 1 and A!A = 0

Quote from: botnet on August 29, 2013, 04:10:15 PM

2) Similarly I would expect (a + !a) = 1
   logred_set_scanf(a, "a00");
   logred_set_not(b, a);
   logred_set_xor(c, a, b);
= crash

a = "a00"
b = "!a00"
c = "a00" ^ "!a00"
= a00!(!a00) + !(a00)!a00
= a00 + !a00
= 1

Yup, it crashes. Comments out "logred_set_reduce_4_A_nAB_n(src)" in "logred_set_reduce()" to fix this.

It doesn't resolve down to "1" yet. soon.

Quote from: botnet on August 29, 2013, 04:10:15 PM

3) Haven't debugged enough, but part 2 could be related to this?:
logred_set_scanf(a, "!a00!b00"); = works
logred_set_scanf(a, "!a00"); = crash

Version I'm working on doesn't crash on this.

Quote from: botnet on August 29, 2013, 04:10:15 PM

4) Actually, is there support for numeric literals (0, 1)? c = a+b is purely symbolic, but during SHA256 you add in those K-Constants which can sometimes help reduce things further.

Humm, good point. Will not be ready for this release. Soon.

Quote from: botnet on August 29, 2013, 04:10:15 PM

4) x64 only? This is giving me heap problems:
#define VAR_T size_t
#define VAR_T_SIZE_BITS ( 8 * sizeof(VAR_T) )
#define VAR_A_BITS 256
#define VAR_A_LEN 4L

(worked around setting VAR_A_BITS = 128... maybe this is the source of all my problems?

)

The code should be x32/x64 agnostic. logred_v6.zip is a snapshot of what I have so far. Ignore the logred_set_reduce_5_Quine_McCluskey() functions.

The Makefile now lets you define a bunch of flags like VAR_A_BITS. But it should work for values up to 960. It may work beyond that with some tweeks to the variable string lenghts ("a960" and "a1024" are different lengths).

Cheers.

botnet (OP)

Newbie

Offline

Activity: 35
Merit: 0

Re: Potentially faster method for mining on the CPU

August 29, 2013, 09:40:11 PM

#59

Quote from: jlcooke on August 29, 2013, 04:27:22 PM

Quote from: botnet on August 29, 2013, 04:10:15 PM

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0? for me it outputs a00

Nope. A or A == A. A or !A == 1 and A!A = 0

Ahh, there was misunderstanding about the + operator.
I've been thinking about arithmetic operations mod 2, where "(a+b) mod 2" is equivalent to XOR
0+0 = 0, 0 mod 2 = 0
0+1 = 1, 1 mod 2 = 1
1+0 = 1, 1 mod 2 = 1
1+1 = 2, 2 mod 2 = 0
= XOR

Nancarrow

Hero Member

Offline

Activity: 492
Merit: 500

Re: Potentially faster method for mining on the CPU

August 30, 2013, 12:22:46 PM

#60

Best of luck with this, botnet, but I feel it's a wild goose chase. I first heard about bitcoins in autumn 2011 and immediately set about trying to 'simplify' sha256(sha256(x)) in the same way as you, by trying to find boolean expressions for the first bits of the output in terms of the 640 bits of input. As you've found out, it's the seemingly simple addition of two 32-bit numbers that's the big problem. It simply doesn't lend itself to being simplfied, as it's recursive. As you crawl along the bits from right to left the expression balloons exponentially. All the other operations in SHA-256, the logical connectives and the rotations/shifts are trivial by comparison.

You might want to look into the structure of the block header to see how the 640 bits change at different rates. Fr'instance I believe the version number (first 32 bits of the header, but in stupid-endian) has been 1 for most of bitcoin's history but may be 2 now or soon. That's 30 zero bits right there. Then you have the first (or, goddammit, probably last) 32 bits of the previous hash guaranteed to be zero. The difficulty changes every two weeks, the merkle hash every block, and so on.

But I still think you haven't quite grasped the magnitude of the P/NP problem yet! Then again, if the impossible ever gets done, it'll be as always, by someone who didn't know it was impossible.

If I've said anything amusing and/or informative and you're feeling generous:
1GNJq39NYtf7cn2QFZZuP5vmC1mTs63rEW