Bitcoin Forum
May 14, 2024, 04:43:14 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 »  All
  Print  
Author Topic: Potentially faster method for mining on the CPU  (Read 14410 times)
botnet (OP)
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 17, 2013, 09:31:37 AM
 #41

Hehe.  I like encouraging people to explore this.  Good learning "exorcize" (pun).  32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce!  Tongue

your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0


1715661794
Hero Member
*
Offline Offline

Posts: 1715661794

View Profile Personal Message (Offline)

Ignore
1715661794
Reply with quote  #2

1715661794
Report to moderator
1715661794
Hero Member
*
Offline Offline

Posts: 1715661794

View Profile Personal Message (Offline)

Ignore
1715661794
Reply with quote  #2

1715661794
Report to moderator
1715661794
Hero Member
*
Offline Offline

Posts: 1715661794

View Profile Personal Message (Offline)

Ignore
1715661794
Reply with quote  #2

1715661794
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
ZirconiumX
Full Member
***
Offline Offline

Activity: 286
Merit: 100



View Profile
August 17, 2013, 01:16:37 PM
Last edit: August 18, 2013, 08:07:31 AM by ZirconiumX
 #42

Hehe.  I like encouraging people to explore this.  Good learning "exorcize" (pun).  32 bit addition logic for botnet: http://jlcooke.ca/btc/sha256_logred.html

Just do that about 768 times and reduce!  Tongue

your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0




Replacing ADD with MUL makes no difference. It does indeed, since I didn't see that a1b1 was a multiplication, not a value.

Matthew:out
faiza1990
Sr. Member
****
Offline Offline

Activity: 420
Merit: 250


★☆★777Coin★☆★


View Profile
August 17, 2013, 01:27:13 PM
 #43

Minning could return to normal people, minning Undecided

botnet (OP)
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 19, 2013, 04:03:09 PM
 #44

Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
August 19, 2013, 04:20:42 PM
 #45

can any bits in the block header can be treated as constant
hashMerkleRoot Grin

Of course I gave you bad advice. Good one is way out of your price range.
jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 19, 2013, 04:28:54 PM
 #46

your c1...c31 bits can be reduced by one instruction Smiley

c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0

Yes, but at the expense of 1 extra depth in the circuit.  The form on the site is for shortest path - ideal for implementation.


c1 == a1b1 + a1c0 + b1c0
   == a1(b1+c0) + b1c0
or
   == a1b1 + c0(a1 + b1)
or
   == b1(a1 + c0) + a1c0
jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 19, 2013, 04:46:37 PM
 #47

Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

The last version change IIRC was at the last 50%+1 blockchain split earlier in the year.
jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 20, 2013, 09:12:13 PM
 #48

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

http://jheusser.github.io/2013/02/03/satcoin.html

SAT solvers tried, didn't work.  But I'd still like to see a set of 256 monster logic formula with 2048 bit inputs for the SHA256 round function.  Smiley
smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
August 21, 2013, 01:45:21 AM
 #49

...in such a naïve way Smiley

Of course I gave you bad advice. Good one is way out of your price range.
jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 22, 2013, 06:57:07 PM
 #50

Thoughts? Make sure you have 32GB of RAM installed   Grin

Do you guys know how often 'Version' changes in the block header?  Or more specifically, can any bits in the block header can be treated as constant for some extended period of time, for the purposes of mining?

I am running into some memory concerns, mostly due to all the parallelization.  The fewer variables in these equations the better Smiley

Any updates?

botnet (OP)
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 23, 2013, 04:51:08 PM
 #51

Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel.  I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached;  It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree;  I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children.  This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption.   Any suggestions? Smiley

jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 23, 2013, 05:06:21 PM
 #52

Any updates?

Memory is the big issue right now, trying to run simulations to determine how much ram will be needed.
I did runs with 4, 6, 7, and 8 symbolic input bits to see the peak working set of the mathematica kernel.  I picked bits from nonce to be symbolic, and the fixed bits came from a previously solved blockheader.

4 bits: 79,312 K
6 bits: 84,720 K
7 bits: 89,372 K
8 bits: 850,800 K

What's odd is it looked very linear until 8 bits were reached;  It's running now with 9 bits across 4 cores, I'll have those results in about 30 hours.

I also computed the equation length after PolynomialReduce at each node in the tree;  I trust all the data except for '7 bits' where I maybe have recorded it wrong:

4 bits: median: 25, max: 62
6 bits: median: 170, max:222
7 bits: median: 379, max:495
8 bits: median: 919, max:1218

There the growth does appear linear.

For 8 bits I also computed the number of nodes that contained two leaf node children.  This is essentially the number of equations that can be simplified in parallel (before any potential adjustments to the tree.) Is there a word for this type of node?

median: 17, max:96

I can post the output equations if anyone is interested.

Lastly, I would be open to try any other symbolic computation engines that support modulus->2 to compare speed and memory consumption.   Any suggestions? Smiley



Cool.  How are you generating the formulas?  Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits.  Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.
botnet (OP)
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 26, 2013, 06:40:30 AM
 #53

Cool.  How are you generating the formulas?  Iterating through each input possibility?

The 8th bit "explosion" is probably due to triggering the use of a 2nd path of 32bit addition.

The formulas for 32bit add (64 in, 32 out) get very large very quickly (http://jlcooke.ca/btc/sha256_logred_formulas.html) due to carry bits.  Result r_31 depends on a_31 .. a_0 and b_31 .. b_0.

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica.  BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better.   Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact.  It also consumed a modest 300 megs of memory for 8 bits.  Still learning the maple syntax so things might improve.
jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 26, 2013, 12:24:24 PM
 #54

To generate formulas, I do an in-order walk of the tree, and simplify at each level going up... although I'm not sure I fully understand your quesiton.

Quick update, I've been trying out Maple as a potential replacement for Mathematica.  BooleanSimplify chokes on even 8 bits, but doing simplify( (arithmetic expression mod 2, {a*a-a, b*b-b, c*c-c, ....}) does way better.   Maple seems to evaluate these expressions an order of magnitude faster than mathematica, although the resulting equations aren't as compact.  It also consumed a modest 300 megs of memory for 8 bits.  Still learning the maple syntax so things might improve.


OK I think I see now.

I'll PM you a link to my C code for logic reduction in a few mins.  You can compile C right?  And know how to code?  It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.
 
  • !(AB) = !A + !B,
  • A = A + AB, and
  • A + B = A + !AB

It takes about 4 hours to complete a single 32-bit addition.  The API has all the basic operations you want (AND, OR, XOR, NOT, etc).
botnet (OP)
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 27, 2013, 03:48:04 AM
 #55

I'll PM you a link to my C code for logic reduction in a few mins.  You can compile C right?  And know how to code?  It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

This is excellent by the way Smiley I will start integrating with my code.   Right now my stuff is all C#, so I will have to compile your code into a lib and do marshaling -  though at first glance of your API I don't think marshaling will have too bad of a perf impact.   I'll try with c=a+b to test the runtime and validate the results match yours.

Do you know how your de Morgan approach compares to this Quine McCluskey one I keep reading about? http://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm
jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 27, 2013, 02:16:16 PM
Last edit: August 27, 2013, 05:36:29 PM by jlcooke
 #56

I'll PM you a link to my C code for logic reduction in a few mins.  You can compile C right?  And know how to code?  It should be quite a bit faster than Mathmatica or Maple.

It uses de Morgan's rules and some basic reduction rules to optimize the circuit.

This is excellent by the way Smiley I will start integrating with my code.   Right now my stuff is all C#, so I will have to compile your code into a lib and do marshaling -  though at first glance of your API I don't think marshaling will have too bad of a perf impact.   I'll try with c=a+b to test the runtime and validate the results match yours.

Do you know how your de Morgan approach compares to this Quine McCluskey one I keep reading about? http://en.wikipedia.org/wiki/Quine%E2%80%93McCluskey_algorithm


Quine-McCluskey is the analytical twin of Karnaugh Maps (http://en.wikipedia.org/wiki/Karnaugh_map)

Basically, QMC/KMap takes I/O and creates formulas.  My lib lets you build the formulas directly and simplify them.  You *could* create the formulas based on I/Os I guess.  But since we know the SHA-256 formulas you could skip that step.

In fact you really want to, otherwise you will never have the correct formula unless you iterate all possible inputs for the SHA256 compression function (2^256 + 2^(W-array inputs) ).

Even 2^32 addition is going to be massive.  The nth (0..31) output bit of 32bit-add will be 2^(n+1) terms long.  You're only hope is multiple 32bit-adds start simplifying.

Download logred_v3.zip - there was a bug in performing (A+BCD+EFG+...)(HI+!JK+...)(...)... operations.

Cheers

Edit:
Playing a bit here, turns out the A=A+AB and A+B=A+!AB reductions have no effect in 32bit add.  Guess that's what makes it a nice 32bit bijection.

Download logred_v4.zip which will be faster for your application.

You can disable those reduction algorithms by commenting out logred_set_reduce_2_A_AB() and logred_set_reduce_4_A_nAB_n() inside logred_set_reduce()
botnet (OP)
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 29, 2013, 04:10:15 PM
 #57

Playing a bit here, turns out the A=A+AB and A+B=A+!AB reductions have no effect in 32bit add.  Guess that's what makes it a nice 32bit bijection.

Download logred_v4.zip which will be faster for your application.

You can disable those reduction algorithms by commenting out logred_set_reduce_2_A_AB() and logred_set_reduce_4_A_nAB_n() inside logred_set_reduce()

Been trying this out, so many questions...

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0?   for me it outputs a00

2) Similarly I would expect (a + !a) = 1
   logred_set_scanf(a, "a00");
   logred_set_not(b, a);
   logred_set_xor(c, a, b);
= crash

3) Haven't debugged enough, but part 2 could be related to this?:
   logred_set_scanf(a, "!a00!b00"); = works
    logred_set_scanf(a, "!a00"); = crash

4) Actually, is there support for numeric literals (0, 1)?  c = a+b is purely symbolic, but during SHA256 you add in those K-Constants which can sometimes help reduce things further.

4) x64 only?  This is giving me heap problems:
#define VAR_T           size_t
#define VAR_T_SIZE_BITS ( 8 * sizeof(VAR_T) )
#define VAR_A_BITS      256
#define VAR_A_LEN       4L   

(worked around setting VAR_A_BITS = 128... maybe this is the source of all my problems? Smiley)
jlcooke
Newbie
*
Offline Offline

Activity: 46
Merit: 0



View Profile
August 29, 2013, 04:27:22 PM
 #58

Note: working on Quine-McClusky now.  Smiley  Always wanted a high speed lib for logic reductions ...

Been trying this out, so many questions...

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0?   for me it outputs a00

Nope.  A or A == A.   A or !A == 1  and A!A = 0

2) Similarly I would expect (a + !a) = 1
   logred_set_scanf(a, "a00");
   logred_set_not(b, a);
   logred_set_xor(c, a, b);
= crash

a = "a00"
b = "!a00"
c = "a00" ^ "!a00"
  = a00!(!a00) + !(a00)!a00
  = a00 + !a00
  = 1

Yup, it crashes.  Comments out "logred_set_reduce_4_A_nAB_n(src)" in "logred_set_reduce()" to fix this.

It doesn't resolve down to "1" yet.  soon.

3) Haven't debugged enough, but part 2 could be related to this?:
   logred_set_scanf(a, "!a00!b00"); = works
    logred_set_scanf(a, "!a00"); = crash

Version I'm working on doesn't crash on this. 

4) Actually, is there support for numeric literals (0, 1)?  c = a+b is purely symbolic, but during SHA256 you add in those K-Constants which can sometimes help reduce things further.

Humm, good point.  Will not be ready for this release. Soon.

4) x64 only?  This is giving me heap problems:
#define VAR_T           size_t
#define VAR_T_SIZE_BITS ( 8 * sizeof(VAR_T) )
#define VAR_A_BITS      256
#define VAR_A_LEN       4L   

(worked around setting VAR_A_BITS = 128... maybe this is the source of all my problems? Smiley)

The code should be x32/x64 agnostic.  logred_v6.zip is a snapshot of what I have so far.  Ignore the logred_set_reduce_5_Quine_McCluskey() functions.

The Makefile now lets you define a bunch of flags like VAR_A_BITS.  But it should work for values up to 960.  It may work beyond that with some tweeks to the variable string lenghts ("a960" and "a1024" are different lengths).

Cheers.
botnet (OP)
Newbie
*
Offline Offline

Activity: 35
Merit: 0


View Profile
August 29, 2013, 09:40:11 PM
 #59

1)    logred_set_scanf(a, "a00 + a00");
   logred_set_reduce(a);
   logred_set_printf(a);

Shouldn't that always produce 0?   for me it outputs a00
Nope.  A or A == A.   A or !A == 1  and A!A = 0

Ahh, there was misunderstanding about the + operator.
I've been thinking about arithmetic operations mod 2, where "(a+b) mod 2" is equivalent to XOR
0+0 = 0, 0 mod 2 = 0
0+1 = 1, 1 mod 2 = 1
1+0 = 1, 1 mod 2 = 1
1+1 = 2, 2 mod 2 = 0
= XOR
Nancarrow
Hero Member
*****
Offline Offline

Activity: 492
Merit: 500


View Profile
August 30, 2013, 12:22:46 PM
 #60

Best of luck with this, botnet, but I feel it's a wild goose chase. I first heard about bitcoins in autumn 2011 and immediately set about trying to 'simplify' sha256(sha256(x)) in the same way as you, by trying to find boolean expressions for the first bits of the output in terms of the 640 bits of input. As you've found out, it's the seemingly simple addition of two 32-bit numbers that's the big problem. It simply doesn't lend itself to being simplfied, as it's recursive. As you crawl along the bits from right to left the expression balloons exponentially. All the other operations in SHA-256, the logical connectives and the rotations/shifts are trivial by comparison.

You might want to look into the structure of the block header to see how the 640 bits change at different rates. Fr'instance I believe the version number (first 32 bits of the header, but in stupid-endian) has been 1 for most of bitcoin's history but may be 2 now or soon. That's 30 zero bits right there. Then you have the first (or, goddammit, probably last) 32 bits of the previous hash guaranteed to be zero. The difficulty changes every two weeks, the merkle hash every block, and so on.

But I still think you haven't quite grasped the magnitude of the P/NP problem yet! Then again, if the impossible ever gets done, it'll be as always, by someone who didn't know it was impossible.

If I've said anything amusing and/or informative and you're feeling generous:
1GNJq39NYtf7cn2QFZZuP5vmC1mTs63rEW
Pages: « 1 2 [3] 4 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!