Bitcoin Forum
June 27, 2024, 04:53:53 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2] 3 »  All
  Print  
Author Topic: [XPM] So, why is porting primecoin mining to the GPU so difficult?  (Read 18678 times)
primedigger (OP)
Member
**
Offline Offline

Activity: 75
Merit: 10


View Profile
August 22, 2013, 01:15:28 PM
 #21

So, to get my numbers straight (bottom line still is that porting Fermat's test is though):

A primecoin miner has two stages, filtering candidates and testing numbers for primality by using Fermat tests (with base 2). Latest default setting are: On testnet you spend 30% sieving and 70% testing candidates, on mainnet 70% sieving and 30% testing. On mainnet you will filter out 97% of all candidates you started with by sieving.

In order to produce a meaningful speedup, you will have to port both stages to the GPU or make one of them very small. Lets say you keep the original settings, port the sieve to GPU, interleave Fermat tests on the CPU. In this scenario your speedup is at most ~3x, because you still have to do that 30% candidate testing. Ok, you can sieve much faster now, so why not sieve with 10x more primes? You already filter 97%, and each new prime will not kill more than 1/p more candidates. The low-hanging fruits are the smaller primes and you have already used them. So, dimishing returns for the extra computations if these primes get bigger. At some point, even if you sieve 100x faster it won't be worth the extra computations and just checking if that number is prime with a Fermat test makes more sense. (90% of all Fermat tests come back positive, that is, the start of that possible chain is definitely composite. Only for the remaining 10% you check if you hit a chain of some type.). Bottom line: to produce meaningful speedups with gpu mining, you will have to port both the sieve and Fermat tests to the GPU. Another argument for this is that transfer between CPU and GPU is costly and can quickly become a bottleneck.
cryptocoinsnews
Sr. Member
****
Offline Offline

Activity: 299
Merit: 250


View Profile WWW
August 22, 2013, 01:36:33 PM
 #22

http://www.cryptocoinsnews.com/2013/08/22/xpm-so-why-is-porting-primecoin-mining-to-the-gpu-so-difficult/

/David Parker, Director of CCN
Sy
Legendary
*
Offline Offline

Activity: 1484
Merit: 1003


Bounty Detective


View Profile
August 22, 2013, 01:46:10 PM
 #23

So once we are at it - what about fpgas?

BOUNTY DETECTIVE


















Powered by,
ilostcoins
Sr. Member
****
Offline Offline

Activity: 274
Merit: 250



View Profile
August 22, 2013, 01:52:13 PM
 #24

Interesting insight, thank you.

Does Nvidia or AMD have a better chance of boosting efficiency in primecoin mining? Is CUDA more flexible in some ways and would that help?

LTC: LSyqwk4YbhBRtkrUy8NRdKXFoUcgVpu8Qb   NVC: 4HtynfYVyRYo6yM8BTAqyNYwqiucfoPqFW   TAG id: 4313
CMC: CAHrzqveVm9UxGm7PZtT4uj6su4suxKzZv   YAC: Y9m5S7M24sdkjdwxnA9GZpPez6k6EqUjUt
primedigger (OP)
Member
**
Offline Offline

Activity: 75
Merit: 10


View Profile
August 22, 2013, 02:36:53 PM
 #25

So once we are at it - what about fpgas?

I don't know much about FPGA's, but in theory, if you build a simple 1024-bit processor that can multiply,add,subtract and shift, then you could also do Fermat tests. 1024-bit would be enough for current prime records on the primecoin chain.

RSA also heavily relies on modular expansion and a basic FPGA implementation would start with 1024bit, so if you search for research papers on the topic you find papers like that one: http://irep.iium.edu.my/29540/1/FPGA_Implementation_of_RSA_Encryption.pdf

No idea how fast that would be though.

pyromaniac
Hero Member
*****
Offline Offline

Activity: 639
Merit: 500



View Profile
August 22, 2013, 03:28:42 PM
 #26

I am happy it is proving GPU resistant  Cheesy

Just like LTC is currently ASIC resistant :-)
Yep. Like it was GPU resistant just one and half year ago.  Roll Eyes

smolen
Hero Member
*****
Offline Offline

Activity: 524
Merit: 500


View Profile
August 22, 2013, 03:34:49 PM
 #27

So once we are at it - what about fpgas?

I don't know much about FPGA's, but in theory, if you build a simple 1024-bit processor that can multiply,add,subtract and shift, then you could also do Fermat tests. 1024-bit would be enough for current prime records on the primecoin chain.

RSA also heavily relies on modular expansion and a basic FPGA implementation would start with 1024bit, so if you search for research papers on the topic you find papers like that one: http://irep.iium.edu.my/29540/1/FPGA_Implementation_of_RSA_Encryption.pdf

No idea how fast that would be though.


http://www-mtl.mit.edu/researchgroups/icsystems/pubs/conferences/2001/goodman_isscc01_slides.pdf
Primitives performance points (@50 MHz):
Modexp (IF/DL)
 GFexp (DL)
 ECmult (ECDL)
8.2/5.2 ms @ 512b
 8.0ms @ 512b
 7.0ms @ 176b
32.1/17 ms @ 1024b
 31.7ms @ 1024b
 14.5ms @ 256b

Of course I gave you bad advice. Good one is way out of your price range.
Schleicher
Hero Member
*****
Offline Offline

Activity: 675
Merit: 513



View Profile
August 22, 2013, 07:49:02 PM
 #28

Bottom line: to produce meaningful speedups with gpu mining, you will have to port both the sieve and Fermat tests to the GPU. Another argument for this is that transfer between CPU and GPU is costly and can quickly become a bottleneck.
I imagine that it can work like this:
Run the sieve on the GPU until you have lots of candidates.
Then run a single fermat test on the GPU for all of these candidates.
All candidates passing the first test will be tested for chain length 2.
And so on.
I'm not sure how much memory we will need on the GPU if we store all the candidates there.

Koooooj
Member
**
Offline Offline

Activity: 75
Merit: 10



View Profile
August 22, 2013, 10:20:27 PM
 #29

Could you explain what primecoin algorithm actually does?

Why doesn't primecoin for example just read some known prime numbers from some database, compare them if there is any Cunningham chain candidate and ... offer result?

Does this algorithm have to find all these prime numbers by brute force first, and then find the chains..?

If you could describe this in layman's terms, I'd be very happy.

The algorithm is kind of complicated to explain in lay terms.  First, I'd like to point out that there is no database of primes of this size that would be even kind of useful.  The numbers used are on the order of 10100.  In that range about 1 in 180 numbers is prime.  That means that in order to store all of the prime numbers between 10100 and 10101 you would more memory than there could ever exist in the universe--more bits than there are atoms.  No such database can ever exist within the current understanding of physics.

So, the algorithm is somewhat of a brute force.  It has some elegance to it, though.  The chains have a very specific form, which gives a lot of opportunity for speedups.  For example, if your chain starts at N then 2N-1 is in the chain (for one of the chain types).  It would be inefficient to check the primality of 2N-1 if N is already shown to be composite--the chain is already proven to be invalid.  Many optimizations take this general form.

The algorithm itself is broken into two main parts: sieving and primality tests.  Sieving is a really fast way to show that lots of numbers are composite, but it is impractical to show that any number is prime by a sieve alone.  The most famous (and oldest) sieve is the Sieve of Eratosthenes, which has a lovely Gif and explanation on its wikipedia page.  Sieves work by eliminating numbers that have small factors;  it is easy and computationally cheap to carry out a single division to determine if a big number, N, is divisible by a small number, k, and about 1 in k divisions will show that N is divisible by k and that N is therefore composite.  This can be taken a step farther by noting that if N is divisible by k then N + a*k is also divisible by k, allowing many candidates to be eliminated by a single division and a few multiplications.  Thus, the mining algorithm first generates a set of candidate numbers based on a hash function and the data in the block, then runs that set of numbers through a sieve.

The next step is to actually test the remaining numbers after the sieve takes out most of them.  This is accomplished by the Fermat (Probable) Primality Test, which holds that if aP-1 ≡ 1 (mod P) for all a and prime P.  That is to say, you can take any number (a) and raise it to the power of your candidate (P) minus one, then divide the result by the candidate; if you get anything other than 1 as the remainder of that division then P is composite, otherwise P is probably prime.  For PrimeCoin a is always taken to be 2, and "probable" is taken to be good enough--the odds that 2 is a "Fermat Liar" for P are much less than the odds that P is prime.

Thus, the general flow is (Protocol to build a block and find the appropriate hash value to start from) -> (form a set of numbers that could be valid as part of prime chains for this block) -> (run the numbers through a sieve) -> (Eliminate numbers that passed the sieve but aren't parts of chains of length 9 or longer (current integer difficulty)) -> (for each chain of possibly prime numbers remaining, execute a Fermat Primality test until you either find a complete chain or find a composite number that proves that no chain long enough will be found there).
ilostcoins
Sr. Member
****
Offline Offline

Activity: 274
Merit: 250



View Profile
August 23, 2013, 12:02:48 AM
 #30


The algorithm is kind of complicated to explain in lay terms.  First, I'd like to point out that there is no database of primes of this size that would be even kind of useful.  The numbers used are on the order of 10100.  In that range about 1 in 180 numbers is prime.  That means that in order to store all of the prime numbers between 10100 and 10101 you would more memory than there could ever exist in the universe--more bits than there are atoms.  No such database can ever exist within the current understanding of physics.

So, the algorithm is somewhat of a brute force.  It has some elegance to it, though.  The chains have a very specific form, which gives a lot of opportunity for speedups.  For example, if your chain starts at N then 2N-1 is in the chain (for one of the chain types).  It would be inefficient to check the primality of 2N-1 if N is already shown to be composite--the chain is already proven to be invalid.  Many optimizations take this general form.

The algorithm itself is broken into two main parts: sieving and primality tests.  Sieving is a really fast way to show that lots of numbers are composite, but it is impractical to show that any number is prime by a sieve alone.  The most famous (and oldest) sieve is the Sieve of Eratosthenes, which has a lovely Gif and explanation on its wikipedia page.  Sieves work by eliminating numbers that have small factors;  it is easy and computationally cheap to carry out a single division to determine if a big number, N, is divisible by a small number, k, and about 1 in k divisions will show that N is divisible by k and that N is therefore composite.  This can be taken a step farther by noting that if N is divisible by k then N + a*k is also divisible by k, allowing many candidates to be eliminated by a single division and a few multiplications.  Thus, the mining algorithm first generates a set of candidate numbers based on a hash function and the data in the block, then runs that set of numbers through a sieve.

The next step is to actually test the remaining numbers after the sieve takes out most of them.  This is accomplished by the Fermat (Probable) Primality Test, which holds that if aP-1 ≡ 1 (mod P) for all a and prime P.  That is to say, you can take any number (a) and raise it to the power of your candidate (P) minus one, then divide the result by the candidate; if you get anything other than 1 as the remainder of that division then P is composite, otherwise P is probably prime.  For PrimeCoin a is always taken to be 2, and "probable" is taken to be good enough--the odds that 2 is a "Fermat Liar" for P are much less than the odds that P is prime.

Thus, the general flow is (Protocol to build a block and find the appropriate hash value to start from) -> (form a set of numbers that could be valid as part of prime chains for this block) -> (run the numbers through a sieve) -> (Eliminate numbers that passed the sieve but aren't parts of chains of length 9 or longer (current integer difficulty)) -> (for each chain of possibly prime numbers remaining, execute a Fermat Primality test until you either find a complete chain or find a composite number that proves that no chain long enough will be found there).

You explained very well. Thank you.

LTC: LSyqwk4YbhBRtkrUy8NRdKXFoUcgVpu8Qb   NVC: 4HtynfYVyRYo6yM8BTAqyNYwqiucfoPqFW   TAG id: 4313
CMC: CAHrzqveVm9UxGm7PZtT4uj6su4suxKzZv   YAC: Y9m5S7M24sdkjdwxnA9GZpPez6k6EqUjUt
primedigger (OP)
Member
**
Offline Offline

Activity: 75
Merit: 10


View Profile
August 23, 2013, 12:33:13 AM
 #31

Bottom line: to produce meaningful speedups with gpu mining, you will have to port both the sieve and Fermat tests to the GPU. Another argument for this is that transfer between CPU and GPU is costly and can quickly become a bottleneck.
I imagine that it can work like this:
Run the sieve on the GPU until you have lots of candidates.
Then run a single fermat test on the GPU for all of these candidates.
All candidates passing the first test will be tested for chain length 2.
And so on.
I'm not sure how much memory we will need on the GPU if we store all the candidates there.

Problem being, that Fermat tests are not a good fit for current GPUs, because these tests have a strongly data-dependent execution flow. Memory isn't a problem, even for many many candidates.
runlinux
Hero Member
*****
Offline Offline

Activity: 566
Merit: 500



View Profile WWW
August 23, 2013, 12:36:51 AM
 #32

Why couldn't each core / thread on the GPU run the whole algorithm from start to finish? yeah, each one would be slow, but as a whole, everything wins. It gets around the threading issue that I know plagues most of algorithm being ported. Then again, I could be totally out in left field.

tacotime
Legendary
*
Offline Offline

Activity: 1484
Merit: 1005



View Profile
August 23, 2013, 09:28:39 PM
 #33

There is already a fast implementation of the Lucas–Lehmer primality test on AMD cards, I'd guess one for the Fermat primality test would be a matter of time.

http://mersenneforum.org/showthread.php?t=12576&page=176

Mfakto implements sieving as well: http://www.mersennewiki.org/index.php/Mfakto

Code:
XMR: 44GBHzv6ZyQdJkjqZje6KLZ3xSyN1hBSFAnLP6EAqJtCRVzMzZmeXTC2AHKDS9aEDTRKmo6a6o9r9j86pYfhCWDkKjbtcns
Sharky444
Hero Member
*****
Offline Offline

Activity: 724
Merit: 500


View Profile
August 24, 2013, 12:44:53 AM
 #34

So, to get my numbers straight (bottom line still is that porting Fermat's test is though):

A primecoin miner has two stages, filtering candidates and testing numbers for primality by using Fermat tests (with base 2).

It's not sufficient to do the test with base 2 only. You get just about 99% prim probability with that.

Radix - just imagine
ilostcoins
Sr. Member
****
Offline Offline

Activity: 274
Merit: 250



View Profile
August 24, 2013, 12:49:37 AM
 #35

So, to get my numbers straight (bottom line still is that porting Fermat's test is though):

A primecoin miner has two stages, filtering candidates and testing numbers for primality by using Fermat tests (with base 2).

It's not sufficient to do the test with base 2 only. You get just about 99% prim probability with that.

I think that's what this coin does? I mean some chains accepted in this coin are composite?

LTC: LSyqwk4YbhBRtkrUy8NRdKXFoUcgVpu8Qb   NVC: 4HtynfYVyRYo6yM8BTAqyNYwqiucfoPqFW   TAG id: 4313
CMC: CAHrzqveVm9UxGm7PZtT4uj6su4suxKzZv   YAC: Y9m5S7M24sdkjdwxnA9GZpPez6k6EqUjUt
Sunny King
Legendary
*
Offline Offline

Activity: 1205
Merit: 1010



View Profile WWW
August 24, 2013, 01:16:07 AM
 #36

Base 2 Fermat test is strong enough. It's known that pseudoprimes are extremely rare compared to primes. The term probable prime is misleading to layman because they are actually almost certain primes.

In fact I would be very surprised if you can find a pseudoprime in the block chain.
bcp19
Hero Member
*****
Offline Offline

Activity: 532
Merit: 500



View Profile
August 24, 2013, 02:12:03 AM
 #37

There is already a fast implementation of the Lucas–Lehmer primality test on AMD cards, I'd guess one for the Fermat primality test would be a matter of time.

http://mersenneforum.org/showthread.php?t=12576&page=176

Mfakto implements sieving as well: http://www.mersennewiki.org/index.php/Mfakto
The Lucas-Lehmer test currently available on GPU is very simple compared to the fermat testing needed for Primecoin.  A mersenne number is stated as 2p-1.  In order for a mersenne number to be prime, P must also be prime.  As stated on http://Mersenne.org/various/math.php, The Lucas-Lehmer primality test is remarkably simple. It states that for P > 2, 2P-1 is prime if and only if Sp-2 is zero in this sequence: S0 = 4, SN = (SN-12 - 2) mod (2P-1).  Trust me when I say the explanation is a lot harder than the actual math involved.  To prove 27-1 is prime take 6 steps, which is 1 less that the exponent of 7.  The current record Mersenne Prime, 257,885,161-1 took 57,885,160 calculations to prove as prime, requiring 15 days on my overclocked i5-2500k (4.3GHz) compared to ~4 days on an nVidia GTX 480 or ~3.5 days on a GTX 580.

Trial factoring is even easier, as all factors of Mersenne numbers can be expressed as 2 * k * P - 1.  Since 2 and P are givens, you simply sieve out prime k's from whatever bit depth you are working at and take 2kp-1 mod (2^P-1).  The fast fournier transforms available make GPUs operates at speeds in excess of 100 times a CPU.

The more complicate P-1 test, which has 'somewhat' successfully been ported to GPU can barely get 2-3x the speed of a good i5/i7.

I do not suffer fools gladly... "Captain!  We're surrounded!"
I embrace my inner Kool-Aid.
Sharky444
Hero Member
*****
Offline Offline

Activity: 724
Merit: 500


View Profile
August 24, 2013, 02:38:00 AM
 #38

Base 2 Fermat test is strong enough. It's known that pseudoprimes are extremely rare compared to primes. The term probable prime is misleading to layman because they are actually almost certain primes.

In fact I would be very surprised if you can find a pseudoprime in the block chain.
Pseudoprimes are not that rare! Here is an example:

341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821, 3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911

Radix - just imagine
Sunny King
Legendary
*
Offline Offline

Activity: 1205
Merit: 1010



View Profile WWW
August 24, 2013, 02:44:48 AM
 #39

Pseudoprimes are not that rare! Here is an example:

341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821, 3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911

Asymptotically they are extremely rare compared to primes. Please refer to my design paper.

Primecoin is dealing with numbers over 256 bits long, so you cannot compare the probability of the really small numbers.
bcp19
Hero Member
*****
Offline Offline

Activity: 532
Merit: 500



View Profile
August 24, 2013, 02:47:47 AM
 #40

Base 2 Fermat test is strong enough. It's known that pseudoprimes are extremely rare compared to primes. The term probable prime is misleading to layman because they are actually almost certain primes.

In fact I would be very surprised if you can find a pseudoprime in the block chain.
Pseudoprimes are not that rare! Here is an example:

341, 561, 645, 1105, 1387, 1729, 1905, 2047, 2465, 2701, 2821, 3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957, 8321, 8481, 8911
22 Pseudoprimes by 8911 as compared to 1384 prime numbers.  Definitely very uncommon.  Plus, similar to Mersenne Primes, while there are a lot of psuedoprimes at the lower range, they become a lot more scarce:  22 mersenne primes had an exponent less than 10,000.  #48, found Jan this year, and has an exponent of 57,885,161

I do not suffer fools gladly... "Captain!  We're surrounded!"
I embrace my inner Kool-Aid.
Pages: « 1 [2] 3 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!