Bitcoin Forum

Alternate cryptocurrencies => Altcoin Discussion => Topic started by: Vorksholk on July 17, 2013, 12:21:06 AM



Title: [Removed Bounty] PrimeCoin CUDA miner
Post by: Vorksholk on July 17, 2013, 12:21:06 AM
As a result of extensive development of OpenCL miners for Primecoin with impressive performance, this bounty is now removed. GPU mining of Primecoin is going to soon be saturated. If you were currently developing something for this bounty (unlikely as all development seems to be Opencl and it has been months) PM me.


Note: mtrlt is making a OpenCL-miner. That doesn't qualify for this bounty.

New rule: CUDA miner has to somewhat-significantly outperform mtrlt miner on equivalently-priced hardware (aka CUDA miner that gets 800 coins/day on a $400 NVidia GPU on testnet will get this bounty if mtrlt's miner will get 500 coins/day on a $400 AMD GPU).

The first person who makes a CUDA miner that outperforms a CPU of equivalent value by more than 2x ($330 NVidia GPU, like a 670, should give more than 2x the blocks of an i7-3770 on testnet over the same period of time) will get 6 Bitcoins! If anyone else wants to add to the bounty, just post a message with your 'pledge'.  

Requirements: compiled windows binary, source code provided.
Optional: compiled linux binaries.

Current bounty chip-ins:
Vorksholk: 6BTC

Alternately, if someone wants to dig through the primecoin code and give me a nice concise summary of exactly how the mining process works (purpose that the sieve serves, relation of block hash to potential primes, process of getting data from primecoind, process of submitting data to primecoind) I will either write my own, or attempt to hire someone with the USD equivalent.

If you provide a great explanation of all of the following (and are the first to do so), I will give you one BTC:
-What purpose the sieve serves
-What relation the block hash has to the potential block-solving primes
-Process of obtaining all mining data from primecoind
-process of submitting prime chain to mining data

Also, at heart, I love NVidia, I'm a NVidia fanboy. :)


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Eli0t on July 17, 2013, 12:23:19 AM
-What purpose the sieve serves

In mathematics, the sieve of Eratosthenes, one of a number of prime number sieves, is a simple, ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e. not prime) the multiples of each prime, starting with the multiples of 2.

The multiples of a given prime are generated starting from that prime, as a sequence of numbers with the same difference, equal to that prime, between consecutive numbers. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.

The sieve of Eratosthenes is one of the most efficient ways to find all of the smaller primes (below 10 million or so)

awesome bounty btw i hope someone gets it


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 12:27:21 AM
-What purpose the sieve serves

In mathematics, the sieve of Eratosthenes, one of a number of prime number sieves, is a simple, ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e. not prime) the multiples of each prime, starting with the multiples of 2.

The multiples of a given prime are generated starting from that prime, as a sequence of numbers with the same difference, equal to that prime, between consecutive numbers. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.

The sieve of Eratosthenes is one of the most efficient ways to find all of the smaller primes (below 10 million or so)

awesome bounty btw i hope someone gets it

Yeah, during my short digging I did find the sieve of Eratosthenes, though your explanation was better than what I found. My main question with the sieve is, since we are finding very huge primes, what purpose this 'table of primes' serves the finding of large prime chains?


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: logicsol on July 17, 2013, 12:37:14 AM
Why not Opencl?


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Palmdetroit on July 17, 2013, 12:39:19 AM
Why not Opencl?

Imagine if we finally had a Nvidia coin, or at least one worthwhile to mine on nvidia..... would bring tons new people into the cyrptoworld.. good for all IMO



Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: logicsol on July 17, 2013, 12:42:05 AM
Very true, I had figured something like that was the reasoning. It just struck me that most in this field(mining clients) would be more familiar with Opencl than CUDA.


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Eli0t on July 17, 2013, 12:49:16 AM
-What purpose the sieve serves

In mathematics, the sieve of Eratosthenes, one of a number of prime number sieves, is a simple, ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e. not prime) the multiples of each prime, starting with the multiples of 2.

The multiples of a given prime are generated starting from that prime, as a sequence of numbers with the same difference, equal to that prime, between consecutive numbers. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.

The sieve of Eratosthenes is one of the most efficient ways to find all of the smaller primes (below 10 million or so)

awesome bounty btw i hope someone gets it

Yeah, during my short digging I did find the sieve of Eratosthenes, though your explanation was better than what I found. My main question with the sieve is, since we are finding very huge primes, what purpose this 'table of primes' serves the finding of large prime chains?
the sieve works by getting increasing more likely 'primes' by altering a value to be more likely prime, and the table is used as seeds for that 'winnowing' process.
You want to have a big table because the more you have, the more likely one of them will work for a given block.


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: FiiNALiZE on July 17, 2013, 12:51:22 AM
Not sure if this answers your 2nd question, but:

"Block hash, the value that is embedded in the child block, is derived from hashing the
header together with the proof-of-work certificate. This not only prevents the proof-of-work certificate from being tampered with, but also defeats attempt at generating a single
proof-of-work certificate usable on multiple blocks on the block chain, since the block
header hash of a descendant block then depends on the certificate itself. Note that, if an
attacker generates a different proof-of-work certificate for an existing block, the block
would then have a different block hash even though the block content remains the same
other than the certificate, and would be accepted to the block chain as a sibling block to
the existing block."

Taken from the design paper.




Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: razorfishsl on July 17, 2013, 01:01:45 AM
Or you could just go onto the research site and pull down the data for and including the first 17 million  digit primes....

Then use that as a 'feeder' mechanism....


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: AgentME on July 17, 2013, 01:05:39 AM
I thought AMD cards were better for Bitcoin mining because they were better at integer operations, and Nvidia was mainly good at floating point. Primecoin mining doesn't involve many floating point operations as far as I know, so targeting Nvidia specifically could be a waste of time.


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: gateway on July 17, 2013, 01:11:11 AM
Prob everyone has already looked around and found these but just encase

http://www.macs-site.net/CudaPrimes.htm (http://www.macs-site.net/CudaPrimes.htm) CUDA Primes
https://github.com/MNorthwind/OpenCL-Prime-Numbers-Generator (https://github.com/MNorthwind/OpenCL-Prime-Numbers-Generator) OpenCL

Of course this is just some basic stuff, prob lots of stuff needs to be done, mapping to the memory, dealing with how cores work
on the work, esp since blocks are completed so fast right now.. etc..



Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: rethaw on July 17, 2013, 01:15:32 AM
I'm confused, why wouldn't you write it as OpenCL? This provides at least some level of portability.

EDIT: That being said, it's a generous offer!


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 01:20:40 AM
Why not Opencl?

From what I understand, CUDA is more apt to this type of work. As well, I think variety is a cool thing, and getting both groups of major video card fanboys in on cryptocoins couldn't hurt. :P

While I would love OCL (I have 10 7950s and 3 7970s), I think NVidia would be a nice change.


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: rethaw on July 17, 2013, 01:22:53 AM

If you provide a great explanation of all of the following (and are the first to do so), I will give you one BTC:
-What purpose the sieve serves
-What relation the block hash has to the potential block-solving primes
-Process of obtaining all mining data from primecoind
-process of submitting prime chain to mining data

1) I think you got the purpose of the sieve.

2) None.

3) primecoind getblocktemplate

4) primecoind submitblock


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 01:32:19 AM
-What purpose the sieve serves

In mathematics, the sieve of Eratosthenes, one of a number of prime number sieves, is a simple, ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e. not prime) the multiples of each prime, starting with the multiples of 2.

The multiples of a given prime are generated starting from that prime, as a sequence of numbers with the same difference, equal to that prime, between consecutive numbers. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.

The sieve of Eratosthenes is one of the most efficient ways to find all of the smaller primes (below 10 million or so)

awesome bounty btw i hope someone gets it

Yeah, during my short digging I did find the sieve of Eratosthenes, though your explanation was better than what I found. My main question with the sieve is, since we are finding very huge primes, what purpose this 'table of primes' serves the finding of large prime chains?
the sieve works by getting increasing more likely 'primes' by altering a value to be more likely prime, and the table is used as seeds for that 'winnowing' process.
You want to have a big table because the more you have, the more likely one of them will work for a given block.

This is really interesting! Mathematically, how does the sieve table work? Is one of the primes multiplied by a certain value and then changed in order to obtain a true prime? Also, what address do you want your 0.25 BTC to? :)


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: rabit on July 17, 2013, 01:33:04 AM
-What purpose the sieve serves

In mathematics, the sieve of Eratosthenes, one of a number of prime number sieves, is a simple, ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e. not prime) the multiples of each prime, starting with the multiples of 2.

The multiples of a given prime are generated starting from that prime, as a sequence of numbers with the same difference, equal to that prime, between consecutive numbers. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.

The sieve of Eratosthenes is one of the most efficient ways to find all of the smaller primes (below 10 million or so)

awesome bounty btw i hope someone gets it

Yeah, during my short digging I did find the sieve of Eratosthenes, though your explanation was better than what I found. My main question with the sieve is, since we are finding very huge primes, what purpose this 'table of primes' serves the finding of large prime chains?
the sieve works by getting increasing more likely 'primes' by altering a value to be more likely prime, and the table is used as seeds for that 'winnowing' process.
You want to have a big table because the more you have, the more likely one of them will work for a given block.

Could you explain this a little bit deeper?
So the sieve produces the set P:={p prime | p <= sieve size} and i have the block hash h. Now i need to find an integer k such that h*k is the origin of a chain of length difficulty. So how is P explicitly used to find a candidate for such an integer k?


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 01:33:10 AM
I'm confused, why wouldn't you write it as OpenCL? This provides at least some level of portability.

EDIT: That being said, it's a generous offer!

Mainly because from what I understand CUDA is more fit to this purpose, as well as it serving as a nice way to get the NVidia players into cryptos as well. :D


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 01:39:21 AM

If you provide a great explanation of all of the following (and are the first to do so), I will give you one BTC:
-What purpose the sieve serves
-What relation the block hash has to the potential block-solving primes
-Process of obtaining all mining data from primecoind
-process of submitting prime chain to mining data

1) I think you got the purpose of the sieve.

2) None.

3) primecoind getblocktemplate

4) primecoind submitblock

Looks like you know what's up, so I'm gonna grill you for some more information, and if you can provide it you'll be the new owner of a shiny Bitcoin. :)

2.) I thought that the block hash was used as some factor of either start of chain +1 or start of chain -1 (or in a bi-twin chain, the average between the two first values?). Was this something else that served the purpose of not allowing the same prime chain to be used multiple times?

3.) "getblocktemplate" provides this, which parts are relevant, and how are they relevant, to mining?
{
    "version" : 2,
    "previousblockhash" : "7072f0272b3e38536e072c94e340b7c7824a71989f6e9f3112ce30d6be5ae525",
    "transactions" : [
    ],
    "coinbaseaux" : {
        "flags" : "062f503253482f"
    },
    "coinbasevalue" : 1249000000,
    "target" : "0000000000000000000000000000000000000000000000007118620000000000",
    "mintime" : 1374023934,
    "mutable" : [
        "time",
        "transactions",
        "prevblock"
    ],
    "noncerange" : "00000000ffffffff",
    "sigoplimit" : 20000,
    "sizelimit" : 1000000,
    "curtime" : 1374024884,
    "bits" : "08f11862",
    "height" : 55422
}


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Eli0t on July 17, 2013, 02:10:59 AM
-What purpose the sieve serves

In mathematics, the sieve of Eratosthenes, one of a number of prime number sieves, is a simple, ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e. not prime) the multiples of each prime, starting with the multiples of 2.

The multiples of a given prime are generated starting from that prime, as a sequence of numbers with the same difference, equal to that prime, between consecutive numbers. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.

The sieve of Eratosthenes is one of the most efficient ways to find all of the smaller primes (below 10 million or so)

awesome bounty btw i hope someone gets it

Yeah, during my short digging I did find the sieve of Eratosthenes, though your explanation was better than what I found. My main question with the sieve is, since we are finding very huge primes, what purpose this 'table of primes' serves the finding of large prime chains?
the sieve works by getting increasing more likely 'primes' by altering a value to be more likely prime, and the table is used as seeds for that 'winnowing' process.
You want to have a big table because the more you have, the more likely one of them will work for a given block.

This is really interesting! Mathematically, how does the sieve table work? Is one of the primes multiplied by a certain value and then changed in order to obtain a true prime? Also, what address do you want your 0.25 BTC to? :)
the prime chain is linked to the block header hash by requiring that its origin be divisible by the block header hash.
pk/r is used to measure the difficulty of the chain. r is the Fermat test remainder of the next number in chain pk
d = k + (pk- r)/pk  

k is the prime chain length. d = difficulty if pk passes probable primality tests then it should be considered as a chain of higher integral length.

my BTC addy is 13YGmE2CCAxWpAhqQSm5NVz9pU7Q4B22jm
thanks :)


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 03:00:50 AM
-What purpose the sieve serves

In mathematics, the sieve of Eratosthenes, one of a number of prime number sieves, is a simple, ancient algorithm for finding all prime numbers up to any given limit. It does so by iteratively marking as composite (i.e. not prime) the multiples of each prime, starting with the multiples of 2.

The multiples of a given prime are generated starting from that prime, as a sequence of numbers with the same difference, equal to that prime, between consecutive numbers. This is the sieve's key distinction from using trial division to sequentially test each candidate number for divisibility by each prime.

The sieve of Eratosthenes is one of the most efficient ways to find all of the smaller primes (below 10 million or so)

awesome bounty btw i hope someone gets it

Yeah, during my short digging I did find the sieve of Eratosthenes, though your explanation was better than what I found. My main question with the sieve is, since we are finding very huge primes, what purpose this 'table of primes' serves the finding of large prime chains?
the sieve works by getting increasing more likely 'primes' by altering a value to be more likely prime, and the table is used as seeds for that 'winnowing' process.
You want to have a big table because the more you have, the more likely one of them will work for a given block.

This is really interesting! Mathematically, how does the sieve table work? Is one of the primes multiplied by a certain value and then changed in order to obtain a true prime? Also, what address do you want your 0.25 BTC to? :)
the prime chain is linked to the block header hash by requiring that its origin be divisible by the block header hash.
pk/r is used to measure the difficulty of the chain. r is the Fermat test remainder of the next number in chain pk
d = k + (pk- r)/pk  

k is the prime chain length. d = difficulty if pk passes probable primality tests then it should be considered as a chain of higher integral length.

my BTC addy is 13YGmE2CCAxWpAhqQSm5NVz9pU7Q4B22jm
thanks :)

Stellar, sent your 0.25 BTC :) Cheers!!


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: rethaw on July 17, 2013, 03:43:21 AM
Look at CheckPrimeProofOfWork in prime.cpp.


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Koooooj on July 17, 2013, 06:05:30 AM
I think I have a pretty good understanding of the algorithm from a mathematics standpoint.  I don't know CUDA (or OpenCL, which I would suggest--it runs on both NVidia and AMD with reasonably comparable performance) and I don't know the getwork protocol, but I can give some insight into the mathematics, at least.

The first thing to understand is that if the block hash is H then H, 2H, 3H, 4H, etc are all valid origins, and that H-1, 2H-1, 4H-1, 8H-1, etc. makes a valid Cunningham chain (if all of those numbers are prime and the chain is long enough; this is for a Cunningham chain of the first kind.  For the second kind replace the "-" with "+".  Most of the math here will assume Cunningham chains of the First kind, but it is identicle for chains of the Second kind.  If H ± 1, 2H ± 1, ... are all prime (i.e. a chain of the first and second) then the chain is a bi-twin chain).  That means that when you are eliminating composite origins you are also eliminating numbers that could appear later in the chain.  This will become useful later.

Another thing to notice is that H, 2H, 3H, 4H, etc. are very unlikely to be prime.  If you look at the probability that a random number on the order of 2^256 is prime then it is about 1 in 177; that makes the probability that 8 randomly chosen numbers are all primes about 1 in 1 quintillion--impractically low.  That probability is greatly improved by using a "primorial" number--the product of many small primes.  If you take pRimorial= R = H * 2 * 3 * 5 * 7 * .... * 47 (the first 15 prime numbers; quick tests on my own code suggest that this is about ideal, although some tweaking and testing is in order) then you increase the size of the number you are testing, but you also make the numbers much more likely to be prime.  Thus if you look at R-1, 2R -1, etc then you know that each of those numbers is not divisible by 2, 3, 5, ... , 47 because they are 1 less than (or greater than, if you are looking for Cunningham chains of the Second kind) a multiple of each of those numbers.  This means that the smallest factor NR ± 1 can have is 53, where N is any integer.

From here you execute the Sieve.  The standard Sieve of Eratosthenes is very straightforward, but it is designed to be executed starting at 1 and then continuing over the small, consecutive integers, such as is shown on the GIF on the Wikipedia page.  We are not interested in consecutive numbers, and they are far from small, so a change must be adopted.  This change requires knowledge of modular arithmetic.  To execute the sieve we first take a small prime number (starting with 53 if our primorial ended with 47) and take R (mod 53).  If (and only if) R ≡ 0 (mod 53) then R is divisible by 53, and all future R will be divisible by 53.  In that case, no NR-1 will be divisible by 53 and none should be eliminated.

In every other case, R ≡ r (mod 53), where 0 < r < 53.  If you have found that r = 1 then that shows that R-1 ≡ 0 (mod 53) and is therefore not prime (and cannot be a member of a Cunningham Chain of the first kind).  If r = 52 then R+1 ≡ 0 (mod 53) and is therefore not prime.  If r is any other number then you have to search.  Since r is relatively prime to 53 (i.e. gcd(r,53) = 1) it can be shown that N * r (mod 53) will cycle through every number less than 53.  Thanks to modular arithmetic, N * r ≡ N * R (mod 53), so you can search for the N for which N1 * r ≡ 1 (mod 53) and for which N2 * r ≡ 52 (mod 53).  You can eliminate N1R - 1 as a candidate for chains of the first kind, and you can eliminate N2R + 1 as a candidate for chains of the second kind.

The power of finding this N is that you then know that if N * r ≡ 1 (mod 53) then (53 + N) * r ≡ 1 (mod 53) since N * r is cyclic with a period of 53.  Thus, you can quickly go through, just like in the classic form of the sieve of Eratosthenes, and eliminate every 53rd N.  You would then repeat this process for subsequent prime numbers (59, 61, 67, ...), until sieving against another prime number is not worth it--it takes too long and removes too few numbers.

The next thing you can do to eliminate candidates is to look for clusters (you can actually do this while sieving if you want, but I have separated it since it is a separate topic).  Since the goal is to find chains of (currently) 8 primes it makes no sense to waste time performing a long primality test on a chain of numbers that have passed the sieve that is 7 or fewer long.  Thus, you can look through the sieve for chains in the form N-1, 2N-1, 4N-1, ... 128N-1 where all of those numbers have no small factors (which would have been found in the sieve).  When you find these, they are passed to the formal (probable) primality tests.

The main test used in this step is the Fermat Primality Test, based on Fermat's Little Theorem.  It states that aP-1 ≡ 1 (mod P) for all prime P.  (It does not state that if that identity holds true then P is prime, but in practice prime numbers are far more likely than "Fermat liars").  For the reference client "a" is chosen to be 2, since 1 is trivially a Fermat Liar for every composite P and since a larger "a" would take more time for no benefit.  To perform the operation it is best to use "fast modular exponentiation," which is shown in pseudocode in the Modular Exponentiation Wikipedia page.

The second test used is the Euler Lagrange Lifchitz test (a brief reading of the code suggests that it is not actually used, but it is implemented.  You can skip this paragraph and not lose much).  It states that if P is prime and P ≡ 1 (mod 4) then 2P + 1 is prime if and only if 2P + 1 ≡ 0 (mod 2P + 1).  There are other forms as well:  if P ≡ 3 (mod 4) then 2P + 1 is prime if and only if 2P - 1 ≡ 0 (mod 2P +1).  There are forms for 2P-1 as well; if P ≡ 1 (mod 4) then you use 2P-1 - 1 ≡ 0 (mod 2P - 1); otherwise it's 2P-1 + 1 ≡ 0 (mod 2P -1).  These are not much, if any, faster than just using the Fermat primality test--they all rely on a single modular exponentiation, and in all cases the base is 2, and both the exponent and the modulus are on the order of P.  Lifchitz mentions a way to look for Cunningham chains of the second kind "in cluster" by taking 2ni-1-1-1 ≡ 0 (mod n * n1 * n2 * ... * ni) where ni = 2ni-1-1.  This test only looks for one kind of chain, though, and requires that n ≡ 1 (mod 4).  Also, it is a much more expensive modular exponentiation since the modulus is the product of 8 or more large numbers.

When a prime chain is found that is at least as long as floor(difficulty) (e.g. at least 8 when the difficulty is 8.9) it is necessary to calculate the fractional difficulty.  I haven't verified this in the source code, but in the design paper he takes r = (2P-1 (mod P) ) as the Fermat remainder, then defines the fractional length as (P - r) / P.  P, in all of this, is the number that would have been next in the chain.  I'm not sure which half of a bi-twin chain has to be used (note that a bi-twin chain can have an odd length if the two Cunningham chains it is made up of differ in length by 1).  If the integer chain length plus the fractional length is greater than the difficulty, it is a valid share.

I hope this has been easy enough to follow, while still being informative.  If anyone has questions I'll try to answer them.  Note that my understanding is based partially on looking at the code but also largely on my own insights; I've likely missed something major, but there is a slim chance that I've offered an improvement on what's already being done simply from coming at it from a different persepctive.  If you feel this satisfies part of your bounty for money, I can receive donations at 1FbWVR3LpGoWp5rc4SJc3cHo42ALrYhcxC.  Not really doing this for the bounty, though.  I just want to be as much help as possible towards making GPU mining a reality, to help neutralize the botnet and massive VPS farm threats. 


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: meta.p02 on July 17, 2013, 07:55:16 AM
What currently is the operation that takes up the most time? Is it the testing of primality, or the building of the sieve, or something else altogether?


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 01:34:19 PM
I think I have a pretty good understanding of the algorithm from a mathematics standpoint.  I don't know CUDA (or OpenCL, which I would suggest--it runs on both NVidia and AMD with reasonably comparable performance) and I don't know the getwork protocol, but I can give some insight into the mathematics, at least.

The first thing to understand is that if the block hash is H then H, 2H, 3H, 4H, etc are all valid origins, and that H-1, 2H-1, 4H-1, 8H-1, etc. makes a valid Cunningham chain (if all of those numbers are prime and the chain is long enough; this is for a Cunningham chain of the first kind.  For the second kind replace the "-" with "+".  Most of the math here will assume Cunningham chains of the First kind, but it is identicle for chains of the Second kind.  If H ± 1, 2H ± 1, ... are all prime (i.e. a chain of the first and second) then the chain is a bi-twin chain).  That means that when you are eliminating composite origins you are also eliminating numbers that could appear later in the chain.  This will become useful later.

Another thing to notice is that H, 2H, 3H, 4H, etc. are very unlikely to be prime.  If you look at the probability that a random number on the order of 2^256 is prime then it is about 1 in 177; that makes the probability that 8 randomly chosen numbers are all primes about 1 in 1 quintillion--impractically low.  That probability is greatly improved by using a "primorial" number--the product of many small primes.  If you take pRimorial= R = H * 2 * 3 * 5 * 7 * .... * 47 (the first 15 prime numbers; quick tests on my own code suggest that this is about ideal, although some tweaking and testing is in order) then you increase the size of the number you are testing, but you also make the numbers much more likely to be prime.  Thus if you look at R-1, 2R -1, etc then you know that each of those numbers is not divisible by 2, 3, 5, ... , 47 because they are 1 less than (or greater than, if you are looking for Cunningham chains of the Second kind) a multiple of each of those numbers.  This means that the smallest factor NR ± 1 can have is 53, where N is any integer.

From here you execute the Sieve.  The standard Sieve of Eratosthenes is very straightforward, but it is designed to be executed starting at 1 and then continuing over the small, consecutive integers, such as is shown on the GIF on the Wikipedia page.  We are not interested in consecutive numbers, and they are far from small, so a change must be adopted.  This change requires knowledge of modular arithmetic.  To execute the sieve we first take a small prime number (starting with 53 if our primorial ended with 47) and take R (mod 53).  If (and only if) R ≡ 0 (mod 53) then R is divisible by 53, and all future R will be divisible by 53.  In that case, no NR-1 will be divisible by 53 and none should be eliminated.

In every other case, R ≡ r (mod 53), where 0 < r < 53.  If you have found that r = 1 then that shows that R-1 ≡ 0 (mod 53) and is therefore not prime (and cannot be a member of a Cunningham Chain of the first kind).  If r = 52 then R+1 ≡ 0 (mod 53) and is therefore not prime.  If r is any other number then you have to search.  Since r is relatively prime to 53 (i.e. gcd(r,53) = 1) it can be shown that N * r (mod 53) will cycle through every number less than 53.  Thanks to modular arithmetic, N * r ≡ N * R (mod 53), so you can search for the N for which N1 * r ≡ 1 (mod 53) and for which N2 * r ≡ 52 (mod 53).  You can eliminate N1R - 1 as a candidate for chains of the first kind, and you can eliminate N2R + 1 as a candidate for chains of the second kind.

The power of finding this N is that you then know that if N * r ≡ 1 (mod 53) then (53 + N) * r ≡ 1 (mod 53) since N * r is cyclic with a period of 53.  Thus, you can quickly go through, just like in the classic form of the sieve of Eratosthenes, and eliminate every 53rd N.  You would then repeat this process for subsequent prime numbers (59, 61, 67, ...), until sieving against another prime number is not worth it--it takes too long and removes too few numbers.

The next thing you can do to eliminate candidates is to look for clusters (you can actually do this while sieving if you want, but I have separated it since it is a separate topic).  Since the goal is to find chains of (currently) 8 primes it makes no sense to waste time performing a long primality test on a chain of numbers that have passed the sieve that is 7 or fewer long.  Thus, you can look through the sieve for chains in the form N-1, 2N-1, 4N-1, ... 128N-1 where all of those numbers have no small factors (which would have been found in the sieve).  When you find these, they are passed to the formal (probable) primality tests.

The main test used in this step is the Fermat Primality Test, based on Fermat's Little Theorem.  It states that aP-1 ≡ 1 (mod P) for all prime P.  (It does not state that if that identity holds true then P is prime, but in practice prime numbers are far more likely than "Fermat liars").  For the reference client "a" is chosen to be 2, since 1 is trivially a Fermat Liar for every composite P and since a larger "a" would take more time for no benefit.  To perform the operation it is best to use "fast modular exponentiation," which is shown in pseudocode in the Modular Exponentiation Wikipedia page.

The second test used is the Euler Lagrange Lifchitz test (a brief reading of the code suggests that it is not actually used, but it is implemented.  You can skip this paragraph and not lose much).  It states that if P is prime and P ≡ 1 (mod 4) then 2P + 1 is prime if and only if 2P + 1 ≡ 0 (mod 2P + 1).  There are other forms as well:  if P ≡ 3 (mod 4) then 2P + 1 is prime if and only if 2P - 1 ≡ 0 (mod 2P +1).  There are forms for 2P-1 as well; if P ≡ 1 (mod 4) then you use 2P-1 - 1 ≡ 0 (mod 2P - 1); otherwise it's 2P-1 + 1 ≡ 0 (mod 2P -1).  These are not much, if any, faster than just using the Fermat primality test--they all rely on a single modular exponentiation, and in all cases the base is 2, and both the exponent and the modulus are on the order of P.  Lifchitz mentions a way to look for Cunningham chains of the second kind "in cluster" by taking 2ni-1-1-1 ≡ 0 (mod n * n1 * n2 * ... * ni) where ni = 2ni-1-1.  This test only looks for one kind of chain, though, and requires that n ≡ 1 (mod 4).  Also, it is a much more expensive modular exponentiation since the modulus is the product of 8 or more large numbers.

When a prime chain is found that is at least as long as floor(difficulty) (e.g. at least 8 when the difficulty is 8.9) it is necessary to calculate the fractional difficulty.  I haven't verified this in the source code, but in the design paper he takes r = (2P-1 (mod P) ) as the Fermat remainder, then defines the fractional length as (P - r) / P.  P, in all of this, is the number that would have been next in the chain.  I'm not sure which half of a bi-twin chain has to be used (note that a bi-twin chain can have an odd length if the two Cunningham chains it is made up of differ in length by 1).  If the integer chain length plus the fractional length is greater than the difficulty, it is a valid share.

I hope this has been easy enough to follow, while still being informative.  If anyone has questions I'll try to answer them.  Note that my understanding is based partially on looking at the code but also largely on my own insights; I've likely missed something major, but there is a slim chance that I've offered an improvement on what's already being done simply from coming at it from a different persepctive.  If you feel this satisfies part of your bounty for money, I can receive donations at 1FbWVR3LpGoWp5rc4SJc3cHo42ALrYhcxC.  Not really doing this for the bounty, though.  I just want to be as much help as possible towards making GPU mining a reality, to help neutralize the botnet and massive VPS farm threats.  

Awesome explanation, sending 0.25BTC over. I think I have a good enough explanation of the math to go forward, however (and this is a really dumb question), the block that has to be a factor of the origin of the prime, we just convert the hex of the previous block (so say we are trying to mine block 54320, we would convert the hex of block 54319 to decimal, correct?)

Oh, and another dumb question... What does '≡' mean? :)


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: eCoinomist on July 17, 2013, 02:02:53 PM
I vote for OpenCL, beacause I don't have nor going to buy NVidia, will put another 5BTC to bounty if it's designed for ATI Raedon GPUs.


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 17, 2013, 02:05:29 PM
I vote for OpenCL, beacause I don't have nor going to buy NVidia, will put another 5BTC to bounty if it's designed for ATI Raedon GPUs.

For now, this thread is only for a CUDA-based miner, however I recommend if you would like to see an OCL one that you put that bounty in its own thread, I might contribute a Bitcoin to that one as well. :)


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: meta.p02 on July 17, 2013, 02:37:34 PM
... however (and this is a really dumb question), the block that has to be a factor of the origin of the prime, we just convert the hex of the previous block (so say we are trying to mine block 54320, we would convert the hex of block 54319 to decimal, correct?)
The nice thing about using bignums is that you don't have to convert them to decimal - since the internal representation of bignums is in a base of a power of 2 (usually the "digit" size of bignums is 1 or 2 bytes), and the hash is already in that format, we don't have to convert it to decimal. Unless you want a human-readable representation, the number is stored as raw bytes.

e.g. If the prevhash is 0x000...0053, it is internally represented as {0x00, 0x00, ..., 0x53} (or in the opposite order). We can then use this to construct a bignum without converting it into its decimal representation 83.

Quote
Oh, and another dumb question... What does '≡' mean? :)

'≡' means "congruent". In this case it refers to modular congruence - if x ≡ y (mod z), it means that they both have the same remainder mod z. E.g. 3^4 ≡ 1 (mod 5) because 3^4=81 and 1 have the same remainder (1) mod 5.


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: gateway on July 17, 2013, 05:21:39 PM
Btw, ill also chip in some btc if we can get this implemented into BFGMiner... https://github.com/luke-jr/bfgminer/tree/prime


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Koooooj on July 17, 2013, 06:07:49 PM
(long quote)

Awesome explanation, sending 0.25BTC over. I think I have a good enough explanation of the math to go forward, however (and this is a really dumb question), the block that has to be a factor of the origin of the prime, we just convert the hex of the previous block (so say we are trying to mine block 54320, we would convert the hex of block 54319 to decimal, correct?)

Oh, and another dumb question... What does '≡' mean? :)

Thanks for the BTC!  The explanations on ≡ are correct.  It's just a fancy equals sign for modular arithmetic.

As for the divisibility requirement, it is not just the block depth (not sure if that was what you were asking).  If it were then it would be trivial to start working on the prime chain for the 75,000th block far ahead of time and everyone would be looking for the exact same primes at the same time, meaning the fastest computer would always win.  It is also not just the hex of the block, I don't think, as that would open it up to someone trying to manipulate the block to have the correct hex to allow a precomputed chain of primes to be accepted.  As I understand it you essentially perform a single Bitcoin-style proof of work hash on the block you're looking at and then instead of checking to see if that number is less than a difficulty target as you would do in Bitcoin you search for numbers divisible by it.  I'm not sure the exact order that everything has to go into this hash, but I know that it includes the previous block's hash, the time, the Merkle Root of the transactions, an arbitrary nonce, and likely other data.

Looking at the data associated with a few blocks it would appear that the specific hash that is used is not the current block's "hash" that is used to identify it; I think that hash includes data about the prime chain, which means it can't be used as a requirement for what the primes are.  Looking at block 54321, for example, the prime origin is 4981759032498050152560261402218509159687271713794968969555671831618386493918345 8194973000.  The primorial used here appears to be 2*3*5*7*11*13*17*19*23 (since it is not divisible by the next prime number, 29).  That means the hash was most likely 2233042693160946897388635236176982778377126850264183238825907717991339881869979 00, or 7887e497dbc303fd77e86f8c2ce7c73d96d039643ab695c311987fc2a16f6bfc48 in hex.  If one were to take the (most likely SHA256) hash of parts of the block in a prescribed order then they should get that number.  however, troublingly, that number is 67 hex digits long, which is 268 bits; it appears that I've either made an arithmetic error or that I've misunderstood the procedure. 

As an aside, not the security that this divisibility requirement brings:  the hash value above was approximated by Wolfram Alpha as about 2.2 * the number of atoms in the visible universe.  It is effectively impossible to work on a block out of sequence by guessing a valid origin, as you would be guessing until the heat death of the universe. 


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: iCEBREAKER on July 17, 2013, 07:20:27 PM
Coding directly in low-level CUDA or OpenCL will get the best performance, but perhaps a more quick-and-dirty approach is possible via middleware and code morphing.   ???

There are a number of CUDA/OpenCL translators and abstraction layers available, allowing development at a higher level.   8)


Quote
ArrayFire is a fast and comprehensive, high-level GPU matrix library.

    It contains the fastest implementations for hundreds of routines in areas of matrix arithmetic, signal processing, linear algebra, statistics, and image processing, all built upon thousands of lines of highly-tuned device code optimized for any CUDA-enabled GPU.

    ArrayFire operates on vectors, matrices, volumes, N-dimensional arrays, and supports single- and double-precision floating point values, complex numbers, booleans, 32-bit signed and unsigned integers.

    A simple array notation allows a few lines of ArrayFire code to accomplish what would have taken 10-100X lines in raw CUDA. The array object is also easier to use than template programming and is more flexible and faster than directive-based approaches.

    ArrayFire can be used as a stand-alone C++ application, or easily be integrated with existing CUDA code.

    ArrayFire performs run-time analysis of your code to increase arithmetic intensity and memory throughput, while avoiding unnecessary temporary allocations (via custom JIT compiler). It can also execute loop iterations in parallel with gfor, and supports easy multi-GPU scaling.

http://www.accelereyes.com/arrayfire/c/

Quote
http://www.gpusystems.com/images/librachart.png

Developing and designing high performance software for CPUs and GPUs using Libra SDK is robust and easy. You start with a well-known and simple API following standard interfaces in common programming environments resulting in less code and massive performance to your program. Hundreds of high performance functions and operators focused on standard math, dense and sparse matrix/vector operations also including high performance random number generator, compares, conditional ops, fft, equation solvers and lots more.

http://www.gpusystems.com/libra.aspx

Quote
Mint is a domain-specific programming model and translator that generates highly optimized CUDA C from annotated C source. Using just 5 different Mint pragmas, non expert CUDA programmer can benefit from GPU technology without having to be an expert.  Mint includes an optimizer that targets 3D stencil methods. The translator generates both host and device code and handles the memory management details including host-device and shared memory optimizations. Mint parallelizes loop-nests in appropriately annotated C source, performing domain-specific optimizations important in three-dimensional problems.

https://sites.google.com/site/mintmodel/home/c-to-cuda-translation

Quote
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software.

http://thrust.github.io/07-02-2013/Thrust-v1.7.0-release.html

Quote
PARALUTION is a library which enables you to perform various sparse iterative solvers and preconditioners on multi/many-core CPU and GPU devices. Based on C++, it provides generic and flexible design which allows seamless integration with other scientific software packages.

http://www.paralution.com/

Quote
The MGPU library strives to simplify the implementation of high performance applications and algorithms on multi-GPU systems. Its main goal is both to abstract platform dependent functions and vendor specific APIs, as well as simplifying communication between different compute elements.

http://sschaetz.github.io/mgpu/

Quote
Automated Tool to Generate Parallel CUDA code from a Serial C Code

http://www.academia.edu/1970761/Automated_Tool_to_Generate_Parallel_CUDA_Code_from_a_Serial_C_Code

Quote
Though GPUs are widely used in Supercomputers today, they are not code transparent because one has to sit and code the algorithms in CUDA C to run them on GPU. So if we can have some middleware that converts the C programs to CUDA, the end user gets transparency. I tried to develop a prototype compiler using ANTLR in visual studio that converts the C programs in CUDA C language.

https://code.google.com/p/c2cudatranslator/





Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: teknohog on July 21, 2013, 10:55:16 PM
Coding directly in low-level Verilog or VHDL will get the best performance, but perhaps a more quick-and-dirty approach is possible via middleware and code morphing. 

Fixed that for you. FPGA FTW!


Title: Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!
Post by: Vorksholk on July 21, 2013, 11:10:55 PM
This bounty is coming close to being claimed--I can smell it! https://bitcointalk.org/index.php?topic=258982.0