[Removed Bounty] PrimeCoin CUDA miner

rethaw

Sr. Member

Offline

Activity: 378
Merit: 255

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 03:43:21 AM

#21

Look at CheckPrimeProofOfWork in prime.cpp.

Cashing out or buying in? Use Coinbase

Koooooj

Member

Offline

Activity: 75
Merit: 10

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 06:05:30 AM

#22

I think I have a pretty good understanding of the algorithm from a mathematics standpoint. I don't know CUDA (or OpenCL, which I would suggest--it runs on both NVidia and AMD with reasonably comparable performance) and I don't know the getwork protocol, but I can give some insight into the mathematics, at least.

The first thing to understand is that if the block hash is H then H, 2H, 3H, 4H, etc are all valid origins, and that H-1, 2H-1, 4H-1, 8H-1, etc. makes a valid Cunningham chain (if all of those numbers are prime and the chain is long enough; this is for a Cunningham chain of the first kind. For the second kind replace the "-" with "+". Most of the math here will assume Cunningham chains of the First kind, but it is identicle for chains of the Second kind. If H ± 1, 2H ± 1, ... are all prime (i.e. a chain of the first and second) then the chain is a bi-twin chain). That means that when you are eliminating composite origins you are also eliminating numbers that could appear later in the chain. This will become useful later.

Another thing to notice is that H, 2H, 3H, 4H, etc. are very unlikely to be prime. If you look at the probability that a random number on the order of 2^256 is prime then it is about 1 in 177; that makes the probability that 8 randomly chosen numbers are all primes about 1 in 1 quintillion--impractically low. That probability is greatly improved by using a "primorial" number--the product of many small primes. If you take pRimorial= R = H * 2 * 3 * 5 * 7 * .... * 47 (the first 15 prime numbers; quick tests on my own code suggest that this is about ideal, although some tweaking and testing is in order) then you increase the size of the number you are testing, but you also make the numbers much more likely to be prime. Thus if you look at R-1, 2R -1, etc then you know that each of those numbers is not divisible by 2, 3, 5, ... , 47 because they are 1 less than (or greater than, if you are looking for Cunningham chains of the Second kind) a multiple of each of those numbers. This means that the smallest factor NR ± 1 can have is 53, where N is any integer.

From here you execute the Sieve. The standard Sieve of Eratosthenes is very straightforward, but it is designed to be executed starting at 1 and then continuing over the small, consecutive integers, such as is shown on the GIF on the Wikipedia page. We are not interested in consecutive numbers, and they are far from small, so a change must be adopted. This change requires knowledge of modular arithmetic. To execute the sieve we first take a small prime number (starting with 53 if our primorial ended with 47) and take R (mod 53). If (and only if) R ≡ 0 (mod 53) then R is divisible by 53, and all future R will be divisible by 53. In that case, no NR-1 will be divisible by 53 and none should be eliminated.

In every other case, R ≡ r (mod 53), where 0 < r < 53. If you have found that r = 1 then that shows that R-1 ≡ 0 (mod 53) and is therefore not prime (and cannot be a member of a Cunningham Chain of the first kind). If r = 52 then R+1 ≡ 0 (mod 53) and is therefore not prime. If r is any other number then you have to search. Since r is relatively prime to 53 (i.e. gcd(r,53) = 1) it can be shown that N * r (mod 53) will cycle through every number less than 53. Thanks to modular arithmetic, N * r ≡ N * R (mod 53), so you can search for the N for which N₁ * r ≡ 1 (mod 53) and for which N₂ * r ≡ 52 (mod 53). You can eliminate N₁R - 1 as a candidate for chains of the first kind, and you can eliminate N₂R + 1 as a candidate for chains of the second kind.

The power of finding this N is that you then know that if N * r ≡ 1 (mod 53) then (53 + N) * r ≡ 1 (mod 53) since N * r is cyclic with a period of 53. Thus, you can quickly go through, just like in the classic form of the sieve of Eratosthenes, and eliminate every 53rd N. You would then repeat this process for subsequent prime numbers (59, 61, 67, ...), until sieving against another prime number is not worth it--it takes too long and removes too few numbers.

The next thing you can do to eliminate candidates is to look for clusters (you can actually do this while sieving if you want, but I have separated it since it is a separate topic). Since the goal is to find chains of (currently) 8 primes it makes no sense to waste time performing a long primality test on a chain of numbers that have passed the sieve that is 7 or fewer long. Thus, you can look through the sieve for chains in the form N-1, 2N-1, 4N-1, ... 128N-1 where all of those numbers have no small factors (which would have been found in the sieve). When you find these, they are passed to the formal (probable) primality tests.

The main test used in this step is the Fermat Primality Test, based on Fermat's Little Theorem. It states that a^P-1 ≡ 1 (mod P) for all prime P. (It does not state that if that identity holds true then P is prime, but in practice prime numbers are far more likely than "Fermat liars"). For the reference client "a" is chosen to be 2, since 1 is trivially a Fermat Liar for every composite P and since a larger "a" would take more time for no benefit. To perform the operation it is best to use "fast modular exponentiation," which is shown in pseudocode in the Modular Exponentiation Wikipedia page.

The second test used is the Euler Lagrange Lifchitz test (a brief reading of the code suggests that it is not actually used, but it is implemented. You can skip this paragraph and not lose much). It states that if P is prime and P ≡ 1 (mod 4) then 2P + 1 is prime if and only if 2^P + 1 ≡ 0 (mod 2P + 1). There are other forms as well: if P ≡ 3 (mod 4) then 2P + 1 is prime if and only if 2^P - 1 ≡ 0 (mod 2P +1). There are forms for 2P-1 as well; if P ≡ 1 (mod 4) then you use 2^P-1 - 1 ≡ 0 (mod 2P - 1); otherwise it's 2^P-1 + 1 ≡ 0 (mod 2P -1). These are not much, if any, faster than just using the Fermat primality test--they all rely on a single modular exponentiation, and in all cases the base is 2, and both the exponent and the modulus are on the order of P. Lifchitz mentions a way to look for Cunningham chains of the second kind "in cluster" by taking 2^n_i-1-1-1 ≡ 0 (mod n * n₁ * n₂ * ... * n_i) where n_i = 2n_i-1-1. This test only looks for one kind of chain, though, and requires that n ≡ 1 (mod 4). Also, it is a much more expensive modular exponentiation since the modulus is the product of 8 or more large numbers.

When a prime chain is found that is at least as long as floor(difficulty) (e.g. at least 8 when the difficulty is 8.9) it is necessary to calculate the fractional difficulty. I haven't verified this in the source code, but in the design paper he takes r = (2^P-1 (mod P) ) as the Fermat remainder, then defines the fractional length as (P - r) / P. P, in all of this, is the number that would have been next in the chain. I'm not sure which half of a bi-twin chain has to be used (note that a bi-twin chain can have an odd length if the two Cunningham chains it is made up of differ in length by 1). If the integer chain length plus the fractional length is greater than the difficulty, it is a valid share.

I hope this has been easy enough to follow, while still being informative. If anyone has questions I'll try to answer them. Note that my understanding is based partially on looking at the code but also largely on my own insights; I've likely missed something major, but there is a slim chance that I've offered an improvement on what's already being done simply from coming at it from a different persepctive. If you feel this satisfies part of your bounty for money, I can receive donations at 1FbWVR3LpGoWp5rc4SJc3cHo42ALrYhcxC. Not really doing this for the bounty, though. I just want to be as much help as possible towards making GPU mining a reality, to help neutralize the botnet and massive VPS farm threats.

meta.p02

Full Member

Offline

Activity: 196
Merit: 100

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 07:55:16 AM

#23

What currently is the operation that takes up the most time? Is it the testing of primality, or the building of the sieve, or something else altogether?

Earn Devcoins by Writing | Trade on Cryptsy! Faucets: Watch ads, earn Bitcoin | Visit pages, get Bitcoin | Gamble with faucet earnings!
If you found my post informative/interesting, consider tipping at BTC: 15877457612137dj4MM57bGXRkPzU4wPRM or DVC: 1B2PAYVe9BQRrZKaWZxWtunutwrm6fVcF7.

Vorksholk (OP)

Legendary

Offline

Activity: 1713
Merit: 1029

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 01:34:19 PM

#24

Quote from: Koooooj on July 17, 2013, 06:05:30 AM

Awesome explanation, sending 0.25BTC over. I think I have a good enough explanation of the math to go forward, however (and this is a really dumb question), the block that has to be a factor of the origin of the prime, we just convert the hex of the previous block (so say we are trying to mine block 54320, we would convert the hex of block 54319 to decimal, correct?)

Oh, and another dumb question... What does '≡' mean?

VeriBlock: Securing The World's Blockchains Using Bitcoin
https://veriblock.org

eCoinomist

Member

Offline

Activity: 112
Merit: 10

Independent Analyst

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 02:02:53 PM

#25

I vote for OpenCL, beacause I don't have nor going to buy NVidia, will put another 5BTC to bounty if it's designed for ATI Raedon GPUs.

>>> Learn How to Build The Best GPU Miner in 1 Hour! <<<

Vorksholk (OP)

Legendary

Offline

Activity: 1713
Merit: 1029

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 02:05:29 PM

#26

Quote from: eCoinomist on July 17, 2013, 02:02:53 PM

I vote for OpenCL, beacause I don't have nor going to buy NVidia, will put another 5BTC to bounty if it's designed for ATI Raedon GPUs.

For now, this thread is only for a CUDA-based miner, however I recommend if you would like to see an OCL one that you put that bounty in its own thread, I might contribute a Bitcoin to that one as well.

VeriBlock: Securing The World's Blockchains Using Bitcoin
https://veriblock.org

meta.p02

Full Member

Offline

Activity: 196
Merit: 100

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 02:37:34 PM

#27

Quote from: Vorksholk on July 17, 2013, 01:34:19 PM

... however (and this is a really dumb question), the block that has to be a factor of the origin of the prime, we just convert the hex of the previous block (so say we are trying to mine block 54320, we would convert the hex of block 54319 to decimal, correct?)

The nice thing about using bignums is that you don't have to convert them to decimal - since the internal representation of bignums is in a base of a power of 2 (usually the "digit" size of bignums is 1 or 2 bytes), and the hash is already in that format, we don't have to convert it to decimal. Unless you want a human-readable representation, the number is stored as raw bytes.

e.g. If the prevhash is 0x000...0053, it is internally represented as {0x00, 0x00, ..., 0x53} (or in the opposite order). We can then use this to construct a bignum without converting it into its decimal representation 83.

Quote

Oh, and another dumb question... What does '≡' mean?

'≡' means "congruent". In this case it refers to modular congruence - if x ≡ y (mod z), it means that they both have the same remainder mod z. E.g. 3^4 ≡ 1 (mod 5) because 3^4=81 and 1 have the same remainder (1) mod 5.

gateway

Hero Member

Offline

Activity: 552
Merit: 500

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 05:21:39 PM

#28

Btw, ill also chip in some btc if we can get this implemented into BFGMiner... https://github.com/luke-jr/bfgminer/tree/prime

Koooooj

Member

Offline

Activity: 75
Merit: 10

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 06:07:49 PM

#29

Quote from: Vorksholk on July 17, 2013, 01:34:19 PM

Quote from: Koooooj on July 17, 2013, 06:05:30 AM

(long quote)

Thanks for the BTC! The explanations on ≡ are correct. It's just a fancy equals sign for modular arithmetic.

As for the divisibility requirement, it is not just the block depth (not sure if that was what you were asking). If it were then it would be trivial to start working on the prime chain for the 75,000th block far ahead of time and everyone would be looking for the exact same primes at the same time, meaning the fastest computer would always win. It is also not just the hex of the block, I don't think, as that would open it up to someone trying to manipulate the block to have the correct hex to allow a precomputed chain of primes to be accepted. As I understand it you essentially perform a single Bitcoin-style proof of work hash on the block you're looking at and then instead of checking to see if that number is less than a difficulty target as you would do in Bitcoin you search for numbers divisible by it. I'm not sure the exact order that everything has to go into this hash, but I know that it includes the previous block's hash, the time, the Merkle Root of the transactions, an arbitrary nonce, and likely other data.

Looking at the data associated with a few blocks it would appear that the specific hash that is used is not the current block's "hash" that is used to identify it; I think that hash includes data about the prime chain, which means it can't be used as a requirement for what the primes are. Looking at block 54321, for example, the prime origin is 4981759032498050152560261402218509159687271713794968969555671831618386493918345 8194973000. The primorial used here appears to be 2*3*5*7*11*13*17*19*23 (since it is not divisible by the next prime number, 29). That means the hash was most likely 2233042693160946897388635236176982778377126850264183238825907717991339881869979 00, or 7887e497dbc303fd77e86f8c2ce7c73d96d039643ab695c311987fc2a16f6bfc48 in hex. If one were to take the (most likely SHA256) hash of parts of the block in a prescribed order then they should get that number. however, troublingly, that number is 67 hex digits long, which is 268 bits; it appears that I've either made an arithmetic error or that I've misunderstood the procedure.

As an aside, not the security that this divisibility requirement brings: the hash value above was approximated by Wolfram Alpha as about 2.2 * the number of atoms in the visible universe. It is effectively impossible to work on a block out of sequence by guessing a valid origin, as you would be guessing until the heat death of the universe.

iCEBREAKER

Legendary

Offline

Activity: 2156
Merit: 1072

Crypto is the separation of Power and State.

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 17, 2013, 07:20:27 PM

#30

Coding directly in low-level CUDA or OpenCL will get the best performance, but perhaps a more quick-and-dirty approach is possible via middleware and code morphing. Huh

There are a number of CUDA/OpenCL translators and abstraction layers available, allowing development at a higher level. Cool

Quote

ArrayFire is a fast and comprehensive, high-level GPU matrix library.

It contains the fastest implementations for hundreds of routines in areas of matrix arithmetic, signal processing, linear algebra, statistics, and image processing, all built upon thousands of lines of highly-tuned device code optimized for any CUDA-enabled GPU.

ArrayFire operates on vectors, matrices, volumes, N-dimensional arrays, and supports single- and double-precision floating point values, complex numbers, booleans, 32-bit signed and unsigned integers.

A simple array notation allows a few lines of ArrayFire code to accomplish what would have taken 10-100X lines in raw CUDA. The array object is also easier to use than template programming and is more flexible and faster than directive-based approaches.

ArrayFire can be used as a stand-alone C++ application, or easily be integrated with existing CUDA code.

ArrayFire performs run-time analysis of your code to increase arithmetic intensity and memory throughput, while avoiding unnecessary temporary allocations (via custom JIT compiler). It can also execute loop iterations in parallel with gfor, and supports easy multi-GPU scaling.

http://www.accelereyes.com/arrayfire/c/

Quote

Developing and designing high performance software for CPUs and GPUs using Libra SDK is robust and easy. You start with a well-known and simple API following standard interfaces in common programming environments resulting in less code and massive performance to your program. Hundreds of high performance functions and operators focused on standard math, dense and sparse matrix/vector operations also including high performance random number generator, compares, conditional ops, fft, equation solvers and lots more.

http://www.gpusystems.com/libra.aspx

Quote

Mint is a domain-specific programming model and translator that generates highly optimized CUDA C from annotated C source. Using just 5 different Mint pragmas, non expert CUDA programmer can benefit from GPU technology without having to be an expert. Mint includes an optimizer that targets 3D stencil methods. The translator generates both host and device code and handles the memory management details including host-device and shared memory optimizations. Mint parallelizes loop-nests in appropriately annotated C source, performing domain-specific optimizations important in three-dimensional problems.

https://sites.google.com/site/mintmodel/home/c-to-cuda-translation

Quote

Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software.

http://thrust.github.io/07-02-2013/Thrust-v1.7.0-release.html

Quote

PARALUTION is a library which enables you to perform various sparse iterative solvers and preconditioners on multi/many-core CPU and GPU devices. Based on C++, it provides generic and flexible design which allows seamless integration with other scientific software packages.

http://www.paralution.com/

Quote

The MGPU library strives to simplify the implementation of high performance applications and algorithms on multi-GPU systems. Its main goal is both to abstract platform dependent functions and vendor specific APIs, as well as simplifying communication between different compute elements.

http://sschaetz.github.io/mgpu/

Quote

Automated Tool to Generate Parallel CUDA code from a Serial C Code

http://www.academia.edu/1970761/Automated_Tool_to_Generate_Parallel_CUDA_Code_from_a_Serial_C_Code

Quote

Though GPUs are widely used in Supercomputers today, they are not code transparent because one has to sit and code the algorithms in CUDA C to run them on GPU. So if we can have some middleware that converts the C programs to CUDA, the end user gets transparency. I tried to develop a prototype compiler using ANTLR in visual studio that converts the C programs in CUDA C language.

https://code.google.com/p/c2cudatranslator/

██████████
██████████████████
██████████████████████
██████████████████████████
████████████████████████████
██████████████████████████████
████████████████████████████████
████████████████████████████████
██████████████████████████████████
██████████████████████████████████
██████████████████████████████████
██████████████████████████████████
██████████████████████████████████
████████████████████████████████
██████████████
██████████████
████████████████████████████
██████████████████████████
██████████████████████
██████████████████
██████████
Monero

"The difference between bad and well-developed digital cash will determine
whether we have a dictatorship or a real democracy." David Chaum 1996
"Fungibility provides privacy as a side effect." Adam Back 2014

▪Private▪Auditable▪100% Fungible
▪Wild & Free▪Flexible Blocksize
▪Intro▪Core GUI▪Github▪Android

Buy and sell XMR near you
P2P Exchange Network
Buy XMR with fiat
Is Dash a scam?

teknohog

Sr. Member

Offline

Activity: 520
Merit: 253

555

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 21, 2013, 10:55:16 PM

#31

Quote from: iCEBREAKER on July 17, 2013, 07:20:27 PM

Coding directly in low-level Verilog or VHDL will get the best performance, but perhaps a more quick-and-dirty approach is possible via middleware and code morphing.

Fixed that for you. FPGA FTW!

world famous math art | masternodes are bad, mmmkay?
Every sha(sha(sha(sha()))), every ho-o-o-old, still shines

Vorksholk (OP)

Legendary

Offline

Activity: 1713
Merit: 1029

Re: [New Bounty] 6BTC for PrimeCoin CUDA miner!

July 21, 2013, 11:10:55 PM

#32

This bounty is coming close to being claimed--I can smell it! https://bitcointalk.org/index.php?topic=258982.0

VeriBlock: Securing The World's Blockchains Using Bitcoin
https://veriblock.org

Pages: « 1 [2] All

Bitcoin Forum > Alternate cryptocurrencies > Altcoin Discussion > [Removed Bounty] PrimeCoin CUDA miner

« previous topic next topic »