ncr1pt0r
Newbie
Offline
Activity: 44
Merit: 0
|
|
August 13, 2013, 03:25:52 PM |
|
good news , glad to see your still working on it
|
|
|
|
Spoetnik
Legendary
Offline
Activity: 1540
Merit: 1011
FUD Philanthropist™
|
|
August 13, 2013, 05:13:21 PM |
|
i am watching you CUDA !!!!!!
|
FUD first & ask questions later™
|
|
|
Lauda
Legendary
Offline
Activity: 2674
Merit: 2965
Terminated.
|
|
August 13, 2013, 08:37:27 PM |
|
good news , glad to see your still working on it
I wondered if he quit it, so we got some good news
|
"The Times 03/Jan/2009 Chancellor on brink of second bailout for banks" 😼 Bitcoin Core ( onion)
|
|
|
jaakkop
|
|
August 14, 2013, 05:46:11 AM |
|
I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.
Status:
I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.
If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.
To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.
Thanks for the update and keep up the good work
|
I'd buy that for a dollar bitcoin!
|
|
|
SynergyCores
Newbie
Offline
Activity: 20
Merit: 0
|
|
August 14, 2013, 04:42:03 PM |
|
I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.
Status:
I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.
If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.
To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.
I only wish that I knew enough to help, but as it is, I know nothing. Thanks for the update!
|
|
|
|
primedigger (OP)
Member
Offline
Activity: 75
Merit: 10
|
|
August 15, 2013, 05:10:59 PM |
|
So, sad news: I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end. I pushed my changes for anyone who that wants to play with it. There is also still a very slow ported version of the Fermat test, which is easily outperformed by GMP's implementation on the CPU. I think there is no easy way to avoid doing Fermat tests on the GPU. So for now, there is sadly nothing for the GPU which is faster than hp-9 on the CPU. I will have a close look at Mtrlt's project, but as it seems, he might have similar problems. It would be a major achievement if gets a GPU Fermat test working with a good speed-up. This means that a fast GPU "exponentiation by squaring" algorithm is available to the research community and prime research would benefit from that in general, as most prime tests (not only Fermat's test) need that. There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier. Also if Mtrlt succeds, he really deserves his price money... it's really not an easy task and I doubt other GPU implementations are in the wild. I would also then port over his method to my CUDA project.
|
|
|
|
lemons
|
|
August 15, 2013, 05:36:20 PM |
|
CUDA ++1
|
|
|
|
gigawatt
|
|
August 15, 2013, 05:58:29 PM |
|
I take it CUMP didn't have what you needed?
|
|
|
|
gigawatt
|
|
August 15, 2013, 06:25:27 PM |
|
There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier. I just skimmed over that paper. Their results are novel, but almost useless in application. If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64.
|
|
|
|
primedigger (OP)
Member
Offline
Activity: 75
Merit: 10
|
|
August 15, 2013, 07:20:27 PM |
|
There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier. I just skimmed over that paper. Their results are novel, but almost useless in application. If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64. Exactly, and they are much greater than 2^64 in primcoin. As for CUMP, it is completely useless for primecoin, as it only implements floating point arithmetic and then only addition, multiplication and subtraction.
|
|
|
|
refer_2_me
|
|
August 16, 2013, 04:30:33 AM |
|
Keep on keeping on, I guess.
|
BTC: 1reFerkRnftob5YvbB112bbuwepC9XYLj XPM: APQpPZCfEz3kejrYTfyACY1J9HrjnRf34Y
|
|
|
mikaelh
|
|
August 16, 2013, 06:51:20 AM |
|
I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.
I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.
|
|
|
|
crendore
|
|
August 16, 2013, 09:37:42 AM |
|
I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.
I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug. So uhh... should we not be using HP9 then?
|
|
|
|
Lauda
Legendary
Offline
Activity: 2674
Merit: 2965
Terminated.
|
|
August 16, 2013, 09:50:08 AM |
|
CUDA ++1
We all love CUDA because of XPM
|
"The Times 03/Jan/2009 Chancellor on brink of second bailout for banks" 😼 Bitcoin Core ( onion)
|
|
|
mikaelh
|
|
August 16, 2013, 11:16:26 AM |
|
I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.
I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug. So uhh... should we not be using HP9 then? The fast-div test seems to be filtering out about 6% of candidates. HP9 should still be faster than HP8 despite that.
|
|
|
|
primedigger (OP)
Member
Offline
Activity: 75
Merit: 10
|
|
August 16, 2013, 11:48:06 AM |
|
If you can't wait for Mikaelh's fix: The fast div can be disabled easily, open prime.cpp in an editor and comment out the whole "if (fFastDiv) {...}" block in the function FermatProbablePrimalityTestFast and comment out the block after the comment "// Compute parameters for fast div test" in MineProbablePrimeChain.
The number of candidates it filtered out was depended on "nFastDivPrimes", which by default was chosen fairly low, thus only ~6% of candidates are (wrongly) discarded. But hp-9 is still doing unnecessary divisions on all candidates, so you can speed up hp-9 by commenting out those divisions.
|
|
|
|
crendore
|
|
August 16, 2013, 12:03:42 PM |
|
If you can't wait for Mikaelh's fix: The fast div can be disabled easily, open prime.cpp in an editor and comment out the whole "if (fFastDiv) {...}" block in the function FermatProbablePrimalityTestFast and comment out the block after the comment "// Compute parameters for fast div test" in MineProbablePrimeChain.
The number of candidates it filtered out was depended on "nFastDivPrimes", which by default was chosen fairly low, thus only ~6% of candidates are (wrongly) discarded. But hp-9 is still doing unnecessary divisions on all candidates, so you can speed up hp-9 by commenting out those divisions.
Thanks! Just as an aside, in c++ what does it mean to wrap a block of code in { } when there isn't a function definition, if, else, while, for, etc. statement? (in regards to the second block of code you mentioned to comment out, it seems wrapped in {} with no initial statement.
|
|
|
|
paulthetafy
|
|
August 16, 2013, 12:13:47 PM |
|
Thanks! Just as an aside, in c++ what does it mean to wrap a block of code in { } when there isn't a function definition, if, else, while, for, etc. statement? (in regards to the second block of code you mentioned to comment out, it seems wrapped in {} with no initial statement.
It's a nice simple way to limit the scope of any variables declared inside the {} to only be "visible" inside those brackets
|
|
|
|
crendore
|
|
August 16, 2013, 12:17:57 PM |
|
Thanks! Just as an aside, in c++ what does it mean to wrap a block of code in { } when there isn't a function definition, if, else, while, for, etc. statement? (in regards to the second block of code you mentioned to comment out, it seems wrapped in {} with no initial statement.
It's a nice simple way to limit the scope of any variables declared inside the {} to only be "visible" inside those brackets Ahhh. sweet, good to know. I don't write much c++, just learning it now.
|
|
|
|
crendore
|
|
August 16, 2013, 12:21:07 PM |
|
I can confirm this gave me a speed up on my 4 machines as follows: 1: 6.88% 2: 4.52% 3: 4.69% 4: 8.42%
|
|
|
|
|