Bitcoin Forum
May 11, 2024, 10:55:05 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 4 5 6 [7] 8 »  All
  Print  
Author Topic: [XPM] CUDA enabled qt client miner for primecoins. Source code inside. WIP  (Read 31716 times)
ncr1pt0r
Newbie
*
Offline Offline

Activity: 44
Merit: 0



View Profile WWW
August 13, 2013, 03:25:52 PM
 #121

good news , glad to see your still working on it
1715424905
Hero Member
*
Offline Offline

Posts: 1715424905

View Profile Personal Message (Offline)

Ignore
1715424905
Reply with quote  #2

1715424905
Report to moderator
1715424905
Hero Member
*
Offline Offline

Posts: 1715424905

View Profile Personal Message (Offline)

Ignore
1715424905
Reply with quote  #2

1715424905
Report to moderator
The network tries to produce one block per 10 minutes. It does this by automatically adjusting how difficult it is to produce blocks.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
Spoetnik
Legendary
*
Offline Offline

Activity: 1540
Merit: 1011


FUD Philanthropist™


View Profile
August 13, 2013, 05:13:21 PM
 #122

i am watching you Wink

CUDA !!!!!!

FUD first & ask questions later™
Lauda
Legendary
*
Offline Offline

Activity: 2674
Merit: 2965


Terminated.


View Profile WWW
August 13, 2013, 08:37:27 PM
 #123

good news , glad to see your still working on it
I wondered if he quit it, so we got some good news  Smiley

"The Times 03/Jan/2009 Chancellor on brink of second bailout for banks"
😼 Bitcoin Core (onion)
jaakkop
Member
**
Offline Offline

Activity: 63
Merit: 10


View Profile WWW
August 14, 2013, 05:46:11 AM
 #124

I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.

Thanks for the update and keep up the good work Smiley

I'd buy that for a dollar bitcoin!
SynergyCores
Newbie
*
Offline Offline

Activity: 20
Merit: 0



View Profile
August 14, 2013, 04:42:03 PM
 #125


I was away for the past week and will look into it again this week. Yes, it's just a hobby project and it got bigger than I expected. Currently, I'm the only one working on this, so if someone wants to chip in and help (programming), send me a PM.

Status:

I will push my lastest changes soon, I have updated my code basis to hp-9 and I implemented a fast big num small prime trial division for the GPU. Depending on the settings, this can filter out 10-90% of all candidates. The CPU than computes the fermat tests on the remaining candidates. I was under the impression that the sieve would already filter out all chains versus small primes, but apparently the high performance client still filters out some candidates with trial divisions and does this before doing fermat tests.

If a fast fermat test for the GPU surfaces, than filtering+fermat tests could be chained directly on the GPU to give a better speed up.

To clarify: I didn't push my changes because I still have a silly bug somewhere, so that apparently not all prime divisors are found. But doing more prime division tests than what the high performance client does by default yields already better speed ups directly on the CPU.

I only wish that I knew enough to help, but as it is, I know nothing. Thanks for the update!
primedigger (OP)
Member
**
Offline Offline

Activity: 75
Merit: 10


View Profile
August 15, 2013, 05:10:59 PM
 #126

So, sad news:

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I pushed my changes for anyone who that wants to play with it. There is also still a very slow ported version of the Fermat test, which is easily outperformed by GMP's implementation on the CPU. I think there is no easy way to avoid doing Fermat tests on the GPU. So for now, there is sadly nothing for the GPU which is faster than hp-9 on the CPU.

I will have a close look at Mtrlt's project, but as it seems, he might have similar problems. It would be a major achievement if gets a GPU Fermat test working with a good speed-up. This means that a fast GPU "exponentiation by squaring" algorithm is available to the research community and prime research would benefit from that in general, as most prime tests (not only Fermat's test) need that. There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

Also if Mtrlt succeds, he really deserves his price money... it's really not an easy task and I doubt other GPU implementations are in the wild. I would also then port over his method to my CUDA project.
lemons
Full Member
***
Offline Offline

Activity: 178
Merit: 100



View Profile
August 15, 2013, 05:36:20 PM
 #127

CUDA ++1
gigawatt
Full Member
***
Offline Offline

Activity: 168
Merit: 100



View Profile
August 15, 2013, 05:58:29 PM
 #128

I take it CUMP didn't have what you needed?

BTC: 1E2egHUcLDAmcxcqZqpL18TPLx9Xj1akcV   Ψ: AWHJbwoM67Ez12SHH4pH5DnJKPoMSdvLz2   Primecoin All-In-One VPS Setup Script   Quarkcoin All-In-One VPS Setup Script   Metiscoin VPS Pool Mining Script
gigawatt
Full Member
***
Offline Offline

Activity: 168
Merit: 100



View Profile
August 15, 2013, 06:25:27 PM
 #129

There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

I just skimmed over that paper.  Their results are novel, but almost useless in application.
If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64.   Sad

BTC: 1E2egHUcLDAmcxcqZqpL18TPLx9Xj1akcV   Ψ: AWHJbwoM67Ez12SHH4pH5DnJKPoMSdvLz2   Primecoin All-In-One VPS Setup Script   Quarkcoin All-In-One VPS Setup Script   Metiscoin VPS Pool Mining Script
primedigger (OP)
Member
**
Offline Offline

Activity: 75
Merit: 10


View Profile
August 15, 2013, 07:20:27 PM
 #130

There are also a couple of papers on Fermat tests on the GPU (e.g. http://www.gpgpgpu.com/gecco2009/6.pdf), however these implementations are usually assuming that n is smaller than 32 or 64bits, which makes the test much easier.

I just skimmed over that paper.  Their results are novel, but almost useless in application.
If you're doing Fermat tests, there's a good chance the numbers you want to analyze are greater than 2^64.   Sad

Exactly, and they are much greater than 2^64 in primcoin.

As for CUMP, it is completely useless for primecoin, as it only implements floating point arithmetic and then only addition, multiplication and subtraction.
refer_2_me
Full Member
***
Offline Offline

Activity: 213
Merit: 100



View Profile
August 16, 2013, 04:30:33 AM
 #131

Keep on keeping on, I guess.

BTC: 1reFerkRnftob5YvbB112bbuwepC9XYLj
XPM: APQpPZCfEz3kejrYTfyACY1J9HrjnRf34Y
mikaelh
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
August 16, 2013, 06:51:20 AM
 #132

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.
crendore
Sr. Member
****
Offline Offline

Activity: 363
Merit: 250


View Profile
August 16, 2013, 09:37:42 AM
 #133

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.

So uhh... should we not be using HP9 then?

Lauda
Legendary
*
Offline Offline

Activity: 2674
Merit: 2965


Terminated.


View Profile WWW
August 16, 2013, 09:50:08 AM
 #134

CUDA ++1
We all love CUDA because of XPM  Cheesy

"The Times 03/Jan/2009 Chancellor on brink of second bailout for banks"
😼 Bitcoin Core (onion)
mikaelh
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
August 16, 2013, 11:16:26 AM
 #135

I think there might be a bug in hp-9 somewhere, so that the trial division doesn't work quite right in that version and sorts out wrong candidates. I will need to confirm this with the hp-9 sources without my changes, so that I'm sure I didn't introduce that bug. The CUDA trial division seems to be doing the right think and it doesn't find any candidates to discard, because if I understand it right, the sieve already did that. To put it in different words: this idea is likely a dead end.

I can confirm that fast division is buggy in hp9. It's also not needed in hp9 anymore because of Sunny's optimization. I'll remove the code in my next release. Thanks for spotting the bug.

So uhh... should we not be using HP9 then?

The fast-div test seems to be filtering out about 6% of candidates. HP9 should still be faster than HP8 despite that.
primedigger (OP)
Member
**
Offline Offline

Activity: 75
Merit: 10


View Profile
August 16, 2013, 11:48:06 AM
 #136

If you can't wait for Mikaelh's fix: The fast div can be disabled easily, open prime.cpp in an editor and comment out the whole "if (fFastDiv) {...}" block in the function FermatProbablePrimalityTestFast and comment out the block after the comment "// Compute parameters for fast div test" in MineProbablePrimeChain.

The number of candidates it filtered out was depended on "nFastDivPrimes", which by default was chosen fairly low, thus only ~6% of candidates are (wrongly) discarded. But hp-9 is still doing unnecessary divisions on all candidates, so you can speed up hp-9 by commenting out those divisions.
crendore
Sr. Member
****
Offline Offline

Activity: 363
Merit: 250


View Profile
August 16, 2013, 12:03:42 PM
 #137

If you can't wait for Mikaelh's fix: The fast div can be disabled easily, open prime.cpp in an editor and comment out the whole "if (fFastDiv) {...}" block in the function FermatProbablePrimalityTestFast and comment out the block after the comment "// Compute parameters for fast div test" in MineProbablePrimeChain.

The number of candidates it filtered out was depended on "nFastDivPrimes", which by default was chosen fairly low, thus only ~6% of candidates are (wrongly) discarded. But hp-9 is still doing unnecessary divisions on all candidates, so you can speed up hp-9 by commenting out those divisions.

Thanks!  Just as an aside, in c++ what does it mean to wrap a block of code in { } when there isn't a function definition, if, else, while, for, etc. statement? (in regards to the second block of code you mentioned to comment out, it seems wrapped in {} with no initial statement.

paulthetafy
Hero Member
*****
Offline Offline

Activity: 820
Merit: 1000


View Profile
August 16, 2013, 12:13:47 PM
 #138

Thanks!  Just as an aside, in c++ what does it mean to wrap a block of code in { } when there isn't a function definition, if, else, while, for, etc. statement? (in regards to the second block of code you mentioned to comment out, it seems wrapped in {} with no initial statement.
It's a nice simple way to limit the scope of any variables declared inside the {} to only be "visible" inside those brackets
crendore
Sr. Member
****
Offline Offline

Activity: 363
Merit: 250


View Profile
August 16, 2013, 12:17:57 PM
 #139

Thanks!  Just as an aside, in c++ what does it mean to wrap a block of code in { } when there isn't a function definition, if, else, while, for, etc. statement? (in regards to the second block of code you mentioned to comment out, it seems wrapped in {} with no initial statement.
It's a nice simple way to limit the scope of any variables declared inside the {} to only be "visible" inside those brackets

Ahhh. sweet, good to know.  I don't write much c++, just learning it now.

crendore
Sr. Member
****
Offline Offline

Activity: 363
Merit: 250


View Profile
August 16, 2013, 12:21:07 PM
 #140

I can confirm this gave me a speed up on my 4 machines as follows:
1: 6.88%
2: 4.52%
3: 4.69%
4: 8.42%

Pages: « 1 2 3 4 5 6 [7] 8 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!