Bitcoin Forum
March 19, 2024, 08:27:33 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: doubling Litecoin mining efficiency on nVidia  (Read 15110 times)
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
March 28, 2013, 11:27:27 AM
Last edit: March 28, 2013, 11:40:10 AM by cbuchner1
 #1

Hi,

I was always a bit dismayed at the state of OpenCL mining on nVidia cards. Getting 20kHash/sec on an nVidia gtx 260 (the 216 shader version) was always rather disappointing. Even my GTX 660Ti peaked out at 80kHash/sec.

I sat down and rewrote all of the computationally heavy part in CUDA. That's the several 1024 iterations of calling the Salsa20/8 core (the part inbetween the two PBKDF2_SHA256() calls).

My code reference is this pretty cleaned up and concise implementation: https://github.com/litecoin-project/litecoin/blob/master/src/scrypt.c  I ported this to CUDA making heavy use of shared memory, and making sure that memory accesses are pretty close to optimal.

Now I am getting something like 45 kHash/sec from the GTX 260. Still no match for any ATI card, but about the hashing power of a high-end CPU miner. I still need to figure out in which existing miner application I can include this new CUDA code.

Christian
1710836853
Hero Member
*
Offline Offline

Posts: 1710836853

View Profile Personal Message (Offline)

Ignore
1710836853
Reply with quote  #2

1710836853
Report to moderator
1710836853
Hero Member
*
Offline Offline

Posts: 1710836853

View Profile Personal Message (Offline)

Ignore
1710836853
Reply with quote  #2

1710836853
Report to moderator
TalkImg was created especially for hosting images on bitcointalk.org: try it next time you want to post an image
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1710836853
Hero Member
*
Offline Offline

Posts: 1710836853

View Profile Personal Message (Offline)

Ignore
1710836853
Reply with quote  #2

1710836853
Report to moderator
1710836853
Hero Member
*
Offline Offline

Posts: 1710836853

View Profile Personal Message (Offline)

Ignore
1710836853
Reply with quote  #2

1710836853
Report to moderator
wndrbr3d
Hero Member
*****
Offline Offline

Activity: 914
Merit: 500


View Profile
March 28, 2013, 01:24:54 PM
 #2

Christian,

I was curious if there were any more serious CUDA developers out there looking to tackle this issue. Thank you for stepping up for the community!

As far as a miner is concerned re: CUDA, the only one I know of that actually uses CUDA vs. OpenCL is the ooooold version of RPCMiner (here: https://bitcointalk.org/index.php?topic=2444.0)

I'd be interested in helping you test your code as I have a 690GTX that is begging to be spun up for Litecoins! Smiley

Additionally, I'm very, very novice at CUDA development (as in, went through "CUDA By Example" six months ago as research for work) but I'd like to help any way I can. One thing I was curious about is the new "Shift Left" (SHLFT) instructions in the new Kepler architecture (more info: http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0642-GTC2012-Inside-Kepler.pdf) and how it might be used to juice even more performance from a pure CUDA based miner.

Thanks again and keep the updates coming! Smiley
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
March 28, 2013, 01:52:00 PM
 #3

I'd be interested in helping you test your code as I have a 690GTX that is begging to be spun up for Litecoins! Smiley

One thing I was curious about is the new "Shift Left" (SHLFT) instructions in the new Kepler architecture  and how it might be used to juice even more performance from a pure CUDA based miner.

For integrating my CUDA code, I briefly looked at the source code of the reaper GPU miner, but I do not really like it. Looks like a hack.

I only know about a SHFL instruction, which is for intra-warp data exchange (shuffle?). I do not see this speeding up hashing at the moment.

But the new 64 bit funnel shifter is only available in the 3.5 compute capability, as offered by the Titan card, or the GF110 based Teslas. Lesser Geforce cards like your 690GTX have compute 3.0 only. A question on stackoverflow deals with this feature:
http://stackoverflow.com/questions/12767113/funnel-shift-what-is-it.  It has the potential to somewhat speed up SHA-256 and scrypt hashing, but the price of the cards is just way too high.
msm595
Full Member
***
Offline Offline

Activity: 185
Merit: 100


View Profile
March 28, 2013, 03:29:28 PM
 #4

For integrating my CUDA code, I briefly looked at the source code of the reaper GPU miner, but I do not really like it. Looks like a hack.
Have you looked at cgminer?

wndrbr3d
Hero Member
*****
Offline Offline

Activity: 914
Merit: 500


View Profile
March 28, 2013, 03:43:54 PM
 #5

But the new 64 bit funnel shifter is only available in the 3.5 compute capability, as offered by the Titan card, or the GF110 based Teslas. Lesser Geforce cards like your 690GTX have compute 3.0 only. A question on stackoverflow deals with this feature:
http://stackoverflow.com/questions/12767113/funnel-shift-what-is-it.  It has the potential to somewhat speed up SHA-256 and scrypt hashing, but the price of the cards is just way too high.

I would be willing to make the investment in a GeForce TITAN card (GK110) in order to collaborate with you in order to make a more efficient pure CUDA mining routine for both SHA-256 and scrypt. I think there's huge potential there because the funnel shifter brings to the table performance gains along the same lines as BFI_INT did for OpenCL.

Let me know how you would like to collaborate on this. Again, I'm not a seasoned CUDA developer but I feel I can still be of assistance Smiley
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
March 28, 2013, 04:35:25 PM
 #6


This thread on the nvidia forums discusses the possible benefits from the Compute 3.5 architecture for SHA-256 hashing https://devtalk.nvidia.com/default/topic/496471/cuda-programming-and-performance/amd-radeon-3x-faster-on-bitcoin-mining-sha-256-hashing-performance/3/

Don't be too optimistic, the instruction count is "only" reduced by about 20%. So the speed boost is expected to be moderate at best.
cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
March 28, 2013, 04:41:27 PM
 #7

Have you looked at cgminer?

I've just compiled pooler's cpuminer on Windows (without the assembly bits) and I think I will be adding my CUDA stuff there.

Christian
msm595
Full Member
***
Offline Offline

Activity: 185
Merit: 100


View Profile
March 28, 2013, 04:45:22 PM
 #8

Awesome Cheesy. Will you be releasing binaries? (And maybe a donation address)

wndrbr3d
Hero Member
*****
Offline Offline

Activity: 914
Merit: 500


View Profile
March 28, 2013, 04:54:19 PM
 #9


This thread on the nvidia forums discusses the possible benefits from the Compute 3.5 architecture for SHA-256 hashing https://devtalk.nvidia.com/default/topic/496471/cuda-programming-and-performance/amd-radeon-3x-faster-on-bitcoin-mining-sha-256-hashing-performance/3/

Don't be too optimistic, the instruction count is "only" reduced by about 20%. So the speed boost is expected to be moderate at best.

My sign of hope there is that the 20% reduction was in instruction count only, not actual execution time. Wink
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!