Bitcoin Forum
April 18, 2014, 07:00:07 AM *
News: Due to the OpenSSL heartbleed bug, changing your forum password is recommended.
 
   Home   Help Search Donate Login Register  
Pages: [1]
  Print  
Author Topic: doubling Litecoin mining efficiency on nVidia  (Read 12196 times)
cbuchner1
Sr. Member
****
Offline Offline

Activity: 392


View Profile

Ignore
March 28, 2013, 11:27:27 AM
 #1

Hi,

I was always a bit dismayed at the state of OpenCL mining on nVidia cards. Getting 20kHash/sec on an nVidia gtx 260 (the 216 shader version) was always rather disappointing. Even my GTX 660Ti peaked out at 80kHash/sec.

I sat down and rewrote all of the computationally heavy part in CUDA. That's the several 1024 iterations of calling the Salsa20/8 core (the part inbetween the two PBKDF2_SHA256() calls).

My code reference is this pretty cleaned up and concise implementation: https://github.com/litecoin-project/litecoin/blob/master/src/scrypt.c  I ported this to CUDA making heavy use of shared memory, and making sure that memory accesses are pretty close to optimal.

Now I am getting something like 45 kHash/sec from the GTX 260. Still no match for any ATI card, but about the hashing power of a high-end CPU miner. I still need to figure out in which existing miner application I can include this new CUDA code.

Christian
1397804407
Hero Member
*
Offline Offline

Posts: 1397804407

View Profile Personal Message (Offline)

Ignore
1397804407
Reply with quote  #2

1397804407
Report to moderator
1397804407
Hero Member
*
Offline Offline

Posts: 1397804407

View Profile Personal Message (Offline)

Ignore
1397804407
Reply with quote  #2

1397804407
Report to moderator
ASIC Scrypt Miners Ship FREE Same-Day - Guaranteed Satisfaction!
Just Enter Code freeshipping at GAWMiners.com
Mining Made Easy
For Everyone

Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1397804407
Hero Member
*
Offline Offline

Posts: 1397804407

View Profile Personal Message (Offline)

Ignore
1397804407
Reply with quote  #2

1397804407
Report to moderator
wndrbr3d
Sr. Member
****
Offline Offline

Activity: 295


View Profile

Ignore
March 28, 2013, 01:24:54 PM
 #2

Christian,

I was curious if there were any more serious CUDA developers out there looking to tackle this issue. Thank you for stepping up for the community!

As far as a miner is concerned re: CUDA, the only one I know of that actually uses CUDA vs. OpenCL is the ooooold version of RPCMiner (here: https://bitcointalk.org/index.php?topic=2444.0)

I'd be interested in helping you test your code as I have a 690GTX that is begging to be spun up for Litecoins! Smiley

Additionally, I'm very, very novice at CUDA development (as in, went through "CUDA By Example" six months ago as research for work) but I'd like to help any way I can. One thing I was curious about is the new "Shift Left" (SHLFT) instructions in the new Kepler architecture (more info: http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0642-GTC2012-Inside-Kepler.pdf) and how it might be used to juice even more performance from a pure CUDA based miner.

Thanks again and keep the updates coming! Smiley
cbuchner1
Sr. Member
****
Offline Offline

Activity: 392


View Profile

Ignore
March 28, 2013, 01:52:00 PM
 #3

I'd be interested in helping you test your code as I have a 690GTX that is begging to be spun up for Litecoins! Smiley

One thing I was curious about is the new "Shift Left" (SHLFT) instructions in the new Kepler architecture  and how it might be used to juice even more performance from a pure CUDA based miner.

For integrating my CUDA code, I briefly looked at the source code of the reaper GPU miner, but I do not really like it. Looks like a hack.

I only know about a SHFL instruction, which is for intra-warp data exchange (shuffle?). I do not see this speeding up hashing at the moment.

But the new 64 bit funnel shifter is only available in the 3.5 compute capability, as offered by the Titan card, or the GF110 based Teslas. Lesser Geforce cards like your 690GTX have compute 3.0 only. A question on stackoverflow deals with this feature:
http://stackoverflow.com/questions/12767113/funnel-shift-what-is-it.  It has the potential to somewhat speed up SHA-256 and scrypt hashing, but the price of the cards is just way too high.
msm595
Full Member
***
Offline Offline

Activity: 173


View Profile

Ignore
March 28, 2013, 03:29:28 PM
 #4

For integrating my CUDA code, I briefly looked at the source code of the reaper GPU miner, but I do not really like it. Looks like a hack.
Have you looked at cgminer?

1LSbhxShMmymNQ1Li5qd7pYUgrMUcVTokc | LQz2pJYaeqntA9BFB8rDX5AL2TTKGd5AuN
wndrbr3d
Sr. Member
****
Offline Offline

Activity: 295


View Profile

Ignore
March 28, 2013, 03:43:54 PM
 #5

But the new 64 bit funnel shifter is only available in the 3.5 compute capability, as offered by the Titan card, or the GF110 based Teslas. Lesser Geforce cards like your 690GTX have compute 3.0 only. A question on stackoverflow deals with this feature:
http://stackoverflow.com/questions/12767113/funnel-shift-what-is-it.  It has the potential to somewhat speed up SHA-256 and scrypt hashing, but the price of the cards is just way too high.

I would be willing to make the investment in a GeForce TITAN card (GK110) in order to collaborate with you in order to make a more efficient pure CUDA mining routine for both SHA-256 and scrypt. I think there's huge potential there because the funnel shifter brings to the table performance gains along the same lines as BFI_INT did for OpenCL.

Let me know how you would like to collaborate on this. Again, I'm not a seasoned CUDA developer but I feel I can still be of assistance Smiley
cbuchner1
Sr. Member
****
Offline Offline

Activity: 392


View Profile

Ignore
March 28, 2013, 04:35:25 PM
 #6


This thread on the nvidia forums discusses the possible benefits from the Compute 3.5 architecture for SHA-256 hashing https://devtalk.nvidia.com/default/topic/496471/cuda-programming-and-performance/amd-radeon-3x-faster-on-bitcoin-mining-sha-256-hashing-performance/3/

Don't be too optimistic, the instruction count is "only" reduced by about 20%. So the speed boost is expected to be moderate at best.
cbuchner1
Sr. Member
****
Offline Offline

Activity: 392


View Profile

Ignore
March 28, 2013, 04:41:27 PM
 #7

Have you looked at cgminer?

I've just compiled pooler's cpuminer on Windows (without the assembly bits) and I think I will be adding my CUDA stuff there.

Christian
msm595
Full Member
***
Offline Offline

Activity: 173


View Profile

Ignore
March 28, 2013, 04:45:22 PM
 #8

Awesome Cheesy. Will you be releasing binaries? (And maybe a donation address)

1LSbhxShMmymNQ1Li5qd7pYUgrMUcVTokc | LQz2pJYaeqntA9BFB8rDX5AL2TTKGd5AuN
wndrbr3d
Sr. Member
****
Offline Offline

Activity: 295


View Profile

Ignore
March 28, 2013, 04:54:19 PM
 #9


This thread on the nvidia forums discusses the possible benefits from the Compute 3.5 architecture for SHA-256 hashing https://devtalk.nvidia.com/default/topic/496471/cuda-programming-and-performance/amd-radeon-3x-faster-on-bitcoin-mining-sha-256-hashing-performance/3/

Don't be too optimistic, the instruction count is "only" reduced by about 20%. So the speed boost is expected to be moderate at best.

My sign of hope there is that the 20% reduction was in instruction count only, not actual execution time. Wink
Pages: [1]
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!