Bitcoin Forum
November 11, 2024, 11:12:08 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 [911] 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426930 times)
ilovecudacompute
Member
**
Offline Offline

Activity: 112
Merit: 10


View Profile
July 23, 2014, 10:24:34 PM
 #18201

I'm still working on stabilizing the rest of my optimizations, so I'm not creating binaries yet.

Thanks a ton for your efforts

KEEP US UPDATED  Grin Grin

EDIT: Tsiv i sent you a tiny btc donation :PPP
CodyF86
Full Member
***
Offline Offline

Activity: 161
Merit: 100


View Profile
July 23, 2014, 10:53:40 PM
 #18202

Not sure if this is what you mean't djm, but you can add the files you want git to ignore to the .gitignore file in your repo (If you have one, pretty easy to setup) and you won't have to worry about that.
cayars
Full Member
***
Offline Offline

Activity: 168
Merit: 100


View Profile
July 24, 2014, 12:48:58 AM
 #18203

Hey guys,

Just wanted to give you an update.  I was compiling a new version of nvMiner with djm34's X17 compiled in.

He changed his github at about 3 hours into the compile. I killed the compile and I pulled down his latest and started compiling again.  Not sure what went wrong but my VS project file got messed up.  After 6 hours compiling with CUDA 5.5 I killed it.

I reverted back to my earlier version and I'm starting over to keep things clean on my side.  If this was an algo that was profitable I'd could have done a quick compile with just that algo to get it out the door for you guys to use, but since there really is no reason to mine this coin (maybe a clone in the future) I see no reason to do it "wrong".  So I'm going to take my time with it and get it right.

Right now PPL/X17 isn't worth mining so there is no reason to push a release other than to have a nvidia miner that supports one more algo. I'm not into "bragging rights on number of algos supported" if it doesn't make sense.

I'm also going to do some bench marking of stuff sp_ has published recently in the last 5 to 10 pages.  Plus if TSIV doesn't release source code to his recent mods to CryptoNight I'm going to also benchmark Wolf's changes one by one and include what I find to benefit us.

So long story short, I'm going to delay the next release until I'm ready and have done some testing.  The next nvMiner will have:
1) x17
2) any speed up proposed by sp_ or Wolf if they pan out for either Kepler or Maxwell.
3) If TSIV does a source release I'll include this also.
(should be 24 hours or less)

Also, since CUDA 6.5 is right around the corner from release using 5.5 will basically be 2 versions behind.  There comes a point when it's not worth supporting older software and I think we are getting there.  The next nvMiner WILL SUPPORT 5.5 but I don't know about future releases.

CUDA 6.0 (3.0/3.5/5.0) compiles a lot faster then 5.5 does with both 3.0/3.5.  So in the future we will move to 6.0 for nvMiner when all algos have been tested on both Maxwell and Kepler and work ok.  Right now (or last test) I had a problem with FRESH on 6.0.

I delay release of all nvMiner releases until I test EVERY algo after each build.  Damn you djm34 because you are starting to make my testing time take longer. Smiley

So moral of this post is to start upgrading your Rigs to use the latest nvidia drivers.  For the last 6 months (at least) I've been running the latest beta drivers at all times with no problems at all on both Maxwell and Kepler GPUs. So I see no reason not to run the latest or beta releases (what I run).

I'll compile this next version as 5.5 and probably release the version after this as 6.0 first then 5.5 and the 3rd version from now might very well be 6.0 or greater only.

SO I JUST WANTED TO GIVE A heads UP on my plans up move to CUDA 6.0 which is a normal release and not beta.  If during testing I find this performs worse I'll let you guys know and will re-think this (we want highest hash rates of course).

So start thinking or doing upgrades to the latest nvidia driver releases.

Carlo
yellowduck2
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
July 24, 2014, 01:08:49 AM
 #18204

Hey guys,

Just wanted to give you an update.  I was compiling a new version of nvMiner with djm34's X17 compiled in.

He changed his github at about 3 hours into the compile. I killed the compile and I pulled down his latest and started compiling again.  Not sure what went wrong but my VS project file got messed up.  After 6 hours compiling with CUDA 5.5 I killed it.

I reverted back to my earlier version and I'm starting over to keep things clean on my side.  If this was an algo that was profitable I'd could have done a quick compile with just that algo to get it out the door for you guys to use, but since there really is no reason to mine this coin (maybe a clone in the future) I see no reason to do it "wrong".  So I'm going to take my time with it and get it right.

Right now PPL/X17 isn't worth mining so there is no reason to push a release other than to have a nvidia miner that supports one more algo. I'm not into "bragging rights on number of algos supported" if it doesn't make sense.

I'm also going to do some bench marking of stuff sp_ has published recently in the last 5 to 10 pages.  Plus if TSIV doesn't release source code to his recent mods to CryptoNight I'm going to also benchmark Wolf's changes one by one and include what I find to benefit us.

So long story short, I'm going to delay the next release until I'm ready and have done some testing.  The next nvMiner will have:
1) x17
2) any speed up proposed by sp_ or Wolf if they pan out for either Kepler or Maxwell.
3) If TSIV does a source release I'll include this also.
(should be 24 hours or less)

Also, since CUDA 6.5 is right around the corner from release using 5.5 will basically be 2 versions behind.  There comes a point when it's not worth supporting older software and I think we are getting there.  The next nvMiner WILL SUPPORT 5.5 but I don't know about future releases.

CUDA 6.0 (3.0/3.5/5.0) compiles a lot faster then 5.5 does with both 3.0/3.5.  So in the future we will move to 6.0 for nvMiner when all algos have been tested on both Maxwell and Kepler and work ok.  Right now (or last test) I had a problem with FRESH on 6.0.

I delay release of all nvMiner releases until I test EVERY algo after each build.  Damn you djm34 because you are starting to make my testing time take longer. Smiley

So moral of this post is to start upgrading your Rigs to use the latest nvidia drivers.  For the last 6 months (at least) I've been running the latest beta drivers at all times with no problems at all on both Maxwell and Kepler GPUs. So I see no reason not to run the latest or beta releases (what I run).

I'll compile this next version as 5.5 and probably release the version after this as 6.0 first then 5.5 and the 3rd version from now might very well be 6.0 or greater only.

SO I JUST WANTED TO GIVE A heads UP on my plans up move to CUDA 6.0 which is a normal release and not beta.  If during testing I find this performs worse I'll let you guys know and will re-think this (we want highest hash rates of course).

So start thinking or doing upgrades to the latest nvidia driver releases.

Carlo


Thank you very much.

I think we should seriously think about Nvidia Miner Foundation and a foundation donation address.
tsiv
Full Member
***
Offline Offline

Activity: 137
Merit: 100


View Profile
July 24, 2014, 01:51:38 AM
 #18205

Note to self: __CUDA_ARCH__ is a fickle bitch.

I think I got the damn thing to use the new 4-way version of the phase 2 kernel for compute 3.0+ and the old one for 2.0. Since __CUDA_ARCH__ is apparently not defined when compiling the host code I didn't see much choice but to fire up the kernel with four threads per hash even if it's the single thread per hash compute 2.0 version. Dealt with it by making the single thread kernel do work only on the first of the four subthreads. Not very happy with it but it doesn't seem to matter that much performance-wise.

Bottom line: Fuck all difference on Maxwell, apparently some other compute 3.0+ cards like the new 4-way kernel and gain some performance, compute 2.0 should work like before.

I'll look into pulling some of Wolf's mods, also got some ideas for the phase 1&3 kernels but we'll see.

Win32 binary at https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15/ccminer-cryptonight_20140724.zip
bathrobehero
Legendary
*
Offline Offline

Activity: 2002
Merit: 1051


ICO? Not even once.


View Profile
July 24, 2014, 02:35:08 AM
 #18206

So I take back that 64 bit builds are faster with x15. I just have so many different versions of ccminer at this point (~20 GB) that I ended up using a borked 32 bit version which used the GPU less and the CPU more than it's 64 bit brother.

Anyway, here are the average hasrates of a 750Ti and a 780Ti rig per card, running a couple of minutes per algo with djm34's commit 58 compiled with cuda 5.5:
Code:
	   750 Ti          780 Ti	
x32 x64 x32 x64
x11 2.4 2.3 5.5 5.2
x13 2.0 1.8 4.0 3.8
x14 1.9 1.8 4.0 3.8
x15 1.7 1.6 3.7 3.5
x17 1.6 1.5 3.6 3.4
jackpot 5.0 5.1 11 11
qubit 4.0 3.7 8.9 8.1
nist5 7.7 7.7 16.3 16.4
fresh 3.1 2.8 7.2 6.2
groestl 7.3 7.3 14.5 14.7
Gigabyte cards, solomining, very slight 60mhz core overclock.

Not your keys, not your coins!
tsiv
Full Member
***
Offline Offline

Activity: 137
Merit: 100


View Profile
July 24, 2014, 05:07:13 AM
 #18207

Something I pretty much suspected but never bothered to check up on, run times for the various parts of the hash. Well, actually I did benchmark the core loops earlier and found the second one to be the biggest hog. Throw in the numbers for the prep and final phases and you get this:

Prepare: 0.001388 sec
Phase 1: 0.148383 sec
Phase 2: 1.414880 sec
Phase 3: 0.147834 sec
Final: 0.003590 sec

That's 32x15 hashes on a GTX 750 Ti. Can't tell how it works out on other cards since all I've got is a bunch of 750 Tis, but in this case optimizing the living fuck out of the prep and final parts all the way to instant completion with zero run time would bump up the total hashrate by 0.3%. Don't get me wrong, Wolf's doing nice work on unfucking stuff I pretty much just yanked out of cpuminer-multi and left as is. I just prefer to focus on shit that matters, again, no offense intended. Too bad I'm not even making a dent on that goddamn clusterfuck that is the second main loop Grin
Equitum
Newbie
*
Offline Offline

Activity: 29
Merit: 0


View Profile
July 24, 2014, 05:37:54 AM
 #18208

I know this really doesn't have to do with ccminer discussions as much as nVidia mining/hashing in general, but I figured a good deal of people come to this thread to discuss the most profitable way to use nVidia power, so here goes: I've done a little trial run over the past day and a half of using Folding@Home to "mine" Curecoins, and I'm getting a pretty good payout (payout should be at about 30-35 or so Curecoins/day for my card's PPD; roughly 0.0035-0.0042 BTC/day at the current rate).
It certainly doesn't hurt that my GPU is folding proteins and helping researchers while using all that power, instead of doing random hashing, but even without those considerations, the profit margin speaks for itself (for reference, Bombadil's profit calculator shows my 3 most profitable mining options at: {TAG: VEIL | Name:Veilcoin | Algo: X13 | BTC/day: .00316889, TAG: PP9X11 | Name:Multipool X11 (PP) | Algo: X11 | BTC/day: .00302276, TAG: XMR | Name:Monero | Algo: CryptoNight | BTC/day: .00215359}).

I get about 250k PPD with my 780 Ti and i5-4670k, so folding might be more relevant to single-card/gaming rigs moreso than pure mining rigs, but looking into F@H couldn't hurt for other kinds of rigs (I'd be interested to see what a full 750 Ti rig with a mid-range processor could put out in terms of PPD)!
sp_
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
July 24, 2014, 06:43:06 AM
 #18209

So I take back that 64 bit builds are faster with x15. I just have so many different versions of ccminer at this point (~20 GB) that I ended up using a borked 32 bit version which used the GPU less and the CPU more than it's 64 bit brother.

What we need is 32 bit adressing with 64 bit hashing. (Use 100% of the cudacores per cycle instead of 50%). This is not done by changing the build target to 64bit. Each hash needs to be re-implemented in CUDA-asm from scratch.
Compute 3.0 has max 64 32bit registers per thread, compute 3.5 has 255 registers etc. But there are no speedups when compiling ccminer for 5.0. This meens that the code generated is suboptimal and needs to be finetuned (preferably in 100% Cuda asm).
Remove latency, remove registers, pipeline instructions, improve cachehits etc..

Today each thread in ccminer is computing 1 hash by doing a full runtthtrough of all algorithms:

x1->x2->x3->x4->x5->x6->x7->x8->x9->x10->x11

This is suboptimal

A FPGA implementation will run at the speed of the slowest x, thus eliminating the other x'es since they are done in parallell. We should do something similar.

The slowest algorithm for the 750TI is the groestl. This algorithm is running at 7,5 MHASH on a single 750TI.

The target for the optimized GPU miner wil be in the range 5-7.5 MHASH on the 750TI for x11(darkcoin).

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
DougB62
Hero Member
*****
Offline Offline

Activity: 672
Merit: 500


Banned: For Your Protection


View Profile
July 24, 2014, 06:47:26 AM
 #18210

So I take back that 64 bit builds are faster with x15. I just have so many different versions of ccminer at this point (~20 GB) that I ended up using a borked 32 bit version which used the GPU less and the CPU more than it's 64 bit brother.

What we need is 32 bit adressing with 64 bit hashing. (Use 100% of the cudacores per cycle instead of 50%). This is not done by changing the build target to 64bit. Each hash needs to be re-implemented in CUDA-asm from scratch.
Compute 3.0 has max 64 32bit registers per thread, compute 3.5 has 255 registers etc. But there are no speedups when compiling ccminer for 5.0. This meens that the code generated is suboptimal and needs to be finetuned (preferably in 100% Cuda asm).
Remove latency, remove registers, pipeline instructions, improve cachehits etc..

Today each thread in ccminer is computing 1 hash by doing a full runtthtrough of all algorithms:

x1->x2->x3->x4->x5->x6->x7->x8->x9->x10->x11

This is suboptimal

A FPGA implementation will run at the speed of the slowest x, thus eliminating the other x'es since they are done in parallell. We should do something similar.

The slowest algorithm for the 750TI is the groestl. This algorithm is running at 7,5 MHASH on a single 750TI.

The target for the optimized GPU miner wil be in the range 5-7.5 MHASH on the 750TI for x11(darkcoin).


Now you're gettin' serious... and I like that!!  Grin
yellowduck2
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
July 24, 2014, 06:50:10 AM
 #18211


The target for the optimized GPU miner wil be in the range 5-7.5 MHASH on the 750TI for x11(darkcoin).


WOw. This is big if it's true. It will more than double the speed of pretty much every algo that ccminer is using. Very big improvement.

This is the kind of improvement that is ground breaking !

cbuchner1 aka Nvidia Satoshi , Any comment about sp_ theory ?
Amph
Legendary
*
Offline Offline

Activity: 3248
Merit: 1070



View Profile
July 24, 2014, 07:32:00 AM
 #18212

Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive. The best I've come up with breaks even with the current single thread per hash implementation. Well, almost. It's actually a percent slower AND loses compute 2.0 compatibility due to using shuffle. On the other hands it performs a lot more reasonably with various launch configurations, 15 blocks of 32 threads works our equally well as the original 8x60 magic bullet for 750 Ti.

At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets.

https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip

Also, any chances for this code to get released already? Or are you competing against Wolf0 Cheesy
It works like a charm, 220H/s for GTX760, before it was 190. GTX750TIs seem unchanged.



I get 270H(peaks of 297H with -l 8x50)  with this release and a GTX 760 overclocked -->v0.15-rc1 ccminer-cryptonight_20140723

Thanks for that launch setting Cheesy 306H/s (MSI gaming, +180core, +500mem). Still have to test what's the most stable, but thanks for giving me a start Wink

Ooh damn, you've released that a looong time ago, tsiv. Should've noticed ^^"

EDIT: 320H/s with +222core, +666mem Tongue I'm waiting anxiously for a driver crash Wink

Fantastic Bombadill...im on +180 core +300 Memory
If you find any better launch configs please post it
I also asked in the other thread if there are binaries for wolf nvidia xmr miner

dunno but i can't oc my cards at least one of them keeps crashing if i do so, you changed the power limit in the bios?
sp_
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
July 24, 2014, 07:34:53 AM
Last edit: July 24, 2014, 07:49:08 AM by sp_
 #18213

I'm pretty sure the output of the last algorithm is used as the input of the next one for X11, precisely so you can't do that.

Yes you can. If each thread is working on a different hash.

example
4 threads 4 hashes

HASH1: x1->x2->x3->
HASH2: x4->x5->x6->
HASH3: x7->x8->x9->
HASH4: x10->x11

Swap the 4 hashes

HASH4: x1->x2->x3->
HASH1: x4->x5->x6->
HASH2: x7->x8->x9->
HASH3: x10->x11

Swap the 4 hashes

HASH3: x1->x2->x3->
HASH4: x4->x5->x6->
HASH1: x7->x8->x9->
HASH2: x10->x11

Swap the 4 hashes

HASH2: x1->x2->x3->
HASH3: x4->x5->x6->
HASH4: x7->x8->x9->
HASH1: x10->x11

Complete


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
July 24, 2014, 07:47:35 AM
 #18214

I'm pretty sure the output of the last algorithm is used as the input of the next one for X11, precisely so you can't do that.
Yes you can. If each thread is working on a different hash.
Oh, I get it. Clever.
You're not going to raise the hash to that of the slowest alg, though, because the GPU is partially occupied by the other hashes going on. However, I see no reason why that won't work.

Yes, the GPU is occupied, but on seperate and non overlapping memory blocks.

The slowest alg can be optimized...

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
PVmining
Sr. Member
****
Offline Offline

Activity: 330
Merit: 252



View Profile
July 24, 2014, 08:26:16 AM
 #18215

Just saw tsiv's parallelization of the second loop. Quite impressive.

...he's a cool guy.
Hey tsiv thanks a lot for your launch-config change for kopiemtu - that's really awesome!
sp_
Legendary
*
Offline Offline

Activity: 2954
Merit: 1087

Team Black developer


View Profile
July 24, 2014, 08:38:46 AM
 #18216

I haven't looked at TSIV's code. Isn't Cryptonite just a variation of x11 + scryptn? 20% gain is a good job. Now do another 20% Smiley

Anyway, I will start implementing some code soon. I will start with the 11 x'es. One by One.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
July 24, 2014, 08:48:45 AM
 #18217

ouch i am getting left behind, my mining rig has been off over a week and this thread just looks like a developer chatroom  Cool it is great to see so many of you all working together, who's this Christian guy that releases stuff? I've never seen him here  Wink

Owner of: cudamining.co.uk
yellowduck2
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
July 24, 2014, 09:14:47 AM
 #18218

ouch i am getting left behind, my mining rig has been off over a week and this thread just looks like a developer chatroom  Cool it is great to see so many of you all working together, who's this Christian guy that releases stuff? I've never seen him here  Wink

He is Nvidia Satoshi

Retire behind the scene
S_tring
Full Member
***
Offline Offline

Activity: 252
Merit: 102


OPEN Platform - Powering Blockchain Acceptance


View Profile
July 24, 2014, 09:16:52 AM
 #18219

Linux users might be pleased to know that the profit switching capability of ccManager is coming along nicely, too. It uses TradeMyBit for now, and I've just coded a facility to stop mining on TMB altogether if the daily profit projection is poor. In this case it switches to an alternative pool of your choice (last resort pool), or it stops mining altogether and monitors TMB for a decent profit margin before starting again.

I should have the gitHub updated with something for you to play with next week some time.

OPEN Platform | Powering Blockchain Acceptance [ICO]
❱❱❱❱❱❱❱❱❱❱❱❱❱❱❱Blockchain's First Payment API❱❱❱❱❱❱❱❱❱❱❱❱❱❱❱❱
Whitepaper  ●  Slack  ●  Facebook  ●  Twitter  ●  Reddit  ●  Telegram
yellowduck2
Hero Member
*****
Offline Offline

Activity: 868
Merit: 1000


View Profile
July 24, 2014, 09:18:02 AM
 #18220

I haven't looked at TSIV's code. Isn't Cryptonite just a variation of x11 + scryptn? 20% gain is a good job. Now do another 20% Smiley

Anyway, I will start implementing some code soon. I will start with the 11 x'es. One by One.

Do u mind me asking if u have a degree / master / phd in computer science ? U spot something that no one here understand at first. I see u have to explain in so many post before people gets it.

Are you related to Satoshi ?
Pages: « 1 ... 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 [911] 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!