Bitcoin Forum
July 04, 2024, 04:22:13 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
  Home Help Search Login Register More  
  Show Posts
Pages: « 1 2 3 4 [5] 6 7 »
81  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 11:30:10 AM


Grin

tsiv you did solomine 150 blocks from total 600 blocks till now? LOL the fastest instaminer title in the cryptoworld from djm34 should be given to you  Grin
yep I doesn't have that much  Grin
I solomined only 5950 (took me a while to get the R9 working strangely...), but blocks are still coming...
(actually there wasn't any instamine or I arrived late to that coin... and block wasn't coming that easily even when I was doing 1/2 of the net hashrate)

ok there is still 331 unaccounted blocks  Grin

I was up and running with 290 MH/s roughly 2 minutes from the wallet made available, I'd say that qualifies as insta Tongue

Sitting on 156 blocks solomined now and the diff is looking like it's time to switch to a pool.
82  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 10:14:06 AM
Code:
#define SUBCRUMB(a0,a1,a2,a3,a4)\
    asm( \
        "mov.b32    %4, %0;\n\t" \
        "or.b32     %0, %0, %1;\n\t" \
        "xor.b32    %2, %2, %3;\n\t" \
        "not.b32    %1, %1;\n\t" \
        "xor.b32    %0, %0, %3;\n\t" \
        "and.b32    %3, %3, %4;\n\t" \
        "xor.b32    %1, %1, %3;\n\t" \
        "xor.b32    %3, %3, %2;\n\t" \
        "and.b32    %2, %2, %0;\n\t" \
        "not.b32    %0, %0;\n\t" \
        "xor.b32    %2, %2, %1;\n\t" \
        "or.b32     %1, %1, %3;\n\t" \
        "xor.b32    %4, %4, %1;\n\t" \
        "xor.b32    %3, %3, %2;\n\t" \
        "and.b32    %2, %2, %1;\n\t" \
        "xor.b32    %1, %1, %0;\n\t" \
        "mov.b32    %0, %4;\n\t" \
        :: "r"(a0), "r"(a1), "r"(a2), "r"(a3), "r"(a4))

Massive +1 MH/s on 750 Ti  Cheesy

Well, at least with CUDA 5.5. No idea how that can actually be even a single bit faster than straight C, the compiler seems to do a piss poor job sometimes with simple statements like in that define.
83  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 09:57:25 AM

Mine. Butchered up a quick SPH port pre-launch and it just happened to work right out of the box. Hadn't been following the other algos and didn't realize there was a variation of the luffa512 kernel in djm's repo that handled 80 input bytes, couldn't figure out how to mod the one in x11 to do 80 instead of 64 for input so yet another ugly SPH copy&paste job it was. I expect djm's will be the faster of the two, compiling it as we speak to confirm.

Which speeds did you reaxh with 750TI ? Smiley

290 MH/s with a 6 card rig so about 48.3 MH/s each on average. Still stuck compiling djm's version, I did read it compiles slow on CUDA 5.5 but dddddddddddddaaaaaaaaaaaaaamn....

Edit: Compilation finished. No surprises there, djm's is indeed faster by roughly 17%

1 card give you 48 mh/s??? isn't supposed to be 8mh/s, you mean one rig give ou 48m?

Pretty sure you're confusing it with some other coin, DOOM is just a  single round of luffa-512 and runs at about 48-49 MH/s using a poor implementation on my 750 Ti. Djm's version bumped it up to the 56 MH/s area. Per card.
84  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 09:21:12 AM

Mine. Butchered up a quick SPH port pre-launch and it just happened to work right out of the box. Hadn't been following the other algos and didn't realize there was a variation of the luffa512 kernel in djm's repo that handled 80 input bytes, couldn't figure out how to mod the one in x11 to do 80 instead of 64 for input so yet another ugly SPH copy&paste job it was. I expect djm's will be the faster of the two, compiling it as we speak to confirm.

Which speeds did you reaxh with 750TI ? Smiley

290 MH/s with a 6 card rig so about 48.3 MH/s each on average. Still stuck compiling djm's version, I did read it compiles slow on CUDA 5.5 but dddddddddddddaaaaaaaaaaaaaamn....

Edit: Compilation finished. No surprises there, djm's is indeed faster by roughly 17%
85  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 07:50:34 AM

Mine. Butchered up a quick SPH port pre-launch and it just happened to work right out of the box. Hadn't been following the other algos and didn't realize there was a variation of the luffa512 kernel in djm's repo that handled 80 input bytes, couldn't figure out how to mod the one in x11 to do 80 instead of 64 for input so yet another ugly SPH copy&paste job it was. I expect djm's will be the faster of the two, compiling it as we speak to confirm.
86  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 07:34:15 AM
can someone help me with setting up 750ti for mineing XMR? i have it running but it seems to only do 23h/s.
340.43 beta drivers
win 8
newest CCminer done by tsiv.

i have no idea as this is my first Nvidia build.

23 H/s sounds like there's something massively wrong indeed. I can't really see how it would work but run that poorly, you could always try cayars' nvminer build.

is that version linked on the first page?

If you mean the first page of this thread... Nope, it's pretty out of date for anything. Latest nvMiner seems to be up at http://www.cudamining.cc/url/releases
87  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 07:18:02 AM
can someone help me with setting up 750ti for mineing XMR? i have it running but it seems to only do 23h/s.
340.43 beta drivers
win 8
newest CCminer done by tsiv.

i have no idea as this is my first Nvidia build.

23 H/s sounds like there's something massively wrong indeed. I can't really see how it would work but run that poorly, you could always try cayars' nvminer build.
88  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 29, 2014, 07:13:54 AM


Grin
89  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 26, 2014, 09:34:39 AM
750Ti @ 1300/1500 gives ~ 270

mine are at 1350(stock) 1450(+200) and only 235h per card...

using this https://github.com/tsiv/ccminer-cryptonight
Palit DualX (1300/1500 stock) makes 270 using
-l 8x40 switch

still 235, are you using beta drivers? i'm at 337.88

8x60 seems to give the best hash rates on a 750 Ti. If you're on Windows you're also slightly limited by the default bfactor 6, you could try lowering it and the default bsleep of 100 microseconds but your interactivity will take a hit.

My rig has 6x Palit GeForce GTX 750 Ti StormX Dual, each hitting about 280 H/s at the default factory overclock using 8x60. The piece of shit Asus 750 Ti DC2OCwhatever on my win machine does like 244. Marvelous piece of engineering by Asus, take 6 GHz memory chips and fuck something up that they can't run beyond 5.4 GHz.

8x60 is better yes i'm getting 240 now or slightly more, for bfactor and bsleep how much i can lower them?

is nvidia like amd with hynix and elpidia? could explain why some card are better than others


p.s. tried 6 and 66 only 5h more 245h per card

You can probably get away with bfactor 1, maybe even 0. If you're not concerned about interactivity I'm pretty sure you can run bsleep 0. The way it works is described in the help text (and maybe readme, can't remember) but here's the quick recap:

The bfactor option controls how much of the biggest main loop is done in a single kernel launch. That particular kernel gets split into 2^bfactor pieces. At bfactor 0 (default on Linux because you don't need to worry about your OS being a dick and resetting the display driver because your CUDA kernel is taking too long to run) you get 2^0 = 1 meaning doing the whole thing in a single launch. At 1 you get 2^1 = 2 parts and at the Windows default 6 it gets split into 2^6 = 64 parts. 6 seems to be a reasonable balance between interactivity and performance. Of course if you're running more than one card you only need to worry about interactivity on your primary display GPU. So you could run --bfactor 6,1 which would make the primary GPU run the 64 part split and be fairly interactive while the rest of your cards would run a 2 part split and spend a little more time working instead of sleeping.

After typing that I just realized how insignificant the effect of the interactivity hack is after all Tongue It's 0.0064 seconds added to the roughly 1.5 second run time of the second loop, not really a big deal. Well, there's probably some overhead for each kernel launch but it's still not much.

And then we have --bsleep, the amount of time to wait before launching the kernel for the next part of the split. Windows defaults of bfactor 6 and bsleep 100 leaves you with 64 parts with 100 microseconds doing nothing after each part, for a total of 0.0064 seconds spent sleeping instead of working. Well, there's probably some overhead involved with each kernel launch too. Might be just the fact that Windows is doing stuff on the GPU that's slowing it down.

What a useless post.

PS. The temperature in my apartment just hit 35C, fuck my balls and fuck trying to think straight in this shit Grin
90  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 26, 2014, 09:09:09 AM
750Ti @ 1300/1500 gives ~ 270

mine are at 1350(stock) 1450(+200) and only 235h per card...

using this https://github.com/tsiv/ccminer-cryptonight
Palit DualX (1300/1500 stock) makes 270 using
-l 8x40 switch

still 235, are you using beta drivers? i'm at 337.88

8x60 seems to give the best hash rates on a 750 Ti. If you're on Windows you're also slightly limited by the default bfactor 6, you could try lowering it and the default bsleep of 100 microseconds but your interactivity will take a hit.

My rig has 6x Palit GeForce GTX 750 Ti StormX Dual, each hitting about 280 H/s at the default factory overclock using 8x60. The piece of shit Asus 750 Ti DC2OCwhatever on my win machine does like 244. Marvelous piece of engineering by Asus, take 6 GHz memory chips and fuck something up that they can't run beyond 5.4 GHz.

tsiv, you may want to pull my keccak implementation - getting a lot of reports it works better on Kepler.

da70729, fd9d114 and f26efdb, right? Pulled them into my local repo, building right now and will push to my git repo if everything seems fine. As far as I can tell, the last commit doesn't really do anything useful though. Apparently __CUDA_ARCH__ is only defined when nvcc (or cudafe++, whatever) is processing device code and the define (like pretty much every single arch based define in ccminer) is in the host code, meaning you'll always hit the else branch on #if __CUDA_ARCH__ >= something. I might be wrong on this, haven't had the time to see what actually goes in the final product but on the other hand I've never seen any improvement on these compute based specific macros either which kind of supports the fact that they tend to default to the "ah well, we'll use the crappy default thingy then" branch.

I think my last commit was making the scratchpad pointer restricted - seemed to provide a small boost even on Maxwell.

EDIT: Also, any thoughts on the memory allocation bug?

I prefer to think of it more as a feature Cheesy

Yea, I know it's a bitch. One of last ideas I've got left for the first and third main loops involves rearranging the scratchpad layout for better access patterns instead of 2MB strides between hashes and at first glance I think having separate scratchpads would fuck that up. If that doesn't pan out I'll probably turn the single 2MB*hashcount allocation into hashcount 2MB allocations like you did, maybe make it an option. I tend to forget how much the single alloc can mess with stuff because I'm running on a headless rig that simply doesn't use the video memory for anything and let's me pretty much use the full 2GB.
91  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 26, 2014, 08:57:07 AM
Pulled Wolf's keccak code, not seeing any difference on Maxwell but if it helps Kepler/Fermi, I'm all for it.

Win32 binary at https://github.com/tsiv/ccminer-cryptonight/releases/tag/v0.16 and source obviously at https://github.com/tsiv/ccminer-cryptonight/
92  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 25, 2014, 04:39:40 PM

mine are at 1350(stock) 1450(+200) and only 235h per card...
using this https://github.com/tsiv/ccminer-cryptonight


linux seems to be faster...
OC cards make about 280-290khash/s here.
nonOC 240-250khash/s


Running on linux, and I get 230H/s @1350Mhz core/stock memory 750 ti.
But I have a low end CPU (cheap haswell @50$) , that probably the point ?

model name      : Intel(R) Celeron(R) CPU G1820 @ 2.70GHz

Highly doubt that your CPU is holding you back. I'd say it's the memory clock you need to worry about.
93  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 25, 2014, 04:37:08 PM
750Ti @ 1300/1500 gives ~ 270

mine are at 1350(stock) 1450(+200) and only 235h per card...

using this https://github.com/tsiv/ccminer-cryptonight
Palit DualX (1300/1500 stock) makes 270 using
-l 8x40 switch

still 235, are you using beta drivers? i'm at 337.88

8x60 seems to give the best hash rates on a 750 Ti. If you're on Windows you're also slightly limited by the default bfactor 6, you could try lowering it and the default bsleep of 100 microseconds but your interactivity will take a hit.

My rig has 6x Palit GeForce GTX 750 Ti StormX Dual, each hitting about 280 H/s at the default factory overclock using 8x60. The piece of shit Asus 750 Ti DC2OCwhatever on my win machine does like 244. Marvelous piece of engineering by Asus, take 6 GHz memory chips and fuck something up that they can't run beyond 5.4 GHz.

tsiv, you may want to pull my keccak implementation - getting a lot of reports it works better on Kepler.

da70729, fd9d114 and f26efdb, right? Pulled them into my local repo, building right now and will push to my git repo if everything seems fine. As far as I can tell, the last commit doesn't really do anything useful though. Apparently __CUDA_ARCH__ is only defined when nvcc (or cudafe++, whatever) is processing device code and the define (like pretty much every single arch based define in ccminer) is in the host code, meaning you'll always hit the else branch on #if __CUDA_ARCH__ >= something. I might be wrong on this, haven't had the time to see what actually goes in the final product but on the other hand I've never seen any improvement on these compute based specific macros either which kind of supports the fact that they tend to default to the "ah well, we'll use the crappy default thingy then" branch.
94  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 25, 2014, 02:49:59 PM
750Ti @ 1300/1500 gives ~ 270

mine are at 1350(stock) 1450(+200) and only 235h per card...

using this https://github.com/tsiv/ccminer-cryptonight
Palit DualX (1300/1500 stock) makes 270 using
-l 8x40 switch

still 235, are you using beta drivers? i'm at 337.88

8x60 seems to give the best hash rates on a 750 Ti. If you're on Windows you're also slightly limited by the default bfactor 6, you could try lowering it and the default bsleep of 100 microseconds but your interactivity will take a hit.

My rig has 6x Palit GeForce GTX 750 Ti StormX Dual, each hitting about 280 H/s at the default factory overclock using 8x60. The piece of shit Asus 750 Ti DC2OCwhatever on my win machine does like 244. Marvelous piece of engineering by Asus, take 6 GHz memory chips and fuck something up that they can't run beyond 5.4 GHz.
95  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 24, 2014, 05:07:13 AM
Something I pretty much suspected but never bothered to check up on, run times for the various parts of the hash. Well, actually I did benchmark the core loops earlier and found the second one to be the biggest hog. Throw in the numbers for the prep and final phases and you get this:

Prepare: 0.001388 sec
Phase 1: 0.148383 sec
Phase 2: 1.414880 sec
Phase 3: 0.147834 sec
Final: 0.003590 sec

That's 32x15 hashes on a GTX 750 Ti. Can't tell how it works out on other cards since all I've got is a bunch of 750 Tis, but in this case optimizing the living fuck out of the prep and final parts all the way to instant completion with zero run time would bump up the total hashrate by 0.3%. Don't get me wrong, Wolf's doing nice work on unfucking stuff I pretty much just yanked out of cpuminer-multi and left as is. I just prefer to focus on shit that matters, again, no offense intended. Too bad I'm not even making a dent on that goddamn clusterfuck that is the second main loop Grin
96  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 24, 2014, 01:51:38 AM
Note to self: __CUDA_ARCH__ is a fickle bitch.

I think I got the damn thing to use the new 4-way version of the phase 2 kernel for compute 3.0+ and the old one for 2.0. Since __CUDA_ARCH__ is apparently not defined when compiling the host code I didn't see much choice but to fire up the kernel with four threads per hash even if it's the single thread per hash compute 2.0 version. Dealt with it by making the single thread kernel do work only on the first of the four subthreads. Not very happy with it but it doesn't seem to matter that much performance-wise.

Bottom line: Fuck all difference on Maxwell, apparently some other compute 3.0+ cards like the new 4-way kernel and gain some performance, compute 2.0 should work like before.

I'll look into pulling some of Wolf's mods, also got some ideas for the phase 1&3 kernels but we'll see.

Win32 binary at https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15/ccminer-cryptonight_20140724.zip
97  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 23, 2014, 02:17:03 PM
tsiv,

Wouldn't you want to have the block size a multiple of 32?  Ie 32,64,96,128

Ye, full warps do sound tasty. We're starting to get there too. The launch config isn't exactly about threads per block anymore, the kernels are starting to use more than one thread per hash and the launch config is actually hashes per block and blocks per grid. For example the kernels I modified earlier are now running eight threads per hash, so they're actually already at full warp size at four hashes per block. The latest experimental build takes the slowest kernel that is running only a single thread per hash on the latest committed source and spreads it out between four threads per hash. Again, full warp at eight hashes per block while four hashes per block remains kinda iffy.
98  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 23, 2014, 12:19:07 PM
At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets.

https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip

Improved hashrate of about 70H/s on a 780ti. Up from 320 to about 390 (using 8x60). Also doesn't seem to hang and bring the system to it's knees when using all GFX cards.

Seems to be in line with the ~18% improvements I saw when benchmarking only the AES part of the kernel. Have you tried other configs? 390 is still pretty low for a 780 Ti, I think people were getting best results with 4x120 on the 780 Ti.
99  Alternate cryptocurrencies / Bounties (Altcoins) / Re: Bounty for Open-Sourced XMR/Cryptonight GPU Miner Bounties Thread on: July 23, 2014, 05:37:34 AM
As of today I'll be considering the bounty for the Nvidia GPU miner to be fulfilled by Tsiv. It seems as if it's in widespread use and there are few problems. Thank you very much for your effort!

Tsiv: The bounty addresses below will be used. It seems that the BTC part (.2 BTC) has already been paid by HardwarePal.

XMR: 42uasNqYPnSaG3TwRtTeVbQ4aRY3n9jY6VXX3mfgerWt4ohDQLVaBPv3cYGKDXasTUVuLvhxetcuS16 ynt85czQ48mbSrWX
BTC: 1JHDKp59t1RhHFXsTw2UQpR3F9BBz3R3cs


There will be no collection, and everyone who has pledged should just send to those addresses and then post the tx info here if you'd like.

-updated by kbm

Just popping in to confirm having received the BTC part from HardwarePal a while back and the 150 and 300 XMR from kbm and smooth respectively. Cheers, gents.
100  Alternate cryptocurrencies / Mining (Altcoins) / Re: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX] on: July 23, 2014, 05:27:44 AM
Welp. Managed to split the most offensive part of the kernel into four parallel threads per hash, result is spectacularly unimpressive. The best I've come up with breaks even with the current single thread per hash implementation. Well, almost. It's actually a percent slower AND loses compute 2.0 compatibility due to using shuffle. On the other hands it performs a lot more reasonably with various launch configurations, 15 blocks of 32 threads works our equally well as the original 8x60 magic bullet for 750 Ti.

At this point I'm starting to think I'll just forget about that part and start looking if there's something else to be improved. I'm still curious as to how it runs on other hardware, so if a couple of gents on Win boxes with something else than a 750 Ti in would be willing to take it for a spin, I'd appreciate it. I've added the number for SMX/SMM/Whateverthingmabobs into the miner thread start-up info, you'll probably find your card performing best when the block count is a multiple of the SMX count and the number of threads a power of 2. 4/8/16/32/64 are the best bets.

https://github.com/tsiv/ccminer-cryptonight/releases/download/v0.15-rc1/ccminer-cryptonight_20140723_exp.zip
Pages: « 1 2 3 4 [5] 6 7 »
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!