Bitcoin Forum
February 19, 2019, 10:25:10 PM *
News: Latest Bitcoin Core release: 0.17.1 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 [102] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 ... 1136 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3403448 times)
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 06, 2014, 02:12:51 AM
Last edit: January 06, 2014, 02:36:53 AM by cbuchner1
 #2021


EDIT: I was trying to use the tesla kernel. It doesn't appear to like Jane. I can squeeze out 2.3 KH/s on my gtx 780 using K9x2. Going to keep tinkering.

2.4 kHash with K4x4 on GTX 660Ti, I also use -C 1

Tesla, Fermi and Legacy Kernels don't do scrypt-jane yet.

"GPU #0: Given launch config 'K4x4' exceeds limits for 1D cache." No warnings about -C 2, but the results don't validate.

6x3 and 9x2 with -C 0 give me the best results at ~2.3 and ~2.35 kh/s. Unfortunately my machine becomes annoyingly slow with any scrypt-jane config as it stands. With K9x2 I have 40MB of free gpu memory, with K1x1 over 2GB are free, yet my computer still approaches unusable, and as bathrobehero said, -i 1 doesn't seem to help either.

I'm thinking of picking up a 4GB GT 630 for $30 to play around with. It seems to match your criteria for being efficient. 96 shaders, 128 bit bus, lots of ram. I'm curious what kind of hash rate it can pull, but mostly I want to play around with scrypt-jane without crippling my dev box.

On another note (regarding regular scrypt), I've just been using the 12-18-2013 commit from github until yesterday... but with the changes from the 20th (autotune up to 32 warps) I get another 20 kh/s with T15x32. I played around with the Kepler kernel and noticed -C 1 adds nearly 50 kh/s. Is the texture cache a possibility for Tesla kernels in the future?

cbuchner1, you are a beast!
I concur. I will have to find some LTC to send your way to show my appreciation.

4GB GT 630 for $30 : wow! good price. I have yet to enable the Fermi kernel for scrypt-jane though.

Tesla kernels don't need to explicitly enable a texture for cached reading, as they automatically pull their data through this cache (look up what the __ldg intrinsics do in the latest CUDA programming guide)

I might try to figure out a way to chop the scrypt-jane kernels into a series of smaller kernel launches, which may make make it less taxing on the display and also allowing the use of interactive mode again.

Titan kernels are now scrypt-jane enabled! I get 3.2 kHash/s on GTX 780Ti using -l T7x3 now. And power use is cut in half compared to LTC mining. What a pity the 780Ti doesn't have 6 Gigs of RAM, or I could use -l T14x3, doubling the speed. Someone should try this launch config with the 6 GB Geforce Titan models though. Could yield some 6 kHash/s.

I also have a crazy idea that would basically remove the memory limitations for scrypt-jane mining. It requires joining the A and B kernels into a single kernel again and re-using the scratchpad memory on the GPU. So instead of giving each thread a unique 4 MB scratchpad, we may be able to reuse the same scratchpad memory for all non-concurrently executed thread blocks. I think this is a similar concept that the "intensity" parameter on the ATI cards is controlling when running cgminer. Unfortunately this idea might be incompatible with the texture cache, as this cache does not guarantee read/write coherency within a single kernel invocation. But hey, it could get my 780Ti's to 6 kHash/s...maybe.
EDIT: okay, I made a mistake in my thoughts here. with so few thread blocks running on the GPU, ALL of them would be executing concurrently. And hence the memory reuse concept falls flat.

Christian




1550615110
Hero Member
*
Offline Offline

Posts: 1550615110

View Profile Personal Message (Offline)

Ignore
1550615110
Reply with quote  #2

1550615110
Report to moderator
1550615110
Hero Member
*
Offline Offline

Posts: 1550615110

View Profile Personal Message (Offline)

Ignore
1550615110
Reply with quote  #2

1550615110
Report to moderator
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1550615110
Hero Member
*
Offline Offline

Posts: 1550615110

View Profile Personal Message (Offline)

Ignore
1550615110
Reply with quote  #2

1550615110
Report to moderator
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 06, 2014, 02:15:45 AM
 #2022

Ahh ok. If I can figure out how to read the benchmarks and get my gpu crunching at the 880khash/s the benchmark says, I will gladly donate some litecoins

I also got a few suspicously high readings on autotune, but they never quite materialized in real world hashing. Autotune is a bitch.


cdogster
Newbie
*
Offline Offline

Activity: 46
Merit: 0


View Profile
January 06, 2014, 02:19:46 AM
 #2023

3.2 kHash/s on GTX 780Ti

3.2 kHash/s?  Is that a typo?  Or perhaps that's the increase you're seeing now?  Sorry, I'm confused.
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 06, 2014, 02:22:35 AM
 #2024

3.2 kHash/s on GTX 780Ti

3.2 kHash/s?  Is that a typo?  Or perhaps that's the increase you're seeing now?  Sorry, I'm confused.

scrypt-jane. that is a STELLAR value (considering the memory limitations of the card - it cannot run enough threads to fully occupy all the multiprocessors because each thread requires 4MB of RAM on the card).

With the Kepler kernel I was getting only 2.5 kHash/s at -l K7x3. By the way, my GTX 660Ti got 2.5 kHash/s too.  High end cards have too many shaders and not enough RAM for scrypt-jane.


bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 06, 2014, 02:25:50 AM
 #2025

So im best to ignore the high hash rate benchmarks?
For now I have a random value set because of auto tune messing up so badly.

I tried T11x13 on my 780. This is said to give 880. Gives around 440, yet I am able to get 503 from another config but I can not figure out which to use as the benchmark values are so off

Owner of: cudamining.co.uk
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 06, 2014, 02:26:39 AM
 #2026

So im best to ignore the high hash rate benchmarks?
For now I have a random value set because of auto tune messing up so badly.

I tried T11x13 on my 780. This is said to give 880. Gives around 440, yet I am able to get 503 from another config but I can not figure out which to use as the benchmark values are so off

-l T12x32 should be best on a 780

Christian
bathrobehero
Legendary
*
Offline Offline

Activity: 1652
Merit: 1026


ICO? Not even once.


View Profile
January 06, 2014, 02:30:07 AM
 #2027

So im best to ignore the high hash rate benchmarks?
For now I have a random value set because of auto tune messing up so badly.

I tried T11x13 on my 780. This is said to give 880. Gives around 440, yet I am able to get 503 from another config but I can not figure out which to use as the benchmark values are so off

I guess you have overclocks. Autotune doesn't seem to like that.
Anyway, cudaminer by default uses scrypt, that's what you're using, scrypt-jane is something else.

I just mined my first Yacoin block SOLO. One 660Ti plus 2 GT 640 cards add up to 4.5 kHash/s, which is significant hashing power for Yacoin (the whole Yacoin network is around 1000-1500 kHash/s only, with blocks being generated once per minute.)

Some scrypt-jane hashrates for comparison:
i5 3570k - 0.41 kH/s
i7-3930K - 1.20 kH/s
FX-8350 - 0.57 kH/s

5770 - 1.0 kH/s
7950 - 1.3-1.5 kH/s
R7 250 - 1.44 kH/s

RIP Bittrex
RIP Poloniex
see360
Member
**
Offline Offline

Activity: 91
Merit: 10


View Profile
January 06, 2014, 02:30:40 AM
 #2028

So eager to try the latest additions that I'm almost ready to learn how to compile code from git, but alas the last compiler I used was java 10 years ago. I volunteer to be a beta tester if you want to build what you have now :-)
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 06, 2014, 02:32:07 AM
 #2029

I was just thinking, for the scrypt-jane. Would there be a way to force some system memory to be used? Yes it is much slower then gpu memory, but if you can give it 16gb, it can do over 5 times that of 3gb gpu at around a 3rd of the speed. So in theory doing more for the time

Owner of: cudamining.co.uk
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 06, 2014, 02:32:20 AM
 #2030

5770 - 1.0 kH/s
7950 - 1.3-1.5 kH/s
R7 250 - 1.44 kH/s

Watch my taillights, puny Radeons! ;-)

I already have 1% of the entire Yacoin network's hashing power with my few nVidia GPUs. Granted, it is not the most popular coin around.

cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 500


View Profile
January 06, 2014, 02:34:42 AM
 #2031

So eager to try the latest additions that I'm almost ready to learn how to compile code from git, but alas the last compiler I used was java 10 years ago. I volunteer to be a beta tester if you want to build what you have now :-)

Compilation from git is relatively painless on Linux. There should be some guides around on the Internet, because some people have been doing this on rented amazon EC2 instances previously.
see360
Member
**
Offline Offline

Activity: 91
Merit: 10


View Profile
January 06, 2014, 02:43:12 AM
 #2032

So eager to try the latest additions that I'm almost ready to learn how to compile code from git, but alas the last compiler I used was java 10 years ago. I volunteer to be a beta tester if you want to build what you have now :-)

Compilation from git is relatively painless on Linux. There should be some guides around on the Internet, because some people have been doing this on rented amazon EC2 instances previously.

Thanks for the suggestion, and I do run Mint from time to time, but I'll patiently wait for your next creation. It looked like there is even a way https://github.com/blog/1127-github-for-windows in Windows without VS, but I better stick to my day job.

bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 06, 2014, 02:47:11 AM
 #2033

I just had a quick look and gddr5 is twice as fast as system ddr3.

But ddr3 is better for smaller files. 64mb or smaller. With jane being 8mb ddr3 system memory would be ideal. Then a memory war would start. If 3gb of gpu memory can do 3.2kh, 12gb of system should do double.

3gb gpu is rougly 6gb system.
Those systems with 32gb should be able to pass 26gb of system memory over. That should be the same speed as 13gb of gpu memory. Might be worth a look if it can be done. Im no software developer so sorry if this is all waffle

Owner of: cudamining.co.uk
bathrobehero
Legendary
*
Offline Offline

Activity: 1652
Merit: 1026


ICO? Not even once.


View Profile
January 06, 2014, 03:07:05 AM
 #2034

One of the best things with this scrypt-jane algorithm is that it uses my GPU in a very specific way, it is far, far lighter on the card then scrypt so I can overclock better.
With scrypt at 110% TDP, 62°C and a maximum stable core clock of 1201 Mhz, I got 220 kH/s, while
with scrypt-jane at 73% TDP, 54°C and a (currently stable, but still climbing) 1319 Mhz core clock I have 2.30 kH/s.

RIP Bittrex
RIP Poloniex
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 06, 2014, 03:10:20 AM
 #2035

So it uses much less power? I had a look and I need 150 yacoins to get the same value as 3 days of doge mining. How much are we all mining in a day at the moment?

Owner of: cudamining.co.uk
bathrobehero
Legendary
*
Offline Offline

Activity: 1652
Merit: 1026


ICO? Not even once.


View Profile
January 06, 2014, 03:13:50 AM
 #2036

I think with my hashrate I could get 150 in 24 hours, although I'm not sure. The pool I'm on is rubbish (43% valid hashrate  Shocked) while I have 100%, but I can't find a decent pool I can connect to.

Edit: @cbuchner1, have you tried increasing -s? The default 5 seems to cut back a bit on my hashrate, and increasing it to 20-30 made a difference

RIP Bittrex
RIP Poloniex
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 06, 2014, 03:16:51 AM
 #2037

My hashrate should be similar to that of the 780ti as its tue same rough performance and memory. So 3.2kh/s plus I could stick my cpu working if power is less on the gpu

If I can find a pool I will try it. Maybe cbuchner1 has a better pool as he is testing his hash rates I believe?

Just found yac.coinmine.pl
Seems to have some good reviews and 0% invalid. Do I need to do anything to the last official zip file to mine them?

Do I need to compile from github or just use what I have now? And what settings would I need to set? Worth a try to see how many I can get this early on in the coins life

Owner of: cudamining.co.uk
bathrobehero
Legendary
*
Offline Offline

Activity: 1652
Merit: 1026


ICO? Not even once.


View Profile
January 06, 2014, 03:27:51 AM
 #2038

I can't connect to any coinmine.pl pools for some reason.
For now only the github version has scrypt-jane.

RIP Bittrex
RIP Poloniex
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
January 06, 2014, 03:31:44 AM
 #2039

I will have to try it. I have a second mining rig being connected tomorrow which will double my production so my 780 could get these and save me some power. As the new rig I wont be paying any power on.

If anyone can pm me with instructions on setting up scrypt-jane for my 780 and some good pool suggestions it would be great  and I can give it a try tomorrow once my second machine is running. As long as I can make 0.01btc in 3 days or so I am happy to mine it and be a tester for the code in cudaminer

Owner of: cudamining.co.uk
coercion
Newbie
*
Offline Offline

Activity: 34
Merit: 0


View Profile
January 06, 2014, 05:42:27 AM
Last edit: January 06, 2014, 10:42:59 AM by coercion
 #2040

4GB GT 630 for $30 : wow! good price. I have yet to enable the Fermi kernel for scrypt-jane though.

Tesla kernels don't need to explicitly enable a texture for cached reading, as they automatically pull their data through this cache (look up what the __ldg intrinsics do in the latest CUDA programming guide)

I might try to figure out a way to chop the scrypt-jane kernels into a series of smaller kernel launches, which may make make it less taxing on the display and also allowing the use of interactive mode again.

Titan kernels are now scrypt-jane enabled! I get 3.2 kHash/s on GTX 780Ti using -l T7x3 now. And power use is cut in half compared to LTC mining. What a pity the 780Ti doesn't have 6 Gigs of RAM, or I could use -l T14x3, doubling the speed. Someone should try this launch config with the 6 GB Geforce Titan models though. Could yield some 6 kHash/s.

I also have a crazy idea that would basically remove the memory limitations for scrypt-jane mining. It requires joining the A and B kernels into a single kernel again and re-using the scratchpad memory on the GPU. So instead of giving each thread a unique 4 MB scratchpad, we may be able to reuse the same scratchpad memory for all non-concurrently executed thread blocks. I think this is a similar concept that the "intensity" parameter on the ATI cards is controlling when running cgminer. Unfortunately this idea might be incompatible with the texture cache, as this cache does not guarantee read/write coherency within a single kernel invocation. But hey, it could get my 780Ti's to 6 kHash/s...maybe.
EDIT: okay, I made a mistake in my thoughts here. with so few thread blocks running on the GPU, ALL of them would be executing concurrently. And hence the memory reuse concept falls flat.

Christian
Well this is pretty awesome. I do have a question though.

What is the significance of values that do show up in autotune vs those that don't? Auto-tune selected T7x2 for my 780.  I had been using K9x2 previously, but T9x2 was blank in autotune. I gave it a shot anyway and it worked. So I tried a few others. Previously, K10x2 gave me memory warnings and wouldn't verify on the CPU. T10x2 now gives me 3.26 kHash and T21x1 gives me 3.4 and I'm getting shares accepted like nobody's business!

According to wikipedia, some 630s have kepler cores in them, I'm not sure if the particular one I'm after does or not, but I'm trying to find out. Either way, I can't pass it up for $30.

Thanks for the info about Titan kernels. I'm not current with CUDA at all. I wrote a very poor automated satisfiability theorem prover with CUDA in 2009, and I've forgotten most it since. I've spent most of the day looking through CUDA code though, when I ought to be working on my own completely unrelated code. Perhaps one of these days I'll be caught up enough to contribute.

Regarding your memory brainstorming... how much overhead is involved with creating thread blocks, and is there any concept of synchronization, i.e. locking, between thread blocks? I'm guessing no, but on the off chance that there is, can you queue up some blocks and have them wait on the currently executing blocks before they start using the scratch pad?

I will have to try it. I have a second mining rig being connected tomorrow which will double my production so my 780 could get these and save me some power. As the new rig I wont be paying any power on.

If anyone can pm me with instructions on setting up scrypt-jane for my 780 and some good pool suggestions it would be great  and I can give it a try tomorrow once my second machine is running. As long as I can make 0.01btc in 3 days or so I am happy to mine it and be a tester for the code in cudaminer
With my GTX 780 I'm currently on track to make 0.014 BTC in 24 hours. You need the latest commit from git, then just pass the argument "--algo=scrypt-jane" As for compiling? I don't know much about the windows world, but on linux, or os x you just run ./autogen.sh; ./configure; make; If that doesn't work, read the error messages and try to deduce what your missing.

That reminds me, I always have to modify a few includes when I compile on OS X, perhaps I'll submit a pull request with a few #ifdefs.
Pages: « 1 ... 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 [102] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 ... 1136 »
  Print  
 
Jump to:  

Bitcointalk.org is not available or authorized for sale. Do not believe any fake listings.
Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!