Bitcoin Forum
May 12, 2024, 04:06:33 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
Author Topic: [ANN][YAC] Yacminer GPU miner for Yacoin  (Read 57470 times)
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 12:57:07 PM
 #41

i tryed mining for a while (about 18 hours) didnt found any blocks with about 283kh/s per card. am i doing something wrong?

settings i use with solomining and Yacminer 3.3.1 x64

yacminer --scrypt -o 127.0.0.1:8112 -u yacoin -p x -I 12 -w 256 --thread-concurrency 8192 -g 2

If your HW error count isn't going up, your miner should be doing just fine. It's just a matter of the difficulty being high and being unlucky in general.
"Bitcoin: the cutting edge of begging technology." -- Giraffe.BTC
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715486793
Hero Member
*
Offline Offline

Posts: 1715486793

View Profile Personal Message (Offline)

Ignore
1715486793
Reply with quote  #2

1715486793
Report to moderator
1715486793
Hero Member
*
Offline Offline

Posts: 1715486793

View Profile Personal Message (Offline)

Ignore
1715486793
Reply with quote  #2

1715486793
Report to moderator
1715486793
Hero Member
*
Offline Offline

Posts: 1715486793

View Profile Personal Message (Offline)

Ignore
1715486793
Reply with quote  #2

1715486793
Report to moderator
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 01:06:13 PM
 #42

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:
Code:
--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:
Code:
x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.
paulthetafy
Hero Member
*****
Offline Offline

Activity: 820
Merit: 1000


View Profile
July 08, 2013, 01:56:45 PM
 #43

Does anyone know when N last changed?
Sun 07 Jul 2013 21:54:40

http://yacexplorer.tk/graphs.htm
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 02:50:04 PM
 #44

Well, let's re-iterate this once more. GPU mining is not going to magically stop at N = 8192. My miner should survive it just fine. I'm not going to benchmark it because people always keep finding new settings after N changes. Some people will use a higher lookup gap and others will find ways to use lower values. I might come up with some new tricks but at the moment I'm pretty happy how the code scales with the lookup gap.

Hi Mikael. Firstly, thanks for your releases. Could you make a prediction about the long term viability of GPU mining? Yacoin was originally designed to be a currency that could be mined with CPU efficiently. Is that a busted flush now? Or will continued increments in N lead to a situation where CPU miners compete with, or even outmine GPU for equivalent energy input? I'm sure it's difficult to say for certain, but your guess is likely to be more educated than others.

Well, my current theory is that N = 8192 will still be doable with lookup gap = 2 on my HD 7790. After that I will probably need to increase lookup gap when N hits 16384 near the end of September.

My HD 7790 has 896 cores and 1 GB of memory. I can allocate about 768 MB of that (which corresponds to thread concurrency = 12000 with lookup gap = 2). 896 is the absolute minimum value of TC needed to sustain 896 cores at N = 1024. The effective thread concurrency is divided by 2 for each Nfactor increase after N = 1024. At N = 8192, the lower bound of TC is 8 * 896 = 7168. My GPU still has enough memory for that but not after the next increase of N.

The exact technical details are fairly complicated so I'm not going to try to explain them here. I'm not even sure about all the details myself. Please note using the optimal value for TC is the key to being able to use high intensities with lookup gap = 2.

I will try to do some benchmarks and post some numbers to support my theory. That's going to take a while though.
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 03:23:57 PM
 #45

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:
Code:
--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:
Code:
x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

If someone wants to help test the above formula, I calculated the suggested TC values for the HD 7000 series:
Code:
Model               | Shaders | Compute Units | Suggested TC
--------------------+---------+---------------+--------------
HD 7750             |     512 |             8 |         3840
HD 7770 GHz Edition |     640 |            10 |         4864
HD 7790             |     896 |            14 |         6912
HD 7850             |    1024 |            16 |         7936
HD 7870 GHz Edition |    1280 |            20 |         9984
HD 7870 XT          |    1536 |            24 |        12032
HD 7950             |    1792 |            28 |        14080
HD 7970             |    2048 |            32 |        16128

Use lookup gap = 2 and grab the thread concurrency from the table. Try increasing intensity as high as possible. You can ignore a few HW errors but if you start getting a lot of them, then it's not working.
gyverlb
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
July 08, 2013, 03:42:54 PM
 #46

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:
Code:
--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:
Code:
x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

It doesn't work for me with 7950 and 7970, with your formula I get more hashrate in yacminer but lots of HW errors (several times the accepted shares) and less than half of the effective hashrate I can get with optimal settings as reported by my p2pool node.

I have to use 16x the number of shaders to avoid HW errors on 79x0 and get the best result. I can use higher numbers but it doesn't improve the effective hashrate.

As I mine on p2pool I had to reduce my intensity to 16 from 17 to cope with longer processing times. I noticed that gpu-memdiff=60 is the sweet spot too now (-150 was optimal with previous N).

P2pool tuning guide
Trade BTC for €/$ at bitcoin.de (referral), it's cheaper and faster (acts as escrow and lets the buyers do bank transfers).
Tip: 17bdPfKXXvr7zETKRkPG14dEjfgBt5k2dd
eule
Hero Member
*****
Offline Offline

Activity: 756
Merit: 501


View Profile
July 08, 2013, 03:53:54 PM
 #47

yacminer.exe --scrypt -w 64 -I 10 --thread-concurrency 5120
Using that for my 5770, get 23-24Kh/s. Can't go higher than I 10, thread concurrency seems to be at the sweet spot, got 20kH around TC 8000 and 4000. Switching to YBC for now

mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 03:57:35 PM
 #48

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:
Code:
--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:
Code:
x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

It doesn't work for me with 7950 and 7970, with your formula I get more hashrate in yacminer but lots of HW errors (several times the accepted shares) and less than half of the effective hashrate I can get with optimal settings as reported by my p2pool node.

I have to use 16x the number of shaders to avoid HW errors on 79x0 and get the best result. I can use higher numbers but it doesn't improve the effective hashrate.

As I mine on p2pool I had to reduce my intensity to 16 from 17 to cope with longer processing times. I noticed that gpu-memdiff=60 is the sweet spot too now (-150 was optimal with previous N).

Alright, thanks for testing. I don't have a 7970, so I'm just guessing here.

Just to double check, can you tell me the difficulty on the p2pool? The HW error numbers are a bit difficult to interpret because they are difficulty-1 shares. You need to multiply your accepted shares with the difficulty reported by the miner before you can compare it with HW errors.
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 04:06:11 PM
 #49

yacminer.exe --scrypt -w 64 -I 10 --thread-concurrency 5120
Using that for my 5770, get 23-24Kh/s. Can't go higher than I 10, thread concurrency seems to be at the sweet spot, got 20kH around TC 8000 and 4000. Switching to YBC for now

The HD 5770 has 10 compute units. You can try adding/subtracting values like 64, 128, 256 to/from 5120. I'm not sure how exactly it works for HD 5000 series.
gyverlb
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
July 08, 2013, 04:11:32 PM
 #50

I have discovered a new sweet spot for thread concurrency. After setting it to 6912 (which is 14 * 512 - 256), I'm getting 61 kh/s on a 7790 (instead of 59.3 kh/s with a TC of 8000). My full settings are:
Code:
--scrypt -w 128 --lookup-gap 2 -I 17 --thread-concurrency 6912 -g 1

For other models in the HD 7000 series, I suggest the following formula for thread concurrrency with high intensities:
Code:
x * 512 - 256

x should be the number of compute units if possible. That's 14 for my HD 7790. For the HD 7000 series, the number of compute units should be the number of shaders divided by 64. You can also try other values.

Note that these values will need to be adjusted when N changes because thread concurrency is automatically adjusted by the kernel. Right now it's being divided by 4, so setting TC to 6912 means that the real TC is 1728.

It doesn't work for me with 7950 and 7970, with your formula I get more hashrate in yacminer but lots of HW errors (several times the accepted shares) and less than half of the effective hashrate I can get with optimal settings as reported by my p2pool node.

I have to use 16x the number of shaders to avoid HW errors on 79x0 and get the best result. I can use higher numbers but it doesn't improve the effective hashrate.

As I mine on p2pool I had to reduce my intensity to 16 from 17 to cope with longer processing times. I noticed that gpu-memdiff=60 is the sweet spot too now (-150 was optimal with previous N).

On 5x70 the situation is noticeably different on my hardware, I need to have more than 20000 shaders to avoid hardware errors and low effective hashrate (20000 with lookup-gap 5 seems optimal with some but relatively few hardware errors). The hashrate is ~30% of what it was before.

P2pool tuning guide
Trade BTC for €/$ at bitcoin.de (referral), it's cheaper and faster (acts as escrow and lets the buyers do bank transfers).
Tip: 17bdPfKXXvr7zETKRkPG14dEjfgBt5k2dd
gyverlb
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
July 08, 2013, 04:17:57 PM
 #51

Just to double check, can you tell me the difficulty on the p2pool? The HW error numbers are a bit difficult to interpret because they are difficulty-1 shares. You need to multiply your accepted shares with the difficulty reported by the miner before you can compare it with HW errors.

Looking for this I noticed in my yacoin p2pool logs lots of:
Code:
2013-07-08 18:07:05.117352 Worker <hidden> submitted share with hash > target:
2013-07-08 18:07:05.117431     Hash:   fe983383877a82de6330326a41011914da8606f1900c3b669d4fcea9bb8b
2013-07-08 18:07:05.117495     Target: fffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

I'm not sure of the formula for Target -> diff for scrypt-jane, but that's an interesting yacminer/p2pool diff mismatch. Doesn't affect my effective hashrate though.

P2pool tuning guide
Trade BTC for €/$ at bitcoin.de (referral), it's cheaper and faster (acts as escrow and lets the buyers do bank transfers).
Tip: 17bdPfKXXvr7zETKRkPG14dEjfgBt5k2dd
cryptrol
Hero Member
*****
Offline Offline

Activity: 637
Merit: 500


View Profile
July 08, 2013, 04:22:02 PM
 #52

Using the suggested TC by the table above with a 7950 does not work.

I get the best hashrate for a 7950 with this settings :
Quote
yacminer --scrypt -o http://pool -u me -p x --gpu-platform 0 -w 256 -I 19 --thread-concurrency 41216 --gpu-memclock 1483 --gpu-engine 1044 --lookup-gap 2
I  get 120Kh with the above settings. I was getting 230 before the N change.
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
July 08, 2013, 04:57:22 PM
 #53


Well, my current theory is that N = 8192 will still be doable with lookup gap = 2 on my HD 7790. After that I will probably need to increase lookup gap when N hits 16384 near the end of September.

My HD 7790 has 896 cores and 1 GB of memory. I can allocate about 768 MB of that (which corresponds to thread concurrency = 12000 with lookup gap = 2). 896 is the absolute minimum value of TC needed to sustain 896 cores at N = 1024. The effective thread concurrency is divided by 2 for each Nfactor increase after N = 1024. At N = 8192, the lower bound of TC is 8 * 896 = 7168. My GPU still has enough memory for that but not after the next increase of N.

The exact technical details are fairly complicated so I'm not going to try to explain them here. I'm not even sure about all the details myself. Please note using the optimal value for TC is the key to being able to use high intensities with lookup gap = 2.

I will try to do some benchmarks and post some numbers to support my theory. That's going to take a while though.

Thanks for the extra info and explanation. Based on some rough back of the envelope calculations, assuming a GPU with 4GB vs. a Quad Core CPU, assuming the CPU cores are 10x as efficient as the GPU cores at hashing (correct me if that sounds barmy) . . . we're looking at about an NFactor of 19 (August 2017) before CPUs draw even with GPUs. At this point, about 95% of the GPU cores could be idling for lack of memory, and most of the memory on the PC could still be unused. Does that sound about right?

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 05:09:48 PM
 #54

Just to double check, can you tell me the difficulty on the p2pool? The HW error numbers are a bit difficult to interpret because they are difficulty-1 shares. You need to multiply your accepted shares with the difficulty reported by the miner before you can compare it with HW errors.

Looking for this I noticed in my yacoin p2pool logs lots of:
Code:
2013-07-08 18:07:05.117352 Worker <hidden> submitted share with hash > target:
2013-07-08 18:07:05.117431     Hash:   fe983383877a82de6330326a41011914da8606f1900c3b669d4fcea9bb8b
2013-07-08 18:07:05.117495     Target: fffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

I'm not sure of the formula for Target -> diff for scrypt-jane, but that's an interesting yacminer/p2pool diff mismatch. Doesn't affect my effective hashrate though.

That's interesting. The numbers are missing leading zeros, but it does look like a target miss. If I add the leading zeros, I get:

Code:
Hash:   0000fe983383877a82de6330326a41011914da8606f1900c3b669d4fcea9bb8b
Target: 00000fffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

I'm not sure if that's a bug in Yacminer or P2pool. Yacminer is working fine with pushpools and yacoind.

Yacminer (cgminer) uses the target 0x0000ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff as difficulty 1. It then calculates difficulty as diff1_target / target. That gives a nice integer number. Your P2pool's difficulty should be 16.

Yacoind uses the target 00000000ffff0000000000000000000000000000000000000000000000000000 as difficulty 1. It then uses a tricky algorithm that corresponds roughly to target / diff1_target. That produces a floating point number. Your P2pool's difficulty should be about 0.00024413 that way.

In either case, the difficulty is pretty low. If the number of HW errors is exceeding accepted shares with that difficulty, then it's definitely not good.
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 05:49:22 PM
 #55

Thanks for the extra info and explanation. Based on some rough back of the envelope calculations, assuming a GPU with 4GB vs. a Quad Core CPU, assuming the CPU cores are 10x as efficient as the GPU cores at hashing (correct me if that sounds barmy) . . . we're looking at about an NFactor of 19 (August 2017) before CPUs draw even with GPUs. At this point, about 95% of the GPU cores could be idling for lack of memory, and most of the memory on the PC could still be unused. Does that sound about right?

GPUs being 10x as efficient sounds a bit dubious but I don't have the numbers right now to check that. I'm not sure how you did your math either. Did you calculate when the efficiency of GPUs would meet with CPUs? Also you seem to be assuming that GPUs scale linearly when cores go idle which may not be the case. I actually did try idle cores but I don't remember the numbers unfortunately.
gyverlb
Hero Member
*****
Offline Offline

Activity: 896
Merit: 1000



View Profile
July 08, 2013, 06:58:11 PM
 #56

In either case, the difficulty is pretty low. If the number of HW errors is exceeding accepted shares with that difficulty, then it's definitely not good.

I'm wondering if the p2pool fork ( https://github.com/Rav3nPL/p2pool-yac.git ) was based on a recent enough p2pool: there was a bug with scrypt where p2pool didn't handle the difficulty correctly and sent a lower one to the miners. This kind of logs was exactly what we had with Litecoin.

HW errors can be limited, but 5x70 cards perform poorly : 7950 and 7970 are OK and I only had to tune memory and intensity for my particular setup to get hashrate/2 compared to the previous N value (as expected) but for 5870 and 5970 this is more like hashrate/3.

P2pool tuning guide
Trade BTC for €/$ at bitcoin.de (referral), it's cheaper and faster (acts as escrow and lets the buyers do bank transfers).
Tip: 17bdPfKXXvr7zETKRkPG14dEjfgBt5k2dd
FreeTrade
Legendary
*
Offline Offline

Activity: 1428
Merit: 1030



View Profile
July 08, 2013, 07:20:34 PM
 #57

GPUs being 10x as efficient sounds a bit dubious but I don't have the numbers right now to check that. I'm not sure how you did your math either. Did you calculate when the efficiency of GPUs would meet with CPUs? Also you seem to be assuming that GPUs scale linearly when cores go idle which may not be the case. I actually did try idle cores but I don't remember the numbers unfortunately.

Yes, just using some rough numbers and simple assumptions as best I am able with my limited knowledge. Also assuming the hardware stays relatively the same.

Interesting that you say that some idle cores on the GPU would allow others to run faster . . .  less heat to dissipate I'm guessing . . . and I guess the GPU could cycle through its cores, so as one got hotter, it could switch to a cooler one.

Membercoin - Layer 1 Coin used for the member.cash decentralized social network.
10% Interest On All Balances. Browser and Solo Mining. 100% Distributed to Users and Developers.
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 07:57:07 PM
 #58

GPUs being 10x as efficient sounds a bit dubious but I don't have the numbers right now to check that. I'm not sure how you did your math either. Did you calculate when the efficiency of GPUs would meet with CPUs? Also you seem to be assuming that GPUs scale linearly when cores go idle which may not be the case. I actually did try idle cores but I don't remember the numbers unfortunately.

Yes, just using some rough numbers and simple assumptions as best I am able with my limited knowledge. Also assuming the hardware stays relatively the same.

Interesting that you say that some idle cores on the GPU would allow others to run faster . . .  less heat to dissipate I'm guessing . . . and I guess the GPU could cycle through its cores, so as one got hotter, it could switch to a cooler one.

Well, it's almost linear. I actually did a quick benchmark of it.

Code:
Active cores | Hashrate
-------------+----------
100%         | 59.88
50%          | 30.78
25%          | 14.61
12.5%        | 7.361

I would assume that power consumption would go down as well but I cannot measure that now. It is very unlikely that it would scale linearly though.
mikaelh (OP)
Sr. Member
****
Offline Offline

Activity: 301
Merit: 250


View Profile
July 08, 2013, 08:22:23 PM
 #59

Here's an alternative table how the miner scales with lookup gap:

Code:
Lookup gap | Hashrate
-----------+----------
2          | 61.01
4          | 39.24
8          | 21.61
16         | 11.97

Power consumption is likely to remain the same but I haven't measured it.

As you can see, there are at least 2 options for scaling. One option reduces both the hashrate and the power consumption. The other option provides higher hashrates at the cost of high power consumption. These options can also be combined. And this is exactly why I don't back of the envelope calculations of when CPUs are going to break even with GPUs.
Thirtybird
Hero Member
*****
Offline Offline

Activity: 693
Merit: 500



View Profile
July 08, 2013, 09:36:34 PM
 #60

I've updated my settings on my 7850 with 1GB of RAM to the following :

yacminer --scrypt --worksize 256 --lookup-gap 3 -I 15 --thread-concurrency 18200 --gpu-engine 1100 --gpu-memclock 1250

this produces 61.15 KH/sec - higher intensities push it up slightly, but the WU/s go down.  I get no hardware errors with this configuration.  With a LUG of 2 best I could get was 76 KH/sec with a significant number of hardware errors. 

I've been tuning my thread-concurrency on each card by setting it to an outrageous number, and running cgminer.  It will then tell me what I'm trying to allocate memory wise (TRY), and what it could allocate(MAX).  I then use those two numbers to lower the TC down to (MAX/TRY) * TC and then subtract 256. 

My 5870 still gets about 51KH/sec using the same settings as N=10, just with a lower intensity.  Again, with no hardware errors.

In the code, what does the hardware error actually occur from?  is it running out of memory, hashes not adding up, or something else?

YACMiner: https://github.com/Thirtybird/YACMiner  N-Factor information : https://docs.google.com/spreadsheet/ccc?key=0Aj3vcsuY-JFNdC1ITWJrSG9VeWp6QXppbVgxcm0tbGc&usp=drive_web#gid=0
BTC: 183eSsaxG9y6m2ZhrDhHueoKnZWmbm6jfC  YAC: Y4FKiwKKYGQzcqn3M3u6mJoded6ri1UWHa
Pages: « 1 2 [3] 4 5 6 7 8 9 10 11 12 13 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!