Linit
Newbie
Offline
Activity: 13
Merit: 0
|
|
December 15, 2016, 03:06:13 PM Last edit: December 15, 2016, 07:00:37 PM by Linit |
|
Windows 10 64 Bit. Driver 16.6. Gigabyte R9 390 G1
160 S/s
|
|
|
|
|
|
|
|
|
According to NIST and ECRYPT II, the cryptographic algorithms used in
Bitcoin are expected to be strong until at least 2030. (After that, it
will not be too difficult to transition to different algorithms.)
|
|
|
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
|
|
|
|
xeridea
|
|
December 15, 2016, 04:57:25 PM |
|
You should probably update bottom of readme... "Author Marc Bevand -- http://zorinaq.com" I probably won't be using, most my cards back to Eth for now, and CM faster, and has Remote Monitor. I like open source projects though, I am a developer also, but can't contribute due to issues with my hands. I would like to tinker with OpenCL if I could. Good luck with project!
|
Profitability over time charts for many GPUs - http://xeridea.us/chartsBTC: bc1qr2xwjwfmjn43zhrlp6pn7vwdjrjnv5z0anhjhn LTC: LXDm6sR4dkyqtEWfUbPumMnVEiUFQvxSbZ Eth: 0x44cCe2cf90C8FEE4C9e4338Ae7049913D4F6fC24
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 15, 2016, 06:33:48 PM |
|
You should probably update bottom of readme... "Author Marc Bevand -- http://zorinaq.com" I probably won't be using, most my cards back to Eth for now, and CM faster, and has Remote Monitor. I like open source projects though, I am a developer also, but can't contribute due to issues with my hands. I would like to tinker with OpenCL if I could. Good luck with project! Thanks! After I rest a little, I will optimize the miner further. My ultimate goal would be to create a GUI-based, feature-rich, multi-algorithm miner.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
Linit
Newbie
Offline
Activity: 13
Merit: 0
|
|
December 15, 2016, 06:59:56 PM |
|
Ubuntu 15.04 64 bit. Driver fglrx 15.12. Gigabyte R9 390 G1.
160 S/s.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 15, 2016, 07:04:10 PM |
|
Ubuntu 15.04 64 bit. Driver fglrx 15.12. Gigabyte R9 390 G1.
160 S/s.
Very nice! I would like to reach 200 sol/s without a GCN assembler. We will see.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
Linit
Newbie
Offline
Activity: 13
Merit: 0
|
|
December 15, 2016, 07:16:18 PM |
|
Ubuntu 15.04 64 bit. Driver fglrx 15.12. Gigabyte R9 390 G1.
160 S/s.
Very nice! I would like to reach 200 sol/s without a GCN assembler. We will see. Excellent...
|
|
|
|
laik2
|
|
December 15, 2016, 07:25:50 PM |
|
Ubuntu 15.04 64 bit. Driver fglrx 15.12. Gigabyte R9 390 G1.
160 S/s.
Very nice! I would like to reach 200 sol/s without a GCN assembler. We will see. Without GCN asm 390s should reach 300S/s at most. Multialgo miner is sgminer but documentation is hell...until I find some useful values for a card my beard looks like Santa Claus's.
|
|
|
|
Vetal_inside
Member
Offline
Activity: 78
Merit: 10
|
|
December 15, 2016, 07:56:28 PM |
|
R9 280x w/ modded bios - 85 s/s with instances=1 and 90-95 s/s with instances=2(not stable), like as original SA miner v.5. Win8.1, x64, drivers 15.12
add: with CM it shows 210-220 s/s, depending from memclock
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 15, 2016, 09:14:51 PM |
|
R9 280x w/ modded bios - 85 s/s with instances=1 and 90-95 s/s with instances=2(not stable), like as original SA miner v.5. Win8.1, x64, drivers 15.12
add: with CM it shows 210-220 s/s, depending from memclock
The slow speed is probably due either to the modded BIOS or to the driver. Mods for Claymore's do not necessarily work with Gateless Gate/SILENTARMY. I would try the stock BIOS first. Also, I only tested the miner with Crimson drivers. I suppose I need to be more clear about requirements...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
krnlx
|
|
December 15, 2016, 09:43:52 PM |
|
Total 1094.3 sol/s [dev0 177.9, dev1 176.8, dev2 182.7, dev3 180.9, dev4 185.4, dev5 185.1] 36 shares Total 1093.9 sol/s [dev0 177.6, dev1 177.4, dev2 181.9, dev3 180.4, dev4 184.8, dev5 185.2] 36 shares Total 1094.0 sol/s [dev0 177.6, dev1 177.4, dev2 182.0, dev3 180.7, dev4 185.5, dev5 185.4] 38 shares Total 1093.3 sol/s [dev0 177.5, dev1 176.6, dev2 182.2, dev3 179.8, dev4 186.6, dev5 184.7] 38 shares Total 1092.8 sol/s [dev0 178.5, dev1 176.9, dev2 181.7, dev3 180.7, dev4 185.8, dev5 184.8] 38 shares Total 1093.1 sol/s [dev0 177.7, dev1 177.1, dev2 181.4, dev3 180.4, dev4 186.1, dev5 184.0] 40 shares Total 1093.2 sol/s [dev0 177.1, dev1 177.8, dev2 182.2, dev3 179.9, dev4 186.3, dev5 182.7] 40 shares Total 1093.5 sol/s [dev0 176.8, dev1 178.0, dev2 182.0, dev3 180.2, dev4 186.5, dev5 182.8] 40 shares 6x1070 with a little tune
|
|
|
|
Vetal_inside
Member
Offline
Activity: 78
Merit: 10
|
|
December 15, 2016, 09:45:45 PM |
|
The slow speed is probably due either to the modded BIOS or to the driver. Mods for Claymore's do not necessarily work with Gateless Gate/SILENTARMY. I would try the stock BIOS first. Also, I only tested the miner with Crimson drivers. I suppose I need to be more clear about requirements...
This is memory timings patch. Not sure that it can be a reason for this low solrate. But, on next few days I will try install latest crimson drivers and reflash stock bios. Will see what will change.
|
|
|
|
krnlx
|
|
December 15, 2016, 09:53:09 PM |
|
@zawawa Nvidia cards run faster with NR_ROWS_LOG = 14 Can you check my settings for NR_ROWS_LOG = 14 ? All is correct ? I think, it will be faster with NR_ROWS_LOG=12... #define NR_ROWS_LOG 14 #define NR_SLOTS 240 #define LOCAL_WORK_SIZE 512 #define THREADS_PER_ROW 512 #define LOCAL_WORK_SIZE_SOLS 256 #define THREADS_PER_ROW_SOLS 256 #define GLOBAL_WORK_SIZE_RATIO 512 #define SLOT_CACHE_SIZE (NR_SLOTS * (LOCAL_WORK_SIZE/THREADS_PER_ROW) * 75 / 100) #define LDS_COLL_SIZE (NR_SLOTS * (LOCAL_WORK_SIZE / THREADS_PER_ROW) * 240 / 100)
|
|
|
|
laik2
|
|
December 15, 2016, 10:21:40 PM |
|
@zawawa Nvidia cards run faster with NR_ROWS_LOG = 14 Can you check my settings for NR_ROWS_LOG = 14 ? All is correct ? I think, it will be faster with NR_ROWS_LOG=12... #define NR_ROWS_LOG 14 #define NR_SLOTS 240 #define LOCAL_WORK_SIZE 512 #define THREADS_PER_ROW 512 #define LOCAL_WORK_SIZE_SOLS 256 #define THREADS_PER_ROW_SOLS 256 #define GLOBAL_WORK_SIZE_RATIO 512 #define SLOT_CACHE_SIZE (NR_SLOTS * (LOCAL_WORK_SIZE/THREADS_PER_ROW) * 75 / 100) #define LDS_COLL_SIZE (NR_SLOTS * (LOCAL_WORK_SIZE / THREADS_PER_ROW) * 240 / 100) Proper CUDA implementation is required for NV to boost over 300S/s. There are already nicehash and EWBF CUDA closed source miners doing ~300S/s on 1070. I am waiting on my 1070s to arrive so I can test some CUDA tweaks.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 15, 2016, 11:01:35 PM |
|
@zawawa Nvidia cards run faster with NR_ROWS_LOG = 14 Can you check my settings for NR_ROWS_LOG = 14 ? All is correct ? I think, it will be faster with NR_ROWS_LOG=12... #define NR_ROWS_LOG 14 #define NR_SLOTS 240 #define LOCAL_WORK_SIZE 512 #define THREADS_PER_ROW 512 #define LOCAL_WORK_SIZE_SOLS 256 #define THREADS_PER_ROW_SOLS 256 #define GLOBAL_WORK_SIZE_RATIO 512 #define SLOT_CACHE_SIZE (NR_SLOTS * (LOCAL_WORK_SIZE/THREADS_PER_ROW) * 75 / 100) #define LDS_COLL_SIZE (NR_SLOTS * (LOCAL_WORK_SIZE / THREADS_PER_ROW) * 240 / 100) What speed are you getting on which card? I'm very curious. You could lower NR_SLOTS by 10 or 20, I think. You can uncomment "#define ENABLE_DEBUG", rebuild the app, and run sa-solver.exe to see how many slots drop out at each round. Too many dropped slots would hurt performance. Adding NR_ROWS_LOG=12 itself is trivial, but there may not be enough shared memory.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 15, 2016, 11:05:43 PM |
|
The slow speed is probably due either to the modded BIOS or to the driver. Mods for Claymore's do not necessarily work with Gateless Gate/SILENTARMY. I would try the stock BIOS first. Also, I only tested the miner with Crimson drivers. I suppose I need to be more clear about requirements...
This is memory timings patch. Not sure that it can be a reason for this low solrate. But, on next few days I will try install latest crimson drivers and reflash stock bios. Will see what will change. Performance does suffer if memory timings are too tight. In the meantime, I will test the miner with my trusty 7990's...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
nerdralph
|
|
December 15, 2016, 11:31:30 PM |
|
Not bad zawawa. You still have room to improve ht_store. Will at best result in 2 store_dwordx4 instructions, and 2 core cycles to the memory controller. Which you use after round 5 should only be one cycle, but it will force a 32-byte read burst from the GDDR into the L2, modification of 16 bytes, and then write back. This will waste a lot of GDDR cycles due to the bus turnaround delay. The solution is to have a n-way operation where n threads write 32/n bytes. That will be just core one cycle to xfer 32 bytes to the L2, and a single 32-byte write burst to one of the 2 GDDR5 chips per memory controller channel. I also think using an odd number for NR_SLOTS should be a tiny bit faster by balancing out the writes between the odd and even memory chips. With NR_SLOTS even, the first write to a given row will always be to an even memory chip. With more slots per row this becomes less significant because the rows don't fill up equally. Using an odd number for NR_SLOTS may also reduce channel conflicts.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 15, 2016, 11:46:42 PM |
|
Not bad zawawa. You still have room to improve ht_store. Will at best result in 2 store_dwordx4 instructions, and 2 core cycles to the memory controller. Which you use after round 5 should only be one cycle, but it will force a 32-byte read burst from the GDDR into the L2, modification of 16 bytes, and then write back. This will waste a lot of GDDR cycles due to the bus turnaround delay. The solution is to have a n-way operation where n threads write 32/n bytes. That will be just core one cycle to xfer 32 bytes to the L2, and a single 32-byte write burst to one of the 2 GDDR5 chips per memory controller channel. I also think using an odd number for NR_SLOTS should be a tiny bit faster by balancing out the writes between the odd and even memory chips. With NR_SLOTS even, the first write to a given row will always be to an even memory chip. With more slots per row this becomes less significant because the rows don't fill up equally. Using an odd number for NR_SLOTS may also reduce channel conflicts. Very interesting suggestions. Let me see...
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 16, 2016, 03:10:03 AM |
|
Not bad zawawa. You still have room to improve ht_store. Will at best result in 2 store_dwordx4 instructions, and 2 core cycles to the memory controller. Which you use after round 5 should only be one cycle, but it will force a 32-byte read burst from the GDDR into the L2, modification of 16 bytes, and then write back. This will waste a lot of GDDR cycles due to the bus turnaround delay. The solution is to have a n-way operation where n threads write 32/n bytes. That will be just core one cycle to xfer 32 bytes to the L2, and a single 32-byte write burst to one of the 2 GDDR5 chips per memory controller channel. I also think using an odd number for NR_SLOTS should be a tiny bit faster by balancing out the writes between the odd and even memory chips. With NR_SLOTS even, the first write to a given row will always be to an even memory chip. With more slots per row this becomes less significant because the rows don't fill up equally. Using an odd number for NR_SLOTS may also reduce channel conflicts. I tried 4-way writes with mixed results. The 4-way write version was actually slower than the single-thread-write version, but the former seems to speed up the last few rounds. It makes sense as these rounds are more memory-intensive. I will explore this approach further.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
bigchirv
Newbie
Offline
Activity: 19
Merit: 0
|
|
December 16, 2016, 03:31:09 AM |
|
Thanks for publishing your repo! Appreciated. I'm not a C programmer (or OpenCL for the matter) but I'm a fan of DRY; so when I was reading input.cl I found the get_row() function and I think we can make it a little bit DRYer by doing something like this: uint get_row(uint round, uint xi0) { uint row; uint swp; uint num; #if NR_ROWS_LOG == 14 swp = 0; #elif NR_ROWS_LOG == 15 swp = 1; #elif NR_ROWS_LOG == 16 swp = 2; #else #error "unsupported NR_ROWS_LOG" #endif num = (40 << swp) - 1); if (!(round % 2)) row = (xi0 & ((num << 8 | 0xff)); else row = ((xi0 & (num << 16 | 0xf00)) >> 8) | ((xi0 & 0xf0000000) >> 24); return row; }
So, what do you think, @zawawa? I don't know if this can be useful at all, but if you like it I can make a PR so you can merge the changes later.
|
|
|
|
zawawa (OP)
Sr. Member
Offline
Activity: 728
Merit: 304
Miner Developer
|
|
December 16, 2016, 03:52:25 AM |
|
Thanks for publishing your repo! Appreciated. I'm not a C programmer (or OpenCL for the matter) but I'm a fan of DRY; so when I was reading input.cl I found the get_row() function and I think we can make it a little bit DRYer by doing something like this: uint get_row(uint round, uint xi0) { uint row; uint swp; uint num; #if NR_ROWS_LOG == 14 swp = 0; #elif NR_ROWS_LOG == 15 swp = 1; #elif NR_ROWS_LOG == 16 swp = 2; #else #error "unsupported NR_ROWS_LOG" #endif num = (40 << swp) - 1); if (!(round % 2)) row = (xi0 & ((num << 8 | 0xff)); else row = ((xi0 & (num << 16 | 0xf00)) >> 8) | ((xi0 & 0xf0000000) >> 24); return row; }
So, what do you think, @zawawa? I don't know if this can be useful at all, but if you like it I can make a PR so you can merge the changes later. I appreciate your enthusiasm and willingness to help, but I will keep the current code. With GPGPU, and especially with AMD OpenCL drivers, repeats are often better because you can keep register usage low that way, which is crucially important. My general approach toward GPGPU is that I sacrifice everything for performance, including readability.
|
Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4VBTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ
|
|
|
|