Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480!

Linit

Newbie

Offline

Activity: 13
Merit: 0

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 03:06:13 PM
Last edit: December 15, 2016, 07:00:37 PM by Linit

#21

Windows 10 64 Bit.
Driver 16.6.
Gigabyte R9 390 G1

160 S/s

xeridea

Sr. Member

Offline

Activity: 449
Merit: 251

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 04:57:25 PM

#22

You should probably update bottom of readme...

"Author

Marc Bevand -- http://zorinaq.com"

I probably won't be using, most my cards back to Eth for now, and CM faster, and has Remote Monitor. I like open source projects though, I am a developer also, but can't contribute due to issues with my hands. I would like to tinker with OpenCL if I could. Good luck with project!

Profitability over time charts for many GPUs - http://xeridea.us/charts

BTC: bc1qr2xwjwfmjn43zhrlp6pn7vwdjrjnv5z0anhjhn LTC: LXDm6sR4dkyqtEWfUbPumMnVEiUFQvxSbZ Eth: 0x44cCe2cf90C8FEE4C9e4338Ae7049913D4F6fC24

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 06:33:48 PM

#23

Quote from: xeridea on December 15, 2016, 04:57:25 PM

Thanks! After I rest a little, I will optimize the miner further. My ultimate goal would be to create a GUI-based, feature-rich, multi-algorithm miner.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

Linit

Newbie

Offline

Activity: 13
Merit: 0

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 06:59:56 PM

#24

Ubuntu 15.04 64 bit.
Driver fglrx 15.12.
Gigabyte R9 390 G1.

160 S/s.

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 07:04:10 PM

#25

Quote from: Linit on December 15, 2016, 06:59:56 PM

Ubuntu 15.04 64 bit.
Driver fglrx 15.12.
Gigabyte R9 390 G1.

160 S/s.

Very nice! I would like to reach 200 sol/s without a GCN assembler.
We will see.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

Linit

Newbie

Offline

Activity: 13
Merit: 0

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 07:16:18 PM

#26

Quote from: zawawa on December 15, 2016, 07:04:10 PM

Quote from: Linit on December 15, 2016, 06:59:56 PM

Ubuntu 15.04 64 bit.
Driver fglrx 15.12.
Gigabyte R9 390 G1.

160 S/s.

Very nice! I would like to reach 200 sol/s without a GCN assembler.
We will see.

Excellent...

laik2

Sr. Member

Offline

Activity: 652
Merit: 266

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 07:25:50 PM

#27

Quote from: zawawa on December 15, 2016, 07:04:10 PM

Quote from: Linit on December 15, 2016, 06:59:56 PM

Ubuntu 15.04 64 bit.
Driver fglrx 15.12.
Gigabyte R9 390 G1.

160 S/s.

Very nice! I would like to reach 200 sol/s without a GCN assembler.
We will see.

Without GCN asm 390s should reach 300S/s at most.
Multialgo miner is sgminer but documentation is hell...until I find some useful values for a card my beard looks like Santa Claus's.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/

Vetal_inside

Member

Offline

Activity: 78
Merit: 10

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 07:56:28 PM

#28

R9 280x w/ modded bios - 85 s/s with instances=1 and 90-95 s/s with instances=2(not stable), like as original SA miner v.5.
Win8.1, x64, drivers 15.12

add: with CM it shows 210-220 s/s, depending from memclock

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 09:14:51 PM

#29

Quote from: Vetal_inside on December 15, 2016, 07:56:28 PM

The slow speed is probably due either to the modded BIOS or to the driver. Mods for Claymore's do not necessarily work with Gateless Gate/SILENTARMY. I would try the stock BIOS first. Also, I only tested the miner with Crimson drivers. I suppose I need to be more clear about requirements...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

krnlx

Full Member

Offline

Activity: 243
Merit: 105

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 09:43:52 PM

#30

Quote

Total 1094.3 sol/s [dev0 177.9, dev1 176.8, dev2 182.7, dev3 180.9, dev4 185.4, dev5 185.1] 36 shares
Total 1093.9 sol/s [dev0 177.6, dev1 177.4, dev2 181.9, dev3 180.4, dev4 184.8, dev5 185.2] 36 shares
Total 1094.0 sol/s [dev0 177.6, dev1 177.4, dev2 182.0, dev3 180.7, dev4 185.5, dev5 185.4] 38 shares
Total 1093.3 sol/s [dev0 177.5, dev1 176.6, dev2 182.2, dev3 179.8, dev4 186.6, dev5 184.7] 38 shares
Total 1092.8 sol/s [dev0 178.5, dev1 176.9, dev2 181.7, dev3 180.7, dev4 185.8, dev5 184.8] 38 shares
Total 1093.1 sol/s [dev0 177.7, dev1 177.1, dev2 181.4, dev3 180.4, dev4 186.1, dev5 184.0] 40 shares
Total 1093.2 sol/s [dev0 177.1, dev1 177.8, dev2 182.2, dev3 179.9, dev4 186.3, dev5 182.7] 40 shares
Total 1093.5 sol/s [dev0 176.8, dev1 178.0, dev2 182.0, dev3 180.2, dev4 186.5, dev5 182.8] 40 shares

6x1070 with a little tune

Vetal_inside

Member

Offline

Activity: 78
Merit: 10

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 09:45:45 PM

#31

Quote from: zawawa on December 15, 2016, 09:14:51 PM

This is memory timings patch. Not sure that it can be a reason for this low solrate.
But, on next few days I will try install latest crimson drivers and reflash stock bios. Will see what will change.

krnlx

Full Member

Offline

Activity: 243
Merit: 105

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 09:53:09 PM

#32

@zawawa

Nvidia cards run faster with NR_ROWS_LOG = 14
Can you check my settings for NR_ROWS_LOG = 14 ? All is correct ?

I think, it will be faster with NR_ROWS_LOG=12...

Code:

#define NR_ROWS_LOG            14
#define NR_SLOTS               240
#define LOCAL_WORK_SIZE        512
#define THREADS_PER_ROW        512
#define LOCAL_WORK_SIZE_SOLS   256
#define THREADS_PER_ROW_SOLS   256
#define GLOBAL_WORK_SIZE_RATIO 512
#define SLOT_CACHE_SIZE        (NR_SLOTS * (LOCAL_WORK_SIZE/THREADS_PER_ROW) * 75 / 100)
#define LDS_COLL_SIZE          (NR_SLOTS * (LOCAL_WORK_SIZE / THREADS_PER_ROW) * 240 / 100)

laik2

Sr. Member

Offline

Activity: 652
Merit: 266

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 10:21:40 PM

#33

Quote from: krnlx on December 15, 2016, 09:53:09 PM

@zawawa

Nvidia cards run faster with NR_ROWS_LOG = 14
Can you check my settings for NR_ROWS_LOG = 14 ? All is correct ?

I think, it will be faster with NR_ROWS_LOG=12...

Code:

#define NR_ROWS_LOG            14
#define NR_SLOTS               240
#define LOCAL_WORK_SIZE        512
#define THREADS_PER_ROW        512
#define LOCAL_WORK_SIZE_SOLS   256
#define THREADS_PER_ROW_SOLS   256
#define GLOBAL_WORK_SIZE_RATIO 512
#define SLOT_CACHE_SIZE        (NR_SLOTS * (LOCAL_WORK_SIZE/THREADS_PER_ROW) * 75 / 100)
#define LDS_COLL_SIZE          (NR_SLOTS * (LOCAL_WORK_SIZE / THREADS_PER_ROW) * 240 / 100)

Proper CUDA implementation is required for NV to boost over 300S/s. There are already nicehash and EWBF CUDA closed source miners doing ~300S/s on 1070. I am waiting on my 1070s to arrive so I can test some CUDA tweaks.

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 11:01:35 PM

#34

Quote from: krnlx on December 15, 2016, 09:53:09 PM

@zawawa

Nvidia cards run faster with NR_ROWS_LOG = 14
Can you check my settings for NR_ROWS_LOG = 14 ? All is correct ?

I think, it will be faster with NR_ROWS_LOG=12...

Code:

#define NR_ROWS_LOG            14
#define NR_SLOTS               240
#define LOCAL_WORK_SIZE        512
#define THREADS_PER_ROW        512
#define LOCAL_WORK_SIZE_SOLS   256
#define THREADS_PER_ROW_SOLS   256
#define GLOBAL_WORK_SIZE_RATIO 512
#define SLOT_CACHE_SIZE        (NR_SLOTS * (LOCAL_WORK_SIZE/THREADS_PER_ROW) * 75 / 100)
#define LDS_COLL_SIZE          (NR_SLOTS * (LOCAL_WORK_SIZE / THREADS_PER_ROW) * 240 / 100)

What speed are you getting on which card? I'm very curious. You could lower NR_SLOTS by 10 or 20, I think. You can uncomment "#define ENABLE_DEBUG", rebuild the app, and run sa-solver.exe to see how many slots drop out at each round. Too many dropped slots would hurt performance. Adding NR_ROWS_LOG=12 itself is trivial, but there may not be enough shared memory.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 11:05:43 PM

#35

Quote from: Vetal_inside on December 15, 2016, 09:45:45 PM

Quote from: zawawa on December 15, 2016, 09:14:51 PM

This is memory timings patch. Not sure that it can be a reason for this low solrate.
But, on next few days I will try install latest crimson drivers and reflash stock bios. Will see what will change.

Performance does suffer if memory timings are too tight.
In the meantime, I will test the miner with my trusty 7990's...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

nerdralph

Sr. Member

Offline

Activity: 588
Merit: 251

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 11:31:30 PM

#36

Not bad zawawa. You still have room to improve ht_store.

Code:

p = slot.ui8

Will at best result in 2 store_dwordx4 instructions, and 2 core cycles to the memory controller.

Code:

p = slot.ui4[0]

Which you use after round 5 should only be one cycle, but it will force a 32-byte read burst from the GDDR into the L2, modification of 16 bytes, and then write back. This will waste a lot of GDDR cycles due to the bus turnaround delay. The solution is to have a n-way operation where n threads write 32/n bytes. That will be just core one cycle to xfer 32 bytes to the L2, and a single 32-byte write burst to one of the 2 GDDR5 chips per memory controller channel.
I also think using an odd number for NR_SLOTS should be a tiny bit faster by balancing out the writes between the odd and even memory chips. With NR_SLOTS even, the first write to a given row will always be to an even memory chip. With more slots per row this becomes less significant because the rows don't fill up equally. Using an odd number for NR_SLOTS may also reduce channel conflicts.

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 15, 2016, 11:46:42 PM

#37

Quote from: nerdralph on December 15, 2016, 11:31:30 PM

Not bad zawawa. You still have room to improve ht_store.

Code:

p = slot.ui8

Will at best result in 2 store_dwordx4 instructions, and 2 core cycles to the memory controller.

Code:

p = slot.ui4[0]

Very interesting suggestions. Let me see...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 16, 2016, 03:10:03 AM

#38

Quote from: nerdralph on December 15, 2016, 11:31:30 PM

Not bad zawawa. You still have room to improve ht_store.

Code:

p = slot.ui8

Will at best result in 2 store_dwordx4 instructions, and 2 core cycles to the memory controller.

Code:

p = slot.ui4[0]

I tried 4-way writes with mixed results. The 4-way write version was actually slower than the single-thread-write version, but the former seems to speed up the last few rounds. It makes sense as these rounds are more memory-intensive. I will explore this approach further.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

bigchirv

Newbie

Offline

Activity: 19
Merit: 0

⇾ Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 16, 2016, 03:31:09 AM

#39

Thanks for publishing your repo! Appreciated.

I'm not a C programmer (or OpenCL for the matter) but I'm a fan of DRY; so when I was reading input.cl I found the get_row() function and I think we can make it a little bit DRYer by doing something like this:

Code:

uint get_row(uint round, uint xi0)
{
  uint           row;
  uint           swp;
  uint           num;
#if NR_ROWS_LOG == 14
  swp = 0;
#elif NR_ROWS_LOG == 15
  swp = 1;
#elif NR_ROWS_LOG == 16
  swp = 2;
#else
#error "unsupported NR_ROWS_LOG"
#endif
  num = (40 << swp) - 1);
  if (!(round % 2))
    row = (xi0 & ((num << 8 | 0xff));
  else
    row = ((xi0 & (num << 16 | 0xf00)) >> 8) | ((xi0 & 0xf0000000) >> 24);
  return row;
}

So, what do you think, @zawawa?

I don't know if this can be useful at all, but if you like it I can make a PR so you can merge the changes later.

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 16, 2016, 03:52:25 AM

#40

Quote from: bigchirv on December 16, 2016, 03:31:09 AM

Code:

uint get_row(uint round, uint xi0)
{
  uint           row;
  uint           swp;
  uint           num;
#if NR_ROWS_LOG == 14
  swp = 0;
#elif NR_ROWS_LOG == 15
  swp = 1;
#elif NR_ROWS_LOG == 16
  swp = 2;
#else
#error "unsupported NR_ROWS_LOG"
#endif
  num = (40 << swp) - 1);
  if (!(round % 2))
    row = (xi0 & ((num << 8 | 0xff));
  else
    row = ((xi0 & (num << 16 | 0xf00)) >> 8) | ((xi0 & 0xf0000000) >> 24);
  return row;
}

So, what do you think, @zawawa?

I don't know if this can be useful at all, but if you like it I can make a PR so you can merge the changes later.

I appreciate your enthusiasm and willingness to help, but I will keep the current code. With GPGPU, and especially with AMD OpenCL drivers, repeats are often better because you can keep register usage low that way, which is crucially important. My general approach toward GPGPU is that I sacrifice everything for performance, including readability.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

Pages: « 1 [2] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 ... 197 »

Bitcoin Forum > Alternate cryptocurrencies > Mining (Altcoins) > Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480!

« previous topic next topic »