Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480!

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 18, 2016, 09:00:32 PM

#81

Quote from: jiggytom on December 18, 2016, 07:29:13 PM

Quote from: zawawa on December 18, 2016, 05:41:37 PM

I think I figured out how to coalesce global memory reads.
(Memory writes cannot be coalesced because the destination of each slot is not predictable.)
It would have been impossible with the original design of SA, but it should be possible with GG because it loads slots differently.
If everything works out, there should be a massive speedup, hehehe...

Great news! Does that include CUDA also?

Definitely for NVIDIA, and potentially for AMD. I already have some positive results, but I still need to reduce the overhead and bring NR_ROWS_LOG down to 13. More work, more work...

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 18, 2016, 09:18:52 PM

#82

This should be pretty useful once I get down to the GCN assembly:

https://community.amd.com/thread/165710

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

kilo17

Legendary

Offline

Activity: 980
Merit: 1001

aka "whocares"

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 19, 2016, 05:58:45 AM

#83

Not sure if you can help out on this or not. I am trying out the miner on Ubuntu 16.10 with 4.9 kernel and open source drivers and etc.

I changed the opencl location in the make file but had similar results to Eliovp:

Code:

echo 'const char *ocl_code = R"_mrb_(' >_kernel.h
cpp input.cl >>_kernel.h
echo ')_mrb_";' >>_kernel.h
gcc -O2 -std=gnu99 -pedantic -Wextra -Wall -Wno-deprecated-declarations -Wno-overlength-strings -I"/opt/AMDAPPSDK-3.0/include"  -c -o main.o main.c
main.c: In function ‘examine_ht’:
main.c:534:26: warning: unused parameter ‘round’ [-Wunused-parameter]
 void examine_ht(unsigned round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                          ^~~~~
main.c:534:50: warning: unused parameter ‘queue’ [-Wunused-parameter]
 void examine_ht(unsigned round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                                                  ^~~~~
main.c:534:65: warning: unused parameter ‘hash_table_buffers’ [-Wunused-parameter]
 void examine_ht(unsigned round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                                                                 ^~~~~~~~~~~~~~~~~~
main.c:534:92: warning: unused parameter ‘row_counters_buffer’ [-Wunused-parameter]
 d round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                                                                     ^~~~~~~~~~~~~~~~~~~
main.c: In function ‘store_encoded_sol’:
main.c:640:25: warning: left shift of negative value [-Wshift-negative-value]
    uint32_t mask = ~(-1 << (8 - x_bits_used));
                         ^~
main.c: In function ‘solve_equihash’:
main.c:958:57: warning: unused parameter ‘ctx’ [-Wunused-parameter]
 uint32_t solve_equihash(cl_device_id dev_id, cl_context ctx, cl_command_queue queue,
                                                         ^~~
main.c: In function ‘mining_mode’:
main.c:1408:18: warning: unused variable ‘status’ [-Wunused-variable]
  cl_int          status;
                  ^~~~~~
main.c:1393:50: warning: unused parameter ‘program’ [-Wunused-parameter]
 void mining_mode(cl_device_id dev_id, cl_program program, cl_context ctx, cl_command_queue queue,
                                                  ^~~~~~~
gcc -O2 -std=gnu99 -pedantic -Wextra -Wall -Wno-deprecated-declarations -Wno-overlength-strings -I"/opt/AMDAPPSDK-3.0/include"  -c -o blake.o blake.c
blake.c:26:25: warning: ‘blake2b_block_len’ defined but not used [-Wunused-const-variable=]
 static const uint32_t   blake2b_block_len = 128;
                         ^~~~~~~~~~~~~~~~~
gcc -O2 -std=gnu99 -pedantic -Wextra -Wall -Wno-deprecated-declarations -Wno-overlength-strings -I"/opt/AMDAPPSDK-3.0/include"  -c -o sha256.o sha256.c
gcc -o sa-solver main.o blake.o sha256.o -rdynamic -L"/usr/lib/x86_64-linux-gnu" -lOpenCL

and then:

Code:

kilo17@kilo-GT7:~/gatelessgate-master$ ./gatelessgate.py -c stratum+tcp://us1-zcash.flypool.org:3333 -u t1cVviFvgJinQ4w3C2m2CfRxgP5DnHYaoFC
Gateless Gate, a Zcash miner
Copyright 2016 zawawa @ bitcointalk.org
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device

Bitcoin Will Only Succeed If The Community That Supports It Gets Support - Support Home Miners & Mining

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

⇾ Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 19, 2016, 05:34:37 PM

#84

Quote from: kilo17 on December 19, 2016, 05:58:45 AM

Code:

echo 'const char *ocl_code = R"_mrb_(' >_kernel.h
cpp input.cl >>_kernel.h
echo ')_mrb_";' >>_kernel.h
gcc -O2 -std=gnu99 -pedantic -Wextra -Wall -Wno-deprecated-declarations -Wno-overlength-strings -I"/opt/AMDAPPSDK-3.0/include"  -c -o main.o main.c
main.c: In function ‘examine_ht’:
main.c:534:26: warning: unused parameter ‘round’ [-Wunused-parameter]
 void examine_ht(unsigned round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                          ^~~~~
main.c:534:50: warning: unused parameter ‘queue’ [-Wunused-parameter]
 void examine_ht(unsigned round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                                                  ^~~~~
main.c:534:65: warning: unused parameter ‘hash_table_buffers’ [-Wunused-parameter]
 void examine_ht(unsigned round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                                                                 ^~~~~~~~~~~~~~~~~~
main.c:534:92: warning: unused parameter ‘row_counters_buffer’ [-Wunused-parameter]
 d round, cl_command_queue queue, cl_mem *hash_table_buffers, cl_mem row_counters_buffer)
                                                                     ^~~~~~~~~~~~~~~~~~~
main.c: In function ‘store_encoded_sol’:
main.c:640:25: warning: left shift of negative value [-Wshift-negative-value]
    uint32_t mask = ~(-1 << (8 - x_bits_used));
                         ^~
main.c: In function ‘solve_equihash’:
main.c:958:57: warning: unused parameter ‘ctx’ [-Wunused-parameter]
 uint32_t solve_equihash(cl_device_id dev_id, cl_context ctx, cl_command_queue queue,
                                                         ^~~
main.c: In function ‘mining_mode’:
main.c:1408:18: warning: unused variable ‘status’ [-Wunused-variable]
  cl_int          status;
                  ^~~~~~
main.c:1393:50: warning: unused parameter ‘program’ [-Wunused-parameter]
 void mining_mode(cl_device_id dev_id, cl_program program, cl_context ctx, cl_command_queue queue,
                                                  ^~~~~~~
gcc -O2 -std=gnu99 -pedantic -Wextra -Wall -Wno-deprecated-declarations -Wno-overlength-strings -I"/opt/AMDAPPSDK-3.0/include"  -c -o blake.o blake.c
blake.c:26:25: warning: ‘blake2b_block_len’ defined but not used [-Wunused-const-variable=]
 static const uint32_t   blake2b_block_len = 128;
                         ^~~~~~~~~~~~~~~~~
gcc -O2 -std=gnu99 -pedantic -Wextra -Wall -Wno-deprecated-declarations -Wno-overlength-strings -I"/opt/AMDAPPSDK-3.0/include"  -c -o sha256.o sha256.c
gcc -o sa-solver main.o blake.o sha256.o -rdynamic -L"/usr/lib/x86_64-linux-gnu" -lOpenCL

and then:

Code:

kilo17@kilo-GT7:~/gatelessgate-master$ ./gatelessgate.py -c stratum+tcp://us1-zcash.flypool.org:3333 -u t1cVviFvgJinQ4w3C2m2CfRxgP5DnHYaoFC
Gateless Gate, a Zcash miner
Copyright 2016 zawawa @ bitcointalk.org
Connecting to us1-zcash.flypool.org:3333
Stratum server sent us the first job
Mining on 1 device

You can run sa-solver to see what's actually going on.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 19, 2016, 06:06:29 PM

#85

I just noticed three things:

(1) With NR_ROWS_LOG=14, Rounds 1 through 8 are actually much faster.
(2) However, kernel_sols() becomes a bottleneck because there are just too many slots in one row. This problem can be easily solved by using a different sorting algorithm for kernel_sols().
(3) Currently, NR_ROWS_LOG<14 is not possible because there is not enough space in shared memory. This problem can be partially solved by making NR_ROWS_LOG variable across rounds. Since less space in shared memory is required for caching slots at later rounds, NR_ROWS_LOG can be decreased accordingly.

This is it. I'm catching up with Claymore's and Eqminer.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

chown.multi

Newbie

Offline

Activity: 28
Merit: 0

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 19, 2016, 06:11:18 PM

#86

Quote from: zawawa on December 19, 2016, 06:06:29 PM

That is good news, keep up the good work.

laik2

Sr. Member

Offline

Activity: 652
Merit: 266

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 19, 2016, 08:46:08 PM

#87

Updated to latest crimson fglrx drivers and...still no go, amdgpu-pro has a bug and is not working well on R9 series...

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 11:28:49 AM

#88

Quote from: laik2 on December 19, 2016, 08:46:08 PM

Updated to latest crimson fglrx drivers and...still no go, amdgpu-pro has a bug and is not working well on R9 series...

I am preparing the next point release today. PM me when your Linux servers are ready. I will do my best to make GG compatible with fglrx.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 11:40:31 AM

#89

By the way, it turned out that NR_ROWS_LOG=12 and 13 actually work. (There was a bug in the code.)
I am also thinking about rewriting GG in CUDA for a better performance.
The miner is already running 40% faster on GTX 1060, but I need an extra boost to catch up with Eqminer.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

qwep1

Hero Member

Offline

Activity: 610
Merit: 500

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 02:12:55 PM

#90

whe are i download miner for win

▄▄██▄▄ ▄▄██████████▄▄ ▄▄██████████████████▄▄ ▄▄██████████▀▀ ▀▀██████████▄▄ ▄█████████▀▀ ▀▀█████████▄ ██████▀▀ ▄▄ ▀▀██████ ██████ ▄▄██████▄▄ ██████ ██████ ██████████████ ██████ ██████ ██████████████ ██████ ██████ ██████████████ ██████ ██████ ▀▀██████▀▀ ██████ ██████ ▀▀ ▄▄██████ ▀█████ ▄▄ ▄▄█████████▀ ▀▀█ ████▄▄ ▄▄██████████▀▀ ████████████████▀▀ ▀▀██████████▀▀ ▀▀██▀▀

P H O R E

█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █

KryptKoin rebranded to Phore
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
PoS 3.0 - Masternodes - Obfuscation

█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █
█ █ █

http://phore.io
Announcement Thread
▪ Join Slack ▪

.
│

▄▄██▄▄ ▄▄██████████▄▄ ▄▄████████▀▀████████▄▄ ▄████████▀▀ ▀▀████████▄ ▐█████▀▀ ▀▀█████▌ ▐████ ▄▄██▄▄ ████▌ ▐████ ▄██████████▄ ████▌ ▐████ ████████████ ████▌ ▐████ ▀██████████▀ ████▌ ▐████ ▀▀██▀▀ ████▌ ▀███ ▄▄█████▌ ▀ █▄▄ ▄▄████████▀ █████▄▄████████▀▀ ▀██████████▀▀ ▀▀██▀▀

TRADE

[ P H R ]

krnlx

Full Member

Offline

Activity: 243
Merit: 105

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 02:56:35 PM

#91

Quote from: zawawa on December 20, 2016, 11:40:31 AM

I made cuda port of SA5, no difference in performance vs opencl+nvidia cpu load fix. The only one thing that cannot be implemented in opencl is cudaDeviceSetCacheConfig.

In openCL you can inline nvidia ptx asm easy, like in cuda.

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 03:06:05 PM

#92

Quote from: krnlx on December 20, 2016, 02:56:35 PM

Quote from: zawawa on December 20, 2016, 11:40:31 AM

Thank you so much for letting me know.
What I specifically had in my mind was "shfl."
If that instruction can be exposed through inline PTX, I can save a considerable amount of time.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 03:08:13 PM

#93

Quote from: qwep1 on December 20, 2016, 02:12:55 PM

whe are i download miner for win

https://github.com/zawawawa/gatelessgate

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

maztheman

Newbie

Offline

Activity: 9
Merit: 0

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 03:10:38 PM

#94

Quote from: zawawa on December 20, 2016, 03:06:05 PM

Quote from: krnlx on December 20, 2016, 02:56:35 PM

Quote from: zawawa on December 20, 2016, 11:40:31 AM

Thank you so much for letting me know.
What I specifically had in my mind was "shfl."
If that instruction can be exposed through inline PTX, I can save a considerable amount of time.

Yeah you can but check the compute version needed, I think its compute 3.2+.

krnlx

Full Member

Offline

Activity: 243
Merit: 105

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 03:28:01 PM

#95

Quote from: zawawa on December 20, 2016, 03:06:05 PM

Quote from: krnlx on December 20, 2016, 02:56:35 PM

Quote from: zawawa on December 20, 2016, 11:40:31 AM

Thank you so much for letting me know.
What I specifically had in my mind was "shfl."
If that instruction can be exposed through inline PTX, I can save a considerable amount of time.

From cuda include files:

Code:

int __shfl(int var, int srcLane, int width) {
        int ret;
        int c = ((warpSize-width) << 8) | 0x1f;
        asm volatile ("shfl.idx.b32 %0, %1, %2, %3;" : "=r"(ret) : "r"(var), "r"(srcLane), "r"(c));
        return ret;
}

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 04:44:15 PM

#96

Quote from: krnlx on December 20, 2016, 03:28:01 PM

Quote from: zawawa on December 20, 2016, 03:06:05 PM

Quote from: krnlx on December 20, 2016, 02:56:35 PM

Quote from: zawawa on December 20, 2016, 11:40:31 AM

Thank you so much for letting me know.
What I specifically had in my mind was "shfl."
If that instruction can be exposed through inline PTX, I can save a considerable amount of time.

From cuda include files:

Code:

int __shfl(int var, int srcLane, int width) {
        int ret;
        int c = ((warpSize-width) << 8) | 0x1f;
        asm volatile ("shfl.idx.b32 %0, %1, %2, %3;" : "=r"(ret) : "r"(var), "r"(srcLane), "r"(c));
        return ret;
}

Awesome!

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 08:23:22 PM

#97

Alright peeps, bug fixes for the next version is almost done.
I'm getting 164 sol/s with RX 480 and 128 sol/s with GTX 1060 3GB.
That should be good enough for now.
I will upload the new version tonight, US PST.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 20, 2016, 11:57:31 PM
Last edit: December 21, 2016, 05:34:57 AM by zawawa

#98

Quote from: laik2 on December 19, 2016, 08:46:08 PM

Updated to latest crimson fglrx drivers and...still no go, amdgpu-pro has a bug and is not working well on R9 series...

I just tested GG on your server. I think I already fixed the problem.
(The next version should be much more stable overall.)
I will push it to the repo in the next few hours, so you can check it yourself.

This compatibility issue turned out to be much more complicated than I thought, and I am afraid I need to drop support for fglrx for now. I am practically a one-man development team, and I don't have resources to support outdated drivers that are known to be notoriously buggy. I continue to support AMDPRO drivers, though.

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

zawawa (OP)

Sr. Member

Offline

Activity: 728
Merit: 304

Miner Developer

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 21, 2016, 03:00:52 AM

#99

I just pushed the new version to GitHub:

https://github.com/zawawawa/gatelessgate

More speed enhancements are coming very soon. Enjoy!

Gateless Gate Sharp, an open-source ETH/XMR miner: http://bit.ly/2rJ2x4V
BTC: 1BHwDWVerUTiKxhHPf2ubqKKiBMiKQGomZ

laik2

Sr. Member

Offline

Activity: 652
Merit: 266

Re: Gateless Gate: zawawa's open-source cross-platform OpenCL Zcash miner

December 21, 2016, 08:04:47 AM

#100

Quote from: zawawa on December 20, 2016, 11:57:31 PM

Quote from: laik2 on December 19, 2016, 08:46:08 PM

Updated to latest crimson fglrx drivers and...still no go, amdgpu-pro has a bug and is not working well on R9 series...

RX cards should work well with amdgpu-pro but R9 are currently poorly supported...

Miners Mining Platform [ MMP OS ] - https://app.mmpos.eu/

Pages: « 1 2 3 4 [5] 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 ... 197 »

Bitcoin Forum > Alternate cryptocurrencies > Mining (Altcoins) > Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480!

« previous topic next topic »