Genoil
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 01:21:52 PM |
|
Build instructions for Windows:1. Install AMD APP SDK for Windows from here http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/2. Get cygwin from http://cygwin.com3. Install cygwin with the packages: - Devel/git - Devel/make - Devel/gcc-core - Python/python3 4. Start cygwin64 terminal 5. Clone repo: git clone https://github.com/mbevand/silentarmy.git6. cd silentarmy 7. Build: make OPENCL_HEADERS="/cygdrive/c/Program\ Files\ \(x86\)/AMD\ APP\ SDK/3.0/include/" LIBOPENCL="/cygdrive/c/Program\ Files\ \(x86\)/AMD\ APP\ SDK/3.0/lib/x86_64" 8. Copy cygwin1.dll: cp /cygdrive/c/cygwin64/bin/cygwin1.dll . 9. Get your build path by typing: cygpath . -w -a 10. Create start.bat with: c:\cygwin64\bin\python3.4m silentarmy --instances=1 -c stratum+tcp://address -u x.x -p x --use 1,2 pause Does the Cygwin port also have issues wit multiple instances like my MSVC port?
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
oslak
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 01:34:02 PM |
|
Build instructions for Windows:1. Install AMD APP SDK for Windows from here http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/2. Get cygwin from http://cygwin.com3. Install cygwin with the packages: - Devel/git - Devel/make - Devel/gcc-core - Python/python3 4. Start cygwin64 terminal 5. Clone repo: git clone https://github.com/mbevand/silentarmy.git6. cd silentarmy 7. Build: make OPENCL_HEADERS="/cygdrive/c/Program\ Files\ \(x86\)/AMD\ APP\ SDK/3.0/include/" LIBOPENCL="/cygdrive/c/Program\ Files\ \(x86\)/AMD\ APP\ SDK/3.0/lib/x86_64" 8. Copy cygwin1.dll: cp /cygdrive/c/cygwin64/bin/cygwin1.dll . 9. Get your build path by typing: cygpath . -w -a 10. Create start.bat with: c:\cygwin64\bin\python3.4m silentarmy --instances=1 -c stratum+tcp://address -u x.x -p x --use 1,2 pause Thank you very much for this. ![Grin](https://bitcointalk.org/Smileys/default/grin.gif)
|
|
|
|
ioglnx
Sr. Member
![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif)
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 01:40:44 PM |
|
Try replacing ht_store function with this I'm getting little speed increase( 1070) uint ht_store(uint round, __global char *ht, uint i, ulong xi0, ulong xi1, ulong xi2, ulong xi3, __global uint *rowCounters) { uint row; __global char *p; uint cnt; uint tid = get_global_id(0); uint tlid = get_local_id(0); #if NR_ROWS_LOG == 16 if (!(round % 2)) row = (xi0 & 0xffff); else // if we have in hex: "ab cd ef..." (little endian xi0) then this // formula computes the row as 0xdebc. it skips the 'a' nibble as it // is part of the PREFIX. The Xi will be stored starting with "ef..."; // 'e' will be considered padding and 'f' is part of the current PREFIX row = ((xi0 & 0xf00) << 4) | ((xi0 & 0xf00000) >> 12) | ((xi0 & 0xf) << 4) | ((xi0 & 0xf000) >> 12); #elif NR_ROWS_LOG == 18 if (!(round % 2)) row = (xi0 & 0xffff) | ((xi0 & 0xc00000) >> 6); else row = ((xi0 & 0xc0000) >> 2) | ((xi0 & 0xf00) << 4) | ((xi0 & 0xf00000) >> 12) | ((xi0 & 0xf) << 4) | ((xi0 & 0xf000) >> 12); #elif NR_ROWS_LOG == 19 if (!(round % 2)) row = (xi0 & 0xffff) | ((xi0 & 0xe00000) >> 5); else row = ((xi0 & 0xe0000) >> 1) | ((xi0 & 0xf00) << 4) | ((xi0 & 0xf00000) >> 12) | ((xi0 & 0xf) << 4) | ((xi0 & 0xf000) >> 12); #elif NR_ROWS_LOG == 20 if (!(round % 2)) row = (xi0 & 0xffff) | ((xi0 & 0xf00000) >> 4); else row = ((xi0 & 0xf0000) >> 0) | ((xi0 & 0xf00) << 4) | ((xi0 & 0xf00000) >> 12) | ((xi0 & 0xf) << 4) | ((xi0 & 0xf000) >> 12); #else #error "unsupported NR_ROWS_LOG" #endif xi0 = (xi0 >> 16) | (xi1 << (64 - 16)); xi1 = (xi1 >> 16) | (xi2 << (64 - 16)); xi2 = (xi2 >> 16) | (xi3 << (64 - 16)); p = ht + row * NR_SLOTS * SLOT_LEN; uint rowIdx = row/ROWS_PER_UINT; uint rowOffset = BITS_PER_ROW*(row%ROWS_PER_UINT); uint xcnt = atomic_add(rowCounters + rowIdx, 1 << rowOffset); xcnt = (xcnt >> rowOffset) & ROW_MASK; cnt = xcnt; if (cnt >= NR_SLOTS) { // avoid overflows atomic_sub(rowCounters + rowIdx, 1 << rowOffset); return 1; } p += cnt * SLOT_LEN + xi_offset_for_round(round); // store "i" (always 4 bytes before Xi) // *(__global uint *)(p - 4) = i; if (round == 0 || round == 1) { //*(__global uint *)(p - 4) = i; // store 24 bytes ulong2 store; store.x=xi1; store.y=xi2; //*(__global ulong *)(p + 0) = xi0; *(__global uint *)(p - 4) = i; *(__global ulong *)(p + 0) = xi0; *(__global ulong2 *)(p + 8)=store;
} else if (round == 2) { // *(__global uint *)(p - 4) = i; // store 20 bytes
*(__global ulong *)(p - 4) = ((ulong)i) | (xi0 << 32); *(__global ulong *)(p + 4) = (xi0 >> 32) | (xi1 << 32); *(__global ulong *)(p + 12) = (xi1 >> 32) | (xi2 << 32);
} else if (round == 3) { // *(__global uint *)(p - 4) = i; // store 16 bytes //8 byte align *(__global ulong *)(p - 4) = ((ulong)i) | (xi0 << 32); *(__global ulong *)(p + 4) = (xi0 >> 32) | (xi1 << 32); *(__global uint *)(p + 12) = (xi1 >> 32); } else if (round == 4) { // *(__global uint *)(p - 4) = i; // store 16 bytes *(__global uint *)(p - 4) = i; *(__global ulong *)(p + 0) = xi0; *(__global ulong *)(p + 8) = xi1; } else if (round == 5) { //*(__global uint *)(p - 4) = i; // store 12 bytes // *(__global uint *)(p - 4) = i;
*(__global uint *)(p - 4) = i; *(__global ulong *)(p + 0) = xi0; *(__global uint *)(p + 8) = xi1; } else if (round == 6 || round == 7) { // *(__global uint *)(p - 4) = i; // store 8 bytes *(__global ulong *)(p - 4) = ((ulong)i) | (xi0 << 32); *(__global uint *)(p + 4) = (xi0 >> 32); } else if (round == 8) { //4 byte align *(__global uint *)(p - 4) = i; // store 4 bytes *(__global uint *)(p + 0) = xi0;
}
//*(__global uint *)(p - 4) = i; return 0; }
And part of xor_and_store else if (round == 3) { // xor 20 bytes uint one = *(__global uint *)a ^ *(__global uint *)b;
uint4 loada = *(__global uint4 *)((__global char *)a + 4); uint4 loadb = *(__global uint4 *)((__global char *)b + 4); uint4 stor = loada ^ loadb; xi0 = ((ulong)one ) | ((ulong) stor.x << 32); xi1 = ((ulong)stor.y << 32) | ((ulong)stor.z ); xi2 = stor.w;
//xi0 = half_aligned_long(a, 0) ^ half_aligned_long(b, 0); //xi1 = half_aligned_long(a, 8) ^ half_aligned_long(b, 8); //xi2 = well_aligned_int(a, 16) ^ well_aligned_int(b, 16); } these changes are made in the kernel.cu? Am I right
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
snk0752
Newbie
Offline
Activity: 39
Merit: 0
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 01:46:22 PM |
|
these changes are made in the kernel.cu? Am I right
input.cl
|
|
|
|
ioglnx
Sr. Member
![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif)
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 01:48:36 PM |
|
these changes are made in the kernel.cu? Am I right
input.cl Well krnlx has a cuda port of it so I assume its in the kernel.cu since ther eis no input.cl in his cuda implementation. Or I'm blind since i found these functions in the kernel .cu too,
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
snk0752
Newbie
Offline
Activity: 39
Merit: 0
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 01:54:40 PM |
|
Well krnlx has a cuda port of it so I assume its in the kernel.cu since ther eis no input.cl in his cuda implementation. Or I'm blind since i found these functions in the kernel .cu too,
ah.. got it. as I have no cuda (as I'm using amd opencl drivers) I've just adjusted input.cl in my case. Sorry for confusing.
|
|
|
|
qqqq
Legendary
Offline
Activity: 1596
Merit: 1011
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 02:01:27 PM |
|
Why the devs not making the windows builds, i don't get it ?
|
|
|
|
mgmk
Newbie
Offline
Activity: 3
Merit: 0
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 02:03:42 PM |
|
Does the Cygwin port also have issues wit multiple instances like my MSVC port?
Yes, using cygwin results the same performance issue when running multiple instances
|
|
|
|
gross
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 02:08:38 PM |
|
Build instructions for Windows:1. Install AMD APP SDK for Windows from here http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/2. Get cygwin from http://cygwin.com3. Install cygwin with the packages: - Devel/git - Devel/make - Devel/gcc-core - Python/python3 4. Start cygwin64 terminal 5. Clone repo: git clone https://github.com/mbevand/silentarmy.git6. cd silentarmy 7. Build: make OPENCL_HEADERS="/cygdrive/c/Program\ Files\ \(x86\)/AMD\ APP\ SDK/3.0/include/" LIBOPENCL="/cygdrive/c/Program\ Files\ \(x86\)/AMD\ APP\ SDK/3.0/lib/x86_64" 8. Copy cygwin1.dll: cp /cygdrive/c/cygwin64/bin/cygwin1.dll . 9. Get your build path by typing: cygpath . -w -a 10. Create start.bat with: c:\cygwin64\bin\python3.4m silentarmy --instances=1 -c stratum+tcp://address -u x.x -p x --use 1,2 pause It doesn't work, 3 dev show:"0 sol/s". Do I installs AMD APP SDK 64bit and 32 bit? I only install AMD APP SDK 64bit
|
|
|
|
eXtremal
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 02:12:57 PM |
|
Yeah, I think it's totally worth it to put up a serious open source competition against Sir. Claymore ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) We shall be back. I'll try get Claymore's v5/optiminer speed in 2-3 days. p.s. he have a more motivation because his model (miner with fee) works perfectly - opposite to my (opensource miner and own pool), miners don't want support coinsforall.io ![Sad](https://bitcointalk.org/Smileys/default/sad.gif)
|
|
|
|
Kubuxu
Newbie
Offline
Activity: 4
Merit: 0
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 02:50:13 PM |
|
Yeah, I think it's totally worth it to put up a serious open source competition against Sir. Claymore ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) We shall be back. I'll try get Claymore's v5/optiminer speed in 2-3 days. p.s. he have a more motivation because his model (miner with fee) works perfectly - opposite to my (opensource miner and own pool), miners don't want support coinsforall.io ![Sad](https://bitcointalk.org/Smileys/default/sad.gif) What do you think about FOSS miner with dev fee. Small fee 1-2% split among devs (address list in sources with weights). If someone builds one on its own he has chance to opt out. No enforcing, no saying that people opting out are bad. There probably won't be much profit in that but some motivation is better than none, as someone that does OSS from begging and currently works for FOSS company I know that well. Also I am planning to rewrite computation pipeline to always have some job queued up thus reducing kernel downtime. The overhead is small right now but when the limitations of GPUs are truly met it might make a difference.
|
|
|
|
ioglnx
Sr. Member
![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif)
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 02:55:12 PM |
|
Hello Krnlx,
can you please post the complete kernel.cu somehow your snippets aren't as complete as I thought on first spot.Getting this error:
Error 3 error : no operator "^" matches these operands X:\Mining\sources\nheqminer-cuda-silentarmy\cuda_silentarmy\kernel.cu 496 1 cuda_silentarmy
uint4 loada = *(__global uint4 *)((__global char *)a + 4); uint4 loadb = *(__global uint4 *)((__global char *)b + 4); uint4 stor = loada ^ loadb; Or wasn't it supposed for your cuda port?
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
padrino
Legendary
Offline
Activity: 1428
Merit: 1000
https://www.bitworks.io
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 03:18:24 PM |
|
Yeah, I think it's totally worth it to put up a serious open source competition against Sir. Claymore ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) We shall be back. I'll try get Claymore's v5/optiminer speed in 2-3 days. p.s. he have a more motivation because his model (miner with fee) works perfectly - opposite to my (opensource miner and own pool), miners don't want support coinsforall.io ![Sad](https://bitcointalk.org/Smileys/default/sad.gif) I had a rough time with your pool so I gave it up some time ago, but have no problem letting the miner mine for a dev fee, perhaps you can sync up with mrb and figure something out..
|
|
|
|
cryptomined
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 03:22:50 PM Last edit: November 14, 2016, 03:33:18 PM by cryptomined |
|
I'll try get Claymore's v5/optiminer speed in 2-3 days. p.s. he have a more motivation because his model (miner with fee) works perfectly - opposite to my (opensource miner and own pool), miners don't want support coinsforall.io ![Sad](https://bitcointalk.org/Smileys/default/sad.gif) I would have mined on coinsforall.io from day one but your pool seems blocked from chinese IPs? chinese were complaining that you blocked them from your pool from day 1.. Is it you or your host? everyone in china was complaining that it was not fair for them,... and that the ZCash start was not fair due to your pool not allowing direct chiense connections. it was a pain in the butt to connect through VPN in linux, my VPn keeps dropping when the screen turns off so i would have to always leave the monitor on.. and since I was just using one monitor for 3 rigs... was impossible did/do you know about the issue with china IPs not connecting to your pool? is this done on purpose? if not can you ask for a change of IP from your host? I liked your miner and was also very upset i was not able to efficiently use it on your pool... really upsetting for me at least I left you a small tip as I was using it for a few hours until i gave up due to the IP address situation..... so you got my remaining balance from day 1 - which today isn't work jack but still,,., i lots hours of mining on your pool on genesis day you may want to consider changing hosts - but maybe damage is already done and there are too many pools now and anyways... how about adding a ZClassic Pool?
|
|
|
|
Tmdz
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 03:35:15 PM |
|
Yeah, I think it's totally worth it to put up a serious open source competition against Sir. Claymore ![Wink](https://bitcointalk.org/Smileys/default/wink.gif) We shall be back. I'll try get Claymore's v5/optiminer speed in 2-3 days. p.s. he have a more motivation because his model (miner with fee) works perfectly - opposite to my (opensource miner and own pool), miners don't want support coinsforall.io ![Sad](https://bitcointalk.org/Smileys/default/sad.gif) I had a rough time with your pool so I gave it up some time ago, but have no problem letting the miner mine for a dev fee, perhaps you can sync up with mrb and figure something out.. I like the way optiminer mines the dev fee, eg it continuously mines instead of dropping all the hash to mine for dev. That also gives the user a accurate picture of how much hash they are getting. People don't like being stuck or forced to use a single pool, thats just science.
|
|
|
|
krnlx
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 04:02:55 PM |
|
Hello Krnlx,
can you please post the complete kernel.cu somehow your snippets aren't as complete as I thought on first spot.Getting this error:
Error 3 error : no operator "^" matches these operands X:\Mining\sources\nheqminer-cuda-silentarmy\cuda_silentarmy\kernel.cu 496 1 cuda_silentarmy
uint4 loada = *(__global uint4 *)((__global char *)a + 4); uint4 loadb = *(__global uint4 *)((__global char *)b + 4); uint4 stor = loada ^ loadb; Or wasn't it supposed for your cuda port?
It's for opencl. Cuda have not native 128bit xor (don't know about amd, and future cards). For cuda you can test uint4 stor; stor.x = loada.x ^ loadb.x; stor.y = loada.y ^ loadb.y;
|
|
|
|
Genoil
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 04:11:12 PM |
|
Hello Krnlx,
can you please post the complete kernel.cu somehow your snippets aren't as complete as I thought on first spot.Getting this error:
Error 3 error : no operator "^" matches these operands X:\Mining\sources\nheqminer-cuda-silentarmy\cuda_silentarmy\kernel.cu 496 1 cuda_silentarmy
uint4 loada = *(__global uint4 *)((__global char *)a + 4); uint4 loadb = *(__global uint4 *)((__global char *)b + 4); uint4 stor = loada ^ loadb; Or wasn't it supposed for your cuda port?
It's for opencl. Cuda have not native 128bit xor (don't know about amd, and future cards). For cuda you can test uint4 stor; stor.x = loada.x ^ loadb.x; stor.y = loada.y ^ loadb.y; static __device__ __forceinline__ uint4 operator^ (uint4 a, uint4 b) { return make_uint4(a.x ^ b.x, a.y ^ b.y, a.z ^ b.z, a.w ^ b.w); }
|
ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
|
|
|
ioglnx
Sr. Member
![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif) ![*](https://bitcointalk.org/Themes/custom1/images/star.gif)
Offline
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 04:26:08 PM |
|
Thanks genoil :-D Edit: I could build the CUDA_SA now..let see if it works :-D But it doesn't work :-D registers get spammed :-D takes 5min to unload the application ![Cheesy](https://bitcointalk.org/Smileys/default/cheesy.gif)
|
GTX 1080Ti rocks da house... seriously... this card is a beast³ Owning by now 18x GTX1080Ti :-D @serious love of efficiency
|
|
|
laik2
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 05:42:13 PM |
|
Add cpu mining support for DevFee with 1 core mining on your T addresses. Include option from 0 to 100% DevFee. I am aware that binary distributions are preferred due to speed and developers get more donations due to existance of devfee but mining 24/7 with 1 core to dev address or 1 hour at full speed every 24h will make huge difference in development speed.
P.S. We need solutions fast. Everyone with cpu/gpu power should donate ASAP for faster development. Other closed source projects already support 150+ S/s on highend AMD cards, but instability is the only frontier that stops everyone using silentarmy to switch there...As if nobody cares about Claymore's and others devfee but only speed - please donate by mining on silentarmy devs addresses. I will ask every active developer to put their address on the github page so every and each one of us donates for you!
|
|
|
|
toptek
Legendary
Offline
Activity: 1274
Merit: 1000
|
![](https://bitcointalk.org/Themes/custom1/images/post/xx.gif) |
November 14, 2016, 07:08:30 PM |
|
add devfee we can turn off and on ,if it's off it doesn't slow us down i'll keep it on a few days at a time. I'll put it use. I hope others would to.
|
|
|
|
|