winmkx
Newbie
Offline
Activity: 86
Merit: 0
|
|
April 26, 2015, 10:17:14 PM |
|
Has anybody done per-algorithm* comparison of CPU and GPU megahashes? And/or the wattage on GPU.
Like for example: BMW 100MHS on R9 280X and 20MHS on Core i7 8 threads.
Which of the SPH algorithms are the best for CPU?
* By algoritm here I mean the primitive algos, like BMW, skein, luffa, etc. Not the combined ones like Quark, X11, X15, etc
Best for CPU compared to GPU? None. You want a KDF, not a hash function. No, not better on CPU. I'd say "less bad" on CPU than on GPU. Perhaps on some algorithms CPU is 100X worse, while on others it is 5X worse? For example, CPUs have built in AES NI, so algorithms like Fugue should be not as lousy on CPU?
|
|
|
|
winmkx
Newbie
Offline
Activity: 86
Merit: 0
|
|
April 27, 2015, 01:10:06 AM |
|
OK, I did some measurements on CPU and GPU and here are the results below: Detailed CPU results can be obtained from eBACS for multiple architectures, but I used my own quick code below on Core i7: #include "stdafx.h" #include <stdlib.h> #include <stdint.h> #include <string.h> #include <stdio.h> #include <time.h>
#include "sha3/sph_blake.h" #include "sha3/sph_bmw.h" #include "sha3/sph_groestl.h" #include "sha3/sph_jh.h" #include "sha3/sph_keccak.h" #include "sha3/sph_skein.h" #include "sha3/sph_luffa.h" #include "sha3/sph_cubehash.h" #include "sha3/sph_shavite.h" #include "sha3/sph_simd.h" #include "sha3/sph_echo.h" #include "sha3/sph_hamsi.h" #include "sha3/sph_fugue.h" #include "sha3/sph_shabal.h" #include "sha3/sph_whirlpool.h"
sph_blake512_context ctx_blake; sph_bmw512_context ctx_bmw; sph_groestl512_context ctx_groestl; sph_jh512_context ctx_jh; sph_keccak512_context ctx_keccak; sph_skein512_context ctx_skein; sph_luffa512_context ctx_luffa; sph_cubehash512_context ctx_cubehash; sph_shavite512_context ctx_shavite; sph_simd512_context ctx_simd; sph_echo512_context ctx_echo; sph_hamsi512_context ctx_hamsi; sph_fugue512_context ctx_fugue; sph_shabal512_context ctx_shabal; sph_whirlpool_context ctx_whirlpool;
#define TEST_PRE(na) printf("%s: ", na); \ unsigned char hash[256]; \ for(int i=0; i<64; i++) hash[i] = i; \ int t = clock(); \ for(int i=0; i<1000000; i++) \ {
#define TEST_POST() } \ printf ("\t\t%g seconds\n", ((float)(clock() - t)) / CLOCKS_PER_SEC); \
void main() { {//blake512 TEST_PRE("blake512"); sph_blake512_init(&ctx_jh); sph_blake512(&ctx_jh, hash, 64); sph_blake512_close(&ctx_jh, hash); TEST_POST(); }
{//bmw512 TEST_PRE("bmw512"); sph_bmw512_init(&ctx_jh); sph_bmw512(&ctx_jh, hash, 64); sph_bmw512_close(&ctx_jh, hash); TEST_POST(); }
{//groestl512 TEST_PRE("groestl512"); sph_groestl512_init(&ctx_jh); sph_groestl512(&ctx_jh, hash, 64); sph_groestl512_close(&ctx_jh, hash); TEST_POST(); }
{//skein512 TEST_PRE("skein512"); sph_skein512_init(&ctx_jh); sph_skein512(&ctx_jh, hash, 64); sph_skein512_close(&ctx_jh, hash); TEST_POST(); }
{//jh512 TEST_PRE("jh512"); sph_jh512_init(&ctx_jh); sph_jh512(&ctx_jh, hash, 64); sph_jh512_close(&ctx_jh, hash); TEST_POST(); }
{//keccak512 TEST_PRE("keccak512"); sph_keccak512_init(&ctx_jh); sph_keccak512(&ctx_jh, hash, 64); sph_keccak512_close(&ctx_jh, hash); TEST_POST(); }
{//luffa512 TEST_PRE("luffa512"); sph_luffa512_init(&ctx_jh); sph_luffa512(&ctx_jh, hash, 64); sph_luffa512_close(&ctx_jh, hash); TEST_POST(); }
{//cubehash512 TEST_PRE("cubehash512"); sph_cubehash512_init(&ctx_jh); sph_cubehash512(&ctx_jh, hash, 64); sph_cubehash512_close(&ctx_jh, hash); TEST_POST(); }
{//shavite512 TEST_PRE("shavite512"); sph_shavite512_init(&ctx_jh); sph_shavite512(&ctx_jh, hash, 64); sph_shavite512_close(&ctx_jh, hash); TEST_POST(); }
{//simd512 TEST_PRE("simd512"); sph_simd512_init(&ctx_jh); sph_simd512(&ctx_jh, hash, 64); sph_simd512_close(&ctx_jh, hash); TEST_POST(); }
{//echo512 TEST_PRE("echo512"); sph_echo512_init(&ctx_jh); sph_echo512(&ctx_jh, hash, 64); sph_echo512_close(&ctx_jh, hash); TEST_POST(); }
{//hamsi512 TEST_PRE("hamsi512"); sph_hamsi512_init(&ctx_jh); sph_hamsi512(&ctx_jh, hash, 64); sph_hamsi512_close(&ctx_jh, hash); TEST_POST(); }
{//fugue512 TEST_PRE("fugue512"); sph_fugue512_init(&ctx_jh); sph_fugue512(&ctx_jh, hash, 64); sph_fugue512_close(&ctx_jh, hash); TEST_POST(); }
{//shabal512 TEST_PRE("shabal512"); sph_shabal512_init(&ctx_jh); sph_shabal512(&ctx_jh, hash, 64); sph_shabal512_close(&ctx_jh, hash); TEST_POST(); }
{//whirlpool TEST_PRE("whirlpool"); sph_whirlpool_init(&ctx_jh); sph_whirlpool(&ctx_jh, hash, 64); sph_whirlpool_close(&ctx_jh, hash); TEST_POST(); } }
All results are scaled to be relative to the fastest hash function, which is shabal. And here's the graph: https://i.imgur.com/ySLE1xK.pngGPU results where obtained on R9 280X with smginer 5.1 and driver 13.X using AMD CodeXL and scaled to be relative to the fastest function shabal: https://i.imgur.com/k6lif5s.pngNotice that CPU speed difference is 1X - 8X, while GPU speed difference is 1X - 45X. PS: Note, you cannot compare values from CPU graph and GPU graph directly. Only algorithm's relative performance in respective graph matters.
|
|
|
|
MaxDZ8
|
|
April 27, 2015, 05:45:35 AM |
|
Very interesting experiment... curious to see CubeHash to be so slow. However I would suggest to iterate at least 1024 calls to rule out possible I$ effects. Also, clock() does not do what people really wants (it might be equivalent in this context), main problem is the granularity is implementation dependent. I'm surprised it can now be used to bench instructions such as those nowadays. In any case, don't use it. Use the performance counter or even better C++11 std::chrono. Groestl should be faster than SIMD on a CPU with AES-NI, I would think. Are the AES-NI instructions applicable to the bigger Groestl round? I think it's most likely the compiler didn't emit correct code, it seems likely it wouldn't even try. It might be intrinsics or nothing.
|
|
|
|
MaxDZ8
|
|
April 27, 2015, 08:27:04 AM |
|
I agree. However, that's what most people use.
|
|
|
|
o00o
Member
Offline
Activity: 93
Merit: 11
|
|
April 30, 2015, 09:30:14 PM |
|
It would be much appreciated if someone can confirm as to which algorithms/miners are supported by my Compute Capability 1.1 GPUs such as 9800GTX+/GTS250. I've tried mining x11 w/ Cudaminer& ccminer on 340.52, the latest driver w/ support for my GPUs but unfortunately, GPU utilization is ~ 1%, making Scrypt my only viable yet unprofitable option.
Thanks to anyone in advance for offering their help!
|
BTC:1Gk3p6KbCKiVhJYksaYPeAGL948rAsjmUS
|
|
|
platinum4
|
|
May 01, 2015, 06:36:49 AM |
|
It would be much appreciated if someone can confirm as to which algorithms/miners are supported by my Compute Capability 1.1 GPUs such as 9800GTX+/GTS250. I've tried mining x11 w/ Cudaminer& ccminer on 340.52, the latest driver w/ support for my GPUs but unfortunately, GPU utilization is ~ 1%, making Scrypt my only viable yet unprofitable option.
Thanks to anyone in advance for offering their help!
sgminer is for AMD GPU architectures...
|
|
|
|
Atomicat
Legendary
Offline
Activity: 952
Merit: 1002
|
|
May 01, 2015, 08:47:26 AM |
|
It would be much appreciated if someone can confirm as to which algorithms/miners are supported by my Compute Capability 1.1 GPUs such as 9800GTX+/GTS250. I've tried mining x11 w/ Cudaminer& ccminer on 340.52, the latest driver w/ support for my GPUs but unfortunately, GPU utilization is ~ 1%, making Scrypt my only viable yet unprofitable option.
Thanks to anyone in advance for offering their help!
You're looking for CudaMiner. You can find the thread here... https://bitcointalk.org/index.php?topic=167229.0
|
|
|
|
o00o
Member
Offline
Activity: 93
Merit: 11
|
|
May 01, 2015, 10:06:46 PM |
|
It would be much appreciated if someone can confirm as to which algorithms/miners are supported by my Compute Capability 1.1 GPUs such as 9800GTX+/GTS250. I've tried mining x11 w/ Cudaminer& ccminer on 340.52, the latest driver w/ support for my GPUs but unfortunately, GPU utilization is ~ 1%, making Scrypt my only viable yet unprofitable option.
Thanks to anyone in advance for offering their help!
sgminer is for AMD GPU architectures... I am well aware that sgminer is optimized for AMD GPUs but given that Cudaminer& ccminer work only w/ Scrypt which is currently dominated by ASICs, the difficulty is now too high for these GPUs to submit any shares even when connected to low difficulty servers. Despite the performance hit which OpenCL will introduce compared to CUDA, it seems to remain my best option unless someone can point me in the right direction. It would be much appreciated if someone can confirm as to which algorithms/miners are supported by my Compute Capability 1.1 GPUs such as 9800GTX+/GTS250. I've tried mining x11 w/ Cudaminer& ccminer on 340.52, the latest driver w/ support for my GPUs but unfortunately, GPU utilization is ~ 1%, making Scrypt my only viable yet unprofitable option.
Thanks to anyone in advance for offering their help!
You're looking for CudaMiner. You can find the thread here... https://bitcointalk.org/index.php?topic=167229.0Thanks for your input!
|
BTC:1Gk3p6KbCKiVhJYksaYPeAGL948rAsjmUS
|
|
|
_javi_
|
|
May 02, 2015, 05:43:11 PM |
|
Could somebody plz share the config settings for quarkcoin with 7850/7950 cards? thanks a lot!
|
|
|
|
MysteryX
Member
Offline
Activity: 98
Merit: 10
|
|
May 03, 2015, 06:53:23 AM |
|
Could somebody plz share the config settings for quarkcoin with 7850/7950 cards? thanks a lot!
I just use the same settings as x11 etc, but I change gpu-threads to 1, and worksize 256. I wish there was a better quark kernel, it's still wayyyy slower than nvidia cards.
|
|
|
|
smokiepot
|
|
May 03, 2015, 07:27:38 AM |
|
there is a better quark kernel. but the person who write this kernel doest make it public. look to the last sites here and you see it.
|
|
|
|
_javi_
|
|
May 03, 2015, 11:01:15 AM |
|
thanks for your answers, but i still cant get decent speeds
|
|
|
|
smokiepot
|
|
May 03, 2015, 12:18:21 PM |
|
3.8 mh/s with my r9 280 (without x ). how much you get ? wich hardware , driver , setting ?
|
|
|
|
djm34
Legendary
Offline
Activity: 1400
Merit: 1050
|
|
May 03, 2015, 12:32:53 PM |
|
new addition to sgminer (github/djm34/sgminer): yescrypt algo (binaries here: http://ge.tt/5AAmOfF2/v/0?c) there is 2 implementation of the algo --kernel yescrypt for amd --kernel yescrypt-multi for newer nvidia cards (compute 5.2 ie 900 serie for other cards ccminer should be better) see example.bat for example on how to use it
|
djm34 facebook pageBTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
|
|
|
JuanHungLo
|
|
May 03, 2015, 02:18:23 PM |
|
new addition to sgminer (github/djm34/sgminer): yescrypt algo (binaries here: http://ge.tt/5AAmOfF2/v/0?c) there is 2 implementation of the algo --kernel yescrypt for amd --kernel yescrypt-multi for newer nvidia cards (compute 5.2 ie 900 serie for other cards ccminer should be better) see example.bat for example on how to use it Thanks for uploading this. In order to get it to run I had to put a few more .dll in the folder, specifically:
|
Bull markets are born on pessimism, grow on skepticism, mature on optimism, and die on euphoria. - John Templeton
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
May 04, 2015, 07:53:39 AM |
|
why was the supposedly optimized quark and qubit sgminer post removed? was it a trojan or a wolf0's leak?
|
|
|
|
mitache365
|
|
May 04, 2015, 08:25:22 AM |
|
someone knows the 290x 8GB X11 hashrate ?
|
BTC
|
|
|
smokiepot
|
|
May 04, 2015, 09:03:14 AM |
|
why was the supposedly optimized quark and qubit sgminer post removed? was it a trojan or a wolf0's leak?
i think it was because there was a link to another board in there.
|
|
|
|
smolen
|
|
May 04, 2015, 11:47:16 AM Last edit: May 04, 2015, 01:14:19 PM by smolen |
|
why was the supposedly optimized quark and qubit sgminer post removed? was it a trojan or a wolf0's leak?
The author still distributes them via PM at original Russian forum aлгo/cк-ть вoльфa/cк-ть/пoтpeблeниe кoмпa c 1 кapтoй, Bт/yвeличeниe пoтpeблeния нa 1 кapтy, Bт x11/5900/7400/310/10 x13/4600/5700/310/5 qubit/5300/12500/285/35 quark/3000/12900/315/65
(algo/Wolf's speed/speed/Watt/Watt delta) EDIT: measured on 280x 1100/1500, wattage is for PC with 1 GPU, measurement is not mine
|
Of course I gave you bad advice. Good one is way out of your price range.
|
|
|
pallas
Legendary
Offline
Activity: 2716
Merit: 1094
Black Belt Developer
|
|
May 04, 2015, 12:08:45 PM |
|
why was the supposedly optimized quark and qubit sgminer post removed? was it a trojan or a wolf0's leak?
The author still distributes them via PM at original Russian forum aлгo/cк-ть вoльфa/cк-ть/пoтpeблeниe кoмпa c 1 кapтoй, Bт/yвeличeниe пoтpeблeния нa 1 кapтy, Bт x11/5900/7400/310/10 x13/4600/5700/310/5 qubit/5300/12500/285/35 quark/3000/12900/315/65
(algo/Wolf's speed/speed/Watt/Watt delta) is there a linux build? :-) kernels have been split and parameters changed so you can't just copy the bins over... :-/
|
|
|
|
|