Bitcoin Forum
June 18, 2025, 01:35:41 PM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 [249] 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347845 times)
ryen123
Sr. Member
****
Offline Offline

Activity: 292
Merit: 250


View Profile
August 16, 2015, 02:52:47 PM
 #4961

Hey there,

Thanks for having a look at my CUDA kernel. I actually took the keccak and ROTL64 code from your ccminer fork (or was it someone else's fork..not sure) a few months ago, so I'm surprised it can even be faster Smiley.

Unfortunately I don't have a working Maxwell development environment to build and test, only Kepler.



The bug causing low hashrate on windows hope it can be solved.

skunk
Sr. Member
****
Offline Offline

Activity: 329
Merit: 250


View Profile
August 16, 2015, 03:20:59 PM
 #4962

Hey there,

Thanks for having a look at my CUDA kernel. I actually took the keccak and ROTL64 code from your ccminer fork (or was it someone else's fork..not sure) a few months ago, so I'm surprised it can even be faster Smiley.

Unfortunately I don't have a working Maxwell development environment to build and test, only Kepler.


i think making ethminer less cpu intensive (full cpu usage on a G3240 processor) would boost performance considerably as ccminer barely uses the cpu...

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
August 16, 2015, 03:34:56 PM
 #4963

To improve the cpu usage you can try to add:

cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);

like this:

Code:
void run_ethash_search(
uint32_t blocks,
uint32_t threads,
cudaStream_t stream,
uint32_t* g_output,
hash32_t const* g_header,
hash128_t const* g_dag,
uint64_t start_nonce,
uint64_t target
)
{
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
ethash_search
#if __CUDA_ARCH__ >= SHUFFLE_MIN_VER
ethash_search <<<blocks, threads, 0, stream >>>(g_output, g_header, g_dag, start_nonce, target);
#else
ethash_search <<<blocks, threads, (sizeof(compute_hash_share) * threads) / THREADS_PER_HASH, stream>>>(g_output, g_header, g_dag, start_nonce, target);
#endif
}

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
August 16, 2015, 03:36:35 PM
 #4964

Nice work SP , I wish to test it , but can't compile unfortunately...
copy cuda_helper.h from sp's ccminer into libethash-cu directory and add the following line on top of keccak.cuh:
Code:
__constant__ uint64_t keccak_round_constants[24];
unfortunately no improvements on a 6 750ti rig (~52 mhs)

Use cuda 6.5. and remember to compile for compute5.0 (Seems like compute5.0 is included in (CMakeLists.txt) ). Use the version 2nd version I posted with #pragma unroll and bitselect.

And add cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync); If your cpu is weak high cpu will slow down alot.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
bensam1231
Legendary
*
Offline Offline

Activity: 1806
Merit: 1024


View Profile
August 16, 2015, 03:43:20 PM
 #4965

Ethereum has no GUI wallet, little windows support, difficult configurations..
I think it's too early to call it a coin, but it seems to have some money support on the exchanges. (pump)
A waste of time.

(Is Ethereum a volunteer botnet?) Join us and we will give you some cash every day.

lol... a lot of promises and hype, little to show for it. It reminds me of Star Citizen. But I go where the money does, so its' what I've been mining.

Also this coin really sucks without pools. Solo mining is for big miners or you get shafted hard. The one pool that is available has closed source code. If they wanted to be fair they should kick off everyone who has greater > 1GH. They can effectively solo mine quite easily. You know, if they actually cared about 51% attacks and being fair.

I buy private Nvidia miners. Send information and/or inquiries to my PM box.
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
August 16, 2015, 03:46:03 PM
Last edit: August 16, 2015, 04:02:54 PM by sp_
 #4966

Hey there,
Thanks for having a look at my CUDA kernel. I actually took the keccak and ROTL64 code from your ccminer fork (or was it someone else's fork..not sure) a few months ago, so I'm surprised it can even be faster Smiley.

It must have been another fork. In my latest fork of ccminer(sp-mod 60) (maxcoin) Keccak256 (uint2/bitselect) is running at:

400MHASH gtx 970
155MHASH gtx 750ti

In old cudaminer it was something like 70MHASH on the 750ti and 200 on the 970.

Found a doc with the old keccak hashrates (maxcoin) in cudaminer:

In the old cudaminer a 780ti is doing 218MHASH (250watt)

https://docs.google.com/spreadsheets/d/1BIaD-12rmsoz3t64k3_hU79morafOURkEUjXKSdJVLo/edit?pli=1#gid=0


These algos can not be compared directly

The ether algo is doing a double round of modded keccak512 (24 iterations) + some some other stuff.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
skunk
Sr. Member
****
Offline Offline

Activity: 329
Merit: 250


View Profile
August 16, 2015, 04:02:35 PM
 #4967

To improve the cpu usage you can try to add:

cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);

like this:

Code:
void run_ethash_search(
uint32_t blocks,
uint32_t threads,
cudaStream_t stream,
uint32_t* g_output,
hash32_t const* g_header,
hash128_t const* g_dag,
uint64_t start_nonce,
uint64_t target
)
{
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
ethash_search
#if __CUDA_ARCH__ >= SHUFFLE_MIN_VER
ethash_search <<<blocks, threads, 0, stream >>>(g_output, g_header, g_dag, start_nonce, target);
#else
ethash_search <<<blocks, threads, (sizeof(compute_hash_share) * threads) / THREADS_PER_HASH, stream>>>(g_output, g_header, g_dag, start_nonce, target);
#endif
}

this change causes this:
Code:
  ✘  17:49:23|cudaminer1  Error CUDA mining: cannot set while device is active in this process
  ✘  17:49:23|cudaminer2  Error CUDA mining: cannot set while device is active in this process
  ✘  17:49:23|cudaminer3  Error CUDA mining: cannot set while device is active in this process
  ✘  17:49:23|cudaminer0  Error CUDA mining: cannot set while device is active in this process
  ✘  17:49:23|cudaminer4  Error CUDA mining: cannot set while device is active in this process
  ✘  17:49:23|cudaminer5  Error CUDA mining: cannot set while device is active in this process

skunk
Sr. Member
****
Offline Offline

Activity: 329
Merit: 250


View Profile
August 16, 2015, 04:05:02 PM
 #4968

Nice work SP , I wish to test it , but can't compile unfortunately...
copy cuda_helper.h from sp's ccminer into libethash-cu directory and add the following line on top of keccak.cuh:
Code:
__constant__ uint64_t keccak_round_constants[24];
unfortunately no improvements on a 6 750ti rig (~52 mhs)

Use cuda 6.5. and remember to compile for compute5.0 (Seems like compute5.0 is included in (CMakeLists.txt) ). Use the version 2nd version I posted with #pragma unroll and bitselect.

And add cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync); If your cpu is weak high cpu will slow down alot.
never moved away from cuda 6.5 and yes, i'm sure it's compiled for compute 5.0...

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
August 16, 2015, 04:08:49 PM
 #4969

sorry

Add it after cudadevicereset()

in the ethash_cu_miner.cpp

Code:
	cudaDeviceReset();
// cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte);
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
cudaDeviceSetCacheConfig(cudaFuncCachePreferL1);

for maxwell comment out cudaSharedMemBankSizeEightByte and add cudaFuncCachePreferL1

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
ryen123
Sr. Member
****
Offline Offline

Activity: 292
Merit: 250


View Profile
August 16, 2015, 04:18:00 PM
 #4970

sorry

Add it after cudadevicereset()

in the ethash_cu_miner.cpp

Code:
	cudaDeviceReset();
// cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte);
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
cudaDeviceSetCacheConfig(cudaFuncCachePreferL1);

for maxwell comment out cudaSharedMemBankSizeEightByte and add cudaFuncCachePreferL1

Wow cpu utilization dropped to zero.

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
August 16, 2015, 04:26:34 PM
 #4971

sorry

Add it after cudadevicereset()

in the ethash_cu_miner.cpp

Code:
	cudaDeviceReset();
// cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte);
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
cudaDeviceSetCacheConfig(cudaFuncCachePreferL1);

for maxwell comment out cudaSharedMemBankSizeEightByte and add cudaFuncCachePreferL1

Wow cpu utilization dropped to zero.

And the speed? Any faster?

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
ryen123
Sr. Member
****
Offline Offline

Activity: 292
Merit: 250


View Profile
August 16, 2015, 04:33:34 PM
 #4972

And the speed? Any faster?

I compiled for windows with cuda 6.5. There is a windows bug in it, my 750Ti is only at 2.2+MHs instead of the possible 9+MHs.

skunk
Sr. Member
****
Offline Offline

Activity: 329
Merit: 250


View Profile
August 16, 2015, 04:35:06 PM
 #4973

sorry

Add it after cudadevicereset()

in the ethash_cu_miner.cpp

Code:
	cudaDeviceReset();
// cudaDeviceSetSharedMemConfig(cudaSharedMemBankSizeEightByte);
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
cudaDeviceSetCacheConfig(cudaFuncCachePreferL1);

for maxwell comment out cudaSharedMemBankSizeEightByte and add cudaFuncCachePreferL1
yess! that did it Smiley
cpu is almost idle now but performance wise the hash rate is still the same (even with your keccak changes)

hashbrown9000
Sr. Member
****
Offline Offline

Activity: 427
Merit: 250


View Profile
August 16, 2015, 04:36:13 PM
 #4974

submitted a pull request for file util.cpp that prevents compiling under linux.  somehow it shows up under tpruvot's fork. can you check, @sp_ ?

Pinkcoin:
ETH:
VTC:
BTC:
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
August 16, 2015, 04:46:42 PM
 #4975

There is no projectfile for visual studio 2013. and the cuda 7.0 compiler is slow. Perhaps 30% faster with cuda 6.5.

You can tip the crypto mining blog, they have compiled a windows version before..

Ask them to build for cuda 6.5

doesn't make much of a difference actually, everything is slowed down my mem load/store. (moving to 32bit would probably help a little as well but get some problem with some dll)
actually can't get my 780ti/750ti to work properly because of the high mem allocation (which actually isn't so high... but it doesn't like that)
 

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
August 16, 2015, 05:06:22 PM
 #4976

Nvidia fixed a memory allocation bug in one of the latest drivers. 353.30 or newer.you should upgrade

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
August 16, 2015, 05:11:10 PM
 #4977

And the speed? Any faster?

I compiled for windows with cuda 6.5. There is a windows bug in it, my 750Ti is only at 2.2+MHs instead of the possible 9+MHs.

Did you build with debug? Smiley then build release. I think the kernal need some tuning. Might work on it later. I see work has been done by tvprovut in his fork.

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
skunk
Sr. Member
****
Offline Offline

Activity: 329
Merit: 250


View Profile
August 16, 2015, 05:25:53 PM
 #4978

There is no projectfile for visual studio 2013. and the cuda 7.0 compiler is slow. Perhaps 30% faster with cuda 6.5.

You can tip the crypto mining blog, they have compiled a windows version before..

Ask them to build for cuda 6.5

doesn't make much of a difference actually, everything is slowed down my mem load/store. (moving to 32bit would probably help a little as well but get some problem with some dll)
actually can't get my 780ti/750ti to work properly because of the high mem allocation (which actually isn't so high... but it doesn't like that)
 
under linux ethminer allocates 1087mb on each gpu and almost 800mb on the system, i wonder if it's moving hashes between cpu and gpu's memory...

ryen123
Sr. Member
****
Offline Offline

Activity: 292
Merit: 250


View Profile
August 16, 2015, 05:30:51 PM
 #4979

And the speed? Any faster?

I compiled for windows with cuda 6.5. There is a windows bug in it, my 750Ti is only at 2.2+MHs instead of the possible 9+MHs.

Did you build with debug? Smiley then build release. I think the kernal need some tuning. Might work on it later. I see work has been done by tvprovut in his fork.

I always build release. From what I've read on the other forum, the same code works correctly in linux. Others that run in windows get bad hashrates. Whatever the problem is, is beyond my understanding.

My system: Win10, Nvidia driver 355.60, CUDA 6.5

skunk
Sr. Member
****
Offline Offline

Activity: 329
Merit: 250


View Profile
August 16, 2015, 05:37:40 PM
 #4980

And the speed? Any faster?

I compiled for windows with cuda 6.5. There is a windows bug in it, my 750Ti is only at 2.2+MHs instead of the possible 9+MHs.

Did you build with debug? Smiley then build release. I think the kernal need some tuning. Might work on it later. I see work has been done by tvprovut in his fork.

I always build release. From what I've read on the other forum, the same code works correctly in linux. Others that run in windows get bad hashrates. Whatever the problem is, is beyond my understanding.

My system: Win10, Nvidia driver 355.60, CUDA 6.5
windows 10 is the problem, it sends the missing hashes to microsoft's pool  Grin

Pages: « 1 ... 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 [249] 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!