Bitcoin Forum
October 07, 2025, 12:27:00 PM *
News: Latest Bitcoin Core release: 29.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 [371] 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 ... 1240 »
  Print  
Author Topic: CCminer(SP-MOD) Modded GPU kernels.  (Read 2347983 times)
chrysophylax
Legendary
*
Offline Offline

Activity: 3122
Merit: 1093


--- ChainWorks Industries ---


View Profile WWW
November 05, 2015, 06:19:28 AM
 #7401


the argument is not whether the the auction is the casue of the price rise or not - it is about the reason WHY its being pumped ...


I'm not sure it is being pumped. Pumping altcoins is relatively easy but bitcoin takes a lot more capital. Not government
size, not megacorp size, but players currently bigger than anyone else currently in the crypto market to be able to
manipulate the price of a USD 6B security by 100%.

Like I said before I'm liking the idea that the Chinese are turning to bitcoin to circumvent currency controls.
If so it's a big step forward for crypto.

that could be the case mate ...

which ever way - we are all be speculating anyhow ...

the beauty about it all is - that btc IS moving - and moving very well at the moment ...

if you injected another few million dollars ( like the winklevoss bros did a couple of years back with their $11million ) - then the movement would be very nice upwards again ...

ive never really been hung up about btc / fiat prices ... coins are coins are coins ... you can use them all if the service giver is willing to accept the coin you are willing to trade ... so its no major issue on this end ... i just like to watch - beats the hell out of sports ...

btc however IS a serious confronting notion to the big players in this world - the ones that do control markets ...

so if it ever came out that 'they' were involved in the fiat price fluctuations - i wouldnt bat an eyelid over it ...

give me my partner - good food - my place - my farm - and my sports car - and im a very happy guy which ever way it turns out ...

#crysx

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
November 05, 2015, 07:24:46 AM
Last edit: November 05, 2015, 07:43:23 AM by sp_
 #7402

According to  http://coinmarketcap.com. Bitcoins  has been traded for  $259,625,000 in the last 24Hours. But this webpage doesn't include the biggest chinese exchanges like OKCoin, huobi, BTCChina and Bter

The bitcoin USD market is just a small fraction of the world market. In the recent rally  more than 1 billion$ worth of bitcoins has been traded in a couple of days.

Another interesting point is that there is Arbitrage between the CNY exchanges and the USD
   
Huobi (china) $431.73/ (2740.21 CNY)
Bitstamp       $389.98

Bitcoins are 10% more expensive to buy in china than in Europe / America. Perhaps because somebody is buying in china and selling in europe..
Perhaps someone is moving fiat currency trough the bitcoin blockchain between borders without having any governement control...

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
November 05, 2015, 08:34:23 AM
 #7403

and the price rebounded ;-)

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
November 05, 2015, 09:21:04 AM
 #7404

And the altocoins are up, and the mining profit meassured in BTC is up..

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
cbuchner1
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
November 05, 2015, 09:47:24 AM
 #7405


Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian
pallas
Legendary
*
Offline Offline

Activity: 2716
Merit: 1094


Black Belt Developer


View Profile
November 05, 2015, 10:43:35 AM
 #7406


Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian


I tried it some months ago on a 970 and it worked, but you loose the auto clock tuning based on temperature and fan speed, which you may or may not like ;-)
I personally like it and ended up with better hashrate using the default settings than changing it with nvidia-smi, but YMMV.

sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
November 05, 2015, 11:03:43 AM
Last edit: November 05, 2015, 11:15:39 AM by sp_
 #7407

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
November 05, 2015, 11:30:15 AM
 #7408

Testing sp-mod release 72 opensource

On the GTX 970 OC windforce G1 boost: 1505Mhz core clock I get

19MHASH in the quark algo.

The titanx reference is doing 20.8MHASH

(source http://cryptomining-blog.com/tag/gtx-titan-x-hashrate/)


But the Titanx costs $1000 and the gtx970 1/3rd of that..


So I get 3 gtx970's for the price of one titanx. and almost 3x the hashrate and income of the $1000 spent...

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
November 05, 2015, 11:36:17 AM
 #7409

With a $10 000 rig:



Only

188MHASH@quark

But with 30 970 cards (will cost around $10 000)

you get

540 - 570MHASH@quark

Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
tbearhere
Legendary
*
Offline Offline

Activity: 3304
Merit: 1003



View Profile
November 05, 2015, 03:41:59 PM
 #7410

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided
tbearhere
Legendary
*
Offline Offline

Activity: 3304
Merit: 1003



View Profile
November 05, 2015, 03:43:47 PM
 #7411


Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian

Hi Christian, how you been?.... Thank you for the info.
tbearhere
Legendary
*
Offline Offline

Activity: 3304
Merit: 1003



View Profile
November 05, 2015, 03:46:14 PM
 #7412

And the altocoins are up, and the mining profit meassured in BTC is up..
I have twice the hash power now and making the same amount.  Undecided
tbearhere
Legendary
*
Offline Offline

Activity: 3304
Merit: 1003



View Profile
November 05, 2015, 04:02:26 PM
 #7413

The EVGA_PrecisionX_16 does support more than 3 cards.....
Genoil
Sr. Member
****
Offline Offline

Activity: 438
Merit: 250


View Profile
November 05, 2015, 04:12:22 PM
 #7414

Thanks Smiley
I am looking at the Etherum miner now. I have some improvements.
Very curious what you come up with. I hope you can challenge me to look at the code once again, Kind of lost interest with the whole TLB trashing thing going on on Windows.

in the dagger.cuh:

__device__ uint4 fnv4(uint4 a, uint4 b)
{
   uint4 c;
   c.x = a.x * FNV_PRIME ^ b.x;
   c.y = a.y * FNV_PRIME ^ b.y;
   c.z = a.z * FNV_PRIME ^ b.z;
   c.w = a.w * FNV_PRIME ^ b.w;
   return c;
}


Since a.x*2^24= a.x<<24


This can be rewritten to:

__device__ uint4 fnv4(uint4 a, uint4 b)
{
   c.x = sharedmemprecalc[a.x&0xff]^ b.x;
   c.y = sharedmemprecalc[a.y&0xff] ^ b.y;
   c.z = sharedmemprecalc[a.z&0xff] ^ b.z;
   c.w = sharedmemprecalc[a.w&0xff] ^ b.w;
   return c;
}

The precalcbuffer must be 32bit  (256*4 bytes) and the  values shifted by 24 bits (shared mem level1cache):

xx000000

Code:
		__shared__ uint32_t sharedmemprecalc[256 * 4];
for (int i = 0; i<256; i++)
{
sharedmemprecalc[i] = (193 * i) << 24;  // Since the FNV_PRIME is a high number the 24 highest bits of the product are ignored. We only need to know the 8 low bits.
}





since you ony need to read 1 byte and not the whole 4 bytes (32 bits), you might be able to solve it with 1/4th of the memory reads...

__device__ uint4 fnv4(uchar4 a, uint4 b)
{
   c.x = sharedmemprecalc[a.x]^ b.x;
   c.y = sharedmemprecalc[a.y] ^ b.y;
   c.z = sharedmemprecalc[a.z] ^ b.z;
   c.w = sharedmemprecalc[a.w] ^ b.w;
   return c;
}

But you might have to reorganize /scramble the memory. and read 32 bit lineary in one read to fill the uchar4.



I looked into this a for a bit. I rewrote it like this:

block size == 128
lower byte of FNV_PRIME is 147, not 193. (0x01000193 & 0xFF = 0x93 == 147)

Code:
__shared__ uint32_t sharedmemprecalc[256];

In compute_hash_shuffle:

sharedmemprecalc[threadIdx.x] = (147 * threadIdx.x) << 24; 
sharedmemprecalc[threadIdx.x + 128] = (147 * (threadIdx.x + 128)) << 24;
__syncthreads();

And this unmodified:

__device__ uint4 fnv4s(uint4 a, uint4 b)
{
uint4 c;
c.x = sharedmemprecalc[a.x & 0xff] ^ b.x;
c.y = sharedmemprecalc[a.y & 0xff] ^ b.y;
c.z = sharedmemprecalc[a.z & 0xff] ^ b.z;
c.w = sharedmemprecalc[a.w & 0xff] ^ b.w;
return c;
}

It doesn't work. But more importantly, your suggestion of only requiring the lower byte from the DAG entry seems wrong, since that's param b in the fnv function...


ETH: 0xeb9310b185455f863f526dab3d245809f6854b4d
BTC: 1Nu2fMCEBjmnLzqb8qUJpKgq5RoEWFhNcW
scryptr
Legendary
*
Offline Offline

Activity: 1798
Merit: 1028



View Profile WWW
November 05, 2015, 04:19:23 PM
 #7415

The EVGA_PrecisionX_16 does support more than 3 cards.....

MAXIMUM OF FOUR CARDS--

It only supports four at the most.  I had to remove EVGA PrecisionX 16 from my rig when I added a fifth card, it complained on boot-up.       --scryptr

SCRYPTR'S NOTEBOOK: https://bitcointalk.org/index.php?topic=5035515.msg46035530#msg46035530
GITHUB: "github.com/scryptr"  MERIT is appreciated, also.  Thanks!
tbearhere
Legendary
*
Offline Offline

Activity: 3304
Merit: 1003



View Profile
November 05, 2015, 04:26:43 PM
Last edit: November 05, 2015, 04:58:09 PM by tbearhere
 #7416

The EVGA_PrecisionX_16 does support more than 3 cards.....

MAXIMUM OF FOUR CARDS--

It only supports four at the most.  I had to remove EVGA PrecisionX 16 from my rig when I added a fifth card, it complained on boot-up.       --scryptr
Ok ...scryptr what are you using in place of it.   Thx
EDIT: It was complaining about the 4th card though.... that's what I have in there.

dominuspro
Full Member
***
Offline Offline

Activity: 201
Merit: 100


View Profile
November 05, 2015, 05:02:55 PM
 #7417

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.
tbearhere
Legendary
*
Offline Offline

Activity: 3304
Merit: 1003



View Profile
November 05, 2015, 05:19:23 PM
 #7418

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.
Thx....I'll give it a try.
sp_ (OP)
Legendary
*
Offline Offline

Activity: 2926
Merit: 1087

Team Black developer


View Profile
November 05, 2015, 05:34:30 PM
 #7419

I looked into this a for a bit. I rewrote it like this:
block size == 128
lower byte of FNV_PRIME is 147, not 193. (0x01000193 & 0xFF = 0x93 == 147)

Then the precalc table would have to be 0x193 * 4 bytes big.
Quote
   c.x = sharedmemprecalc[a.x & 0x1ff] ^ b.x;
   c.y = sharedmemprecalc[a.y & 0x1ff] ^ b.y;
   c.z = sharedmemprecalc[a.z & 0x1ff] ^ b.z;
   c.w = sharedmemprecalc[a.w & 0x1ff] ^ b.w;

But there is a bether way.

It doesn't work. But more importantly, your suggestion of only requiring the lower byte from the DAG entry seems wrong, since that's param b in the fnv function...

I haven't started on my mod yet. But it wlll come later.. I can remove many isntructions, but I cannot remove the memory latency..


Team Black Miner (ETHB3 ETH ETC VTC KAWPOW FIROPOW EVRPROGPOW MEOWPOW + dual mining + tripple mining.. https://github.com/sp-hash/TeamBlackMiner
bensam1231
Legendary
*
Offline Offline

Activity: 1848
Merit: 1024


View Profile
November 05, 2015, 05:55:25 PM
 #7420


Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian


I tried it some months ago on a 970 and it worked, but you loose the auto clock tuning based on temperature and fan speed, which you may or may not like ;-)
I personally like it and ended up with better hashrate using the default settings than changing it with nvidia-smi, but YMMV.

Keep in mind there are only a handful of algos that get a boost out of memory speed... Juicing your memory on a 970 adds like a extra 20w~.

And the altocoins are up, and the mining profit meassured in BTC is up..
I have twice the hash power now and making the same amount.  Undecided

Yeah they're still playing with BTC... probably will be a week or two before altcoins rebound, depending on how long they mess around with BTC. All hands are on deck for BTC right now.

I buy private Nvidia miners. Send information and/or inquiries to my PM box.
Pages: « 1 ... 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 [371] 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 ... 1240 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!