Bitcoin Forum
March 19, 2024, 10:38:10 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 2 3 [4]  All
  Print  
Author Topic: collection for cgminer 7970 [Card as been sent! THANK YOU EVERYONE]  (Read 9742 times)
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 12, 2012, 12:16:26 PM
 #61

Working on the 7970 tuning, I have tried to port both the diapolo and diablo kernels to cgminer. Alas neither of them are actually working yet, so instead I started modifying the existing poclbm kernel in cgminer to improve throughput. This should work on other GPUs as well as the 7970, but I have no idea if it's better or worse than phatk.

When it's released it will get a new date/version number, but I haven't changed the number right now so that people can download it now and give it a try:
https://raw.github.com/ckolivas/cgminer/kernels/poclbm120203.cl

Remember to delete any .bin files and if you're not on a 7970 with the latest cgminer, you'll have to tell it to use that kernel with -k poclbm.

So what's the damage? Well on the 7970 at 1200/1050 clocks, which was getting 694MHash, it's now getting 711Mhash. The 7970 has this unusual behaviour where the hashrate slowly rises for the first 5-10 minutes.
The problem, I believe, is that the GCN cards are built to take large vector counts and perform a single instruction on them all at once.  This is in contrast to the small vector count of VLIW which can perform large instructions quickly.  So, a straight-forward 16-vector miner with simple instructions should work better than a 4-vector miner with multiple instructions.  Granted, this is speculation, but from what I've seen of the hardware specs, it should hold true.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
1710844690
Hero Member
*
Offline Offline

Posts: 1710844690

View Profile Personal Message (Offline)

Ignore
1710844690
Reply with quote  #2

1710844690
Report to moderator
1710844690
Hero Member
*
Offline Offline

Posts: 1710844690

View Profile Personal Message (Offline)

Ignore
1710844690
Reply with quote  #2

1710844690
Report to moderator
"With e-currency based on cryptographic proof, without the need to trust a third party middleman, money can be secure and transactions effortless." -- Satoshi
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1710844690
Hero Member
*
Offline Offline

Posts: 1710844690

View Profile Personal Message (Offline)

Ignore
1710844690
Reply with quote  #2

1710844690
Report to moderator
1710844690
Hero Member
*
Offline Offline

Posts: 1710844690

View Profile Personal Message (Offline)

Ignore
1710844690
Reply with quote  #2

1710844690
Report to moderator
1710844690
Hero Member
*
Offline Offline

Posts: 1710844690

View Profile Personal Message (Offline)

Ignore
1710844690
Reply with quote  #2

1710844690
Report to moderator
-ck
Legendary
*
Offline Offline

Activity: 4046
Merit: 1622


Ruu \o/


View Profile WWW
February 13, 2012, 06:16:45 AM
 #62

New release of cgminer 2.2.5 with fresh kernel and no more zero binary error, coping with multiple different cards at last and working well with sdk 2.6.

7970 running at 1200/1050+5% is getting 714 Mhash with -I 11 and cgminer 2.2.5 defaults.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck
Legendary
*
Offline Offline

Activity: 4046
Merit: 1622


Ruu \o/


View Profile WWW
February 13, 2012, 06:17:59 AM
 #63

The problem, I believe, is that the GCN cards are built to take large vector counts and perform a single instruction on them all at once.  This is in contrast to the small vector count of VLIW which can perform large instructions quickly.  So, a straight-forward 16-vector miner with simple instructions should work better than a 4-vector miner with multiple instructions.  Granted, this is speculation, but from what I've seen of the hardware specs, it should hold true.
I tried it, and a 16 vector simplest possible mining kernel had performance which was, unfortunately, appalling. GCN with SDK 2.6 (the only one it works with) really does not want any vectors at all.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 13, 2012, 10:00:52 AM
 #64

The problem, I believe, is that the GCN cards are built to take large vector counts and perform a single instruction on them all at once.  This is in contrast to the small vector count of VLIW which can perform large instructions quickly.  So, a straight-forward 16-vector miner with simple instructions should work better than a 4-vector miner with multiple instructions.  Granted, this is speculation, but from what I've seen of the hardware specs, it should hold true.
I tried it, and a 16 vector simplest possible mining kernel had performance which was, unfortunately, appalling. GCN with SDK 2.6 (the only one it works with) really does not want any vectors at all.
Hmm, interesting.  Did you drop the worksize in your tests as well?

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
-ck
Legendary
*
Offline Offline

Activity: 4046
Merit: 1622


Ruu \o/


View Profile WWW
February 13, 2012, 10:28:47 AM
 #65

The problem, I believe, is that the GCN cards are built to take large vector counts and perform a single instruction on them all at once.  This is in contrast to the small vector count of VLIW which can perform large instructions quickly.  So, a straight-forward 16-vector miner with simple instructions should work better than a 4-vector miner with multiple instructions.  Granted, this is speculation, but from what I've seen of the hardware specs, it should hold true.
I tried it, and a 16 vector simplest possible mining kernel had performance which was, unfortunately, appalling. GCN with SDK 2.6 (the only one it works with) really does not want any vectors at all.
Hmm, interesting.  Did you drop the worksize in your tests as well?
Absolutely. I tried all sorts of combinations. Specifically the optimum, as always, is using the card's reported max_work_size and then dividing that by the vector size. It gave the least worst performance... but we're talking 20% of the performance of running no vectors.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 14, 2012, 01:54:41 PM
 #66

The problem, I believe, is that the GCN cards are built to take large vector counts and perform a single instruction on them all at once.  This is in contrast to the small vector count of VLIW which can perform large instructions quickly.  So, a straight-forward 16-vector miner with simple instructions should work better than a 4-vector miner with multiple instructions.  Granted, this is speculation, but from what I've seen of the hardware specs, it should hold true.
I tried it, and a 16 vector simplest possible mining kernel had performance which was, unfortunately, appalling. GCN with SDK 2.6 (the only one it works with) really does not want any vectors at all.
Hmm, interesting.  Did you drop the worksize in your tests as well?
Absolutely. I tried all sorts of combinations. Specifically the optimum, as always, is using the card's reported max_work_size and then dividing that by the vector size. It gave the least worst performance... but we're talking 20% of the performance of running no vectors.
Weird.  I would have thought it to do better considering the 16-vector ALUs.  VLIW actually showed the best output for me so long as I used VECTORS8 and GOFFSET=false as I'm using an HD5450.  But that comes with new architectures I suppose.  I wish I could find more literature for programming for GCN, but it's so new that I can't find much.  Combine that with not being able to test the programming, and I might as well stick with modifying Phatk2 (which is taking a while considering life's little disruptions lately).  I'll save-up for a 7970 and try to figure out what I can do about GCN code.  I'm thinking about having the code alternate between two variables in case it has any read/write timing conflicts.  On the other hand, it probably won't do much more than add another GPR and a few more ALUs.  Considering its instruction execution timing, it probably won't matter.  Too many theories to be pondering at the same time.  I'll rest on it and see what I can come up with.
If you have any documentation, I would really appreciate it.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
-ck
Legendary
*
Offline Offline

Activity: 4046
Merit: 1622


Ruu \o/


View Profile WWW
February 15, 2012, 10:49:32 PM
 #67

cgminer 2.2.6 out. 2+ more mhash on 7970. It's getting harder and harder to extract much more Tongue

1200/1050+5% clocks, intensity 11 - 717 Mhash.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
d3m0n1q_733rz
Sr. Member
****
Offline Offline

Activity: 378
Merit: 250



View Profile WWW
February 16, 2012, 10:17:42 AM
 #68

cgminer 2.2.6 out. 2+ more mhash on 7970. It's getting harder and harder to extract much more Tongue

1200/1050+5% clocks, intensity 11 - 717 Mhash.

I'm going to have to look at what methods you're using.  I'm curious as to how the programming differs between VLIW and GCN.

Funroll_Loops, the theoretically quicker breakfast cereal!
Check out http://www.facebook.com/JupiterICT for all of your computing needs.  If you need it, we can get it.  We have solutions for your computing conundrums.  BTC accepted!  12HWUSguWXRCQKfkPeJygVR1ex5wbg3hAq
-ck
Legendary
*
Offline Offline

Activity: 4046
Merit: 1622


Ruu \o/


View Profile WWW
February 16, 2012, 10:19:27 AM
 #69

cgminer 2.2.6 out. 2+ more mhash on 7970. It's getting harder and harder to extract much more Tongue

1200/1050+5% clocks, intensity 11 - 717 Mhash.

I'm going to have to look at what methods you're using.  I'm curious as to how the programming differs between VLIW and GCN.
Direct link to the kernel in the git tree:
https://github.com/ckolivas/cgminer/blob/master/poclbm120214.cl

It's using a worksize of 256, 1 vector (i.e. no vectors) and intensity 11.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck
Legendary
*
Offline Offline

Activity: 4046
Merit: 1622


Ruu \o/


View Profile WWW
February 20, 2012, 05:27:03 AM
Last edit: February 20, 2012, 06:18:41 AM by ckolivas
 #70

Cgminer 2.2.7 out. This version fixes the bug with 12.2 ATI drivers. Reworked to use -w 64 by default on Tahiti which is worth just under 1 more MHash. 718.5 MHash now at 1200/1050+5% clocks intensity 11.

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
-ck
Legendary
*
Offline Offline

Activity: 4046
Merit: 1622


Ruu \o/


View Profile WWW
March 31, 2012, 05:20:08 AM
Last edit: March 31, 2012, 06:09:00 AM by ckolivas
 #71

cgminer 2.3.2 is out with a new poclbm kernel I've been bashing with a mallet for a week to try and extract some more out of it, and I hit my target which is 720 MHash at 1200/1050+5% clocks intensity 11 with 12.3 amd drivers.  Grin

Developer/maintainer for cgminer, ckpool/ckproxy, and the -ck kernel
2% Fee Solo mining at solo.ckpool.org
-ck
jjiimm_64 (OP)
Legendary
*
Offline Offline

Activity: 1876
Merit: 1000


View Profile
March 31, 2012, 12:51:00 PM
 #72

you da man....


I am stuck on windows with untill I can get linuxcoin to cooperate.

at   1125/1000  I11     672Mh   I am happy with that..  I have not tried to clock it up..  think I will try that now.

1jimbitm6hAKTjKX4qurCNQubbnk2YsFw
Pages: « 1 2 3 [4]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!