Bitcoin Forum
June 29, 2024, 02:36:46 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 ... 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 [537] 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 ... 1135 »
  Print  
Author Topic: [ANN] cudaMiner & ccMiner CUDA based mining applications [Windows/Linux/MacOSX]  (Read 3426880 times)
Bearclaw
Newbie
*
Offline Offline

Activity: 52
Merit: 0


View Profile
April 09, 2014, 01:54:01 PM
 #10721


I should try djm34 version because I don't get that sort of performance on scrypt with my gtx780ti  Smiley
For scrypt-jane try to autotune first

 Grin djm... it was a EVGA 780Ti Superclocked with Skynet bios.  OC was +135 (total 1180.6).

The autotune for YAC works on my 780, so I will try the 750's when I get home tonight, as I have to leave for work in a few.

ltcnim
Legendary
*
Offline Offline

Activity: 914
Merit: 1001



View Profile
April 09, 2014, 02:01:14 PM
Last edit: April 09, 2014, 02:11:21 PM by ltcnim
 #10722

Hmm, Linux compilation of cudaminer is borked.

cpu-miner.c won't build, most likely because Alexey used C++'isms... which works on Windows because I complile this module with the /TP flag in order to trick it into allowing inline delarations of variables (and other things) requiring C99 support which Visual Studio 2010 is lacking)

EDIT: it's fixed!

Christian


just compiled it under ubuntu 13.10 server, but something seems to be very wrong. It looks like only one card is used out of my 5x750Ti rig. Even with -d 0,1,2,3,4 only one card seem to be working (i guess that by the cards temperatures) using scrypt-n:



Edit: switched back to tagged release 2014-2-28 and everything is back to normal.

cbuchner1 (OP)
Hero Member
*****
Offline Offline

Activity: 756
Merit: 502


View Profile
April 09, 2014, 02:11:27 PM
 #10723

just compiled it under ubuntu 13.10 server, but something seems to be very wrong. It looks like only one card is used out of my 5x750Ti rig. Even with -d 0,1,2,3,4 only one card seem to be working (i guess that by the cards temperatures):

well he made substantial changes in unfamiliar code with only a few days to spare, so it's expected that he broke a couple of things Wink  Don't worry, I'll clean up.

Christian
ltcnim
Legendary
*
Offline Offline

Activity: 914
Merit: 1001



View Profile
April 09, 2014, 02:13:34 PM
 #10724

just throw a line here, when we can test it again, I'll happily help testing as much as I can.

bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
April 09, 2014, 03:02:11 PM
 #10725



how are you getting the reported hashrate?
one thing i have wanted is to be able to read the last reported hashrate for each cudaminer instance

Owner of: cudamining.co.uk
ltcnim
Legendary
*
Offline Offline

Activity: 914
Merit: 1001



View Profile
April 09, 2014, 03:10:42 PM
 #10726

parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
April 09, 2014, 03:55:07 PM
 #10727

parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs

Owner of: cudamining.co.uk
ltcnim
Legendary
*
Offline Offline

Activity: 914
Merit: 1001



View Profile
April 09, 2014, 04:15:05 PM
 #10728

parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs

under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy.

ManIkWeet
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
April 09, 2014, 04:56:24 PM
 #10729

parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs

under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy.
Can even be done in Java

BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
NuggetFlipper
Newbie
*
Offline Offline

Activity: 4
Merit: 0


View Profile
April 09, 2014, 05:24:34 PM
 #10730

I have a gt650m, using cudaminer from feb 9.
Whenever set clocks below 800 mhz, it mines at 65 C no problem at 65k/h
when I increase the clocks even a little bit, the temps slowly rise to 90-100 C (wtf?) and its mining rate doesn't even change significantly
Is anyone else having this issue?

I would also appreciate it if cudaminer had backup pool support...
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
April 09, 2014, 05:26:52 PM
Last edit: April 09, 2014, 06:42:52 PM by bigjme
 #10731

parsing the cudaminer/ccminer output from a detached screen. since you can name a screen instance, you can easily get the output from each detached screen instance.

straight over my head :p
im stuck in windows so i am pretty limited. hmm i wonder if i can go through the cudaminer code and see where it reports hashrate and edit the outputs

under windows, you could write a little app which starts the cudaminer instances and then reads from stdout. should be easy.

i will have to have a look. writing websites i can do. writing and manipulating batch files, im not so good at
ok then so i have been able to write a code that will read the last reported hashrate from a file, the only problem is that after a while the file will get large, so i would need the file to be overwritten with every output line cudaminer gives, that part i can not figure out

soooo much easier if someone was able to figure out how to add in a curl to cudaminer :-(

Owner of: cudamining.co.uk
scriptfu
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
April 09, 2014, 07:00:33 PM
Last edit: April 09, 2014, 08:36:34 PM by scriptfu
 #10732

I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched). I based block size on a multiple of SMs per card (e.g. 110 * 5 SMs on 750ti == 550).

Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards):

Code:
‡ is default launch config.
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
|         || blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+=========++========+=========+===================+==================+=================+=================+==================+
| best    ||   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| default || ‡ 683  |   768   |       13987       |        17        |       16        |        1        |       94         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| diff    ||  -133  |    -    |       +2794       |       +15        |      +12        |       +3        |       -7         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled:
  • Using CUDA 6 RC
  • Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
  • Removed maxrregcount to let compiler choose register count

The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharing

Tests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58ae

edits:
  • added default block size baseline for comparison
  • clarified block size calculation
  • added ± diff comparison
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
April 09, 2014, 07:04:59 PM
 #10733

Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

almost 17MH/s for 1 750Ti?

Owner of: cudamining.co.uk
scriptfu
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
April 09, 2014, 07:10:07 PM
 #10734

Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

almost 17MH/s for 1 750Ti?

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
bigjme
Sr. Member
****
Offline Offline

Activity: 350
Merit: 250


View Profile
April 09, 2014, 07:11:18 PM
Last edit: April 09, 2014, 07:24:37 PM by bigjme
 #10735

How on earth did you manage that? We havent been able to get over 13Mh/s

Christian i found an example on how to implement a rpc into a c++ program, i will try to see if i can get it working but i don't have a clue what i am doing  Cool

Owner of: cudamining.co.uk
ManIkWeet
Full Member
***
Offline Offline

Activity: 182
Merit: 100


View Profile
April 09, 2014, 07:35:00 PM
 #10736

Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

almost 17MH/s for 1 750Ti?

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
Is this with or without the failed hashes included?

BTC donations: 18fw6ZjYkN7xNxfVWbsRmBvD6jBAChRQVn (thanks!)
djm34
Legendary
*
Offline Offline

Activity: 1400
Merit: 1050


View Profile WWW
April 09, 2014, 07:43:52 PM
 #10737

I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched).

Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards):
Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled:
  • Using CUDA 6 RC
  • Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
  • Removed maxrregcount to let compiler choose register count

The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharing

Tests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58ae

Can you add to your table what are the change between each lines.
Did you try on windows ?
Must say I am a bit surprise bu the 23MHash/s. You should run a little longer to make sure everything is stable.

djm34 facebook page
BTC: 1NENYmxwZGHsKFmyjTc5WferTn5VTFb7Ze
Pledge for neoscrypt ccminer to that address: 16UoC4DmTz2pvhFvcfTQrzkPTrXkWijzXw
scriptfu
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
April 09, 2014, 07:54:44 PM
 #10738

How on earth did you manage that? We havent been able to get over 13Mh/s

Just by benchmarking various launch configs until I found one that worked well, in addition to the other changes I listed in my original post. I modified the hefty_cpu_hash function in cuda_hefty1.cu. Changes made are expressed in this diff: https://gist.github.com/danryan/6a631e0ece773e5f6788

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
Is this with or without the failed hashes included?

Could you clarify what you mean by failed hashes? If you're referring to ones that didn't pass CPU validation, yes they are included in the hashrate average, but they are not included in the share metrics (I care more about these, as these are the canonical numbers by which one gets credited for work).
scriptfu
Newbie
*
Offline Offline

Activity: 19
Merit: 0


View Profile
April 09, 2014, 08:03:11 PM
Last edit: April 09, 2014, 08:23:36 PM by scriptfu
 #10739

Can you add to your table what are the change between each lines.

Sure! I will update have updated my post accordingly.

Did you try on windows ?

I haven't because I do not have a Windows rig, and likely will not test this because I do not want to reimage or deal with Windows taking over my boot record Smiley See my diff in a previous post above for the changes I made. If you are capable of compiling this, I'd be very curious to see the results.

Must say I am a bit surprise bu the 23MHash/s. You should run a little longer to make sure everything is stable.

Configurations with the highest hashrates were stable enough to run in the sense that the program would not crash, however they were not stable enough to provide valid shares. For instance, 384 blocks x 768 threads @ 23213 khash/s attempted 27 shares, but only 16 were valid (less than half that of the 550x768 config).
zelante
Full Member
***
Offline Offline

Activity: 263
Merit: 100



View Profile
April 09, 2014, 08:40:23 PM
 #10740


 Grin
Pages: « 1 ... 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 [537] 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 ... 1135 »
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!