Bitcoin Forum
December 11, 2024, 11:42:51 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Looking for FPGA cgminer testers.  (Read 3790 times)
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 18, 2012, 07:16:07 PM
Last edit: June 18, 2012, 09:19:17 PM by P_Shep
 #1

I've been working on many MANY improvements for BFL support in cgminer.
I think I'm about ready to submit a pull request into cgminer, but I still need to test how it operates with other FPGAs.
Essentially it should have no effect on Icarus, ztex and ModMiner, but since don't have any of those, I can't test them so I'm hoping some kind volunteers could compile my fork here, and run it for a while:
https://github.com/pshep/cgminer

The most significant change for FPGAs is the inclusion of 'SICK' and 'DEAD' processing, which was previously reserved only for OpenCL devices. For Icarus, Ztex and ModMiner, this should tell you if they are sick or dead (for BFL it'll attempt to re-init the device).

For BFL devices, my changes do the following:
- Timeout to restart work if it's taking too long
   A nonce range should take just over 5s. Any longer and device is throttling.
   Fixes issue where BFL appears to stall in cgminer
- Count throttling as a zero hash error
- Temp taken in watchdog thread
  Now a disabled device will still return a current temperature, rather than the last value before disabling.
- Work restart on new work
  This was cause very high stale rate for me...
  Previously on a work restart, cgminer would allow the BFL to continue with the stale block. Now this is checked, and while and nonces found in that time will be wasted, the work will be discarded and new work will be started immediately and not after the 5s the BFL takes to return results.
- Timing adjustments
   The 'wait for results' was hard-set to 4500ms before polling at 10ms intervals. With variation of systems and new firmwares of differing hash rates, a hard set timer could be either inefficient (starts polling way before necessary) or wasteful (starts polling way after result is ready). The auto-adjustment will find the correct wait time to minimize polling and and delay retrieving results.
- Device re-initialization
   When a device is disabled (for whatever reason - user or by cgminer) then re-enabled, the device is re-initialized, rather then assuming communications are still working.
- Sick / Dead monitoring
  As with OpenCL devices, BFL devices will be checked for sickness (60s no response) or dead (10 mins no response) and try to re-initialize them.
- Improved logging
   Most logs now include the device in question, i.e.: "BFL0: took longer than 15s"
- Device start offset
  Delays the start of each device by a random time between 0 and 100ms so that they don't all make calls at exactly the same time.

If you can help, I'll be much appreciative.

Thanks Smiley


rudrigorc2
Legendary
*
Offline Offline

Activity: 1064
Merit: 1000



View Profile
June 18, 2012, 11:55:00 PM
 #2

if you compile for the tplink mips toy I can test =)
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 19, 2012, 12:02:50 AM
 #3

I tried actually, just to see if my compiler would!

It wouldn't.

Actually I think the compiler would, but the libraries are all wrong for your kind of processor, so it got nowhere.
nedbert9
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250

Inactive


View Profile
June 19, 2012, 12:07:06 AM
 #4




Really appreciated, P_Shep
nedbert9
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250

Inactive


View Profile
June 19, 2012, 12:08:55 AM
 #5




I guess there's no hope for scan-serial to work for BFL's in Windows, eh?
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 19, 2012, 12:32:35 AM
 #6

I guess there's no hope for scan-serial to work for BFL's in Windows, eh?

Well I don't know... an ugly way maybe just to try an open each port in turn. But then where do you stop... 8? 16? 100? That might take a while...
rjk
Sr. Member
****
Offline Offline

Activity: 448
Merit: 250


1ngldh


View Profile
June 19, 2012, 12:35:45 AM
 #7

I guess there's no hope for scan-serial to work for BFL's in Windows, eh?

Well I don't know... an ugly way maybe just to try an open each port in turn. But then where do you stop... 8? 16? 100? That might take a while...
Can they be done in parallel? Ufasoft is able to do it, somehow.

Mining Rig Extraordinaire - the Trenton BPX6806 18-slot PCIe backplane [PICS] Dead project is dead, all hail the coming of the mighty ASIC!
Phraust
Full Member
***
Offline Offline

Activity: 206
Merit: 100


Mostly Harmless...


View Profile WWW
June 19, 2012, 06:01:38 AM
 #8

I tried it out on OSX, seems to work pretty well.

With the stock cgminer, I've been noticing a decline in hashrate over the course of about 4 hours (from 8.5gh down to 7.Cool so I've been restarting it when I notice it dropping.  I've been running your version for the last five hours, and it's still up at 8.5gh.  Thanks a ton, I'll keep you updated.

I should add that after about 4 hours, one or more of the BFLs will drop below a U of 10 (normally around 8, sometimes down to 6.  this behavior started after I moved my rig back, so it might be how I laid everything out, I was thinking it was probably noise across all the USB cables).  Right now, they are all at or above 11.8, with one at 11.3, much much better.
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 19, 2012, 03:46:48 PM
 #9

I guess there's no hope for scan-serial to work for BFL's in Windows, eh?

Well I don't know... an ugly way maybe just to try an open each port in turn. But then where do you stop... 8? 16? 100? That might take a while...
Can they be done in parallel? Ufasoft is able to do it, somehow.

Of course.  Separate pool of "test" threads (say 10-20) could signal the main "scan" thread when they are done so that new port can be assigned to be scanned.  The upper limit can be read from the OS.  The "scan" thread would assign "untested" ports to worker "test" threads.

The BFL I/O code should be non-blocking, overlapped IO so that scanning can be stopped if needed.

When/if this is accepted, I'll look into it. I also want to have it scan port during operation so you can yank out and replace/add devices while it's running.
nedbert9
Sr. Member
****
Offline Offline

Activity: 252
Merit: 250

Inactive


View Profile
June 19, 2012, 04:39:42 PM
 #10

I guess there's no hope for scan-serial to work for BFL's in Windows, eh?

Well I don't know... an ugly way maybe just to try an open each port in turn. But then where do you stop... 8? 16? 100? That might take a while...
Can they be done in parallel? Ufasoft is able to do it, somehow.

Of course.  Separate pool of "test" threads (say 10-20) could signal the main "scan" thread when they are done so that new port can be assigned to be scanned.  The upper limit can be read from the OS.  The "scan" thread would assign "untested" ports to worker "test" threads.

The BFL I/O code should be non-blocking, overlapped IO so that scanning can be stopped if needed.

When/if this is accepted, I'll look into it. I also want to have it scan port during operation so you can yank out and replace/add devices while it's running.

Not sure if this helps, but reserved for use COM identifiers are located here.  Records in a key of sorts what has ever been assigned - offline devices, too.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\COM Name Arbiter

Problem with this method is unplug/replug of devices without reboot results in incrementing COM id's.

Phraust
Full Member
***
Offline Offline

Activity: 206
Merit: 100


Mostly Harmless...


View Profile WWW
June 20, 2012, 01:44:14 AM
 #11

Just wanted to update, it's been 24 hours and it's working like a champ, a solid 8.5gh with no issues.
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 20, 2012, 03:44:35 AM
 #12

Great Smiley

Just need someone who has an FPGA other then BFL to test it...
kano
Legendary
*
Offline Offline

Activity: 4634
Merit: 1851


Linux since 1997 RedHat 4


View Profile
June 23, 2012, 03:47:21 PM
 #13

Oh - you have a thread.
You didn't mention that Smiley

Yeah been running for a while - but it's 1:45am
I'll leave it running overnight anyway.

If you show up shortly in #cgminer I'll give you the link to see my rig (or in the morning when I wake up)
But yeah it's mining and showing the same av MH/s mine does.

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
fred0
Sr. Member
****
Offline Offline

Activity: 349
Merit: 250


View Profile
June 25, 2012, 03:26:16 PM
 #14

Some numbers from testing BFL rev2 x16 running 800Mh/s Firmware

I disrupted the results by disconnecting the power on one unit mistakenly and did not notice, so results might be a teeny bit better for the pshep changes to cgminer.

Running under ubuntu 12.04 64-bit

CGMinerRejectAcceptedUtilMH/sGetworkRemoteLocalDiscardFoundHW ErrNetworkUptime hh:mm:ssRej %Eff
std8249700825174.831251492038870085391020044866:48:431.16%761%
pshep588367158175.68125753988810235339036850021334:49:580.16%920%



P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 25, 2012, 04:20:27 PM
 #15

Thanks for that Kano, I'm just about ready to submit a pull request now, Sorting out one more thing...

Looks good there fred Smiley
kano
Legendary
*
Offline Offline

Activity: 4634
Merit: 1851


Linux since 1997 RedHat 4


View Profile
June 25, 2012, 05:02:34 PM
 #16

Not sure if you are doing this so:
As I  have mentioned to luke-jr so I'll mention to you also Smiley
The BFL abort should only be done if --no-submit-stale is enabled and the getwork said to not submit stale.
(i.e. you need to somehow check those two before aborting the work)

Reasons:
1) If you abort work on a pool that allows stale shares, then when you abort you may be throwing away shares (since BFL doesn't tell you what shares you have worked out already when you abort the work) - so on such a pool (or a getwork that says to submit stale) you'd never abort the work
2) On p2pool only ~1 in every ~60 LPs represent a real BTC LP - for all other LP's, if the stale work is a full difficulty block, it is a valid payable BTC block - and p2pool will send it to the bitcoind ... and throwing away blocks is bad Smiley

Of course no one in their right mind would mine on p2pool with a BFL since either you throw away blocks or you throw away shares - you must do one or the other with a BFL on p2pool.

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 25, 2012, 05:48:19 PM
 #17

Not sure if you are doing this so:
As I  have mentioned to luke-jr so I'll mention to you also Smiley
The BFL abort should only be done if --no-submit-stale is enabled and the getwork said to not submit stale.
(i.e. you need to somehow check those two before aborting the work)

Reasons:
1) If you abort work on a pool that allows stale shares, then when you abort you may be throwing away shares (since BFL doesn't tell you what shares you have worked out already when you abort the work) - so on such a pool (or a getwork that says to submit stale) you'd never abort the work
2) On p2pool only ~1 in every ~60 LPs represent a real BTC LP - for all other LP's, if the stale work is a full difficulty block, it is a valid payable BTC block - and p2pool will send it to the bitcoind ... and throwing away blocks is bad Smiley

Of course no one in their right mind would mine on p2pool with a BFL since either you throw away blocks or you throw away shares - you must do one or the other with a BFL on p2pool.

Well, that's the thing, either way the work is lost, no? It's a matter of minimizing work lost. We can submit shares which may/may not be accepted, or we can restart work which we know will be accepted. As you say, this is only really a problem on P2Pool, which is problem anyway, so what's lost?
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 25, 2012, 07:12:20 PM
 #18

Actually it'd be interest to get real performance data from someone with BFLs to mine on P2Pool with the existing 2.4.3, and my version. I'm quite curious Smiley
kano
Legendary
*
Offline Offline

Activity: 4634
Merit: 1851


Linux since 1997 RedHat 4


View Profile
June 25, 2012, 11:58:04 PM
 #19

...
Well, that's the thing, either way the work is lost, no? It's a matter of minimizing work lost. We can submit shares which may/may not be accepted, or we can restart work which we know will be accepted. As you say, this is only really a problem on P2Pool, which is problem anyway, so what's lost?
Well, no it's not an actual problem as such.
How the code must work is quite straight forward - as I said:
If --no-submit-stale is set and the getwork didn't say to submit stale, then yes abort.
i.e. the choice is the user's with using "--no-submit-stale" or the pool's by saying to submit stale in the getwork

Pool: https://kano.is - low 0.5% fee PPLNS 3 Days - Most reliable Solo with ONLY 0.5% fee   Bitcointalk thread: Forum
Discord support invite at https://kano.is/ Majority developer of the ckpool code - k for kano
The ONLY active original developer of cgminer. Original master git: https://github.com/kanoi/cgminer
P_Shep (OP)
Legendary
*
Offline Offline

Activity: 1810
Merit: 1246


I guess this is OK.


View Profile
June 26, 2012, 12:28:08 AM
 #20

It still does submit stales though. My logs still show plenty being submitted, just not the 2-3% I was getting. Now it's 0.4%.
I see you point, stick to what the command has set... but if you do that, one way or the other you lose performance.

Disable stales:
No stales are submitted potentially losing valid shares
Re-start work quickly and get on with new block

Enable stales:
Potentially gain some shares through submitting stales
Potentially lose up to 5s of work on each device

It's marginal either way, but why not take the best from both worlds and maximise as far as we can the performance, at the sacrifice not strictly doing what the user has asked. How many users do you think give a crap about whether stales are submitted or not over getting the best performance? My U has gone up about 0.5 from 69.2 to 69.7. Fred's has gone up 0.8.

If BFL worked the way every other device worked and returned a result as soon as it found it, of course we'd do it your way. We've got to handle what we're dealt the best we can, eh?
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!