Bitcoin Forum
November 06, 2024, 11:10:31 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Stale share idea? Eliminate them from the miner side.  (Read 3292 times)
Anonymous
Guest

July 10, 2011, 01:28:13 PM
Last edit: September 11, 2011, 04:00:56 PM by davidonpda
 #1

agadf
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 10, 2011, 01:31:59 PM
 #2

What if the miner programs added the new block to their side. Miners ran bitcoin, miner program listened for the new block, and as soon as it came across, miner program asks pool for fresh work? Wouldn't that completely eliminate stale shares and reduce a little bit of overhead on the pool?
Umm, no. That would mean the pool controller would have to process 1,000 requests for new blocks from its miners rather than 1 new block arrival from bitcoind. How could that possibly be less overhead?

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 10, 2011, 01:57:57 PM
 #3

When the server finds a new block, it tells 1000 people that there is a new block and sends them new work... how is it not more efficient taking this burden off the server?
You aren't taking the burden off the server, you're increasing it. In the current scheme, when the mining controller sees a new block, it sends new work out to all the miners. In your scheme, as each miner detects a new block, it requests new work from the mining controller. This means instead of processing one block coming in, the mining controller instead has to process 1,000 work requests.

Are you suggesting the miners would autonomously just start working on the next block without receiving a work unit from the controller? Then how would you determine which miner worked on what? And would each miner choose its own transactions to include in the block? And generate its own coinbase?

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
error
Hero Member
*****
Offline Offline

Activity: 588
Merit: 500



View Profile
July 10, 2011, 06:52:04 PM
 #4

It sounds like you're wanting long polling. Pushpool already does this.

3KzNGwzRZ6SimWuFAgh4TnXzHpruHMZmV8
1bitc0inplz
Member
**
Offline Offline

Activity: 112
Merit: 10


View Profile
July 11, 2011, 12:31:44 AM
 #5


Longpolling on pushpool is done with blkmond and is what I was referring to in the OP. I was trying to think of a better way to do it and in the process got the cart ahead.

There are already was of reducing miner's shares to near zero. Just remove pushpool from the equation, and replace it with something else and you're most of the way there. This burden doesn't have to be placed on the miners, the pools themselves have a lot that they can be doing to help the situation, but most of them aren't.

I'm personally enjoying a 0.02% stale rate.   Wink

Mine @ http://pool.bitp.it - No fees, virtually 0 stales, what's not to love!
Chat with us @ #bitp.it on irc.freenode.net
Learn more about our pool @ http://forum.bitcoin.org/index.php?topic=12181.0
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
July 11, 2011, 04:14:38 AM
 #6

Quote
You aren't taking the burden off the server, you're increasing it. In the current scheme, when the mining controller sees a new block, it sends new work out to all the miners. In your scheme, as each miner detects a new block, it requests new work from the mining controller. This means instead of processing one block coming in, the mining controller instead has to process 1,000 work requests.
You make it sound like the server only has to come up with one new unit of work. The server still has to generate 1000 new pieces of work, so it is the same amount of CPU and memory processing required. As far as network activity goes, it is not an increase in average network load; instead of 1000 Long Polling connections being opened over a period of time, it's a burst of 1000 getwork requests. The average activity is the same, but yes it is an increase in burst load.

Even if it was an increased load on the mining server, the pool operator would have to determine if the decrease in invalid shares would out weigh the costs of increased network traffic. Remember that invalid shares cause the server to effectively waste resources attempting to validate old data. It also means a reduced effective hashrate of the pool; higher pool hashrates mean more profit for the pool-op.

If I were a pool-op, I would do everything I could to reduce stale shares. It drives more business to your pool, it brings in more profits through a higher hashrate, it reduces wasted processing of useless shares, and your pool server should be able to handle network bursts anyway, otherwise you are bound to fail to the pressure of increased popularity, DDoS, or just random chance of a bunch of getworks coming in all at once.


Quote
What if the miner programs added the new block to their side. Miners ran bitcoin, miner program listened for the new block, and as soon as it came across, miner program asks pool for fresh work? Wouldn't that completely eliminate stale shares and reduce a little bit of overhead on the pool?
I've implemented this in a modified poclbm client I test. I call it Local Long Polling (even though it isn't actually long polling Wink ). However, it isn't theoretically better than Long Polling. Here's why:

Suppose opening a TCP connection to the pool with request data takes 30ms; 10ms for the pool to get work ready and shoved down the TCP pipe; 30ms for the data to get to you and the connection to finish.

A typical getwork request would thus take 70ms before your miner has the data it needs to mine.

In Long Polling, the miner opens a connection to the pool, and leaves it open. So there is effectively 0 opening cost. So, with Long Polling, when a block is found it would take your miner 40ms before it has the data it needs to being mining again. Contrast this with the 70ms round-trip required for a fresh getwork request.

The pool server is also likely to be better connected to the Bitcoin network than you, both in the quality of its connection and the quality of the peers it is connected to. So it will get a block notification before you do.

Having said all that (phew), I should note a few things. In reality, a lot of pools have Long Polling problems. The connection usually just hangs, making it completely useless. So my LLP solution can be effective there. Some invalid shares are also the result of duplicate work, which can happen by random chance, faulty pool software, and faulty mining software.

I should also put on the table an improvement idea I had the other day. When an invalid share is detected, the pool should send back new work along with the rest of the response. That way the miner can immediately begin chugging away on the correct work. This can be half-solved immediately by modifications to mining software; if an invalid is detected, queue up a new getwork request. Modifications to pooling software would take awhile, if anyone even agrees that it's a good idea.

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 15, 2011, 12:10:40 AM
 #7

Having said all that (phew), I should note a few things. In reality, a lot of pools have Long Polling problems. The connection usually just hangs, making it completely useless. So my LLP solution can be effective there. Some invalid shares are also the result of duplicate work, which can happen by random chance, faulty pool software, and faulty mining software.
Giel de Nijs and I looked into this issue. It seems that there are two things going on.

The first is that a lot miners are connecting through NAT and some cheap routers have buggy TCP NAT implementations. If a connection is idle for a long time (often ten minutes), they just seem to drop it, causing it to hang on both sides until data is sent. The client never sends data. About half the time it takes more than ten minutes to find a block, so the pool tries to send to the broken connection. This send times out after two minutes or so, but it leaves the pool with thousands of busted connections all timing out, 'getwork' blocks that it has generated that go to waste, and the miners are still working on stale blocks.

The second is that when a new block is found, the pool must create a new work unit for every single worker all at the same time. Even at 1,200 units per second (which it takes great effort to get to) that would still be four seconds for a pool with 5,000 workers. During those four seconds, some miners will still be working on stale work units.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
twmz
Hero Member
*****
Offline Offline

Activity: 737
Merit: 500



View Profile
July 15, 2011, 12:39:49 AM
 #8

While I like the idea, isn't there a timing issue here?  New blocks to not arrive at all nodes on the bitcoin network at the same time.  If a new block was detected on a miner before it arrived at a pool, the miner would just request new work and get back a work based on the old block.  This would still cause stale shares (or invalid blocks if a share looked like a valid block). 

Long polling has the advantage that it doesn't happen until pushpool for sure has new work to hand out based on the new block.  The long running HTTP requests are a problem, but perhaps the solution is to just set a max time on them and return after 5 minutes with or without a new block.  That would solve the problem with buggy NAT routers and not add too much overhead (a long poll returning early is basically just another getwork).  And because long poll requests are initiated from the universe of miners in a staggered way, the "early ending" long polls would be staggered as well.

This doesn't solve the problem of an actual long poll (caused by a new block arriving) requiring 5000 work units being generated for larger pools.  But I don't know that anything can solve that except possibly beefy CPUs + changes to bitcoind to allow it to calculate more getwork requests in parallel (and I think there may already be a patch for that floating around).

Was I helpful?  1TwmzX1wBxNF2qtAJRhdKmi2WyLZ5VHRs
WoT, GPG

Bitrated user: ewal.
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
July 15, 2011, 01:05:22 AM
 #9

Quote
The second is that when a new block is found, the pool must create a new work unit for every single worker all at the same time. Even at 1,200 units per second (which it takes great effort to get to) that would still be four seconds for a pool with 5,000 workers. During those four seconds, some miners will still be working on stale work units.
That does sound like an issue! For a pool with >5,000 workers, though, it seems like multiple servers would be the way to go.

Why is it difficult to get more than 1,200 work units pushed out per second? Is it because of a computational limit, or networking limit? If computational, couldn't a server very quickly modify the extraNonce and recalculate the merkleroot + midstate very quickly? Even my dumpy ole CPU gets >2MH/s. This is assuming that the extraNonce is at the top of the merkle tree, so only a few SHA-256 calculations would need to be made to recalculate everything for a getwork if extraNonce is modified.

I'm sure you and other devs are already fully aware of the issues ... I'm just asking for my personal curiosity.

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 15, 2011, 01:52:56 AM
 #10

Why is it difficult to get more than 1,200 work units pushed out per second? Is it because of a computational limit, or networking limit? If computational, couldn't a server very quickly modify the extraNonce and recalculate the merkleroot + midstate very quickly? Even my dumpy ole CPU gets >2MH/s. This is assuming that the extraNonce is at the top of the merkle tree, so only a few SHA-256 calculations would need to be made to recalculate everything for a getwork if extraNonce is modified.

I'm sure you and other devs are already fully aware of the issues ... I'm just asking for my personal curiosity.
It's a combination of issues. We've already knocked down most of the easy ones. One of the biggest ones left is recomputing the entire merkle transaction tree every time the coinbase is changed. The extraNonce is at the top, so only that branch needs to be recomputed.

It would also help to have a 'get many works' command that gets, say, 50 work units. That would save a lot of the 'go from here to here' overhead and allow key bits of code to run while hot in the cache. Also, a binary interface between bitcoind and the pool manager would help. A lot of overhead is marshalling between binary and text. A patch to queue work units in shared memory would be awesome.

Another huge issue is that when a new block comes in, the bitcoin code has to forget all the previous work units it issued. This requires an expensive traverse/free/update/free operation. And then as each new work unit is generated, it has to be tracked along with which transactions are in it. So that means creating a new transaction (which involves allocating memory) and creating the new unit to track the work unit (again allocating memory) and then allocating a thunk object to link it to the structure that tracks work units (you guessed it, allocating memory again). When a new block is found, all those tiny little objects have to be freed for each work unit we were tracking. That's a giant hiccup at the worst possible time. (A patch to get that out of the hot path would be easy to do. I'll add that to my list.)

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
July 15, 2011, 03:00:27 AM
 #11

Thank you for the insightful posts JoelKatz!

Quote
One of the biggest ones left is recomputing the entire merkle transaction tree every time the coinbase is changed. The extraNonce is at the top, so only that branch needs to be recomputed.
You can get 2^32 new getworks from a single coinbase by just changing the extraNonce. With extraNonce at the top of the merkle tree, I'm guessing that means 1 or 2 re-hashes to re-do the merkleRoot, plus a re-hash for midstate. A total of 3 Bitcoin Hashes, right? As opposed to the giant number needed for a full merkle re-hash.

Quote
Another huge issue is that when a new block comes in, the bitcoin code has to forget all the previous work units it issued.
I remember someone mentioning that bitcoind doesn't do that anymore (if it ever did?). Pushpool does, though. But pushpool's list is of a fixed maximum size, is it not (it should be, anyway)? So it could be allocated as a single chunk of memory. A hashtable with fixed size buckets, for example.

Again, just thinking out loud here. I'm too afraid to touch bitcoind's code, for fear of introducing a nasty bug in such critical code.  Undecided

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 15, 2011, 03:23:43 AM
 #12

You can get 2^32 new getworks from a single coinbase by just changing the extraNonce. With extraNonce at the top of the merkle tree, I'm guessing that means 1 or 2 re-hashes to re-do the merkleRoot, plus a re-hash for midstate. A total of 3 Bitcoin Hashes, right? As opposed to the giant number needed for a full merkle re-hash.
Unless I'm seriously misunderstanding the code, the coinbase and the extranonce are the same thing. They're just called the 'extranonce' in the client code and the 'coinbase' in the block format.

Quote
Quote
Another huge issue is that when a new block comes in, the bitcoin code has to forget all the previous work units it issued.
I remember someone mentioning that bitcoind doesn't do that anymore (if it ever did?). Pushpool does, though. But pushpool's list is of a fixed maximum size, is it not (it should be, anyway)? So it could be allocated as a single chunk of memory. A hashtable with fixed size buckets, for example.

Again, just thinking out loud here. I'm too afraid to touch bitcoind's code, for fear of introducing a nasty bug in such critical code.  Undecided
Bitcoind has to do this. If a work unit is successfully mined, it must assemble the completed block. In order to do this, it must recover the exact transaction set used, coinbase, and so on.

Here's the relevant code (from the latest git):

        // Update block
        static unsigned int nTransactionsUpdatedLast;
        static CBlockIndex* pindexPrev;
        static int64 nStart;
        static CBlock* pblock;
        if (pindexPrev != pindexBest ||
            (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60))
        {
            if (pindexPrev != pindexBest)
            {
                // Deallocate old blocks since they're obsolete now
                mapNewBlock.clear();
                BOOST_FOREACH(CBlock* pblock, vNewBlock)
                    delete pblock;
                vNewBlock.clear();
            }


Notice the map has to be cleared, the blocks deleted, and the vector cleared. Since each block will also have its own coinbase transaction, each of those transactions will also wind up getting deleted (since they'll no longer be reference).

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
July 15, 2011, 04:32:56 AM
 #13

Quote
Unless I'm seriously misunderstanding the code, the coinbase and the extranonce are the same thing. They're just called the 'extranonce' in the client code and the 'coinbase' in the block format.
You seem to be correct. The relevant code is currently beyond my understanding, because I haven't learned up the details of transactions in Bitcoin. Here it is though:

CreateNewBlock does:
Code:
// Create coinbase tx
    CTransaction txNew;
    txNew.vin.resize(1);
    txNew.vin[0].prevout.SetNull();
    txNew.vout.resize(1);
    txNew.vout[0].scriptPubKey << reservekey.GetReservedKey() << OP_CHECKSIG;

    // Add our coinbase tx as first transaction
    pblock->vtx.push_back(txNew);

IncrementExtraNonce does:
Code:
pblock->vtx[0].vin[0].scriptSig = CScript() << pblock->nBits << CBigNum(nExtraNonce);

I'll have to go through the code and see if I can finally learn up how transactions are represented in Bitcoin. Should be useful information Smiley

Anyway, you pointed out that transaction 0 (the Generation transaction) is at the top of the tree, correct? So it would have to be rehashed, and then it and the hash of transaction 1 would have to be re-hashed to get the new merkle root. So what I said earlier, only needing a few hashes, is still correct unless I've missed something else.

Quote
Bitcoind has to do this. If a work unit is successfully mined, it must assemble the completed block. In order to do this, it must recover the exact transaction set used, coinbase, and so on.
OOooohhh ... that makes a lot more sense. What a nasty problem. And it uses CBlock, so you'd need to destruct all of those anyway. Hmm ...

A double buffering approach would work. Some pseudo-code:

Code:
// Assuming mapNewBlock just holds pointers to the blocks
// vNewBlock is now a pointer to a vector

// Update block
mapNewBlock.clear(); // should completely fairly quickly
vOldNewBlock = vNewBlock;
vNewBlock = new vector ...blah blah


and either later, piece by piece, or in a separate thread you can destroy the blocks in vOldNewBlock and finally delete that vector. You could also just do:

Code:
oldBlocksEnd = vNewBlock.size()

and slowly delete and remove those blocks later (as well as removing them from the map). I'm assuming having old blocks in the vector wouldn't cause a problem, since they'd simply be ignored.

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 15, 2011, 06:18:04 PM
 #14

By keeping pointers to these structures, you can simply allocate new ones, put pointers to the new ones in the slot, and free the old ones later. This would get the time to free these structures out of the 'getwork' hot path.

Unfortunately, there's a big problem with this approach. When you later go to free those blocks, you have to adjust the reference counts on the transactions in them. The coinbase transactions will go to zero and have to be deleted. This requires acquiring the locks that protect those transactions because they're shared with other operations. (For example, any of them that didn't get into that block are candidate transactions for the next mined block.)

Doing it slowly would mean having to acquire the lock multiple times. And it would require more complex code to do a partial traverse rather than just calling a method that empties it.

However, even if you just defer it for five minutes, at least you won't have to do the cleanup while you're in a mad scramble to give all your workers new work. But you would still cause the same hiccup in work issuing while you do the massive cleanup.

I wonder if this change would be a net improvement: On each 'getwork' call, check if the oldest block is stale. If so, delete it and do the check one more time. Delete up to two stale transactions per 'getwork' call. This would amortize the cost of emptying over many calls to 'getwork'. This still dumps the costs on the very important 'getwork' calls right after a block is found when LP triggers a mess of them. Maybe we should exempt the first 75% (estimated) of calls after a new block is found and free up to 5 on each call?

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 16, 2011, 07:51:16 AM
Last edit: July 16, 2011, 06:31:43 PM by JoelKatz
 #15

Here's maybe another stupid question from me just talking out loud and usually not making sense.

What if the server gave more than one work at the same time? The miner software would need to be smarter but here's kind of what I'm saying:

Miner requests work. Server sends a whole bunch of it, to keep them busy for 10 minutes (average until a new block is found) When the miner gets on its last set of work, miner requests more work from the server.

There's no point in solving the easy problem if you do nothing for the hard problem. Later on in the block, the pool manager has no problem keeping up. The problem is when a new block is found and everyminer needs a new work unit.

Quote
That would require significant changes on the miner side as well as the server side. But then the only time you need new work is when long polling dings. Instead of automatically getting fresh work every 5 seconds it may be 10 minutes before they get new work. The amount of work they requests / get could also depend on how fast they are. A CPU miner obviously needs much less work than the latest and greatest ATI GPU.
It could be done, but it wouldn't help much. I think it would make much more sense to allow the pool manager to request multiple units from the bitcoin client code which it would then parcel out to the miners. This could reduce the number of pool<->bitcoin transactions in the hot path (when the network finds a new block and all miners need work units) by a factor of 50 or so.

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
fpgaminer
Hero Member
*****
Offline Offline

Activity: 560
Merit: 517



View Profile WWW
July 17, 2011, 01:29:58 AM
 #16

Quote
I wonder if this change would be a net improvement: On each 'getwork' call, check if the oldest block is stale. If so, delete it and do the check one more time. Delete up to two stale transactions per 'getwork' call. This would amortize the cost of emptying over many calls to 'getwork'. This still dumps the costs on the very important 'getwork' calls right after a block is found when LP triggers a mess of them. Maybe we should exempt the first 75% (estimated) of calls after a new block is found and free up to 5 on each call?
It sounds like this is forming into a kind of garbage collection. That being the case, you'd want to run your "garbage collection" only when there is time available. If bitcoind detects that it doesn't have any getworks to process, and nothing else to do, it could then spend some time picking up the trash, so to speak.

How smart is bitcoind about storing the transaction lists for the possible new blocks? I imagine most of the time they're exactly the same, except for the generation transaction. Could it store a list of unique transaction lists, excluding the generation transaction, instead? Then the blocks would only need a reference to a transaction list, along with their specific extraNonce; instead of each block carrying duplicates of transactions lists that only differ by their extraNonce.

JoelKatz
Legendary
*
Offline Offline

Activity: 1596
Merit: 1012


Democracy is vulnerable to a 51% attack.


View Profile WWW
July 17, 2011, 03:11:52 AM
 #17

How smart is bitcoind about storing the transaction lists for the possible new blocks? I imagine most of the time they're exactly the same, except for the generation transaction. Could it store a list of unique transaction lists, excluding the generation transaction, instead? Then the blocks would only need a reference to a transaction list, along with their specific extraNonce; instead of each block carrying duplicates of transactions lists that only differ by their extraNonce.
I've thought about optimizations similar to that. It would seem to be quite a bit of work for not very much gain. I also through about copying the transactions to a special table just used in 'getwork' blocks. That way, removing all the memorized work units wouldn't require holding the lock that protects the regular transaction table. (And so it could be done in a dedicated thread that didn't interfere with generating new work units.)

I am an employee of Ripple. Follow me on Twitter @JoelKatz
1Joe1Katzci1rFcsr9HH7SLuHVnDy2aihZ BM-NBM3FRExVJSJJamV9ccgyWvQfratUHgN
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!