fabrizziop
|
|
April 11, 2012, 01:11:56 PM |
|
Any plans to implement merged mining on P2Pool?
|
|
|
|
spiccioli
Legendary
Offline
Activity: 1379
Merit: 1003
nec sine labore
|
|
April 11, 2012, 01:17:46 PM |
|
There is no such thing as "wasting hashing power".
...
D&T this should go, IMHO, into 1st page and/or p2pool wiki. spiccioli
|
|
|
|
kano
Legendary
Offline
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
|
|
April 11, 2012, 01:57:40 PM |
|
Except ... it isn't correct.
There IS wasted hashing power.
The problem is not P2Pool specifically, but rather that people believe it is OK to get a high crappy reject rate (9%) because someone here said it was OK to be that high rate while they were getting a much lower rate.
If you use a good miner program and configure it correctly you will not get a high crappy 9% reject rate.
The cause is actually that the miners are not by default configured to handle the ridiculously high share rate (10 seconds) So P2Pool is the cause, but the solution is simply to configure your miner to handle that issue.
Aside: if you have one or more BFL FPGA Singles, you cannot mine on P2Pool without wasting a large % of your hash rate.
|
|
|
|
forrestv (OP)
|
|
April 11, 2012, 02:04:54 PM |
|
Any plans to implement merged mining on P2Pool?
First, P2Pool has long supported solo merged mining. However, as for pooled merged mining... Merged mining, as it exists, can not efficiently be implemented because every share would need to include the full generation transaction from the parent chain. However, P2Pool's generation transaction is pretty large, and so would increase the size of P2Pool shares by more than an order of magnitude (along with P2Pool's network usage). There are a few solutions: compute main-P2Pool's generation transaction instead of redundantly storing nearly the same thing over and over. Alternatively, change the merged mining spec to not require storing the entire parent gentx. I don't like the first because it would be very complex and tie the MM-P2Pool to the main-P2Pool. The second is obviously impractical in the short term. Anyone else have ideas?
|
1J1zegkNSbwX4smvTdoHSanUfwvXFeuV23
|
|
|
DiabloD3
Legendary
Offline
Activity: 1162
Merit: 1000
DiabloMiner author
|
|
April 11, 2012, 02:12:10 PM |
|
Except ... it isn't correct.
There IS wasted hashing power.
The problem is not P2Pool specifically, but rather that people believe it is OK to get a high crappy reject rate (9%) because someone here said it was OK to be that high rate while they were getting a much lower rate.
If you use a good miner program and configure it correctly you will not get a high crappy 9% reject rate.
The cause is actually that the miners are not by default configured to handle the ridiculously high share rate (10 seconds) So P2Pool is the cause, but the solution is simply to configure your miner to handle that issue.
Aside: if you have one or more BFL FPGA Singles, you cannot mine on P2Pool without wasting a large % of your hash rate.
Except reject rate means nothing, delta of average reject rate is what you need to pay attention to. Also, BFL's firmware is broken, they won't return shares until its done 2^32 hashes, and any attempts to force it to update on long polll dumps valid shares. BFL needs to fix their shit before they sell any more FPGAs.
|
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
April 11, 2012, 02:14:46 PM |
|
Except ... it isn't correct.
There IS wasted hashing power.
The problem is not P2Pool specifically, but rather that people believe it is OK to get a high crappy reject rate (9%) because someone here said it was OK to be that high rate while they were getting a much lower rate.
If you use a good miner program and configure it correctly you will not get a high crappy 9% reject rate.
The cause is actually that the miners are not by default configured to handle the ridiculously high share rate (10 seconds) So P2Pool is the cause, but the solution is simply to configure your miner to handle that issue.
Aside: if you have one or more BFL FPGA Singles, you cannot mine on P2Pool without wasting a large % of your hash rate.
Orphans aren't wasted hashing power for the p2pool "pool" which was what was being discussed. The node will broadcast any blocks it finds to all p2pool peers and all Bitcoin peers. Thus even a miner with 80% oprhan rate isn't wasting his hashing power from the point of view being disccused which is avg # of shares per block (or pool luck). I think it is made pretty clear one's PERSONAL compensation depends on relative orphan rate. Miner has 5% orphan rate, p2pool has 10% orphan rate. Miner is compensated 5% over "fair value". Miner has a 10% oprhan rate, p2pool has a 10% orphan rate. Miner is compensated "fair value". Miner has a 15% oprhan rate, p2pool has a 10% orphan rate. Miner is compensated 5% under "fair value". Aside: if you have one or more BFL FPGA Singles, you cannot mine on P2Pool without wasting a large % of your hash rate. Even there the hashing power isn't WASTED. Blocks will still be found at same rate regardless of orphan rate but the miner's compensate will be lower (due to miner having higher orphan rate relative to the pool). Still theoretically I do think it is possible to make a "merged" sharechain. Bitcoin must have a single block at each height. This is an absolute requirement due to the fact that blocks are just compensation they include tx and there must be a single consensus on which tx is included in a block (or set of blocks). With p2pool it may be possible to include "late shares" in the chain to reduce the orphan rate. Honestly not sure if it is worth it because as discussed if one's oprhan rate is ~= pools orphan rate the absolute values don't really matter. Miner 0% orphan, pool 0% orphan is the same as Miner 10% orphan, pool 10% orphan.
|
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
April 11, 2012, 02:26:25 PM |
|
There are a few solutions: compute main-P2Pool's generation transaction instead of redundantly storing nearly the same thing over and over. Alternatively, change the merged mining spec to not require storing the entire parent gentx.
I don't like the first because it would be very complex and tie the MM-P2Pool to the main-P2Pool. The second is obviously impractical in the short term.
Anyone else have ideas?
Yeah even if space/bandwidth wasn't an issue I don't like complicating the sharechain w/ merge mining data. Most of the alt chains are nearly worthless and I wonder the load if it became popular to merge p2pool mine a dozen or more alt-chains. Local generation may be rough on low end nodes so anything which makes p2pool less viable isn't worth the cost IMHO. Would it be possible to have a separate merge mining chain and a different p2pool instance. Still I am not clear on what level of communication or interaction is necessary between instances or even if it is possible. Given the nearly worthless nature of alt-coins I don't see it as a useful venture. There is so much that can be done to improve p2pool (in terms of GUI frontends, monitoring/reporting, updated docs, custom distros, simplification, etc) that I would hate to see any skill, resources, and time devoted to worthless alt-chains.
|
|
|
|
freshzive
|
|
April 11, 2012, 02:50:38 PM |
|
p2pool randomly freezes up (freezing my Mac for about ~10 seconds) every half an hour or so. Any idea what's causing this? Should I use a different version of python? 2012-04-11 07:47:16.501037 > Watchdog timer went off at: 2012-04-11 07:47:16.501107 > File "run_p2pool.py", line 5, in <module> 2012-04-11 07:47:16.501141 > main.run() 2012-04-11 07:47:16.501172 > File "/Users/christian/p2pool/p2pool/main.py", line 1005, in run 2012-04-11 07:47:16.501203 > reactor.run() 2012-04-11 07:47:16.501234 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/base.py", line 1169, in run 2012-04-11 07:47:16.501267 > self.mainLoop() 2012-04-11 07:47:16.501297 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/base.py", line 1178, in mainLoop 2012-04-11 07:47:16.501331 > self.runUntilCurrent() 2012-04-11 07:47:16.501361 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/base.py", line 800, in runUntilCurrent 2012-04-11 07:47:16.501394 > call.func(*call.args, **call.kw) 2012-04-11 07:47:16.501424 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/defer.py", line 368, in callback 2012-04-11 07:47:16.501456 > self._startRunCallbacks(result) 2012-04-11 07:47:16.501487 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/defer.py", line 464, in _startRunCallbacks 2012-04-11 07:47:16.501520 > self._runCallbacks() 2012-04-11 07:47:16.501550 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/defer.py", line 551, in _runCallbacks 2012-04-11 07:47:16.501583 > current.result = callback(current.result, *args, **kw) 2012-04-11 07:47:16.501614 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/defer.py", line 1101, in gotResult 2012-04-11 07:47:16.501647 > _inlineCallbacks(r, g, deferred) 2012-04-11 07:47:16.501677 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Twisted-12.0.0-py2.7-macosx-10.5-intel.egg/twisted/internet/defer.py", line 1045, in _inlineCallbacks 2012-04-11 07:47:16.501710 > result = g.send(result) 2012-04-11 07:47:16.501740 > File "/Users/christian/p2pool/p2pool/main.py", line 799, in status_thread 2012-04-11 07:47:16.501770 > print this_str 2012-04-11 07:47:16.501799 > File "/Users/christian/p2pool/p2pool/util/logging.py", line 81, in write 2012-04-11 07:47:16.501830 > self.inner_file.write(data) 2012-04-11 07:47:16.501860 > File "/Users/christian/p2pool/p2pool/util/logging.py", line 69, in write 2012-04-11 07:47:16.501891 > self.inner_file.write('%s %s\n' % (datetime.datetime.now(), line)) 2012-04-11 07:47:16.501921 > File "/Users/christian/p2pool/p2pool/util/logging.py", line 55, in write 2012-04-11 07:47:16.501951 > output.write(data) 2012-04-11 07:47:16.501981 > File "/Users/christian/p2pool/p2pool/util/logging.py", line 46, in write 2012-04-11 07:47:16.502011 > self.inner_file.write(data) 2012-04-11 07:47:16.502041 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 691, in write 2012-04-11 07:47:16.502073 > return self.writer.write(data) 2012-04-11 07:47:16.502103 > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 352, in write 2012-04-11 07:47:16.502134 > self.stream.write(data) 2012-04-11 07:47:16.558924 > File "/Users/christian/p2pool/p2pool/main.py", line 702, in <lambda> 2012-04-11 07:47:16.559463 > sys.stderr.write, 'Watchdog timer went off at:\n' + ''.join(traceback.format_stack())
|
|
|
|
rudrigorc2
Legendary
Offline
Activity: 1064
Merit: 1000
|
|
April 11, 2012, 02:55:53 PM |
|
There are a few solutions: compute main-P2Pool's generation transaction instead of redundantly storing nearly the same thing over and over. Alternatively, change the merged mining spec to not require storing the entire parent gentx.
I don't like the first because it would be very complex and tie the MM-P2Pool to the main-P2Pool. The second is obviously impractical in the short term.
Anyone else have ideas?
Yeah even if space/bandwidth wasn't an issue I don't like complicating the sharechain w/ merge mining data. Most of the alt chains are nearly worthless and I wonder the load if it became popular to merge p2pool mine a dozen or more alt-chains. Local generation may be rough on low end nodes so anything which makes p2pool less viable isn't worth the cost IMHO. Would it be possible to have a separate merge mining chain and a different p2pool instance. Still I am not clear on what level of communication or interaction is necessary between instances or even if it is possible. Given the nearly worthless nature of alt-coins I don't see it as a useful venture. There is so much that can be done to improve p2pool (in terms of GUI frontends, monitoring/reporting, updated docs, custom distros, simplification, etc) that I would hate to see any skill, resources, and time devoted to worthless alt-chains. 2x
|
|
|
|
gyverlb
|
|
April 11, 2012, 02:59:33 PM |
|
If you use a good miner program and configure it correctly you will not get a high crappy 9% reject rate.
I'm not sure how. I have ~9% reject rate with 5x 2.3.1 cgminer connected to a p2pool node with 5 to 30ms latency. cgminer is set to use only one thread and intensity 8, which on my hardware (300+MH/s for each GPU) adds between 0 to 3 ms latency when cgminer must wait for a GPU thread to return. If there's a way to get better results, I'd like to know it. Currently I think the large majority of orphan/dead blocks on my configuration are caused by the whole P2Pool network latency, not my configuration but I'd be glad to be proven wrong.
|
|
|
|
spiccioli
Legendary
Offline
Activity: 1379
Merit: 1003
nec sine labore
|
|
April 11, 2012, 03:40:48 PM |
|
If you use a good miner program and configure it correctly you will not get a high crappy 9% reject rate.
I'm not sure how. I have ~9% reject rate with 5x 2.3.1 cgminer connected to a p2pool node with 5 to 30ms latency. cgminer is set to use only one thread and intensity 8, which on my hardware (300+MH/s for each GPU) adds between 0 to 3 ms latency when cgminer must wait for a GPU thread to return. If there's a way to get better results, I'd like to know it. Currently I think the large majority of orphan/dead blocks on my configuration are caused by the whole P2Pool network latency, not my configuration but I'd be glad to be proven wrong. gyverlb, same here, at times I'm a little better than the pool, at time a little worse. You can use two threads per GPU, though, so that when a long poll comes in, one thread can start fetching new data while the other is waiting for the GPU to finish. spiccioli
|
|
|
|
gyverlb
|
|
April 11, 2012, 04:07:58 PM |
|
You can use two threads per GPU, though, so that when a long poll comes in, one thread can start fetching new data while the other is waiting for the GPU to finish.
Are you sure ? The way I understand cgminer's threads, they should all try to keep working in parallel (for <n> threads each thread should be using 1/n of the processing power) and fetching work is done asynchronously so that it is ready as soon as a GPU thread is available. So with a given intensity the more threads you have, the more time you should spend working on a workbase invalidated by a long poll. This is how I understood the advice ckovilas gives in the cgminer's README to use only one thread.
|
|
|
|
spiccioli
Legendary
Offline
Activity: 1379
Merit: 1003
nec sine labore
|
|
April 11, 2012, 04:23:48 PM |
|
You can use two threads per GPU, though, so that when a long poll comes in, one thread can start fetching new data while the other is waiting for the GPU to finish.
Are you sure ? The way I understand cgminer's threads, they should all try to keep working in parallel (for <n> threads each thread should be using 1/n of the processing power) and fetching work is done asynchronously so that it is ready as soon as a GPU thread is available. So with a given intensity the more threads you have, the more time you should spend working on a workbase invalidated by a long poll. This is how I understood the advice ckovilas gives in the cgminer's README to use only one thread. gyverlb, the one thread per GPU was a work-around for old versions of cgminer. As I understand it, while a GPU is processing a batch, the thread that submitted it is blocked waiting for the answer, so, if you have a single thread it cannot fetch new work before the GPU completes its batch. Using two threads makes it possible to have the second thread starting to fetch new work while the first one is still waiting for the GPU to finish its work. I'm using two threads without problems (stales are around 1-2% lower than p2pool ones). spiccioli.
|
|
|
|
kano
Legendary
Offline
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
|
|
April 11, 2012, 04:30:03 PM |
|
Except ... it isn't correct.
There IS wasted hashing power.
The problem is not P2Pool specifically, but rather that people believe it is OK to get a high crappy reject rate (9%) because someone here said it was OK to be that high rate while they were getting a much lower rate.
If you use a good miner program and configure it correctly you will not get a high crappy 9% reject rate.
The cause is actually that the miners are not by default configured to handle the ridiculously high share rate (10 seconds) So P2Pool is the cause, but the solution is simply to configure your miner to handle that issue.
Aside: if you have one or more BFL FPGA Singles, you cannot mine on P2Pool without wasting a large % of your hash rate.
Except reject rate means nothing, delta of average reject rate is what you need to pay attention to. Well, yes, but that is of course what I meant by saying that 9% is bad and you can get a lower % Also, BFL's firmware is broken, they won't return shares until its done 2^32 hashes, and any attempts to force it to update on long polll dumps valid shares. BFL needs to fix their shit before they sell any more FPGAs.
Yep but to put it more specifically, the time to do 2^32 hashes at 830MH/s is 5.17s Thus each BFL device will complete, on average, 1 nonce range, and then abort the 2nd one for each average 10 second share. Thus, on average, it would only mine 5.17s out of every 10s or 51.7% thus wasting 48.3% of it's hashes ... yep it's that bad Oddly enough, that is more similar to GPU mining than Icarus FPGA mining ... GPU mining cannot be aborted for each nonce sub-range sent to the GPU, but of course as long as the processing time of the sub-range is small, then you aren't wasting much time waiting after an LP occurs In this case each LP, you waste 1/2 of the expected processing time for a sub-nonce range (which is very small - but higher as you increase the intensity - each increase in intensity in cgminer increases it 2x) On cgminer, an intensity of 9 usually means a nonce range of 2^24 or 2^25 which is of the order of 4.5 to 9 ms on ~370Mh/s (e.g. ~ ATI 6950) and of course different on other hardware Thus with GPUs, reducing the intensity by one reduces the amount of time wasted each LP and since there are 60 times the number of LP's with P2Pool vs normal network LP's then of course that makes sense. With the Icarus FPGA it aborts when it finds a share and returns it immediately. This means that if you have to abort the work due to an LP, you know the hashes being thrown away containx no shares (there is a very tiny window afterwards that there could be shares - until the new work is sent) So being able to abort and restart is very advantageous Approx time is less than 0.014s on my hardware. ~0.014s is the overhead when processing a work (job start time after sending the work to the FPGA and the time to return the result if there is one)
|
|
|
|
kano
Legendary
Offline
Activity: 4634
Merit: 1851
Linux since 1997 RedHat 4
|
|
April 11, 2012, 04:32:26 PM |
|
You can use two threads per GPU, though, so that when a long poll comes in, one thread can start fetching new data while the other is waiting for the GPU to finish.
Are you sure ? The way I understand cgminer's threads, they should all try to keep working in parallel (for <n> threads each thread should be using 1/n of the processing power) and fetching work is done asynchronously so that it is ready as soon as a GPU thread is available. So with a given intensity the more threads you have, the more time you should spend working on a workbase invalidated by a long poll. This is how I understood the advice ckovilas gives in the cgminer's README to use only one thread. gyverlb, the one thread per GPU was a work-around for old versions of cgminer. As I understand it, while a GPU is processing a batch, the thread that submitted it is blocked waiting for the answer, so, if you have a single thread it cannot fetch new work before the GPU completes its batch. Using two threads makes it possible to have the second thread starting to fetch new work while the first one is still waiting for the GPU to finish its work. I'm using two threads without problems (stales are around 1-2% lower than p2pool ones). spiccioli. Nope. It doesn't wait for work to finish before getting new work from the pool. A separate thread deals with getting the work before it is needed so the GPU isn't idle for that long amount of time that would be spent sending out a work request and getting a reply.
|
|
|
|
spiccioli
Legendary
Offline
Activity: 1379
Merit: 1003
nec sine labore
|
|
April 11, 2012, 04:41:21 PM |
|
You can use two threads per GPU, though, so that when a long poll comes in, one thread can start fetching new data while the other is waiting for the GPU to finish.
Are you sure ? The way I understand cgminer's threads, they should all try to keep working in parallel (for <n> threads each thread should be using 1/n of the processing power) and fetching work is done asynchronously so that it is ready as soon as a GPU thread is available. So with a given intensity the more threads you have, the more time you should spend working on a workbase invalidated by a long poll. This is how I understood the advice ckovilas gives in the cgminer's README to use only one thread. gyverlb, the one thread per GPU was a work-around for old versions of cgminer. As I understand it, while a GPU is processing a batch, the thread that submitted it is blocked waiting for the answer, so, if you have a single thread it cannot fetch new work before the GPU completes its batch. Using two threads makes it possible to have the second thread starting to fetch new work while the first one is still waiting for the GPU to finish its work. I'm using two threads without problems (stales are around 1-2% lower than p2pool ones). spiccioli. Nope. It doesn't wait for work to finish before getting new work from the pool. A separate thread deals with getting the work before it is needed so the GPU isn't idle for that long amount of time that would be spent sending out a work request and getting a reply. kano, good to know, so what advantage do you get using a single thread per GPU? spiccioli.
|
|
|
|
gyverlb
|
|
April 11, 2012, 05:11:17 PM |
|
kano,
good to know, so what advantage do you get using a single thread per GPU?
spiccioli.
What kano described matches my understanding so I assume using a single thread instead of two makes the single thread complete twice as fast : statistically the GPU latency should be halved. In term of latency gains, it should be nearly identical to decrementing intensity by one (which halves the space explored by the GPU on each work). Maybe with low intensities decreasing thread count may lower hashrate less than decreasing intensity further and is thus preferred with p2pool ?
|
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
April 11, 2012, 05:16:09 PM |
|
good to know, so what advantage do you get using a single thread per GPU?
The thread completes faster so it has stale work due to LP. Intensity determines the # of hashes in a workload (batch). hashes in workload = 2^(15+Insensity) With 1 thread a 500 MH/s GPU simply runs at 500 MH/s. With 2 threads the GPU is split into two threads each running at 250 MH/s. For a given intensity a GPU will finish faster with less threads.
|
|
|
|
rudrigorc2
Legendary
Offline
Activity: 1064
Merit: 1000
|
|
April 11, 2012, 06:34:25 PM |
|
so its better high intensity and single thread than low intensity and double thread on a 5970?
|
|
|
|
DeathAndTaxes
Donator
Legendary
Offline
Activity: 1218
Merit: 1079
Gerald Davis
|
|
April 11, 2012, 06:56:58 PM Last edit: April 12, 2012, 12:16:15 AM by DeathAndTaxes |
|
so its better high intensity and single thread than low intensity and double thread on a 5970?
I have found Intensity 8, 1 thread to work well for 5970. <0.2% DOA and <5% orphan without too much of a hit in throughput. I did try Inensity 8 and Intensity 7 w/ 2 threads and got worse results. Intensity 9, 1 thread also worked but had a higher stale rate. There is some variance in stales so I opted for the I:8, T:1 instead. I:9, T:2 was downright horrible with stales, 10%+.
|
|
|
|
|