gmaxwell
Moderator
Legendary
Offline
Activity: 2072


May 31, 2011, 02:27:05 PM 

No, they are not measurements. There is no way to measure how much work went into finding any given hash, unless you are actually monitoring each and every miner involved.
I think you don't actually know what a measurement is. If you ask everyone coming into the emergency room complaining of stomach pains "Did you eat cucumber in the last 24 hours?" is that not a measurement? Even though the results will be be contaminated by noise and biases? If the census randomly selects 100,000 houses out of a million to get demographics from, is this not a measurement? In this case we know the hashrate by virtue of the network reporting when it finds a block. This is a measurement of _well_ defined process with explicable behavior and we can speak with certainty about it. Unlike my silly examples its not subject to much in the way surprising biases or unknown sources of noise. (Because, e.g. if broken clients diminish our hash rate— thats not a random effect we'd wish to exclude) about all there is to worry about is how you time the blocks: If you time from the blocktimes you're subject to node timestamp stupidity if you measure locally you will be subject to a small amount of network propagation noise. But bitcoin propagation time is miniscule compared to the average time between blocks. Regardless, If the expected solution rate is r then the proportion of times in which a block will take longer than x is e^(1/r*x). Moreover, the particular process at play here has its maximum likelihood value at the mean. So you don't have to do any fancier math than taking an average to reach the most accurate measurement possible given the available data. So, e.g. if we see long gaps then we can say with very high confidence that the _measurement_ being performed by the network is telling us that the hashrate is almost certainly low. As far as the timestamps go— yes, bitcoin network time can be wonky. So don't pay any attention to it. If you have a node with good visibility you'll observe the blocks directly.





Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.


kjj
Legendary
Offline
Activity: 1302


May 31, 2011, 03:00:38 PM 

No, they are not measurements. There is no way to measure how much work went into finding any given hash, unless you are actually monitoring each and every miner involved.
I think you don't actually know what a measurement is. If you ask everyone coming into the emergency room complaining of stomach pains "Did you eat cucumber in the last 24 hours?" is that not a measurement? Even though the results will be be contaminated by noise and biases? If the census randomly selects 100,000 houses out of a million to get demographics from, is this not a measurement? In this case we know the hashrate by virtue of the network reporting when it finds a block.No. The only thing you know at this point is the time (roughly) that this block was found. This is a measurement of _well_ defined process with explicable behavior and we can speak with certainty about it.
Yes, well defined, but nondeterministic. You can speak with statistical certainty about it in bulk, but not at small scales. Think radioactive decay. I can give you a very accurate estimate of how long it will take for a mole of U235 to decay halfway, but I can't tell you anything at all about how long it will be until the next hit on your geiger counter. Unlike my silly examples its not subject to much in the way surprising biases or unknown sources of noise. (Because, e.g. if broken clients diminish our hash rate— thats not a random effect we'd wish to exclude) about all there is to worry about is how you time the blocks: If you time from the blocktimes you're subject to node timestamp stupidity if you measure locally you will be subject to a small amount of network propagation noise. But bitcoin propagation time is miniscule compared to the average time between blocks.
Regardless, If the expected solution rate is r then the proportion of times in which a block will take longer than x is e^(1/r*x).
Moreover, the particular process at play here has its maximum likelihood value at the mean. So you don't have to do any fancier math than taking an average to reach the most accurate measurement possible given the available data.
So, e.g. if we see long gaps then we can say with very high confidence that the _measurement_ being performed by the network is telling us that the hashrate is almost certainly low.
As far as the timestamps go— yes, bitcoin network time can be wonky. So don't pay any attention to it. If you have a node with good visibility you'll observe the blocks directly.
Gah. You understand the statistics, but you can't seem to accept the implications. I'll say it again. The time to find one block tells you nothing about the amount of work that went into finding it. When you start talking about large numbers of blocks, you can start saying things like "the probability is very high that the network hashing rate was X over this long interval". This is not the language of measurements. It is the language of estimation.




tiberiandusk


May 31, 2011, 03:01:37 PM 

Is there a good alternative to deepbit?
I switched to eligius a while back when I saw deepbit growing too quickly. If eligius gets too large someday I will switch again. Eventually I would like to be strictly solo but I need to earn more coins to buy more hardware so I can earn more coins and buy more hardware so I can earn more coins until the entire midwest of the united states is covered under a mountain of GPUs and spontaneously combusts into a second sun. Anyone want to invest in some windturbines/solar panels I can put up on my farm to make a totally offgrid, self sufficient mining facility?




CydeWeys


May 31, 2011, 03:07:15 PM 

Kuji, let us example a hypothetical scenario. Let's say I am flipping a fair coin several thousand times. Do you believe it is impossible for me to calculate the odds of getting ten heads in a row because "You can speak with statistical certainty about it in bulk, but not at small scales"? Because it is, in fact, possible to calculate exactly the odds of that happening, as any introductory Statistics student could tell you.
This is no different than Bitcoin. In fact, the distribution of measurements is exactly the same between the two scenarios (results of coinflipping and results of hashing to find improbable hashes); both fit a Poisson distribution.
I will refer you to the earlier chapters of a typical collegelevel Statistics book, especially the parts on calculating probabilities.




kjj
Legendary
Offline
Activity: 1302


May 31, 2011, 03:25:51 PM 

Kuji, let us example a hypothetical scenario. Let's say I am flipping a fair coin several thousand times. Do you believe it is impossible for me to calculate the odds of getting ten heads in a row because "You can speak with statistical certainty about it in bulk, but not at small scales"? Because it is, in fact, possible to calculate exactly the odds of that happening, as any introductory Statistics student could tell you.
This is no different than Bitcoin. In fact, the distribution of measurements is exactly the same between the two scenarios (results of coinflipping and results of hashing to find improbable hashes); both fit a Poisson distribution.
I will refer you to the earlier chapters of a typical collegelevel Statistics book, especially the parts on calculating probabilities.
Who the hell is Kuji? And your example is nothing at all like bitcoin. You can calculate the odds of getting 10 in a row out of X thousands of flips. And then if you do X thousands flips repeatedly, say Y times, the number of 10inarow events in reality will approach your calculation as Y grows larger. Better example. You have a billion sided die, and you throw it and it comes up showing "1". Should I infer from that event that you had actually thrown the die 500 million times? If you are looking for a good book on statistics, I suggest Savage's Foundations of Statistics. Slog your way through that and you'll have a MUCH better understanding of what statistics can, and cannot, do. Oh, and a drinking problem.




gmaxwell
Moderator
Legendary
Offline
Activity: 2072


May 31, 2011, 05:55:26 PM 

Better example. You have a billion sided die, and you throw it and it comes up showing "1". Should I infer from that event that you had actually thrown the die 500 million times?
We're not measuring any outcome. We're measuring a small set of outcomes. If I send you a block header and you come back with a solution at the current difficulty how many hashes did you do before you found it? The most likely answer is 18,678,355,68,419,371 (I changed this number after my initial post because I'd accidentally done a nice precise calculation with the wrong value for the current difficulty ) Maybe you did one hash, maybe you did 1000 times that. But those cases are VERY unlikely. Far more unlikely than the error that would be found in virtually any measurement performed of any physical quantity. So we can easily specify whatever error bounds we like, and say with high confidence that the work was almost certainly within that range and that the chance of the result coming from chance is small— just as if we were comparing the weights of things on a scale (though the difference in process gives a different error distribution, of course). Because bitcoin solutions arise from such a fantastically large number of fantastically unlikely events— giving a fairly high overall rate, the confidence bounds are fairly small even for a single example. E.g. at the normal expectation of 10 minutes, we'd expect 99% of all gaps to be under 46.05 minutes and our average rate has been faster than that. So you're asking us to accept that there were multiple p<1% events but not a loss of hashrate when at the same time the biggest pool operator is obviously taking an outage which is visible to all. OOooookkkaaay...




kjj
Legendary
Offline
Activity: 1302


May 31, 2011, 06:08:17 PM 

Because bitcoin solutions arise from such a fantastically large number of fantastically unlikely events— giving a fairly high overall rate, the confidence bounds are fairly small even for a single example.
Rubbish. The only thing that you can tell from a single sample is how unlikely it would be for you to repeat it in the same amount of time. You can tell absolutely nothing at all about the amount of work actually done. Say I get a block, and returned the result to you in a nanosecond. What is more likely? Option A) I got really lucky. Option B) I have more hashing power at my disposal than the theoretical hashing power of the entire solar system, should it ever be converted into a computer.




gmaxwell
Moderator
Legendary
Offline
Activity: 2072


May 31, 2011, 08:34:54 PM 

The only thing that you can tell from a single sample is how unlikely it would be for you to repeat it in the same amount of time. You can tell absolutely nothing at all about the amount of work actually done.
Say I get a block, and returned the result to you in a nanosecond. What is more likely?
Option A) I got really lucky. Option B) I have more hashing power at my disposal than the theoretical hashing power of the entire solar system, should it ever be converted into a computer.
Obviously the former. really really really really really lucky. In fact, I might instead think it's more likely that you've found a weakness in SHA256 and are able to generate preimages fast. And by the same token if the network goes a day without a result, whats more likely ... that an astronomically unlikely run of bad luck happened— p=2.8946403116483e63 a one in a wtfdotheyhaveawordforthat event— or that it lost hashrate or had some other kind of outage? We had block gaps with combined one in a thousand chances given the expected nominal rate and at the same time big pools were down and their operators were saying they were suffering an outage. But you say we know nothing about the hash rate? *ploink*




kjj
Legendary
Offline
Activity: 1302


May 31, 2011, 08:51:53 PM 

I like how you move the goalposts. But really, you've already agreed with me. I never said that we know nothing of the hash rate, over a long period of time. I said that we are not measuring the hash rate, and that our estimates of it aren't very accurate over short periods of time. Oh, and there were gaps with one in a thousand chances in the other direction (quick solutions) during that same timeframe too. http://blockexplorer.com/b/127594http://blockexplorer.com/b/127588And more.




gmaxwell
Moderator
Legendary
Offline
Activity: 2072


May 31, 2011, 08:58:09 PM 

I like how you move the goalposts. But really, you've already agreed with me.
I never said that we know nothing of the hash rate, over a long period of time. I said that we are not measuring the hash rate, and that our estimates of it aren't very accurate over short periods of time.
Kaji, you've evaded the clear points I raised about the nature of measurement only to continue asserting that what we have isn't one. Why should I even continue to discuss with you when you won't explain why you say that this isn't a measurement when e.g. using a scale (which is also not infinitely precise, and is subject to complicated biases and noise), taking a survey, or taking a census are all considered to be measurements. If you will say that there is no such thing as a measurement of a physical quantity then I will agree with you, given this definition of measurement, that we haven't made a measurement here. Otherwise, I fail to see what you're arguing— that the pool operators are liars when they said they went down— that the pool users were hallucinating when they saw their miners idle— that the rate didn't really go down and that the system was just really unlucky at the time of these lies and hallucinations?




CydeWeys


May 31, 2011, 09:00:37 PM 

Better example. You have a billion sided die, and you throw it and it comes up showing "1". Should I infer from that event that you had actually thrown the die 500 million times?
Nope, Kaji, that isn't a better example. A better example would be you roll a billionsided die repeatedly and you measure how many rolls it takes between each occurrence of, say, a face <= 100 coming up. You can assign probabilities to how many expected rolls it takes on average between getting <= 100, and then if you get something way outside of that range, you can say that something very unexpected happened.




kjj
Legendary
Offline
Activity: 1302


May 31, 2011, 10:52:38 PM 

Who the fuck is Kaji?




kjj
Legendary
Offline
Activity: 1302


May 31, 2011, 10:58:38 PM 

Better example. You have a billion sided die, and you throw it and it comes up showing "1". Should I infer from that event that you had actually thrown the die 500 million times?
Nope, Kaji, that isn't a better example. A better example would be you roll a billionsided die repeatedly and you measure how many rolls it takes between each occurrence of, say, a face <= 100 coming up. You can assign probabilities to how many expected rolls it takes on average between getting <= 100, and then if you get something way outside of that range, you can say that something very unexpected happened. But if it comes up more or less often than you'd predict, you don't then pretend that you've "measured" my rolling speed.




CydeWeys


May 31, 2011, 11:07:56 PM 

Better example. You have a billion sided die, and you throw it and it comes up showing "1". Should I infer from that event that you had actually thrown the die 500 million times?
Nope, Kaji, that isn't a better example. A better example would be you roll a billionsided die repeatedly and you measure how many rolls it takes between each occurrence of, say, a face <= 100 coming up. You can assign probabilities to how many expected rolls it takes on average between getting <= 100, and then if you get something way outside of that range, you can say that something very unexpected happened. But if it comes up more or less often than you'd predict, you don't then pretend that you've "measured" my rolling speed. Not necessarily. But the null hypothesis is "Events are occurring at X speed." If I consistently get experimental data with extremely low probability values at X speed, say they are outside a 99% confidence interval, then I can reject the null hypothesis and posit that the actual speed is different. To calculate the actual speed, you would divide the number of events into the time window, but of course this is very noisy in a smaller time window.




kjj
Legendary
Offline
Activity: 1302


May 31, 2011, 11:22:22 PM 

I like how you move the goalposts. But really, you've already agreed with me.
I never said that we know nothing of the hash rate, over a long period of time. I said that we are not measuring the hash rate, and that our estimates of it aren't very accurate over short periods of time.
Kaji, you've evaded the clear points I raised about the nature of measurement only to continue asserting that what we have isn't one. Why should I even continue to discuss with you when you won't explain why you say that this isn't a measurement when e.g. using a scale (which is also not infinitely precise, and is subject to complicated biases and noise), taking a survey, or taking a census are all considered to be measurements. If you will say that there is no such thing as a measurement of a physical quantity then I will agree with you, given this definition of measurement, that we haven't made a measurement here. Otherwise, I fail to see what you're arguing— that the pool operators are liars when they said they went down— that the pool users were hallucinating when they saw their miners idle— that the rate didn't really go down and that the system was just really unlucky at the time of these lies and hallucinations? There is an actual real number of hashes performed, and there is an actual real rate of them being performed over a given interval. These numbers are only known at the miner itself. They are not collected by the pools, they are not collated globally. If they were, we would have actual measurements. Instead, we have an interval, and the number of hashes that we expect it would take, on average, to perform the work demonstrated. Divide one by the other, and we have an estimate of the hash rate. This is only an estimate because, and this is key, the number of hashes it takes to make any given block is nondeterministic. You can't measure a single nondeterministic event, or even a small number of them, and call it a measurement. I guess if we can't agree on that, it is pointless to continue, but you should totally write a paper and claim your Nobel Prize, because a way to determine nondeterministic systems would probably be the biggest discovery since relativity. Maybe since ever. Or maybe you can only go so far as to say that SHA256 is predictable, in which case the world's cryptographers would love to hear what you have to say. By the way, I've never said that the pool operators were liars, nor that the pool users were hallucinating. I merely said, and I'll quote myself: No one has any idea where the error bars on the hash rate graphs should be. On the 1 day line, they will be astronomically huge (not to mention the 8 hour window, lol). Like several times the width of the plotted channel huge.
DO NOT MAKE ASSUMPTIONS BASED ON THOSE GRAPHS. They are estimates, not measurements.




