mmgen-py (OP)
|
|
February 24, 2016, 11:16:04 AM Last edit: February 24, 2016, 12:57:23 PM by mmgen-py |
|
I just did a full initial sync of the blockchain on a low-end laptop running Bitcoin Core 0.12. Here are the results:
Hardware: CPU: Pentium 4-core N3540 @2.16GHz Disk: Seagate 5400 RPM 16MB Cache SATA 6.0Gb/s RAM: 4GB
Software: Bitcoin Core v0.12 (bitcoind) / gcc 5.2.1 / Linux 4.2
Internet connection: 3.6 Mbyte/sec
Sync summary: Block height: 399811 Sync time: 35:32 Avg CPU load: 47% Max RAM usage: 640 MB (16%) Max net usage: 2.4 Mbyte/sec (67%)
At first, I should mention that 35.5 hours to download and verify the blockchain (currently ≃67 GB) on a budget laptop is a good result, exceeding my expectations. The speedups afforded by libsecp256k1 undoubtedly played a large role here.
There are some concerning issues, however: As the sync process proceeded, disk and CPU usage increased and progress became slower and slower. This slowdown is reflected in the progress summary below:
PROGRESS TIME TIME(%) -------- ---- ------- 0%: 00:00 0.0% 5%: 01:34 4.4% 10%: 03:04 8.6% 15%: 05:07 14.4% 20%: 05:55 16.7% 25%: 06:36 18.6% 30%: 07:17 20.5% 35%: 07:59 22.5% 40%: 08:53 25.0% 45%: 09:45 27.4% 50%: 10:42 30.1% 55%: 11:46 33.1% 60%: 12:59 36.5% 65%: 14:19 40.3% 70%: 15:40 44.1% 75%: 17:30 49.2% 80%: 19:43 55.5% 85%: 21:54 61.6% 90%: 25:03 70.5% 95%: 29:16 82.4% 100%: 35:32 100.0%
As can be seen from the table, it took longer to sync the last 25% of the blockchain than it did to sync the first 75%. Even more dramatically, almost 30% of total sync time was spent on just the last 10% of the blockchain. DISCLAIMER: Progress is estimated by bitcoind (in the function GuessVerificationProgress) by computing the number of remaining transactions in the blockchain using a complex formula. Inaccuracies in these progress estimates might skew the data and lead to inaccurate conclusions.
During the last 10% of the blockchain sync, both the WAIT column of "top" and the furious clicking of disk heads revealed that disk activity was becoming a quite obvious bottleneck, leading to the CPU going idle for longer and longer periods.
When not idle, CPU usage (which was low at the beginning of the sync process), reached occasional peaks of around 90% in this final stage. Most of the CPU time during these peaks was being consumed by signature-verifying threads (i.e. libsecp256k1), so there's probably little room for further optimization here.
Network bandwidth peaked at around 67% capacity and was therefore never a limiting factor.
Another curious pattern that emerged during the later stage of the sync process were long delays (4-5 min) during which data was still coming in through the network at a high rate but both disk and CPU were relatively idle and little or no chain updating was occurring. This can be seen in the following excerpt of block polling data:
TIME BLOCK NEW BLOCKS ---- ----- ---------- 22:36:25 321977 148 22:37:29 322112 135 22:38:29 322212 100 22:39:29 322345 133 22:40:39 322467 122 22:41:48 322599 132 22:42:52 322740 141 22:43:53 322885 145 22:44:53 322979 94 22:45:53 322979 0 22:46:53 322979 0 22:47:53 322979 0 22:48:53 322979 0 22:50:42 323125 146 22:51:57 323324 199 22:52:57 323467 143 22:54:05 323631 164 22:55:12 323791 160 22:56:29 323996 205 22:57:29 324045 49 22:58:39 324085 40 22:59:39 324107 22 23:00:39 324112 5 23:02:00 324193 81 23:03:17 324342 149 23:04:28 324491 149 23:05:49 324671 180 23:06:50 324835 164 23:08:10 325037 202 23:09:21 325185 148 23:10:23 325287 102 23:11:23 325376 89 23:12:23 325402 26 23:13:24 325402 0 23:14:24 325402 0 23:15:37 325439 37 23:16:40 325600 161 23:17:46 325727 127 23:18:48 325904 177 23:20:20 326114 210 23:21:24 326236 122 23:22:31 326349 113 23:23:37 326441 92 23:25:01 326502 61 23:26:01 326503 1 23:27:01 326503 0 23:28:01 326503 0 23:29:01 326504 1 23:30:04 326598 94 23:31:07 326756 158 23:32:07 326917 161 23:33:20 327087 170 23:34:46 327259 172 23:35:50 327378 119 23:36:50 327506 128
The gaps became longer as syncing progressed (and slowed):
TIME BLOCK NEW BLOCKS ---- ----- ---------- 09:46:49 397357 59 09:48:00 397397 40 09:49:00 397423 26 09:50:01 397442 19 09:51:19 397498 56 09:52:28 397541 43 09:57:52 397616 75 09:58:54 397660 44 09:59:54 397663 3 10:00:54 397663 0 10:01:54 397663 0 10:02:54 397663 0 10:03:54 397663 0 10:04:54 397663 0 10:06:50 397720 57 10:08:27 397779 59 10:09:27 397779 0 10:10:27 397779 0 10:12:06 397817 38 10:13:08 397857 40 10:14:30 397888 31 10:16:49 397952 64 10:17:57 397982 30 10:19:16 398042 60 10:20:43 398098 56 10:22:03 398146 48 10:23:38 398193 47 ... 10:42:44 398692 23 10:44:13 398729 37 10:45:19 398756 27 10:46:24 398782 26 10:47:33 398805 23 10:48:45 398834 29 10:49:47 398860 26 10:51:38 398910 50 10:52:38 398914 4 10:53:38 398914 0 10:54:38 398914 0 10:55:38 398914 0 10:56:47 398924 10 10:57:52 398956 32 10:59:17 398991 35 11:00:18 399028 37 11:01:38 399073 45 11:02:45 399110 37 11:03:45 399117 7 11:04:46 399117 0 11:05:46 399117 0 11:06:46 399117 0 11:07:46 399117 0 11:08:47 399135 18 11:09:47 399171 36 11:11:02 399214 43 11:13:28 399320 106 11:15:31 399394 74 11:16:54 399440 46 11:18:43 399503 63 11:20:41 399575 72
Since these "dead spots" can be attributed neither to hardware nor the network, they must be the result of something sub-optimal in the software. At quick glance, it appears that a 20-30% speedup could be achieved by fixing this problem, so there's work for the core devs to do here.
Conclusion
Even if we discount possible inaccuracies in progress estimates, it's clear from these results that sync time is not a linear function of block chain size but an exponential one. This has serious implications for Bitcoin's scalability: even with no block size increase, sync times will continue to grow exponentially as the years pass. If consumer-grade hardware is unable to keep pace with this growth (which can't be ruled out, now that Moore's Law has all but officially been declared dead), it may become impossible for ordinary users to run full nodes at some point in the not-so-distant future.
|
|
|
|
watashi-kokoto
|
|
February 24, 2016, 12:03:31 PM |
|
it's clear from these results that sync time is not a linear function of block chain size but an exponential one.
You are wrong. the initial checking is fast because signature until year 2011 are not checked at all if I remember correctly.
|
|
|
|
mmgen-py (OP)
|
|
February 24, 2016, 12:10:33 PM Last edit: February 24, 2016, 12:53:34 PM by mmgen-py |
|
it's clear from these results that sync time is not a linear function of block chain size but an exponential one.
You are wrong. the initial checking is fast because signature until year 2011 are not checked at all if I remember correctly. The progress estimating function takes this into account, separating "expensive" (after last checkpoint, signature verification) from "cheap" (pre-last checkpoint, no signature verification) transactions, with the former given 5x weight over the latter. Perhaps this factor needs to be adjusted.
|
|
|
|
belcher
|
|
February 24, 2016, 01:00:03 PM |
|
Did you use the -dbcache parameter? It can speed up sync by storing parts of the UTXO set in RAM instead of on hard disk. You have 4GB so there is space to keep at least some of the UTXO set.
|
1HZBd22eQLgbwxjwbCtSjhoPFWxQg8rBd9 JoinMarket - CoinJoin that people will actually use. PGP fingerprint: 0A8B 038F 5E10 CC27 89BF CFFF EF73 4EA6 77F3 1129
|
|
|
|
capcher
Full Member
Offline
Activity: 224
Merit: 100
This user is currently ignored.
|
|
February 24, 2016, 01:21:27 PM |
|
Did you do the same benchmark for 0.11.2? Wondering if some of the behaviors could be normal for your specs.
|
|
|
|
jl777
Legendary
Offline
Activity: 1176
Merit: 1134
|
|
February 24, 2016, 01:22:34 PM |
|
The first 200K blocks are mostly empty and represents about 10% of total size, maybe less The big increase in size happens around block 250k and then I think around 350k
So it is normal for it to zoom through the first 50% to 75% and then slow down.
With a parallel sync, you can sustain network bandwidth speeds but that requires a lot of custom data files and multipass processing of the data. Not sure it would be practical using a DB based system.
The pauses are most likely due to waiting for data for a specific block. Depending on how many blocks are requested to a peer, it could be a long pipeline and since the order of the blocks being returned is at the discretion of the other peer (if you send request data with a long list), you might be getting a lot of blocks out of order. What I ended up doing is request one specific block at a time, so there is no long queue from a specific peer. Still I see bandwidth drops to 50% saturation at time, as peers disconnect or are just slow sometimes and until there are enough pending tasks to keep things saturated, it will be a bit inconsistent.
It is very tricky to keep things streaming.
The pruning of 0.12 is a great improvement.
James
P.S. Certainly if things are stuck updating a HDD based DB write, with all that entails, that could stop things dead in its tracks if it triggers some data movements, which DB's are prone to do
|
|
|
|
jl777
Legendary
Offline
Activity: 1176
Merit: 1134
|
|
February 24, 2016, 01:24:08 PM |
|
Did you do the same benchmark for 0.11.2? Wondering if some of the behaviors could be normal for your specs.
I saw similar times from 0.11, I dont think there was anything to change performance a lot. Unless you consider not using up all of RAM and VM and thrashing all over the place without pruning. the pruning is fantastic. I can finally run bitcoind on my laptop again James
|
|
|
|
RealBitcoin
|
|
February 24, 2016, 01:26:32 PM |
|
Free bump from me, this topic sounds very important, i hope core devs see this and consider it, we can make bitcoin great!
|
|
|
|
jl777
Legendary
Offline
Activity: 1176
Merit: 1134
|
|
February 24, 2016, 01:32:46 PM |
|
Free bump from me, this topic sounds very important, i hope core devs see this and consider it, we can make bitcoin great!
iguana already does the parallel sync and doing while fully saturating the bandwidth, as long as you have enough CPU to keep up. On a beefy server with 1gbps connection (i know not typical) it is downloading the entire 60GB in about 12 minutes, but taking about 30 minutes to an hour to process it. That is with 8 cores. For a more typical home connnection of 20mbps, it takes 6 hours and even with 1.4Ghz CPU it seems to mostly keep up with things, but I havent verify the latest iteration does and probably the CPU speed will be an issue if using just one core. Using 4 cores, the 24hrs of CPU time is plenty James P.S. iguana is an open source project in the "almost beta" stage
|
|
|
|
gmaxwell
Moderator
Legendary
Offline
Activity: 4284
Merit: 8808
|
|
February 24, 2016, 01:38:30 PM |
|
Those pauses aren't pauses. You're going by log entries, which only note when verification has completed. Verification can only proceed in order (block _chain_, after all). When a peer is slow sending a block, you'll continue to get blocks from other peers in parallel but the rest of the pipeline will have to wait until the slower block finishes. Once it does it will catch up, and you'll get a burst of messages. If a peer truly and persistently stalls the process it will be kicked off. So long as your network is staying active, it's doing what it can. (There are times during IBD where network activity does fall down, which can be improved... but it sounds like your IO is slow enough that this isn't a factor for you.)
35 hours is a horribly long time, though I'm sure much better than prior versions.
The primary cause being disk in your system is very slow (5400 RPM laptop drive being tens of times slower than a modern SSD). If you were to run with a larger dbcache setting you would likely see vastly improve performance-- the defaults are sized to keep the software still runnable on systems with fairly low memory. (It can't really set it automatically, because, unless you had much more ram than you do, bitcoin would end up hogging most of your memory when you wanted to use your computer for other things-- the fact that you have more memory doesn't mean it's okay for Bitcoin to use it)
By comparison, I normally measure under 4 hours on a fast desktop with a home internet connection (with dbcache cranked up).
|
|
|
|
mmgen-py (OP)
|
|
February 24, 2016, 01:42:53 PM |
|
The first 200K blocks are mostly empty and represents about 10% of total size, maybe less The big increase in size happens around block 250k and then I think around 350k
So it is normal for it to zoom through the first 50% to 75% and then slow down.
Bitcoind's progress estimation is transaction-based, not block-based. The pauses are most likely due to waiting for data for a specific block. Depending on how many blocks are requested to a peer, it could be a long pipeline and since the order of the blocks being returned is at the discretion of the other peer
Thanks, that probably explains the pauses. And thanks for the other points in your informative post.
|
|
|
|
jl777
Legendary
Offline
Activity: 1176
Merit: 1134
|
|
February 24, 2016, 01:52:54 PM Last edit: February 24, 2016, 02:06:42 PM by jl777 |
|
The first 200K blocks are mostly empty and represents about 10% of total size, maybe less The big increase in size happens around block 250k and then I think around 350k
So it is normal for it to zoom through the first 50% to 75% and then slow down.
Bitcoind's progress estimation is transaction-based, not block-based. The pauses are most likely due to waiting for data for a specific block. Depending on how many blocks are requested to a peer, it could be a long pipeline and since the order of the blocks being returned is at the discretion of the other peer
Thanks, that probably explains the pauses. And thanks for the other points in your informative post. most of the early blocks have very few transactions. Just not much crypto going on in 2010 And the estimate for the total number of transactions is probably updated based on the blocks it has seen so far, so it wont know they are blocks full of transactions until it gets there.I take the above back, I am seeing the progress bar at block 350K which is 85%+ done but the progress bar is nowhere close to that, so it must be estimating the number of tx somehow.
|
|
|
|
mmgen-py (OP)
|
|
February 24, 2016, 02:04:18 PM |
|
Those pauses aren't pauses. You're going by log entries, which only note when verification has completed.
Of course that explains it. Otherwise there would be out-of-order entries in the log, which there never are. It doesn't explain the lack of CPU activity though during the pauses. Sigs could be checked in the order received, couldn't they? If you were to run with a larger dbcache setting you would likely see vastly improve performance
I may try that -- and post the results for comparison purposes. It can't really set it automatically, because, unless you had much more ram than you do, bitcoin would end up hogging most of your memory when you wanted to use your computer for other things
A dynamic dbcache setting might be OK if the limit were set conservatively. A default setting should never be seriously sub-optimal, in my opinion.
|
|
|
|
jl777
Legendary
Offline
Activity: 1176
Merit: 1134
|
|
February 24, 2016, 02:09:49 PM |
|
Those pauses aren't pauses. You're going by log entries, which only note when verification has completed.
Of course that explains it. Otherwise there would be out-of-order entries in the log, which there never are. It doesn't explain the lack of CPU activity though during the pauses. Sigs could be checked in the order received, couldn't they? If you were to run with a larger dbcache setting you would likely see vastly improve performance
I may try that -- and post the results for comparison purposes. It can't really set it automatically, because, unless you had much more ram than you do, bitcoin would end up hogging most of your memory when you wanted to use your computer for other things
A dynamic dbcache setting might be OK if the limit were set conservatively. A default setting should never be seriously sub-optimal, in my opinion. while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync. So you could verify a significant subset as it comes in during the first pass, but until the blockchain is fully there you cant verify all the sigs. as the highest block is advanced, whatever sigs that are not verified yet could be verified. But with the checkpointing, there isnt the sig verification needed until the last checkpoint anyway
|
|
|
|
LiteCoinGuy
Legendary
Offline
Activity: 1148
Merit: 1014
In Satoshi I Trust
|
|
February 24, 2016, 03:55:54 PM |
|
35.5 hours to download and verify the blockchain now? very good results.
|
|
|
|
mmgen-py (OP)
|
|
February 24, 2016, 04:09:57 PM |
|
while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.
Which places rather tight constraints on parallelism.
|
|
|
|
jl777
Legendary
Offline
Activity: 1176
Merit: 1134
|
|
February 24, 2016, 04:29:00 PM |
|
while you can verify sigs a lot of times from the recent blocks, there are many times where you just dont know what the txid the vin refers to is, since you havent seen it yet during a parallel sync.
Which places rather tight constraints on parallelism. I have the blockchain validating serially all throughout the parallel sync, so all the sig validations can take 30 minutes using N cores without affecting the final completion time. Also I think I only need to validate from the latest checkpoint, so that reduces significantly the sheer number of sigs that need to be verified. As long as users can see balances, even if the validation isnt complete yet, just need to prevent any spends until all the inputs for the tx are validated. My guess is for most of the people most of the time, it can be done so the time for sig validation is not noticeable. Also for user tx before fully validated, I could select the oldest inputs and specificallly validate the things needed to make sure those are valid. James
|
|
|
|
mmgen-py (OP)
|
|
February 24, 2016, 04:39:00 PM Last edit: February 24, 2016, 06:04:07 PM by mmgen-py |
|
Also I think I only need to validate from the latest checkpoint, so that reduces significantly the sheer number of sigs that need to be verified.
Actually, I was running rc5 as it turns out, which had the checkpoint way back at block #295000. That's been updated in the final release, which of course will make things go much faster.EDIT: Not so, as it turns out. Checkpoints are no longer being updated (since v0.10), so the last one remains at #295000.
|
|
|
|
jl777
Legendary
Offline
Activity: 1176
Merit: 1134
|
|
February 24, 2016, 04:43:56 PM |
|
Also I think I only need to validate from the latest checkpoint, so that reduces significantly the sheer number of sigs that need to be verified.
Actually, I was running rc5 as it turns out, which had the checkpoint way back at block #295000. That's been updated in the final release, which of course will make things go much faster. majority of tx since 295000 It doesnt take too long to process that far, but it starts really slowing down at around 350,000. I think that is when p2sh started being used a lot more and tx just got bigger and more complicated
|
|
|
|
|