Bitcoin Forum

Bitcoin => Development & Technical Discussion => Topic started by: ABCbits on April 22, 2021, 12:34:13 PM



Title: Are there any benchmark about Bitcoin full node client resource usage?
Post by: ABCbits on April 22, 2021, 12:34:13 PM
Are there any benchmark about Bitcoin full node client resource usage/time complexity on different block size? For reference, i'm looking for benchmark similar with Bitfury report (Block Size Increase (https://bitfury.com/content/downloads/block-size-1.1.1.pdf)). Unfortunately Bitfury's paper doesn't mention hardware, software and parameter used, so it's not reliable source.

https://i.ibb.co/WGT62x7/s1-cleaned.png (https://ibb.co/pbmxL0N)
https://i.ibb.co/KVGb8cp/s2-cleaned.png (https://ibb.co/j35bj9P)

What i'm NOT looking is benchmark like this, 2020 Bitcoin Node Performance Tests (https://blog.lopp.net/2020-bitcoin-node-performance-tests/).


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: Shymaa-Arafat on May 03, 2021, 07:38:37 PM
I know those locality plots here
https://github.com/mit-dci/utreexo/issues/257
 r not exactly what u r looking for, but u may find in Utreexo project site the open source and data to do ur own plots/data analysis (they do download the IBD Initial Blockchain Data which is full data of previous 200 Blocks.
.
-Also this site give data about block sizes, no. of txs,... etc
 https://blockstream.info/testnet/block/00000000000000167572bba29bdcec2cbb0e6926b61d33233b927d44fb75cc33
.
Hope this helps...


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: NotATether on May 04, 2021, 06:30:57 AM
How would somebody simulate the block size for blocks received by Bitcoin Core to perform these tests though? The blocks are either 1MB or 4 weight units in size so it involves running it in regtest and submitting your own blocks en masse, and then you have to fork Core and change the code to allow for bigger size limits. I don't believe anyone has done this yet. And by the way I think that it makes more sense to benchmark resource usage in WUs instead of megabytes since new versions of the client doesn't have a concept of megabyte-denominated block size.


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: Quickseller on May 04, 2021, 06:43:09 AM
How would somebody simulate the block size for blocks received by Bitcoin Core to perform these tests though?
You can run bitcoin core in testnet mode, change the code to accept different size blocks, and use a second device to mine blocks at the various specifications.


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: Shymaa-Arafat on May 04, 2021, 12:37:21 PM
Quote
Iappreciate the effort, but those are far from what i'm looking for.

u could use the (available) IBD of previous blocks they use


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: NotATether on May 04, 2021, 05:14:42 PM
Roughly this is what i had in mind,
1. Prepare separate network (with specific block weight).
2. Prepare few fast computer solely to broadcast transaction & mine block.
3. Prepare few cheap/low-powered/old computer (e.g. Raspberry Pi and old Intel NUC) used to perform benchmark.
4. On fast computer, perform mining block and broadcast transaction (on various network load such as 25%, 50%, 100% and 125% of block size limit).
5. Measure time to verify block/transaction and CPU, RAM and I/O usage on other computer.

You do not need separate computers for the benchmark, for more reliable results you can spin up virtual machines and selectively limit their RAM and CPU usage. You still need to get separate HDD and SSD disks but these can be passed through via PCI to the VM (I assume internet speed is constant, but even that can be throttled using different virtual adapter types).


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: NotATether on May 05, 2021, 01:16:30 PM
Good idea, that could work. But the downside it's not best reflection of block size limit impact on low-end device since we use fast CPU core/RAM rather than slow CPU cores/RAM.

Virtualbox lets you throttle CPU usage by %. Transaction verification is single threaded so you could spawn a 1-core VM from some high-end CPU like a threadripper and give the VM different usage limits. This will allow you to simulate different processor speeds (but keep in mind that Turbo Boost will cause the actual processor speed to vary greatly so in all cases capping the speed to at most 99% is a good idea to remove this noise).


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: gmaxwell on May 06, 2021, 03:07:48 AM
Transaction verification is single threaded
Hasn't been since 2012 (https://github.com/bitcoin/bitcoin/pull/2060).


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: NotATether on May 06, 2021, 09:10:06 AM
Transaction verification is single threaded
Hasn't been since 2012 (https://github.com/bitcoin/bitcoin/pull/2060).

I totally forget such feature, i always thought it's useless feature. But how well this feature (Execution Cap) works? I couldn't find any solid article/review about this feature.

Alright, this is going to open another can of worms because I'm not sure how execution cap handles multiple cores. But on the plus side it looks like all your benchmark has to do is run bitcoin core with -reindex and then measure the time it takes to finish from debug.log and also using stuff like top to keep track of resource usage. But automatic profiling with systat where the metrics are stored in other log files is better IMO.

According to https://www.virtualbox.org/manual/ch03.html#settings-processor, the policy limits the cap for each virtual core, so you could have for example 2 cores both running at 50%. I think that to accurately measure Bitcoin Core performance, we must also measure how much the multithreaded verification speeds things up so it would make sense to be able to do things like emulating a Core2 or i3 with enough vCPUs as threads it supports, and then use the execution cap to try to emulate their baseline processor speed. This, in my opinion, will make more useful results than just testing at 10%, 20%, 30% etc. caps.


Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: gmaxwell on May 09, 2021, 03:14:57 PM
Alright, this is going to open another can of worms because I'm not sure how execution cap handles multiple cores. But on the plus side it looks like all your benchmark has to do is run bitcoin core with -reindex and then measure the time it takes to finish from debug.log and also using stuff like top to keep track of resource usage. But automatic profiling with systat where the metrics are stored in other log files is better IMO.

I don't think those cpu percentage limits are going to be useful for much-- I doubt they result in a repeatable measurement.

A better way to do a benchmark to get a tx/s figure is to use invalidateblock to roll back the chain 1000 blocks or so, then restart the process to flush the signature caches and reconsiderblock and collect data from that.

If you enable the bench debugging option you'll get data like this:

Quote
2021-05-01T19:10:02.540246Z received block 0000000000000000000922bf7fce4f900d7696f0c1c7221f97d3f367fdd9c44d peer=0
2021-05-01T19:10:02.553389Z   - Load block from disk: 0.00ms [0.00s]
2021-05-01T19:10:02.553414Z     - Sanity checks: 0.00ms [0.00s (0.00ms/blk)]
2021-05-01T19:10:02.553449Z     - Fork checks: 0.04ms [0.00s (0.04ms/blk)]
2021-05-01T19:10:03.255699Z       - Connect 2532 transactions: 702.21ms (0.277ms/tx, 0.116ms/txin) [0.70s (702.21ms/blk)]
2021-05-01T19:10:03.255837Z     - Verify 6043 txins: 702.38ms (0.116ms/txin) [0.70s (702.38ms/blk)]
2021-05-01T19:10:03.265095Z     - Index writing: 9.26ms [0.01s (9.26ms/blk)]
2021-05-01T19:10:03.265110Z     - Callbacks: 0.02ms [0.00s (0.02ms/blk)]
2021-05-01T19:10:03.265490Z   - Connect total: 712.10ms [0.71s (712.10ms/blk)]
2021-05-01T19:10:03.270861Z   - Flush: 5.37ms [0.01s (5.37ms/blk)]
2021-05-01T19:10:03.270885Z   - Writing chainstate: 0.03ms [0.00s (0.03ms/blk)]
2021-05-01T19:10:03.278491Z UpdateTip: new best=0000000000000000000922bf7fce4f900d7696f0c1c7221f97d3f367fdd9c44d height=681059 version=0x20800000 log2_work=92.840892 tx=637825747 date='2021-04-29T05:56:20Z' progress=0.998789 cache=2.3MiB(17699txo)
2021-05-01T19:10:03.278523Z   - Connect postprocess: 7.64ms [0.01s (7.64ms/blk)]
2021-05-01T19:10:03.278540Z - Connect block: 725.14ms [0.73s (725.14ms/blk)]

Unfortunately any kind of reindex or cold cache benchmark only tells you about the performance while catching up.

During normal operation there is normally no validation of transactions at all when a block is accepted, or only a couple-- they've already been validated when they were previously relayed on the network.

This is obvious when you look at the performance of blocks after a node has been running for a while:

Quote
2021-05-09T14:11:43.013649Z received: cmpctblock (10734 bytes) peer=14002
2021-05-09T14:11:43.017889Z Initialized PartiallyDownloadedBlock for block 0000000000000000000ccd134daad627f62fbb52258fbc400220cbcd7cd38639 using a cmpctblock of size 10734
2021-05-09T14:11:43.018046Z received: blocktxn (33 bytes) peer=14002
2021-05-09T14:11:43.023885Z Successfully reconstructed block 0000000000000000000ccd134daad627f62fbb52258fbc400220cbcd7cd38639 with 1 txn prefilled, 1715 txn from mempool (incl at least 0 from extra pool) and 0 txn requested
2021-05-09T14:11:43.028245Z PeerManager::NewPoWValidBlock sending header-and-ids 0000000000000000000ccd134daad627f62fbb52258fbc400220cbcd7cd38639 to peer=4
2021-05-09T14:11:43.029259Z sending cmpctblock (10734 bytes) peer=4
[...]
2021-05-09T14:11:43.032630Z sending cmpctblock (10734 bytes) peer=31588
2021-05-09T14:11:43.044382Z   - Load block from disk: 0.00ms [7.36s]
2021-05-09T14:11:43.044427Z     - Sanity checks: 0.01ms [1.48s (0.80ms/blk)]
2021-05-09T14:11:43.044492Z     - Fork checks: 0.07ms [0.10s (0.05ms/blk)]
2021-05-09T14:11:43.068471Z       - Connect 1716 transactions: 23.96ms (0.014ms/tx, 0.004ms/txin) [157.68s (84.87ms/blk)]
2021-05-09T14:11:43.068508Z     - Verify 6370 txins: 24.01ms (0.004ms/txin) [159.77s (85.99ms/blk)]
2021-05-09T14:11:43.081081Z     - Index writing: 12.57ms [18.65s (10.04ms/blk)]
2021-05-09T14:11:43.081107Z     - Callbacks: 0.03ms [0.05s (0.02ms/blk)]
2021-05-09T14:11:43.081346Z   - Connect total: 36.97ms [177.38s (95.47ms/blk)]
2021-05-09T14:11:43.092634Z   - Flush: 11.29ms [15.98s (8.60ms/blk)]
2021-05-09T14:11:43.092672Z   - Writing chainstate: 0.04ms [0.09s (0.05ms/blk)]
2021-05-09T14:11:43.117336Z UpdateTip: new best=0000000000000000000ccd134daad627f62fbb52258fbc400220cbcd7cd38639 height=682762 version=0x20000004 log2_work=92.865917 tx=640740048 date='2021-05-09T14:11:34Z' progress=1.000000 cache=196.6MiB(1235174txo)
2021-05-09T14:11:43.117376Z   - Connect postprocess: 24.71ms [42.18s (22.70ms/blk)]
2021-05-09T14:11:43.117393Z - Connect block: 73.01ms [242.98s (130.77ms/blk)]

So in that number you see that it spent 24.01ms verifying 6370, compared to the earlier cold cache example that spent  702.38ms verifying fewer (6043) txins.

Depending on what you're considering, that faster on-tip performance doesn't matter because a miner could fill their block we new, never before seen txn even ones constructed to be expensive to verify-- it's not the worst case.  The worst case can only really be characterized by making special test blocks that intentionally trigger the most expensive costs.



Title: Re: Are there any benchmark about Bitcoin full node client resource usage?
Post by: NotATether on May 09, 2021, 07:16:48 PM
A better way to do a benchmark to get a tx/s figure is to use invalidateblock to roll back the chain 1000 blocks or so, then restart the process to flush the signature caches and reconsiderblock and collect data from that.

To measure tps on real world hardware we can add together each of the "connect block" times and make an average, but I'm sure the speed of at least some of the steps of the block processing is independent of #txns/block, so to estimate a result for a different block size, only some of the times in each "connect block" would need to be divided by {max #txns in 2-4-8-etc. vMB block / max #txns in 1vMB block}.

For example I'm pretty sure callbacks and writing chainstate is O(1) with respect to the number of transactions, but flush and verify could be O(n) [even though as you said incoming blocks aren't really verified much, whatever makes up the verify time may still at least iterate through all the transactions].

Again these are just my guesses and I don't know the actual runtime of these actions.

At any rate I got wild block connect times from a run of bitcoind on testnet catching up on blocks, so it needs nearly full blocks for the analysis to work, like mainnet blocks.