Bitcoin Forum
November 18, 2024, 03:33:38 PM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Bitcoin profiling results  (Read 1146 times)
Gercio (OP)
Newbie
*
Offline Offline

Activity: 2
Merit: 3


View Profile
August 22, 2016, 07:32:16 PM
Last edit: August 22, 2016, 07:49:21 PM by Gercio
Merited by ABCbits (3)
 #1

Hi to everybody Smiley

Case:
Performance investigation - CPU usage.

Result:
1. Around 35% of processor time is used on SHA-256 calculations.
2. Around 17% is used adding new blocks into blockchain.

Conclusion:
Actually SHA-256 calculation is kind of slow, and after improving this are performance of application can be improved very much. As it looks now it can be done 10 times faster with readily, which could speed up whole Bitcoin system very much.

Doubts:
I was thinking, Yupieee!   Grin  Let’s do it! Use another faster implementation sha! But then I started the have doubts: Probably there is a reason why current implementation is kept (instead of using something like https://github.com/weidai11/cryptopp) which I do not know yet.  Huh

Questions:
Can we change implementation SHA-256 to something faster? If not, then what could be the reason: security, readability of the code, or using assembler inside is prohibited?


Technical details:

Profiling result for bitcoind (staring with empty block chain, and working time around 30 minutes):
https://s10.postimg.org/ukg7su3u1/profiling_bitcoin.png
Slowest method is sha256::Transform & crc32c::Extend (another good candidate for optimization).
Valgrind output file can be downloaded from here: http://www.filedropper.com/bitcoindloadingblockchaincallgrindout
Bitcoin core version is 0.12 taken from here https://github.com/bitcoin/bitcoin/tree/0.12

Operating System: Ubuntu 16.4
Procesor: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz


Hugs,
Gercio
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4284
Merit: 8808



View Profile WWW
August 23, 2016, 12:53:16 AM
Last edit: August 23, 2016, 01:25:06 AM by gmaxwell
Merited by ABCbits (5)
 #2

Why isn't that done yet?  Because until very recently it was burred in the profiles; optimization elsewhere has exposed it.

Any such change has to be done with great care because of consensus consistency of course--

and optimized hash functions for non-parallel use are not ~THAT~ much faster:

SHA256_avx,255,0.0039705,0.0040791,0.00397483
SHA256_basic,175,0.00599706,0.00610304,0.00599851
SHA256_rorx,319,0.00334549,0.00345182,0.00334802
SHA256_rorx_x8ms,319,0.00328481,0.00339317,0.00328667
SHA256_sse4,255,0.00395852,0.00404716,0.00396052

basic is the plain code we have today, the fastest in that test (rorx_x8ms) is only 1.825x faster.

https://github.com/laanwj/bitcoin/tree/2016_05_sha256_accel

I expect this to go into 0.14 sometime relatively soon.

Use of a 4-way implementation would speed it up further, but making good use of 4-way sha2 is technically somewhat difficult-- not just a drop in change, and there are only a few places where it can really be used at all.

Wladimir has done similar testing for the CRC32c, https://github.com/laanwj/bitcoin/commit/431c1b987b34589f32f4c2d0ee0f2571ba70e349

in all cases, these higher performance versions require use of special instruction sets that aren't available on all systems so additional code is needed for runtime autodetection. Not a big deal, but part of the reason that it wasn't magically changed the moment it was the highest point in the profile.

Gercio (OP)
Newbie
*
Offline Offline

Activity: 2
Merit: 3


View Profile
August 23, 2016, 07:55:20 AM
 #3

Gregory, thank you a for explaining this with details Smiley
Good that there is already so much work done on this topic. I will keep my eye on it.
 
AlexGR
Legendary
*
Offline Offline

Activity: 1708
Merit: 1049



View Profile
September 07, 2016, 05:16:59 PM
Merited by ABCbits (1)
 #4

I expect this to go into 0.14 sometime relatively soon.

That's great news.

If you are going to do speed improvements for 0.14, can you also "profile" or check the I/O, and especially writes. I think there's something doing needless work there and it's also stalling the cpu which is waiting for the I/O to finish.

I have the blockchain on my mechanical disk, and while my connection is just 600kb/sec, if I sync to get the last 3-4 days (-blocksonly), this is generating a near constant 10-13mb/s read, and a 11-14mb/s write. Obviously my question was "wtf is it writing?" If I use full indexing support in the filesystem, and since writes have to be performed x2, speed goes down to a crawl.

I initially thought its a flushing issue or something... I was like "ok, if I'm 4 days behind, that's at most ~570mb... can't this be loaded entirely into RAM and then flush it in the end in some kind of batch change to avoid all these writes which kill I/O speed and slow down my sync? It would take just 4-5 seconds in the end..."

I started searching the source files for any file with "flush" and I found some parameters in main.h... So I did some changes in the main.h flush times in lines 97 and 99 (multiplied the defaults x100) but didn't have any obvious effect.

Then I realized that of course it wouldn't have any obvious effect because it's doing something else. With 600kb/sec download, the 11+ mb/s in writes are not the blockchain getting downloaded. It's some other process of rearranging files, writing undo instructions, writing logfiles etc.

So I left it for a few hours and I launched it a few minutes ago to sync the last 10 hours. I now noticed that there were 5-11mb/sec writes going on even in the phase where the bitcoin-qt launches and does the 0%-100% verification, and without having downloaded any extra blocks.

By doing an lsof -p 7464 (bitcoin-qt pid), I get that that the following files are loaded with write or exclusive write attributes:

bitcoin-q 7464    15uW     REG              254,1        .bitcoin/.lock
bitcoin-q 7464    16w      REG              254,1  .bitcoin/debug.log
bitcoin-q 7464    17w      REG              254,1       .bitcoin/db.log
bitcoin-q 7464    18w      REG              254,1      .bitcoin/blocks/index/LOG
bitcoin-q 7464    19uW     REG              254,1        .bitcoin/blocks/index/LOCK
bitcoin-q 7464    20w      REG              254,1     .bitcoin/blocks/index/003446.log
bitcoin-q 7464    21w      REG              254,1     .bitcoin/blocks/index/MANIFEST-003444
bitcoin-q 7464    22w      REG              254,1   .bitcoin/chainstate/LOG
bitcoin-q 7464    23uW     REG              254,1        .bitcoin/chainstate/LOCK
bitcoin-q 7464    24w      REG              254,1       .bitcoin/chainstate/749940.log
bitcoin-q 7464    25w      REG              254,1   .bitcoin/chainstate/MANIFEST-749938

...so the culprit is somewhere in there doing way too many writes for apparently no reason (?), or in a suboptimal way (?).

I also tried dbcache settings of 0 and 1000, they didn't make any difference in making this phenomenon of ~10mb/s writes disappear (whether in the bitcoin-qt launch, or later when syncing).
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4284
Merit: 8808



View Profile WWW
September 07, 2016, 06:29:47 PM
Merited by ABCbits (2)
 #5

You have txindex enabled-- that severely harms performance, there are just so many transactions in the history that the performance isn't great.

Without txindex, and the dbcache turned up enough-- it will manage to complete a full sync without doing any writes other than the block/undo files.

The reason you see writes at start is leveldb will recompact its logs at start, so that is expected.

No doubt there is potential for improvement here, but the level of write amplification you see isn't unexpected. Bitcoin transactions are effectively a highly compressed format. There is a lot of work involved in synchronization.
AlexGR
Legendary
*
Offline Offline

Activity: 1708
Merit: 1049



View Profile
September 07, 2016, 09:27:20 PM
Last edit: September 08, 2016, 04:07:14 AM by AlexGR
 #6

You have txindex enabled-- that severely harms performance, there are just so many transactions in the history that the performance isn't great.

Without txindex, and the dbcache turned up enough-- it will manage to complete a full sync without doing any writes other than the block/undo files.

The reason you see writes at start is leveldb will recompact its logs at start, so that is expected.

No doubt there is potential for improvement here, but the level of write amplification you see isn't unexpected. Bitcoin transactions are effectively a highly compressed format. There is a lot of work involved in synchronization.

Aha. I'm under the impression that one needs txindex if they are going to swap wallets (which I do) and avoid reindexing (?).

But still, is the writing even necessary when launching and it does the 0-100%? I mean nothing has been downloaded since the last time the wallet was switched off, so it should be dominated by read-only operations.

If the writing can't be avoided, but it involves the same few files changing internally over and over, perhaps these changes could be done in RAM and then written to disk once (?).

Just throwing ideas on the table. If, as you say, there is potential for improvement, that's -in a sense- good news. The bad news start when there are no ideas on how to get further improvements Smiley
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!