Bitcoin Forum
May 08, 2024, 08:39:53 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: Looking into forking the core wallet to use parallel computing to verify blocks  (Read 3533 times)
skeeterskeeter (OP)
Full Member
***
Offline Offline

Activity: 160
Merit: 100


View Profile
October 08, 2014, 03:30:49 PM
Last edit: October 08, 2014, 03:58:11 PM by skeeterskeeter
 #1

I am taking a parallel computing course at my university and we are supposed to do a semester long project on anything related to the course material. I have decided that I wanted to do something with Bitcoin or cryptos.

My idea is to fork the current Bitcoin Core wallet. The fork will have a feature to verify the blockchain on a local GPU to offload the work of verifying blocks on the CPU. This is what miners do, though they are trying to find the hashes, I wish to verify.

This idea came about after watching my CPU sit at 100% for a few hours after doing a fresh install of the Core wallet on a new desktop and waiting for the blockchain to verify.

My questions,

So I want to know if this is a feasible idea, from everything I know it is, but I haven't dug into the code yet at all?

Would this be useful? I would assume as long as the chain is being downloaded fast enough to provide the GPU with work it would be. I think for testing I will just copy a blockchain data file to a new Core wallet install and see if my software or the original verifies faster to prove the idea.  

Any general ideas or concerns?

EDIT: I had another idea. I think it would be a lot easier to just write a standalone chain verifier... Give it a file path of the chain you want verified, set some GPU parameters, and let it rip through it.

1715200793
Hero Member
*
Offline Offline

Posts: 1715200793

View Profile Personal Message (Offline)

Ignore
1715200793
Reply with quote  #2

1715200793
Report to moderator
1715200793
Hero Member
*
Offline Offline

Posts: 1715200793

View Profile Personal Message (Offline)

Ignore
1715200793
Reply with quote  #2

1715200793
Report to moderator
Even in the event that an attacker gains more than 50% of the network's computational power, only transactions sent by the attacker could be reversed or double-spent. The network would not be destroyed.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
andytoshi
Full Member
***
Offline Offline

Activity: 179
Merit: 151

-


View Profile
October 08, 2014, 03:49:15 PM
 #2

Hi Skeeter,


I've played around with parallelizing verification (though not as part of Bitcoin Core), and once past 100k blocks I can pin all eight cores of my CPU constantly. So there is definitely benefit for parallel verification using a fairly large number of cores. There are a few complications though. Here are my thoughts:

Transactions refer to earlier transactions' outputs, so it is difficult to verify out-of-order unless you can do fast lookups (so memory bandwidth is gonna wreck your performance). Further, within a block, transactions are allowed to refer to earlier transactions' outputs but not vice-versa. So to be consensus-correct you need to verify them in order. You can solve this by doing two passes: one to make sure that all transaction references are valid (this can be done in about 30 minutes on my CPU with my crappy Rust code), and the other to actually do the signature verifications. However, this is actually what Bitcoin Core is already doing for the majority of the chain (full signature verification could take a couple of days), so you wouldn't be gaining anything vs the Bitcoin Core behaviour.

One thing you could do (and what I do in my own code) is to break consensus-compatibility and verify all transactions in each block in parallel. For blocks with hundreds or thousands of transactions there is a big benefit here. The way to do it is:
1. Scan the block and blindly add all transaction outputs to your output set in a linear pass. This makes sure that backreferences will work, but will also allow forward references. (It occurred to me while typing this that it's easy to label the outputs with an index into the block, so you can still easily detect forward refs. Oops Smiley I will fix my code to do this.)
2. Split all transactions in the block into blocks of N_TRANSACTIONS/N_CORES transactions, and pass each one to a new thread for verification. Each thread will need to be able to lookup old outputs, so you will need to make sure your output set is immutable during this phase.
3. Scan the block and delete all spent outputs from your output set in a linear pass.

I expect that by using a parallel data structure for your output set you can merge some or all of these passes. There is room for experimentation and measurement here.

Andrew
skeeterskeeter (OP)
Full Member
***
Offline Offline

Activity: 160
Merit: 100


View Profile
October 08, 2014, 04:04:08 PM
 #3

Andrew,

I think a much better question I now have. What are the steps involved in verifying the block chain?

1. Verify each transaction (highly parallelizable? )
2. Verify each block hash.. ~300k hashes, could be done in serial
3. ... ?

So a big issue you have encountered is that transactions many blocks away from each other that reference each other mean a breakdown in the parallelizability? (that is an awesome word) Because if you wish to verify a single block you might have to look far up or down the chain to find if that specific address really does have x btc to send to another address in the current block?
andytoshi
Full Member
***
Offline Offline

Activity: 179
Merit: 151

-


View Profile
October 08, 2014, 05:07:03 PM
 #4

It's not so much that they are far away from each other (though they are), it's the fact that they can be that hurts parallelizability because you need access to a lot of data. The way to handle this is to maintain an "output set" of all unspent outputs; you add every output for each transaction and delete every output that is referenced in each transaction. Maintaining this data structure (which requires random insert/delete access in roughly equal and enormous proportions, which is a nasty data structure problem ... not sure you can do better than a hashmap) is why you have to go through the song and dance I described.

To see what's involved with verifying the blockchain, you should check out my high level overview and also check out the Bitcoin Wiki pages on transactions and Script. It's a bit involved.
skeeterskeeter (OP)
Full Member
***
Offline Offline

Activity: 160
Merit: 100


View Profile
October 08, 2014, 05:55:00 PM
 #5

It's not so much that they are far away from each other (though they are), it's the fact that they can be that hurts parallelizability because you need access to a lot of data. The way to handle this is to maintain an "output set" of all unspent outputs; you add every output for each transaction and delete every output that is referenced in each transaction. Maintaining this data structure (which requires random insert/delete access in roughly equal and enormous proportions, which is a nasty data structure problem ... not sure you can do better than a hashmap) is why you have to go through the song and dance I described.

To see what's involved with verifying the blockchain, you should check out my high level overview and also check out the Bitcoin Wiki pages on transactions and Script. It's a bit involved.


Ahh I see. So I might launch a kernel to verify a block. I would have to pass the block data, along with a pointer to an array of unspent coins in device memory, so that it can do this referencing "[sic]up and down" the chain. The issue is though that the RW bandwidth to the device memory would be massive. You would have to do a lookup on it once per unique address in the blockchain, because at some point all coins are unspent.? (is that correct logic).

To get around this you propose to verify all blocks in parallel, and write any unspent coins for each block into memory. Then do a second pass and verify for each unspent coin that it has some link in the blockchain? Then remove that unspent coin from the list, and if the list is empty after the algorithm runs then the chain is verified?

// I read a lot of the wiki in the past, I need to get back to it. Thanks for the reminder!

r3c4ll
Member
**
Offline Offline

Activity: 100
Merit: 10


View Profile
October 08, 2014, 11:36:47 PM
 #6

Very intresting conversation! I'll follow up.

marcus_of_augustus
Legendary
*
Offline Offline

Activity: 3920
Merit: 2348


Eadem mutata resurgo


View Profile
October 09, 2014, 12:58:11 AM
 #7

Interested in this also. Maybe begin looking at how parallelisable sipa's libsecp265k1? ...which is basically the crypto verifying engine.

https://github.com/bitcoin/secp256k1

cr1776
Legendary
*
Offline Offline

Activity: 4032
Merit: 1301


View Profile
October 09, 2014, 01:29:22 AM
 #8

As an academic exercise this is an interesting discussion.  From a coding perspective too. I did some code for a hyper-cube (and some other highly parallel machines) in grad school and love the area.
andytoshi
Full Member
***
Offline Offline

Activity: 179
Merit: 151

-


View Profile
October 09, 2014, 02:10:16 AM
 #9

Interested in this also. Maybe begin looking at how parallelisable sipa's libsecp265k1? ...which is basically the crypto verifying engine.

https://github.com/bitcoin/secp256k1

You should be using libsecp256k1 if you are doing verification. It is trivially parallelizable in the sense that you can verify many signatures at once...however, it does not do batch verification (it is not obvious that this is even possible for ECDSA since there is a sign ambiguity in the k value). There are some coming improvements but even today it is something like 6x faster than openssl.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8417



View Profile WWW
October 09, 2014, 05:45:35 AM
 #10

Bitcoin core's verification is already parallel.  But there isn't as much parallelism to extract as you might guess, not without extensive architectural changes (e.g. pipelining multiple blocks and accepting partially verified ones). You should not, however, have been seeing it using only one cpu... in fact windows users have frequently complained that it must be broken because it was using all of their cores for minutes at a time. Smiley
skeeterskeeter (OP)
Full Member
***
Offline Offline

Activity: 160
Merit: 100


View Profile
October 09, 2014, 07:36:16 AM
 #11

Bitcoin core's verification is already parallel.  But there isn't as much parallelism to extract as you might guess, not without extensive architectural changes (e.g. pipelining multiple blocks and accepting partially verified ones). You should not, however, have been seeing it using only one cpu... in fact windows users have frequently complained that it must be broken because it was using all of their cores for minutes at a time. Smiley


Could you rephrase that slightly? I am not sure if you mean it will work or,

There is no speed up over what is already implemented? or,
The Bitcoin Core already sneaks some code onto the GPU, and is already fully implemented? or,
The GPU would be of no advantage, the most parallel it can become can be ran on a 2+ Core CPU as fast as possible (High step complexity overall)?
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8417



View Profile WWW
October 09, 2014, 03:56:55 PM
 #12

Could you rephrase that slightly? I am not sure if you mean it will work or,

There is no speed up over what is already implemented? or,
The Bitcoin Core already sneaks some code onto the GPU, and is already fully implemented? or,
The GPU would be of no advantage, the most parallel it can become can be ran on a 2+ Core CPU as fast as possible (High step complexity overall)?
It is already parallel, it does not use the GPU.  Using the GPU effectively will be difficult because there is not that much parallelism available unless you are verifying multiple blocks at once, and communicating with the GPU has huge latencies.  Competing with state of the art cpu code will take state of the art GPU code (my own vanitygen code on the cpu is roughly as fast as the published gpu vanitygen).

I think you should try it, a port of libsecp256k1 to the OpenCL would be pretty interesting and useful. If it were faster we'd probably be willing to do the architectural changes (making it possible to verify multiple blocks at once) to enable using it.
hhanh00
Sr. Member
****
Offline Offline

Activity: 467
Merit: 266


View Profile
October 12, 2014, 04:38:21 AM
 #13

Do you have recommendations?

My current thinking is that the best bang for the buck is parallalizing the two group multiplications (by G and Q).
They use the field doubling and multiplication that were optimized in asm and they must be put on the device.
The rest that uses GMP isn't very much. It's a modular inverse and a few mult/add. It barely registers on my execution profile.

Thanks

hhanh00
Sr. Member
****
Offline Offline

Activity: 467
Merit: 266


View Profile
October 13, 2014, 06:43:45 AM
 #14

I made an openCL version of sipa's secp256k1 library. It's on github if someone is interested. It executes faster but only after you throw hundred of thousands of signatures to verify.
On a sample test run
# of sigs = 262144

Times in ms per phase.

prepare -> 2518
compute -> 15201
check   -> 35
total   -> 17755
           0.067733

Phase 1 & 3 are on CPU. Phase 2 is on GPU. 0.067733 ms per verification.
I have a GTX 560 Ti.

marcus_of_augustus
Legendary
*
Offline Offline

Activity: 3920
Merit: 2348


Eadem mutata resurgo


View Profile
October 14, 2014, 01:04:56 AM
 #15

Quote
I made an openCL version of sipa's secp256k1 library. It's on github if someone is interested.

Link?

AMD's probably be better then Nvidia if there a lot of integer steps.

gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8417



View Profile WWW
October 14, 2014, 01:14:50 AM
 #16

Phase 1 & 3 are on CPU. Phase 2 is on GPU. 0.067733 ms per verification.
I have a GTX 560 Ti.
So thats 14763 per second?  Thats not favourable compared to running on the cpu but there may be many optimizations still available for you.
hhanh00
Sr. Member
****
Offline Offline

Activity: 467
Merit: 266


View Profile
October 14, 2014, 08:54:55 AM
 #17

It's an old video card and there are a few improvements that I can see. It's a baseline.
L

Giulio Prisco
Full Member
***
Offline Offline

Activity: 173
Merit: 101


View Profile
October 15, 2014, 09:57:20 AM
 #18

But isn't massive parallel computing precisely what the Bitcoin network as a whole does? Why parallelize locally?
hhanh00
Sr. Member
****
Offline Offline

Activity: 467
Merit: 266


View Profile
October 15, 2014, 10:22:28 AM
 #19

It's decentralized, not parallel. Every full node has to verify the whole blockchain.

dexX7
Legendary
*
Offline Offline

Activity: 1106
Merit: 1024



View Profile WWW
October 18, 2014, 04:15:51 PM
 #20

Link?

I believe this is it:

https://github.com/hhanh00/secp256k1-cl

Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!