Bitcoin Forum
April 26, 2024, 06:20:46 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 »  All
  Print  
Author Topic: Block chain size/storage and slow downloads for new users  (Read 228610 times)
Mike Hearn (OP)
Legendary
*
Offline Offline

Activity: 1526
Merit: 1128


View Profile
July 09, 2013, 03:20:37 PM
 #1

There have been quite a lot of threads lately with people complaining about the size of the block chain, specifically (1) how long it takes to download for new users and (2) the amount of disk space used, often combined with complaints that the "core dev team" isn't doing anything about it.

This is just a quick note to explain where we're up to with this.

  • Starting from a few days ago, MultiBit is the default recommended desktop client on the bitcoin.org choose your wallet page. MultiBit is a what we call an "SPV wallet" so is capable of processing thousands of blocks per second, and its checkpoints are refreshed frequently enough that for brand new users, they will usually be synced with the chain in 5 seconds or less. I'll explain a bit more about how this works in a moment, as we have many newbies join us in recent months who may not be familiar with the details.

  • At some point Bitcoin-Qt will change such that it's able to delete old blocks. The details are still being worked out, but most likely you'll be able to say "Use up to 10 GB of disk space" and it will never use more than that. Nodes will broadcast how much of the chain they have and are able to serve. New nodes that are starting from scratch will have to search out other nodes that still have the full chain and sync from them, but any node that just wasn't online for a while and needs to grab the latest parts of the chain will be able to use most of the others. By controlling disk space usage, you can also indirectly control bandwidth usage (you can't upload data you don't have).

The latter piece of work isn't done yet, basically because Pieter has been busy lately with other things (like: real life). He did the bulk of the work already last year, but some parts still need to be designed and written. Remember that nearly everyone taking part is still a volunteer except for Gavin.

Intro to SPV mode

OK, now that we're recommending MultiBit as the default wallet app for new users, what does this do? MultiBit is like the Android "Bitcoin Wallet" app by Andreas Schildbach. They're both based on the bitcoinj project that I run. Essentially, these clients download sub-parts of the block chain and then do a bunch of maths to verify that it all hangs together. Because it doesn't download the whole chain, an SPV wallet is light and fast. But because it does download and verify parts, an SPV wallet can talk to the regular P2P network because it doesn't really have to trust the remote server. This makes it more decentralised than something like Electrum or blockchain.info which relies on special servers.

How does this work? It's described in Satoshi's original white paper in the "simplified payment verification" section (hence, SPV). But here's a brief description to save you opening up your copy of his paper. Each block in the Bitcoin protocol has two parts, the header and a list of transactions. The header contains data linking the block to a place in the chain (like the hash of the previous block). A full Bitcoin node (Bitcoin-Qt/bitcoind) examines the block headers to figure out which chain of blocks has the most mining work done on it, and then verifies all the transactions in order in those blocks. The best chain determines the order the transactions are applied to the database and thus which transaction loses if there's a double spend. But (and this is crucial), the ordering is the only thing determined by miners. All the transactions still have to make sense. Miners don't have arbitrary power, if they mined a block that just magicked money out of nowhere or included bogus transactions, full nodes would all reject it.

In SPV mode things work differently. Because they don't download the full chain, they can't verify each transaction individually or build a copy of the database. Instead they verify the headers to find the best chain, and then assume the contents of the best chain must be correct. This is usually a valid assumption, because the majority of mining power is honest. However if there was to be a 51% attack then SPV wallets might display arbitrary nonsense for as long as the attack lasts. They would get back to reality once the good chain became longer (harder) than the bogus chain again.

This leads to the question of how SPV wallets find transactions that send them money, if they don't download the whole chain. The answer is they upload to the remote Bitcoin nodes a "filter", which that node applies to each transaction in the block. If the filter matches, the transaction is sent to the SPV wallet along with a mathematical proof that it was really in the chain (we call this proof a Merkle branch, after Ralph Merkle who invented them). The wallet verifies this proof and thus knows the transaction really was accepted by the majority of miners, without having to trust the server.

Because we're talking to random computers on the internet and not a trusted third party, the filter is designed to let you control your privacy. It is not a list of your addresses, as it is with the blockchain.info wallet. It's actually what we call a Bloom filter (named after Burton Howard Bloom who invented them in 1970). You can't directly get the users addresses back out of a Bloom filter, instead you have to test each one you find in the chain against it to see if it matches. Also, the filter can be made "noisy", which means it randomly matches some other addresses as well. When the Bitcoin P2P node you're downloading from finds a match, it doesn't know if it really found one of your transactions or if it was a false positive. And because there are so many P2P nodes, it's possible to split up your list of addresses and send a subset to lots of different peers, so none get an accurate idea of what's in your wallet (bitcoinj doesn't do that today though). By adjusting your false positive rate, you can decide how much bandwidth you want to spend on garbling the other nodes picture of your wallet. If you're on a very slow or expensive link you might decide you want no noise in your filter at all, if you're on a fast wifi connection, you might be OK with downloading a megabyte or two of other peoples transactions just to obscure which ones are yours.

Using these fancy mathematical tools MultiBit and the Android wallet app give us the same nice performance that we can get from a web wallet like blockchain.info or Coinbase, but without the need for any central servers and keeping Bitcoin's P2P nature intact. SPV wallets will always be fast no matter how popular Bitcoin gets. Together with being able to delete old blocks, these are our solutions to the ever-growing size of the chain - which has been Satoshi's plan since the very first day Bitcoin was announced.

I hope it's all clearer now and everyone understands what's going on.
1714112446
Hero Member
*
Offline Offline

Posts: 1714112446

View Profile Personal Message (Offline)

Ignore
1714112446
Reply with quote  #2

1714112446
Report to moderator
1714112446
Hero Member
*
Offline Offline

Posts: 1714112446

View Profile Personal Message (Offline)

Ignore
1714112446
Reply with quote  #2

1714112446
Report to moderator
"You Asked For Change, We Gave You Coins" -- casascius
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714112446
Hero Member
*
Offline Offline

Posts: 1714112446

View Profile Personal Message (Offline)

Ignore
1714112446
Reply with quote  #2

1714112446
Report to moderator
1714112446
Hero Member
*
Offline Offline

Posts: 1714112446

View Profile Personal Message (Offline)

Ignore
1714112446
Reply with quote  #2

1714112446
Report to moderator
knight22
Legendary
*
Offline Offline

Activity: 1372
Merit: 1000


--------------->¿?


View Profile
July 09, 2013, 03:31:06 PM
 #2

Thanks for the update

Peter Todd
Legendary
*
Offline Offline

Activity: 1120
Merit: 1149


View Profile
July 09, 2013, 05:56:38 PM
Last edit: April 13, 2015, 06:17:18 PM by Peter Todd
 #3

Having said that, remember that if you want to keep the contents of your wallet private, you have to be careful with SPV mode. Because your node doesn't relay transactions, any transaction you do relay is obviously from you. In the future the payment protocol will help, but for now assume that by using an SPV wallet your peers can figure out what coins you own.

The other issue is that to scan for new transactions you have to tell your peers what addresses are in your wallet. Fortunately we can use a trick called bloom filters, which is kinda like telling a peer "give me all transactions paying addresses starting with 1ab, 1ar, 1a5 and 1w2", but combined with other information that may be enough to track you down. Edit: Bloom filters have been found since I wrote that to have almost no privacy.

You should be connecting over Tor, or at least over a proxy, if you want to maintain your privacy. Unfortunately as far as I know right now most SPV clients don't make it easy to connect over Tor, but it is possible.

Full nodes don't have anywhere near as serious a privacy problem even without Tor because they do relay transactions, so unless someone is watching every node out there they can't know if you were the one who broadcast the transaction first.

Of course while in the short term finding peers to connect to for your SPV node is free, but that won't be true forever:

I think Gavin has alluded to possibly rewarding those who run full nodes, which I think is the way to go. I don't see any reason why miners should get rewarded, yet those who run full nodes and eat the bandwidth/disk space get nothing.

When running a node becomes expensive enough that people can't do it for free you'll still be able to find full nodes willing to accept incoming connections. You'll pay for that service in a variety of ways:

1) Transaction fees: You connect directly to a miner who lets you do so because they want your transaction fees. They may require some # of transactions per unit time, and part of the agreement may be that you only send transactions to them. (easily verified) In return they'll run your bloom filter against incoming blocks, although don't be surprised if they force you to give them a bloom filter specific enough to identify exactly what addresses are in your wallet as part of the deal.

2) Pay-for-service: You pay for the service directly. In return they resend your transactions to miners to get them mined, possibly with preferential deals (kickbacks) that may or may not be public knowledge. They also run your bloom filter against the blockchain, and again, they may or may let you do so in a non-specific manner. Given AML regulations I wouldn't be surprised if the services that operate out in the open only allow you to tell them what addresses you are interested rather than a bloom filter obscuring that information. (AML rules apply to case #1 too)

3) Datamining: Google and other search engines already provide a lot of services purely in return for the data they can gather. The blockchain itself is a rich source of transaction data, made richer by figuring out the real identities behind the pseudonymous addresses on it. Just like #1 and #2 if you can determine who is sending what transactions and owns what addresses you can integrate that into a rich dataset to do things like get real-world information on what vendors are actually popular, which in turn can feed search engine results and other services.

It'll be interesting to see how AML regulations apply to all these services in the future. I suspect they'll eventually be subject to the same know-your-customer rules as any other financial service provider to help authorities link identities to Bitcoin addresses. This doesn't have to be very intrusive: in case #3 that might be as simple as using your Google login to authenticate with Google's Bitcoin servers.

There's also the issue of DoS attacks which we will need to solve soon - anyone with access to around 125 or more ip addresses from a similar number of /16 subnets can currently DoS attack the entire network by simply filling up every node on the network's incoming connection table. (finding the ip address of every Bitcoin node that accepts incoming connections isn't hard) We limit that to 125 right now. Such an attack wouldn't use very much bandwidth, and is actually pretty easy to carry out. A small botnet with a few hundred or thousand hijacked desktop computers could definitely pull this attack off. At the very least SPV nodes would become unusable, and after a few hours to days the whole network would probably collapse aside from miners and the like who have made manual connections to each other.

What we are going to have to do is require peers to either do something useful, like relay valid fee-paying transactions and valid blocks to us, or expend some kind of limited resource, like perform a proof-of-work or just pay directly via micropayment. That'll make widescale DoS attacks prohibitively expensive, but it also impacts SPV nodes too that don't contribute to the health of the network. Of course, obviously if such an attack happens this code will be written and deployed very quickly, so don't get any ideas...

Cudahuda
Newbie
*
Offline Offline

Activity: 52
Merit: 0



View Profile
July 09, 2013, 06:20:33 PM
 #4

This, combined with off-chain transactions like https://inputs.io, gives me a lot of hope for Bitcoin.
tvbcof
Legendary
*
Offline Offline

Activity: 4592
Merit: 1276


View Profile
July 09, 2013, 06:30:06 PM
 #5

...
What we are going to have to do is require peers to either do something useful, like relay valid fee-paying transactions and valid blocks to us, or expend some kind of limited resource, like perform a proof-of-work or just pay directly via micropayment. That'll make widescale DoS attacks prohibitively expensive, but it also impacts SPV nodes too that don't contribute to the health of the network. Of course, obviously if such an attack happens this code will be written and deployed very quickly, so don't get any ideas...

'Something useful' could be, among other things, being verifiable situated in a domain which is underpopulated.  The domain could be geographical, political, implementational (meaning it works in particular way such as implementing an underrepresented overlay messaging protocol) or whatever.


sig spam anywhere and self-moderated threads on the pol&soc board are for losers.
Peter Todd
Legendary
*
Offline Offline

Activity: 1120
Merit: 1149


View Profile
July 09, 2013, 06:50:49 PM
 #6

...
What we are going to have to do is require peers to either do something useful, like relay valid fee-paying transactions and valid blocks to us, or expend some kind of limited resource, like perform a proof-of-work or just pay directly via micropayment. That'll make widescale DoS attacks prohibitively expensive, but it also impacts SPV nodes too that don't contribute to the health of the network. Of course, obviously if such an attack happens this code will be written and deployed very quickly, so don't get any ideas...

'Something useful' could be, among other things, being verifiable situated in a domain which is underpopulated.  The domain could be geographical, political, implementational (meaning it works in particular way such as implementing an underrepresented overlay messaging protocol) or whatever.

Indeed - that's what we try to achieve with the current system of trying to connect to nodes with ip addresses in a varied set of /16's. Varying implementations is an interesting idea too, although one that's harder to actually verify.

If you can come up with ways to do more than that we'd love to know, but be warned it's a really, really difficult problem.

TippingPoint
Legendary
*
Offline Offline

Activity: 905
Merit: 1000



View Profile
July 09, 2013, 07:04:46 PM
 #7

Intro to SPV mode
<snip>


Thank you.  Very good explanation.
tvbcof
Legendary
*
Offline Offline

Activity: 4592
Merit: 1276


View Profile
July 09, 2013, 07:40:07 PM
Last edit: July 09, 2013, 08:36:22 PM by tvbcof
 #8

...
What we are going to have to do is require peers to either do something useful, like relay valid fee-paying transactions and valid blocks to us, or expend some kind of limited resource, like perform a proof-of-work or just pay directly via micropayment. That'll make widescale DoS attacks prohibitively expensive, but it also impacts SPV nodes too that don't contribute to the health of the network. Of course, obviously if such an attack happens this code will be written and deployed very quickly, so don't get any ideas...

'Something useful' could be, among other things, being verifiable situated in a domain which is underpopulated.  The domain could be geographical, political, implementational (meaning it works in particular way such as implementing an underrepresented overlay messaging protocol) or whatever.

Indeed - that's what we try to achieve with the current system of trying to connect to nodes with ip addresses in a varied set of /16's. Varying implementations is an interesting idea too, although one that's harder to actually verify.

If you can come up with ways to do more than that we'd love to know, but be warned it's a really, really difficult problem.

I started on a path to describing some of this stuff before I got more interested in running my sawmill:

  https://sites.google.com/a/tcilgl.com/paracoin/home/depth_l1/network_mesh#TOC-n-space-characterization:

Several concepts which I've yet to attempt to describe would be

 - multi-pass proof of work with non-predictable algorithms.  So, for instance, head blocks are solved mainly by transfer nodes while legacy datasets are consolidated and locked with long duration (very high difficulty) md5 leveraging existing ASIC investments (at a frequency aligned with codebase releases.)

 - dedicated hardware nodes which contain useful things (like FPGA for adaptable algorithms, TPM for certain node identity needs, POE for a reasonable balance of power, cost, and convenience.)

 - node link rewards where 'close' peers are issues part of a reward which would foster a greater mesh density health.

I hope that the nature, capabilities, and relationships between potential adversaries of Bitcoin are becoming more widely appreciated in these post-Snowden days.  It is why I consider broadly defensive systems to be the most critical aspect defining value in a crypto-currency solution...or at least a 'reserve' one.

Whether there is the potential to adapt the existing Bitcoin as a value core in total and develop a robust supporting framework around it is debatable and I continue to feel that it is probably a long-shot.  Most likely Bitcoin will evolve toward (if not 'remain') an element of something akin to PRISM.

 edit: s/density/health/.  By 'health' I mean diversity more than anything.  The methods by which a 'healthy' node population could be engineered have some interesting parallels to how biological populations implement genetic material transfers.


sig spam anywhere and self-moderated threads on the pol&soc board are for losers.
niko
Hero Member
*****
Offline Offline

Activity: 756
Merit: 501


There is more to Bitcoin than bitcoins.


View Profile
July 09, 2013, 08:06:04 PM
 #9

This thread only makes me think: Bitcoin is too much. The possibilities it opens, technologies it enables - it's all just too much for businesses and governments and people to grasp easily.
Having said that, it is precisely these kinds of threads that help us digest bits and pieces.

Thanks Mike, retep, and everyone else!

They're there, in their room.
Your mining rig is on fire, yet you're very calm.
solex
Legendary
*
Offline Offline

Activity: 1078
Merit: 1002


100 satoshis -> ISO code


View Profile
July 10, 2013, 11:40:45 AM
 #10

Thanks Mike for the posting this info about SPV. Great work indeed.

Coincidentally, I have known Ralph Merkle for a long time and also listened to his talks about nanotechnology. But I did not realize that he also devised the Merkle tree which Satoshi used.  Just think that he is not only at the forefront of the next revolution in manufacturing, but also has an influence in the next revolution in finance. What an amazing guy!

🏰 TradeFortress 🏰
Bitcoin Veteran
VIP
Legendary
*
Offline Offline

Activity: 1316
Merit: 1043

👻


View Profile
July 10, 2013, 11:42:17 AM
 #11

Thank you for the nice write up. I've donated a bitcoin to your tips address.
Mike Hearn (OP)
Legendary
*
Offline Offline

Activity: 1526
Merit: 1128


View Profile
July 10, 2013, 12:21:23 PM
 #12

Thanks! Much appreciated!
Pentium100
Full Member
***
Offline Offline

Activity: 126
Merit: 100


View Profile
July 10, 2013, 12:33:44 PM
 #13

What about saving the blockchain file from a full client and uploading it to some server once a week or so? Then new users could just download that big file and have their full clients sync up quite fast since they won't need to download the blockchain from other nodes.

1GStzEi48CnQN6DgR1s3uAzB8ucuwdcvig
Mike Hearn (OP)
Legendary
*
Offline Offline

Activity: 1526
Merit: 1128


View Profile
July 10, 2013, 12:36:32 PM
 #14

Jeff runs a blockchain BitTorrent for that exact purpose.
flipper
Member
**
Offline Offline

Activity: 64
Merit: 10


View Profile
July 14, 2013, 10:07:59 PM
 #15

Good explanation. Thank you.
giszmo
Legendary
*
Offline Offline

Activity: 1862
Merit: 1105


WalletScrutiny.com


View Profile WWW
July 17, 2013, 05:29:47 PM
 #16

Mike thanks and keep up the good work. You are very dedicated to let us know what's going on and I love that. We need people to teach others about the bitcoin and while all people registered here to the forum consider themselves experts of bitcoin in front of their families, there's only some 0.1% who actually know the code and details and can teach the other experts about these fascinating details. (What I miss a bit is some teasers to actually get my hands dirty. I saw links to the code maybe twice. I'm pretty sure that more people would get involved in working on the code if there were more invitations like that. After all there must be tons of developers on the forum dedicating their brainz to other stuff, like posting on the forum Wink )

ɃɃWalletScrutiny.comIs your wallet secure?(Methodology)
WalletScrutiny checks if wallet builds are reproducible, a precondition for code audits to be of value.
ɃɃ
niko
Hero Member
*****
Offline Offline

Activity: 756
Merit: 501


There is more to Bitcoin than bitcoins.


View Profile
July 17, 2013, 05:44:58 PM
 #17

This would also be a good place to remind people of Moore's "law" - the fact that, since the mid-20th century, the cost of data storage and the cost of computing power have been decreasing exponentially year after year, regardless of technological challenges, market crashes, or anything else. Surely Bitcoin may go througn growth spurts, but eventually blockchain growth will become linear (when we hit the block size limit, or when we remove the limit and let the transaction frequency reach an economic equlibrium based on tx fees). Either way, I see no big problem, and I've been running a full node for several years without any additional cost.

They're there, in their room.
Your mining rig is on fire, yet you're very calm.
jubalix
Legendary
*
Offline Offline

Activity: 2618
Merit: 1022


View Profile WWW
July 20, 2013, 02:33:34 PM
 #18

what about electrum how does this compare?


Admitted Practicing Lawyer::BTC/Crypto Specialist. B.Engineering/B.Laws

https://www.binance.com/?ref=10062065
TheButterZone
Legendary
*
Offline Offline

Activity: 3052
Merit: 1031


RIP Mommy


View Profile WWW
July 20, 2013, 08:40:59 PM
 #19

what about electrum how does this compare?



Electrum doesn't store the blockchain locally. It reads it off remote servers.

Saying that you don't trust someone because of their behavior is completely valid.
jubalix
Legendary
*
Offline Offline

Activity: 2618
Merit: 1022


View Profile WWW
July 21, 2013, 02:03:48 AM
 #20

what about electrum how does this compare?



Electrum doesn't store the blockchain locally. It reads it off remote servers.

yes I know but this multbit proposal does not store the entire blockchain either, just a clever compressed version


Admitted Practicing Lawyer::BTC/Crypto Specialist. B.Engineering/B.Laws

https://www.binance.com/?ref=10062065
Pages: [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!