Bitcoin Forum
May 02, 2024, 05:58:01 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1] 2 »  All
  Print  
Author Topic: UTXO set size does not increase storage costs, you just suck at computer science  (Read 3115 times)
killerstorm (OP)
Legendary
*
Offline Offline

Activity: 1022
Merit: 1015



View Profile
March 16, 2013, 01:29:12 PM
Last edit: March 16, 2013, 01:54:24 PM by killerstorm
 #1

Haha, sorry, I didn't mean to offend anyone, I'm just somewhat disappointed that people consider only the simplest solutions and do not want to see a bigger picture.

Let's start with an example... Suppose I am a miner who is in kinda exotic situation: I have unlimited computational and networking resources, and enough temporary storage (RAM) to verify blockchain, but local permanent storage is incredibly scarce. But if I want to store something, I can store it externally over network. Say, in a DHT like Freenet, Tahoe-LAFS or something like that.

So I want to download and scan blockchain up to block N just once, then I need to be able to verify validity of any transaction. Is it possible?

Of course. As I go through blockchain I will build an index and will store it in an external DHT. I only need to keep hash of the latest block locally, 32 bytes that is.

Having this hash, I can retrieve things from DHT until I find transaction outputs I need to verify in index I've built. (I do not care whether DHT is secure: if something was tampered with, hashes won't match.)

This is kinda obvious: if secure distributed file systems can exist, then I can simply store data in such file systems instead of a local file system.

But... how much would it cost me to verify a transaction? Well, tree-like datastructures generally have look-up costs on scale of log2 N where N is a number of elements. In the worst case each individual satoshi is an individual UTXO, so we have 21000000*100000000 = 2100000000000000 ~= 2^51 UTXOs. Thus I need something like 51 lookups to find an UTXO in a binary search tree. Or just 9 lookups if I have a 64-ary tree.

But people can argue that 9 lookups per UTXO is a lot... Network latency yada yada. How about zero?

That's possible. Suppose that person which sends transaction knows how I store index in DHT, it isn't secret. To make sure that I'll include his transaction into block, he will fetch all the data I need from DHT himself, and will send me a message with his transaction and all information I need.

I don't need to look up anything in a DHT myself, I only need data which was stored in the message. And this is secure: if data was tampered with, hash won't match.
 
So, basically, number of transactions I can include into a block is limited by my computational and network abilities, but storage capability/cost is irrelephant.

But what's about blocks mined by others? Oh...

Well, it is possible to do 9 DHT lookups per UTXO mentioned in block. Number of outputs is limited, I can do lookups in parallel, so it isn't such a big problem. But still...

You know, miners are friendly guys, how about they all use same DHT, and then include confirmation information together with the block they have just mined?

So I receive a new block and supplementary information which is all what is needed to confirm that block is valid.

In the end, it is possible to do all mining having only 32 bytes of permanent secure storage. It requires somewhat more bandwidth, though. But extra bandwidth costs are approximately proportional to block size. So maybe not a big problem...

E.g. I either need 128 GB of RAM, array of hard drives and 100 MBit/sec pipe. Or I need 1 GB of RAM, no hard drives at all and 1 GBit/sec pipe. Which is cheaper?

So what I'm talking about storage/bandwidth trade-off. Using less storage might increase latency, but possible in such a way that it won't be critical.

Next time I will teach you how to implement a blockchain-based cryptocurrency in such a way that new miners can start mining right away without downloading whole blockchain, stay tuned...

Chromia: a better dapp platform
The Bitcoin software, network, and concept is called "Bitcoin" with a capitalized "B". Bitcoin currency units are called "bitcoins" with a lowercase "b" -- this is often abbreviated BTC.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714629481
Hero Member
*
Offline Offline

Posts: 1714629481

View Profile Personal Message (Offline)

Ignore
1714629481
Reply with quote  #2

1714629481
Report to moderator
markm
Legendary
*
Offline Offline

Activity: 2940
Merit: 1090



View Profile WWW
March 16, 2013, 02:05:48 PM
 #2

You know, miners are friendly guys, how about they all use same DHT, and then include confirmation information together with the block they have just mined?

We have to share? Darn, I was gonna say I want me one of those 2^51 record databases!

-MarkM-

Browser-launched Crossfire client now online (select CrossCiv server for Galactic  Milieu)
Free website hosting with PHP, MySQL etc: http://hosting.knotwork.com/
killerstorm (OP)
Legendary
*
Offline Offline

Activity: 1022
Merit: 1015



View Profile
March 16, 2013, 02:10:03 PM
 #3

We have to share? Darn, I was gonna say I want me one of those 2^51 record databases!

Well, that's just 2^19 records per one of 2^32 humans using Bitcoin... I mean, we are talking about future, right?

Basically, if you own some coins (UTXOs), it is in your interest to keep index records which point to them in your wallet. Then you aren't at mercy of a DHT, you can always prove that you own coins without any external help.

This way all storage costs can be shifted onto users... It's their money, after all.

Chromia: a better dapp platform
markm
Legendary
*
Offline Offline

Activity: 2940
Merit: 1090



View Profile WWW
March 16, 2013, 02:15:01 PM
 #4

Y'know what, don't apologise, those guys really do suck at computer science! Smiley Cheesy

-MarkM-

Browser-launched Crossfire client now online (select CrossCiv server for Galactic  Milieu)
Free website hosting with PHP, MySQL etc: http://hosting.knotwork.com/
justusranvier
Legendary
*
Offline Offline

Activity: 1400
Merit: 1009



View Profile
March 16, 2013, 02:18:03 PM
 #5

I'm just somewhat disappointed that people consider only the simplest solutions and do not want to see a bigger picture.
What you're seeing is ex post facto reasoning.

Some people have an unstated goal that requires Bitcoin not to scale. For whatever reason they don't want to talk about their actual reason, so they invent reasons to justify their conclusion. When the invented reasons are debunked they don't change their position as they would if those were the real reasons, but instead just invent new reasons.
jl2012
Legendary
*
Offline Offline

Activity: 1792
Merit: 1093


View Profile
March 16, 2013, 03:00:06 PM
 #6

In simple wordings: a distributed and shared UTXO dataset, right?

So we could also have a distributed and shared full blockchain?

Haha, sorry, I didn't mean to offend anyone, I'm just somewhat disappointed that people consider only the simplest solutions and do not want to see a bigger picture.

Let's start with an example... Suppose I am a miner who is in kinda exotic situation: I have unlimited computational and networking resources, and enough temporary storage (RAM) to verify blockchain, but local permanent storage is incredibly scarce. But if I want to store something, I can store it externally over network. Say, in a DHT like Freenet, Tahoe-LAFS or something like that.

So I want to download and scan blockchain up to block N just once, then I need to be able to verify validity of any transaction. Is it possible?

Of course. As I go through blockchain I will build an index and will store it in an external DHT. I only need to keep hash of the latest block locally, 32 bytes that is.

Having this hash, I can retrieve things from DHT until I find transaction outputs I need to verify in index I've built. (I do not care whether DHT is secure: if something was tampered with, hashes won't match.)

This is kinda obvious: if secure distributed file systems can exist, then I can simply store data in such file systems instead of a local file system.

But... how much would it cost me to verify a transaction? Well, tree-like datastructures generally have look-up costs on scale of log2 N where N is a number of elements. In the worst case each individual satoshi is an individual UTXO, so we have 21000000*100000000 = 2100000000000000 ~= 2^51 UTXOs. Thus I need something like 51 lookups to find an UTXO in a binary search tree. Or just 9 lookups if I have a 64-ary tree.

But people can argue that 9 lookups per UTXO is a lot... Network latency yada yada. How about zero?

That's possible. Suppose that person which sends transaction knows how I store index in DHT, it isn't secret. To make sure that I'll include his transaction into block, he will fetch all the data I need from DHT himself, and will send me a message with his transaction and all information I need.

I don't need to look up anything in a DHT myself, I only need data which was stored in the message. And this is secure: if data was tampered with, hash won't match.
 
So, basically, number of transactions I can include into a block is limited by my computational and network abilities, but storage capability/cost is irrelephant.

But what's about blocks mined by others? Oh...

Well, it is possible to do 9 DHT lookups per UTXO mentioned in block. Number of outputs is limited, I can do lookups in parallel, so it isn't such a big problem. But still...

You know, miners are friendly guys, how about they all use same DHT, and then include confirmation information together with the block they have just mined?

So I receive a new block and supplementary information which is all what is needed to confirm that block is valid.

In the end, it is possible to do all mining having only 32 bytes of permanent secure storage. It requires somewhat more bandwidth, though. But extra bandwidth costs are approximately proportional to block size. So maybe not a big problem...

E.g. I either need 128 GB of RAM, array of hard drives and 100 MBit/sec pipe. Or I need 1 GB of RAM, no hard drives at all and 1 GBit/sec pipe. Which is cheaper?

So what I'm talking about storage/bandwidth trade-off. Using less storage might increase latency, but possible in such a way that it won't be critical.

Next time I will teach you how to implement a blockchain-based cryptocurrency in such a way that new miners can start mining right away without downloading whole blockchain, stay tuned...


Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
markm
Legendary
*
Offline Offline

Activity: 2940
Merit: 1090



View Profile WWW
March 16, 2013, 03:13:05 PM
 #7

In simple wordings: a distributed and shared UTXO dataset, right?

So we could also have a distributed and shared full blockchain?

Distributed data is one thing, reliable distributed data is a whole additional layer.

When "everyone" ("enough" people) have the blockchain, anything that "goes missing" from the DHT can be replenished.

But in principle you could do RAID type stuff to try to make it reliable.

There are challenges already before that though like spam and who can change which entries in the database and so on so it might be best to do just the UTXO set first and see how that works out before risking one's precious blockchain.

I am not sure offhand why thin client weren't already thinking DHT, maybe because one usually has the folks using it be the providers of it whereas thin clients seem to have been associated in people's minds with centralised servers. (Maybe those people had service offerings in mind as business they wanted to create a market for or something?)

(Or maybe phones just aren't good at being DHTs, maybe they aren't "always on" or have limited bandwidth.)

-MarkM-

Browser-launched Crossfire client now online (select CrossCiv server for Galactic  Milieu)
Free website hosting with PHP, MySQL etc: http://hosting.knotwork.com/
killerstorm (OP)
Legendary
*
Offline Offline

Activity: 1022
Merit: 1015



View Profile
March 16, 2013, 03:36:05 PM
 #8

In simple wordings: a distributed and shared UTXO dataset, right?

So we could also have a distributed and shared full blockchain?

Yes, that's where I'm leading to;
but in this post I've simply outlined a protocol which would allow one to take bandwidth/storage trade-off without sacrificing latency too much.

It doesn't require a distributed UTXO/blockchain dataset, I've simply implied existence of such dataset for better introduction.

It's just a protocol which requires users to keep all data required to verify their transactions.

Chromia: a better dapp platform
jtimon
Legendary
*
Offline Offline

Activity: 1372
Merit: 1002


View Profile WWW
March 16, 2013, 03:37:28 PM
 #9

This reminds me Ripple's ledger a lot.

https://ripple.com/wiki/Ledger_Format

2 different forms of free-money: Freicoin (free of basic interest because it's perishable), Mutual credit (no interest because it's abundant)
markm
Legendary
*
Offline Offline

Activity: 2940
Merit: 1090



View Profile WWW
March 16, 2013, 03:55:37 PM
 #10

It's just a protocol which requires users to keep all data required to verify their transactions.

When the user is offline, where then can that data be found, or is it even needed at such times at all?

-MarkM-

Browser-launched Crossfire client now online (select CrossCiv server for Galactic  Milieu)
Free website hosting with PHP, MySQL etc: http://hosting.knotwork.com/
killerstorm (OP)
Legendary
*
Offline Offline

Activity: 1022
Merit: 1015



View Profile
March 16, 2013, 06:26:40 PM
 #11

It's just a protocol which requires users to keep all data required to verify their transactions.

When the user is offline, where then can that data be found, or is it even needed at such times at all?

It is like a certificate of authenticity, it is needed only when one sends a transaction. Basically he sends a message: "Here's my transaction: xxx, and here are certificates for all coins involved: yyy".

Person you're sending coins to will need these "certificates" so he can prove authenticity of coins he got too.

And miners need them to decide whether to include it into a block. They will then repackage and optimize "certificates" of each coins involved into a certificate for a block. so other miners can just check whole block without asking anything from the person who send transaction.

There is one caveat, though: people need to update their certificates periodically, otherwise they become stale... In that case one might need a help of a third party to update it. Or DHT with historic data.

In any case, I'm not saying that this protocol is actually worth implementing. But the mere fact that it can exist shows that there is no good reason to limit UTXO set growth.

Also, nothing I mentioned requires state-of-the-art tech... It is literally something a mildly talented undergrad CS student could implement.

Chromia: a better dapp platform
killerstorm (OP)
Legendary
*
Offline Offline

Activity: 1022
Merit: 1015



View Profile
March 16, 2013, 07:06:28 PM
 #12

There are challenges already before that though like spam and who can change which entries in the database and so on so it might be best to do just the UTXO set first and see how that works out before risking one's precious blockchain.

Actually keeping historic blockchain data isn't even necessary in practice.

Let's assume that UTXO set is referenced from blocks in such a way that block which refers to wrong UTXO set is invalid. This means that it costs a lot of money to make a block with fake UTXO set, basically...

For all practical intents and purposes it is enough to have N latest blocks + UTXO set.

A new node which joins the network can grab UTXO set from a year old block and see if subsequent blocks agree with it.

In a hypothetical scenario where UTXO set is fake, forged by an attacker, attacker is so powerful that he can create 1 year worth of blocks... He clearly dominates the Bitcoin network. He could as well re-create all blocks from start, so it doesn't matter whether we have 1 year worth of full data or full blockchain.

But if such attack is under way, that would mean that Bitcoin is worthless. Thus if it is not worthless, there is no attack, and so it is safe to grab UTXO set and work from there.

I am not sure offhand why thin client weren't already thinking DHT

It offers practically no advantages at this point, but is considerably harder to implement.

(Or maybe phones just aren't good at being DHTs, maybe they aren't "always on" or have limited bandwidth.)[/quotes]

Yes, it is likely impractical to make phones DHT nodes, although phones will be able to query DHT.

It could work for desktop clients, though. A lot of people think that running Bitcoin-Qt is the right thing, I doubt they will mind running a DHT node for themselves and for mobile brethren.

Chromia: a better dapp platform
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
March 17, 2013, 03:43:14 PM
 #13

For all practical intents and purposes it is enough to have N latest blocks + UTXO set.
What you're describing is SPV UTXO security, and it's also been brought up many times before. Perhaps you should save for allegations of unimagniativeness for ideas that area actually original. Tongue

SPV UTXO security means that the reward for someone who can produce N blocks isn't just reversing the transactions they can make and replacing them, it's on the order of the value of the whole bitcoin economy, past and into the future. They could steal anyones coin, they could freely inflate the supply, etc.

You have the Bitcoin security model wrong: Bitcoin verifies everything because it doesn't trust, except in the very limited fashion that it must because we can't do ordering autonomously. Abandon that and you lose the incentives that make mining secure enough to even think that you could abandon validation.

Perhaps, ultimately, the scaling means that it's a superiority security model, but if you're going to change the security, you could hypothesize even weaker models which are even less costly to operate.
killerstorm (OP)
Legendary
*
Offline Offline

Activity: 1022
Merit: 1015



View Profile
March 17, 2013, 05:13:20 PM
 #14

What you're describing is SPV UTXO security, and it's also been brought up many times before. Perhaps you should save for allegations of unimagniativeness for ideas that area actually original. Tongue

Again, the main topic of my post was bandwidth-storage trade-off which is independent of security model you use. Actually I've outlined very conservative approach which changes nothing about security model, i.e. you still scan whole blockchain, you just don't store UTXO set locally.

But, of course, it's also possible to use UTXO set referenced by blocks instead of computing your own.

I'm not sure that "SPV UTXO security" is the right term because what I'm talking about is applicable to miners, but SPV is used in context of client systems, no?

SPV UTXO security means that the reward for someone who can produce N blocks isn't just reversing the transactions they can make and replacing them, it's on the order of the value of the whole bitcoin economy, past and into the future.


This attack is possible right now, N equals current blockchain height.

What changes if we make N less than current blockchain height?

They could steal anyones coin,

Same thing happens if you mine starting from genesis.

If you do history-rewriting attack of a limited depth you can steal coins of some people.

they could freely inflate the supply, etc.

No, it is possible to verify supply having only UTXO set.

You have the Bitcoin security model wrong:

I think you get it wrong: history-rewriting attack is fundamentally not different from UTXO manipulation attack in terms of severity.

If you have enough computational power and checkpointing is not a problem, you can do either of attacks, with same consequences.

If you don't have enough computational power, you can't do either of attacks. If centralized checkpoints are present in clients, you can't do either of attacks.

Chromia: a better dapp platform
jl2012
Legendary
*
Offline Offline

Activity: 1792
Merit: 1093


View Profile
March 18, 2013, 10:13:55 AM
 #15

i.e. you still scan whole blockchain, you just don't store UTXO set locally.

You have the Bitcoin security model wrong.

How did all these exabytes of data get into the DHT?  If you didn't put them there yourself, you're trusting whoever put them there that they constitute a well-formed blockchain.


You can always verify the data on DHT with the full blockchain, and you only need to do it once

Donation address: 374iXxS4BuqFHsEwwxUuH3nvJ69Y7Hqur3 (Bitcoin ONLY)
LRDGENPLYrcTRssGoZrsCT1hngaH3BVkM4 (LTC)
PGP: D3CC 1772 8600 5BB8 FF67 3294 C524 2A1A B393 6517
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
March 18, 2013, 10:14:36 AM
 #16

i.e. you still scan whole blockchain, you just don't store UTXO set locally.

You have the Bitcoin security model wrong.

How did all these exabytes of data get into the DHT?  If you didn't put them there yourself (or at least scan all of them), you're trusting whoever put them there that they constitute a well-formed blockchain.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
eldentyrell
Donator
Legendary
*
Offline Offline

Activity: 980
Merit: 1004


felonious vagrancy, personified


View Profile WWW
March 18, 2013, 10:15:52 AM
 #17

You can always verify the data on DHT with the full blockchain, and you only need to do it once

So you're saying that having to store exabytes of data isn't a problem, all you have to have is exabytes of network bandwidth in order to install the client.

Seriously?

OP has missed the forest for the trees by interpreting "storage costs" too literally.  Turning storage costs into network costs doesn't make them go away.  But it does tempt people to change the security model by skipping that bothersome scanning-exabytes-worth-of-data phase and just trusting the DHT root hash that their friendly neighborhood multinational bank says is safe to use.

All of this makes me wonder just who's computer science abilities he thinks "suck".  I don't know for sure, but I'd hazard a guess he's misrepresenting their point.

The printing press heralded the end of the Dark Ages and made the Enlightenment possible, but it took another three centuries before any country managed to put freedom of the press beyond the reach of legislators.  So it may take a while before cryptocurrencies are free of the AML-NSA-KYC surveillance plague.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
March 18, 2013, 01:01:48 PM
 #18

you just don't store UTXO set locally.
[...]
But, of course, it's also possible to use UTXO set referenced by blocks instead of computing your own.

I'm not sure that "SPV UTXO security" is the right term because what I'm talking about is applicable to miners, but SPV is used in context of client systems, no?
SPV security means trusting the longest chain instead of validating for yourself. Being a "client" or not isn't relevant.

But indeed, I assumed you meant not bothering to validate for yourself rather than validating and then pruning, though only in the context of a hash committed UTXO, so that you can securely query. I would have just called that "pruning". And there is nothing wrong with that.  But I also don't know that there is any great benefit in that— as it substantially increases bandwidth usage. Tradeoffs are fine though.

Quote
SPV UTXO security means that the reward for someone who can produce N blocks isn't just reversing the transactions they can make and replacing them, it's on the order of the value of the whole bitcoin economy, past and into the future.
This attack is possible right now, N equals current blockchain height.
No, in fact— it isn't, as nodes won't reorganize back infinitely far. (this has its own problems, as a longer fork that branches where nodes won't reorganize to makes the system divergent as newly initialized nodes will be on the 'wrong' chain)

Quote
If you do history-rewriting attack of a limited depth you can steal coins of some people.
Only N-recently mined ones, which the system makes unspendable for 100 blocks specifically due to this reason.

Quote
they could freely inflate the supply, etc.
No, it is possible to verify supply having only UTXO set.
To autonomously verify it you must query the entire set. If you're assuming self-verified pruned utxo thats fine, if you're assuming SPV security UTXO... not so much.
markm
Legendary
*
Offline Offline

Activity: 2940
Merit: 1090



View Profile WWW
March 18, 2013, 03:27:35 PM
 #19

All of this makes me wonder just who's computer science abilities he thinks "suck".  I don't know for sure, but I'd hazard a guess he's misrepresenting their point.

There was some tongue-in-cheek going on, y'know.

Similarly when I agreed "those guys'" do suck at computer science really that was just a lefthanded way of saying "good going old chap" not really anything about "those guys" at all.

-MarkM-

Browser-launched Crossfire client now online (select CrossCiv server for Galactic  Milieu)
Free website hosting with PHP, MySQL etc: http://hosting.knotwork.com/
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1065



View Profile
March 18, 2013, 05:00:22 PM
Last edit: March 18, 2013, 05:18:39 PM by 2112
 #20

However, after that you need to keep UTXO set in RAM. RAM is expensive.
Which branch of computer science postulates that? Transactions arrive nearly randomly during the inter-block period, so the mean time available to verify transaction is about 5 minutes.

What is going on here with this "in-RAM database requirement"?

Edit: I mean the only new transaction in the block is the coinbase. Every other transaction should be already broadcast on the p2p network.

Are you trying to preserve some artefact from the Satoshi's implementation? From the computer science point of view the array of transactions appended to the block is pure overhead and redundant.

Or maybe there is some sort of game-theoretic advantage to broadcasting the transactions as late as possible? I'm not aware of any, but I haven't put much thought into the game-theoretic aspects of this concept of hiding the transactions from the other miners. To me it seems counterproductive globally.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Pages: [1] 2 »  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!