SER_DISK vs SER

Sergio_Demian_Lerner (OP)

Hero Member

Offline

Activity: 554
Merit: 648

SER_DISK vs SER_NETWORK

September 25, 2012, 05:32:47 PM

In Satoshi client every object can be serialized either to disk or to network.
Nevertheless I haven't found any difference between the serialization of the blockchain for SER_DISK compared to SER_NETWORK.

What classes are sensitive to SER_* serialization ?

I'm writing a Bitcoinj class to read and process Satoshi blockchain (blk*.dat) files and I want to know if I should care about SET_* flags.

Thanks, Sergio.

jgarzik

Legendary

Offline

Activity: 1596
Merit: 1091

Re: SER_DISK vs SER_NETWORK

September 25, 2012, 06:32:30 PM

Quote from: Sergio_Demian_Lerner on September 25, 2012, 05:32:47 PM

The python implementation pynode does not have any notion of serialization differences between the two, either. pynode successfully imports bitcoin-generated blk000?.dat files, as well as talking on the network.

Perhaps this was for future expansion? I would love to know any differences, myself.

Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj

kjj

Legendary

Offline

Activity: 1302
Merit: 1025

Re: SER_DISK vs SER_NETWORK

September 25, 2012, 06:57:50 PM

I was just grepping through the source, and those enums get passed around a lot, but appear only to be consumed in the IMPLEMENT_SERIALIZE functions of various classes.

For example, CAddress::IMPLEMENT_SERIALIZE in protocol.h adds the nVersion and nTime if called with SER_DISK, but does not otherwise. The others look mostly similar.

SER_DISK only seems to be consumed in protocol.h and wallet.cpp, neither of which involve the block chain, so there should be no differences in the block format there.

If you want to go looking for them yourself, don't forget to also look for SER_GETHASH.

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs. You should too.

Sergio_Demian_Lerner (OP)

Hero Member

Offline

Activity: 554
Merit: 648

Re: SER_DISK vs SER_NETWORK

September 25, 2012, 10:07:12 PM

Thanks! I finished implementing the blk0001.dat parser for Bitcoinj.

For me, it' was the most simpe way to get statistics out from the blockchain. Tomorrow I will post a histogram of average volume transacted depending on the amount range (eg. 0 to 1 BTC, 10 - 100 BTC, 100 to 1K BTC, etc.)
(Note that I had to assume that the last output from a transaction is the change).
This reveals interesting information regarding the average use.

If someone wants to experiment with it, send me a message.

Best regards,
Sergio.

Pieter Wuille

Legendary

Offline

Activity: 1072
Merit: 1178

Re: SER_DISK vs SER_NETWORK

September 25, 2012, 10:17:08 PM

Quote from: Sergio_Demian_Lerner on September 25, 2012, 10:07:12 PM

(Note that I had to assume that the last output from a transaction is the change).

That assumption will be wrong 50% of the time...

I do Bitcoin stuff.

Sergio_Demian_Lerner (OP)

Hero Member

Offline

Activity: 554
Merit: 648

Re: SER_DISK vs SER_NETWORK

September 27, 2012, 06:44:08 PM

Quote from: Pieter Wuille on September 25, 2012, 10:17:08 PM

That assumption will be wrong 50% of the time...

Yes! I forgot the change position randomization!

But still it's generally possible to guess which output is the change, since :

1. The payment amount is always greater than the sum of inputs amounts, with the exception of the input amount of lesser value.
3. The change amount is always smaller than any of the inputs.

The only case where this guessing fails is when there is a single input amount. In this case, generally the payment amount is an integer value, and the change is not, so you still can guess with some accuracy.

Best regards Pieter!

Mike Hearn

Legendary

Offline

Activity: 1526
Merit: 1129

Re: SER_DISK vs SER_NETWORK

September 27, 2012, 07:25:04 PM

Are you planning to contribute your blkdat parser back to bitcoinj? It sounds useful!

I believe your assumptions are still incorrect:

1) You cannot assume anything about the size of a change address, nothing says it has to be smaller than the payment and often it won't be

2) You cannot assume payments are round numbers as often they will have been converted through an exchange rate. For instance many payments I make look like essentially random numbers because they are some round figure of my local currency multiplied by the exchange rate at the time.

Block chain analysis is hard, I doubt there is an accurate way to calculate what you want.

Sergio_Demian_Lerner (OP)

Hero Member

Offline

Activity: 554
Merit: 648

Re: SER_DISK vs SER_NETWORK

September 27, 2012, 09:10:17 PM

Quote from: Mike Hearn on September 27, 2012, 07:25:04 PM

Are you planning to contribute your blkdat parser back to bitcoinj? It sounds useful!

Yes, if anyone wants it

Quote from: Mike Hearn on September 27, 2012, 07:25:04 PM

I believe your assumptions are still incorrect:

1) You cannot assume anything about the size of a change address, nothing says it has to be smaller than the payment and often it won't be

But no client would automatically generate a transaction where the change is greater than a transaction input? What for?

Do you mean something like (A):

Inputs: 10 , 20, 30
Outputs: 15 (change), 45 (payment)

Why not create the tx (B):

Input: 20 , 30
Output: 5 (change) ,45 (payment)

Is the client so dumb to generate a transaction like A which wastes space instead of B ?

kjj

Legendary

Offline

Activity: 1302
Merit: 1025

Re: SER_DISK vs SER_NETWORK

September 27, 2012, 09:24:57 PM

Quote from: Sergio_Demian_Lerner on September 27, 2012, 09:10:17 PM

Quote from: Mike Hearn on September 27, 2012, 07:25:04 PM

Are you planning to contribute your blkdat parser back to bitcoinj? It sounds useful!

Yes, if anyone wants it

Quote from: Mike Hearn on September 27, 2012, 07:25:04 PM

I believe your assumptions are still incorrect:

1) You cannot assume anything about the size of a change address, nothing says it has to be smaller than the payment and often it won't be

In the future, I would like to see the client attempt to make outputs that are roughly equal in size, with equal probability of being higher or lower. Just to make it harder to guess. But that is hardly a promise of anonymity. Which one was the change will quickly be revealed when it is merged with another address known to belong to you, or with the change you sent before, or with another transaction sent to the same address as one of the inputs, or...

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs. You should too.

Sergio_Demian_Lerner (OP)

Hero Member

Offline

Activity: 554
Merit: 648

Re: SER_DISK vs SER_NETWORK

September 27, 2012, 09:31:47 PM

#10

But you still have to know which addresses belongs to you, so there is the chicken and egg problem.

kjj

Legendary

Offline

Activity: 1302
Merit: 1025

Re: SER_DISK vs SER_NETWORK

September 27, 2012, 09:53:39 PM

#11

Quote from: Sergio_Demian_Lerner on September 27, 2012, 09:31:47 PM

But you still have to know which addresses belongs to you, so there is the chicken and egg problem.

But that information, if ever found, then travels back in time and infects every transaction you've ever done, which is bad.

17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8
I routinely ignore posters with paid advertising in their sigs. You should too.

SgtSpike

Legendary

Offline

Activity: 1400
Merit: 1005

⇾ Re: SER_DISK vs SER_NETWORK

September 27, 2012, 10:44:34 PM

#12

Quote from: Sergio_Demian_Lerner on September 27, 2012, 06:44:08 PM

Quote from: Pieter Wuille on September 25, 2012, 10:17:08 PM

That assumption will be wrong 50% of the time...

I'm glad someone finally is doing analysis based on these assumptions! No, they aren't exact, but they would be generally pretty close. I am excited to see what you come up with.

Quote from: Mike Hearn on September 27, 2012, 07:25:04 PM

He did say "guess". And this is certainly a good way to get it right a vast majority of the time.

apetersson

Hero Member

Offline

Activity: 668
Merit: 501

Re: SER_DISK vs SER_NETWORK

September 28, 2012, 12:58:20 AM

#13

i would rather postpone the decision about "what was change" and merge addresses to entities. once you see a transaction that signs multiple inputs at once, you can "assume" that it was one entity and assign change status retroactively

Sergio_Demian_Lerner (OP)

Hero Member

Offline

Activity: 554
Merit: 648

Re: SER_DISK vs SER_NETWORK

September 28, 2012, 01:54:04 PM

#14

Quote from: apetersson on September 28, 2012, 12:58:20 AM

Interesting. What would cover all spent outputs...
Also I can use that information to validate the naive method I suggested, and see the false positives/negatives ratio.

Mike Hearn

Legendary

Offline

Activity: 1526
Merit: 1129

Re: SER_DISK vs SER_NETWORK

September 29, 2012, 10:07:25 AM

#15

Sure, contributing back the blk parser would be welcome.

My concern with your heuristics is not that they will always be wrong (they won't), but people will use whatever statistics you come up with to make judgements or even investment decisions, without understanding the quite serious caveats that go along with your methodologies. See: the Silk Road study, which is now being quoted as fact in various news sources despite that it was based on a VERY shaky set of assumptions.

Pages: [1]

Bitcoin Forum > Bitcoin > Development & Technical Discussion > SER_DISK vs SER_NETWORK

« previous topic next topic »