Obtaining all transactions since a given txid

Gavin Andresen

Legendary

Offline

Activity: 1652
Merit: 2216

Chief Scientist

Re: Obtaining all transactions since a given txid

February 03, 2011, 09:13:55 PM

#21

Quote from: jon_smark on February 03, 2011, 07:52:18 PM

Okay, let us consider that scenario. Step by step, just to make sure we are on the same page:

The Bitcoind daemon knows of transactions [A, B, C, D] with timestamps [T1, T1, T2, T2].
Upon first invocation, the client asks for a list of all transactions. It receives [A, B, C, D] (and timestamps) as response.
There's a block chain re-org invalidating [C, D].

No, there is a block chain re-org invalidating B and C. D is still valid in my scenario.

So the client remains oblivious to B and C being invalid, whether it asks for transactions older than T2 or transactions after D.

How often do you get the chance to work on a potentially world-changing project?

jon_smark (OP)

Member

Offline

Activity: 90
Merit: 10

Re: Obtaining all transactions since a given txid

February 04, 2011, 01:52:17 PM

#22

Quote from: gavinandresen on February 03, 2011, 09:13:55 PM

No, there is a block chain re-org invalidating B and C. D is still valid in my scenario.
So the client remains oblivious to B and C being invalid, whether it asks for transactions older than T2 or transactions after D.

Okay, I was under the impression that if B and C were invalidated, then D would be invalidated too. (As I noted a few posts ago, one of my assumptions was that the txid was unique across the entire tree).

So, we're back to square one: we need to find some ID which is unique across the entire tree. Is there such a thing as a block hash? I assume that when there is a fork in the block chain, both children of block N will number themselves N+1. Though one of them will be invalidated eventually, it's not possible to use the block number as a unique ID. Hence why some sort of hash which uniquely identifies a block would be useful.

In terms of the API, the client would invoke listtransactionssince with the hash of the last known block. If found, the daemon would return the hash of the last confirmed block in the chain, together with all transactions which have occurred between the two blocks. And instead of a timestamp, the optional parameter indicating early breakout criteria would be the block number.

(Alternatively, there could be two separate API calls: listblockssince and getblocktransactions).

As far as the client goes, the algorithm to ensure a correct transaction history -- even in the context of possible invalidations -- would be similar to the one I've already described. Just s/txid/blockhash/.

Anyway, as I mentioned I'm not that familiar with the inner workings of the Bitcoin protocol, so I'm making lots of assumptions. I hope you can fill the blanks, because without an API that allows clients to have a fool-proof way of ensuring transaction consistency it will be very hard for Bitcoin to be taken seriously for e-commerce applications.

Cheers,
Jon

Gavin Andresen

Legendary

Offline

Activity: 1652
Merit: 2216

Chief Scientist

Re: Obtaining all transactions since a given txid

February 04, 2011, 04:34:15 PM

#23

Quote from: jon_smark on February 04, 2011, 01:52:17 PM

Anyway, as I mentioned I'm not that familiar with the inner workings of the Bitcoin protocol, so I'm making lots of assumptions. I hope you can fill the blanks, because without an API that allows clients to have a fool-proof way of ensuring transaction consistency it will be very hard for Bitcoin to be taken seriously for e-commerce applications.

The fool-proof way of dealing with your use case (customer orders something, you want to ship after you're sure payment has cleared):

+ Give each customer an account. When they order, use getaccountaddress to get a bitcoin address to which they can send payment.

+ Every N minutes ask bitcoin either the balance (with minimum 6 confirmations) for accounts with pending orders or all accounts

+ If the account balance is enough to pay for the order, ship it and move the coins from the customer's account to a PAID account.
If not... either wait or tell the customer they paid the wrong amount or maybe refund any extra they sent (you'll have to ask them for a refund address).

The inelegant polling will eventually be fixed by bitcoin POSTing when new blocks or transactions arrive, but I think you'll still need to ask bitcoin what the account's current balance is-- trust me, you really don't want to recreate all the bitcoin logic dealing with double-spent transactions or block chain reorganizations.

If you grow to handling thousands of orders per day (which would be a very good problem to have) you'll want to buy or build a version of bitcoin optimized for high-volume transaction websites. Or maybe you'll run 20 bitcoinds, each handling 1/20'th of the customers -- I dunno, I don't spend a lot of time worrying about problems I'll have when my project is outrageously successful.

How often do you get the chance to work on a potentially world-changing project?

jon_smark (OP)

Member

Offline

Activity: 90
Merit: 10

Re: Obtaining all transactions since a given txid

February 04, 2011, 05:48:41 PM

#24

Quote

+ Give each customer an account. When they order, use getaccountaddress to get a bitcoin address to which they can send payment.

+ Every N minutes ask bitcoin either the balance (with minimum 6 confirmations) for accounts with pending orders or all accounts

+ If the account balance is enough to pay for the order, ship it and move the coins from the customer's account to a PAID account.
If not... either wait or tell the customer they paid the wrong amount or maybe refund any extra they sent (you'll have to ask them for a refund address).

Well, that's the straightforward procedure which the Bitcoin daemon already caters for, and which a lazy programmer might implement in their client. It's awfully inefficient though, and it won't scale at all. I guess that's my fundamental problem with it. But it's not the only one, though:

Suppose there are multiple items associated with a customer (for instance, adverts whose impression count may be asynchronously "recharged" at any moment). With the simple algorithm above, one must have a separate account for each asset. Since each customer can have multiple assets, things can quickly grow out of control. Alternatively, one could group all assets from one user into one account and distinguish between assets via their address. However, the "simple" algorithm is suddenly much more complicated: after an initial polling of all accounts, one would have to loop through all accounts with a changed balance, doing an additional API invocation to get all address balances associated with that account, and comparing each one to the previously stored balance. Just think of all the extra pressure this would put on the database backend that stores the account balances!

Anyway, my point is that the simplicity of the algorithm you describe is only valid for a narrow range of scenarios. There are situations (such as mine) where it would actually make the client logic more complex and unable to scale beyond triviality.

Note that I'm not advocating that clients should be forced to deal with block chain re-organisations. On the contrary, I think the client of the API should have the freedom to implement whatever scheme they desire. I will not argue that there are plenty of people for whom the simple algorithm will suffice. There are however also people who would like a more efficient and scalable way of dealing with the Bitcoin daemon...

But going back to my original question: is there a unique way to identify a block even in the face of reorgs?

Gavin Andresen

Legendary

Offline

Activity: 1652
Merit: 2216

Chief Scientist

Re: Obtaining all transactions since a given txid

February 04, 2011, 06:33:36 PM

#25

Quote from: jon_smark on February 04, 2011, 05:48:41 PM

But going back to my original question: is there a unique way to identify a block even in the face of reorgs?

Sure, every block has a unique hash.

But I don't think that helps at all; you might see that transactions A B C D are in block #100,000 with hash H1, but after a block chain re-org block #100,000 might contain transactions A D (with block hash H2).

The probability of that happening rapidly approaches zero as the block gets confirmed; after 6 confirmations you can safely assume it just won't happen.

How often do you get the chance to work on a potentially world-changing project?

theymos

Administrator
Legendary

Offline

Activity: 5166
Merit: 12864

Re: Obtaining all transactions since a given txid

February 04, 2011, 09:21:14 PM
Last edit: February 04, 2011, 09:36:25 PM by theymos

#26

Quote from: jon_smark on February 04, 2011, 01:52:17 PM

In terms of the API, the client would invoke listtransactionssince with the hash of the last known block. If found, the daemon would return the hash of the last confirmed block in the chain, together with all transactions which have occurred between the two blocks. And instead of a timestamp, the optional parameter indicating early breakout criteria would be the block number.

Good idea. This will work. The client can also keep a stack of block hashes if desired to reduce work in case of a reorg (which happens more often than you might think).

It won't work for 0-confirmation transactions, of course.

Bitcoin Block Explorer does something similar: it uses the current block number for ETag/If-None-Match caching on certain pages.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD

jon_smark (OP)

Member

Offline

Activity: 90
Merit: 10

Re: Obtaining all transactions since a given txid

February 05, 2011, 05:00:03 PM

#27

Hallo again,

I have a new proposal. This one is IMHO simpler and more robust. Moreover, it caters for two kinds of clients of the API: 1) those who don't want to bother with block chain reorgs and therefore can just play it safe and request only transactions which have a high number of confirmations (and therefore have an infinitesimal chance of belonging to a block which might be discarded down the road), and 2) those who want absolute control, 100% robustness, and are willing to deal with possible block chain reorgs.

My suggestion is to base the API call not on transactions but on blocks. After all, transactions are grouped into blocks, and saying that a given transaction has N confirmations means that all transactions in that same block also have N confirmations. It therefore makes more sense for the client application to receive transactions in block-based batches.

So, I propose a gettransactionssince method with the following parameters:

Code:

gettransactionssince [account] [blkhash] [blknum] [minconf=6]

If given, the optional parameter account restricts the result set to transactions pertaining to that account. The also optional parameter blkhash indicates the block hash of the last block the client knows about. When this parameter is given, the Bitcoin daemon is supposed to return all transactions which have occurred since this block (but not including the transactions in the block itself!). If not given, then all transactions since the beginning should be returned.

Very important: together with a set of transactions, the method gettransactionssince should also return the block hash and the block number of the last block in the chain with minconf confirmations (and whose transactions are possibly contained in the result set). This way, the client can in subsequent invocations tell the Bitcoin daemon what it already knows about.

Note the minconf parameter. I suggest setting its default value to a number so high that the probability of a reorg affecting the returned blocks is practically zero. This way, "dumb" clients can just call gettransactionssince using a different blkhash each time, and not worry about any other book-keeping.

As for the blknum parameter, it can be used for an early breakout of the cycle in the event that there's been a reorg and therefore blkhash will not be found. If it's not given, then the daemon will have to loop all the way to the beginning before it can detect and report an error situation.

Smarter clients can use blknum together with a lower minconf value to ensure a lower latency for confirmations; they should however take care of some extra book-keeping in case there's been in fact a reorg. The algorithm for this is not that complicated, but it is in any case the sole responsibility of the client to implement it correctly; the Bitcoin daemon should not have to care.

Anyway, what do you think?

Best regards,
Jon

Gavin Andresen

Legendary

Offline

Activity: 1652
Merit: 2216

Chief Scientist

Re: Obtaining all transactions since a given txid

February 05, 2011, 05:13:24 PM

#28

It doesn't 'feel right' to me.

Seems like the ideal API would be:

"Hey bitcoin, I want to keep track of all transactions for account FOO (or all accounts) that have [minconf] confirmations. Please POST them to [url]."

or

"Hey bitcoin, I want to keep track of all transactions for account FOO (or all accounts) that have [minconf] confirmations. I'll be polling you to see if there are any new ones every once in a while, I'll pass you [unique_token] so you know it is me."

... at least for the simple case. You'd get back two lists of transactions: new transactions with [minconf] that you haven't been told about before (maybe empty in the polling case), and a list of transactions you were told about before that now have less than [minconf] confirmations because of a block chain re-org (always empty if [minconf] is big enough).

For the "I really want to shoot myself in the foot and deal with block-chain reorgs myself" you can call getblock and/or monitorblock to get all the gory details about which transactions are in which blocks.

How often do you get the chance to work on a potentially world-changing project?

jon_smark (OP)

Member

Offline

Activity: 90
Merit: 10

Re: Obtaining all transactions since a given txid

February 05, 2011, 07:44:22 PM

#29

Quote

It doesn't 'feel right' to me.

As I explain below, whether it feels right or not is mostly a function of the role one envisions for the Bitcoin daemon.

Quote

"Hey bitcoin, I want to keep track of all transactions for account FOO (or all accounts) that have [minconf] confirmations. Please POST them to [url]."

I agree that would also be useful. But note that it requires some extra book-keeping on the Bitcoin daemon, in order to keep track of possible lost messages.

Quote

"Hey bitcoin, I want to keep track of all transactions for account FOO (or all accounts) that have [minconf] confirmations. I'll be polling you to see if there are any new ones every once in a while, I'll pass you [unique_token] so you know it is me."

That's essentially what I want, with the difference that my proposal does not require the Bitcoin daemon to be stateful in respect to this method implementation; it is instead the client's responsibility to keep track of the state.

Choosing one approach or another is a matter of deciding what role should the bitcoind daemon play. Should it be a relatively dumb gateway to the Bitcoin protocol? (in which case my approach makes more sense) Or should it make life as easy as possible for clients, even at the expense of being a more complex daemon? (in which case your solution is appropriate) If this fundamental decision has already been made, then the choice of approach should be straightforward.

Quote

For the "I really want to shoot myself in the foot and deal with block-chain reorgs myself" you can call getblock and/or monitorblock to get all the gory details about which transactions are in which blocks.

I'm running version 0.3.19 and I cannot find neither getblock nor monitorblock. Are these new methods coming for 0.3.20? Well, if getblock does what I think it does (ie, return all information available about a given block number, including block hash and transactions) then it would of course be an alternative way to implement the "shoot-myself-in-the-foot" client. Everybody can be happy, yay!

theymos

Administrator
Legendary

Offline

Activity: 5166
Merit: 12864

Re: Obtaining all transactions since a given txid

February 05, 2011, 09:35:01 PM

#30

A very large site would probably want use getblock, anyway, since it allows you to talk to Bitcoin only once per block instead of querying it every time you want to do anything.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD

Gavin Andresen

Legendary

Offline

Activity: 1652
Merit: 2216

Chief Scientist

Re: Obtaining all transactions since a given txid

February 05, 2011, 10:38:50 PM

#31

Quote from: jon_smark on February 05, 2011, 07:44:22 PM

I'm running version 0.3.19 and I cannot find neither getblock nor monitorblock. Are these new methods coming for 0.3.20? Well, if getblock does what I think it does (ie, return all information available about a given block number, including block hash and transactions) then it would of course be an alternative way to implement the "shoot-myself-in-the-foot" client. Everybody can be happy, yay!

They're not in 0.3.20; maybe 0.3.21.
https://github.com/gavinandresen/bitcoin-git/tree/monitorreceived
... is the not-yet-ready-for-prime-time branch they're on.

How often do you get the chance to work on a potentially world-changing project?