The transaction creation algorithm is not optimized for anonymity. It leaks information which could potentially be used to track money flowing through the system. While this may not be of concern to some individuals, it is of concern to others. In order to maximize anonymity for those who wish it, the behaviour of the transaction creation in the standard client should be biased as much as possible towards anonymity. This biasing can easily be accomplished without incurring any extra transaction fees.The Problem
One of Bitcoin's properties is its ability to be anonymous. This ability is not perfect, but it is there. However, the current implementation allows enough side-channel information leakage to make it less difficult for someone to track the flow of cash through various transfers. Closing these side channels may not stop a determined analyst, especially if they have sufficient computing resources, but it will certainly make their job harder. To make the tracker's job harder, we need to minimize the amount of information that allows the tracker to draw an association between two bitcoin addresses.
The primary problem with the current software is the coin selection algorithm (by "coin" I mean a previous transaction output, i.e. vin[x
].prevout. I will sometimes use the terms "coin" and "address" interchangeably). By the way, transaction fees do not affect this discussion, and are assumed to be included in the "payment amount" - the amount of Bitcoins being transferred from one person to another.
The selection algorithm, which I call the "best fit" algorithm, boils down to:
- If there is a coin which exactly matches the payment amount, then use it
- Otherwise, find the coin (if any) which exceeds the payment amount by the least amount; i.e. if the payment is 10BTC and you have an 11BTC coin and a 15BTC coin, use the 11BTC coin
- Otherwise, find the best combination of coins which sums closest to the payment amount.
The selection algorithm makes it a little easier for someone to track the ownership of a given coin, which could allow the tracker to link two transactions to a given individual.
Rule 1 is easy. The coin goes from address A to address B. Following this transaction is trivial.
Rule 2 is a little more difficult, but not much. The standard client doesn't allow a single payment to multiple recipients, therefore one of the two outputs (the change) must be controlled by the same person who controls address A. The transaction generation routine mitigates this somewhat by randomly selecting whether the change goes into vout or vout. But we know (and therefore the tracker also knows) that the client will try to choose a coin which comes as close as possible in value to the payment, therefore it's very likely that the output with the lesser value is the change, and the output with the greater value is the payment. The greater the difference between the output values, the more this is likely. It's still a guess, of course.
Examples (output addresses are O1 and O2, input addresses are A, B, C, etc.):
A=4, O1=3, O2=1: O2 is likely (but not guaranteed) the payment. Therefore O1 is likely controlled by the sender, and O2 is likely controlled by the recipient.
A=100, O1=99, O2=1: Again, O1 is likely (but still not guaranteed) the payment
A=10, O1=4, O2=6: Since O1 and O2 are close together, either one is likely the payment.
A=10, O1=5, O2=5: Impossible to tell which is the payment and which is the change.
Rule 3 is trivial. One of the outputs will have a value greater than any individual input coin, and the other output will have a value smaller than any individual coin. The output coin that has the smaller value is the change, and the output coin with the larger value is the payment. Unlike rule 2, that property is guaranteed, and no assumptions or guesses are necessary.
A=2, B=3, O1=4, O2=1: O1 is the payment, O2 is the change. If the payment were 1BTC, then coin B is not needed at all for the transaction.
A=2, B=2, C=3, O1=1, O2=6: O2 is the payment, O1 is the change. As in the first example, if the payment were 1BTC, then any one of the inputs would satisfy the payment and the other two would not be included.
A=?, B=?; O1=4, O2=4: This cannot happen. If it did, one of the two input coins would have to be greater than or equal to the payment amount, so the other one would not be included.Potential Coin Selection Algorithms
A tracker knows that all the input coins to a transaction are controlled by a single person (except under unusual situations not covered by the standard client). Therefore, transactions should use as few inputs as possible.
Transactions with a single output give away who received the money (it's very unlikely you'll make a payment to yourself).
Randomizing which output is the payment and which is the change is good, so keep that.
The goal is to make it difficult for a tracker to determine which output is the payment, and which is the change.
Keeping those principles in mind, I propose the following algorithms:
1) Select a coin which is exactly twice the value of the payment
This makes the two outputs identical, making it impossible to guess which is the payment and which is the change. Variations: select two or more coins that total exactly twice the payment amount; select one coin which is approximately twice the payment amount (this will result in two outputs which are close in value, and as shown above you cannot reliably guess which is the payment)
2) Select two coins such that either (a) the value of each input is less than the payment amount and sums to greater than (but NOT equal to) the payment amount, or (b) the value of each input is greater than the payment amount
A=4, B=2, O1=5, O2=1
A tracker looking at this transaction cannot tell if it satisfies condition (a) or condition (b), therefore they cannot determine which is the payment and which is the change. For condition (a), O1 is the payment, and for condition (b) O2 is the payment.
3) Select two coins: one whose value exactly matches the payment, and another coin at random.
A=5, B=2; O1=5, O2=2
It's rather obvious what we're doing here, but since we (and the tracker) know the payment and change are randomized, a tracker cannot know which is the payment and which is the change.
4) A special variation on (1) (I call this one the "In yo face!" selection): Select multiple coins adding to more than the payment amount, and if possible ensure at least one coin is redundant.
A=2, B=3, C=1, D=1, E=2, O1=4, O2=5
It's clear from examining this transaction that several combinations of the coins could satisfy either O1 or O2, so the tracker cannot tell which is the payment and which is the change. Note that this algorithm, unlike the others, is guaranteed to find a set of coins that can be used.Important considerations
The transaction creation function must be able to use any of the algorithms above. The specific algorithm must be chosen randomly. The randomness can be weighted towards algorithms 1 and 2.
The transactions that are generated are unlikely to incur any transaction fees greater than they would normally. None of these algorithms chooses excessively large numbers of coins, thus there is no additional cost to the user.
The current algorithm can potentially loop 1000 times seeking the best combination of coins to come closest to the payment amount. These algorithms eliminate that overhead.
It is important that the default behaviour of the Bitcoin client is geared towards anonymity. In order for anonymity to succeed, it must be difficult or impossible to distinguish between a transaction generated by someone seeking anonymity, and a transaction generated by someone who doesn't care about anonymity. If the random algorithm selection is used only by those seeking anonymity, then the tracker just needs to look for transactions that don't fit the "best fit" algorithm.
(Minor edits to clean up formatting)