Bitcoin is often promoted as a tool for privacy but the only privacy that exists in Bitcoin comes from pseudonymous addresses which are fragile and easily compromised through reuse, "taint" analysis, tracking payments, IP address monitoring nodes, web-spidering, and many other mechanisms. Once broken this privacy is difficult and sometimes costly to recover.
Traditional banking provides a fair amount of privacy by default. Your inlaws don't see that you're buying birth control that deprives them of grand children, your employer doesn't learn about the non-profits you support with money from your paycheck, and thieves don't see your latest purchases or how wealthy you are to help them target and scam you. Poor privacy in Bitcoin can be a major practical disadvantage for both individuals and businesses.
Even when a user ends address reuse by switching to
BIP 32 address chains, they still have privacy loss from their old coins and the joining of past payments when they make larger transactions.
Privacy errors can also create externalized costs: You might have good practices but when you trade with people who don't (say ones using "green addresses") you and everyone you trade with loses some privacy. A loss of privacy also presents a grave systemic risk for Bitcoin: If degraded privacy allows people to assemble centralized lists of good and bad coins you may find Bitcoin's fungibility destroyed when your honestly accepted coin is later not honored by others, and its decentralization along with it when people feel forced to enforce popular blacklists on their own coin.
As I write this people with unknown motivations are raining down tiny little payments on old addresses, presumably in an effort to get wallets to consume them and create evidence of common address ownership.
I think this must be improved, urgently.
This message describes a transaction style Bitcoin users can use to dramatically improve their privacy which I've been calling
CoinJoin. It involves no changes to the Bitcoin protocol and has already seen some very limited use spanning back a couple of years now but it seems to not be widely understood.
I first publicly described this transaction style in a whimsically-named thread— "
I taint rich!"— where I focused on a specific side effect of these transactions, with an expectation that people would see the rest of the implications on their own.
Explicit beats implicit, and even people who understand the idea have had some questions which could use answering. Thus this post.
The idea is very simple, first some quick background:
A Bitcoin transaction consumes one or more inputs and creates one or more outputs with specified values.
Each input is an output from a past transaction. For each input there is a distinct signature (scriptsig) which is created in accordance with the rules specified in the past-output that it is consuming (scriptpubkey).
The Bitcoin system is charged with making sure the signatures are correct, that the inputs exist and are spendable, and that the sum of the output values is less than or equal to the sum of the input values (any excess becomes fees paid to miners for including the transaction).
It is normal for a transaction to spend many inputs in order to get enough value to pay its intended payment, often also creating an additional 'change' output to receive the unspent (and non-fee) excess.
There is no requirement that the scriptpubkeys of the inputs used be the same; i.e., no requirement that they be payments to the same address. And, in fact, when Bitcoin is correctly used with one address per payment, none of them will be the same.
When considering the history of Bitcoin ownership one could look at transactions which spend from multiple distinct scriptpubkeys as co-joining their ownership and make an assumption: How else could the transaction spend from multiple addresses unless a common party controlled those addresses?
In the illustration 'transaction 2' spends coins which were assigned to 1A1 and 1C3. So 1A1 and 1C3 are necessarily the same party?
This assumption is incorrect. Usage in a single transaction does not prove common control (though it's currently pretty suggestive), and this is what makes
CoinJoin possible:
The signatures, one per input, inside a transaction are
completely independent of each other. This means that it's possible for Bitcoin users to agree on a set of inputs to spend, and a set of outputs to pay to, and then to individually and separately sign a transaction and later merge their signatures. The transaction is not valid and won't be accepted by the network until all signatures are provided, and no one will sign a transaction which is not to their liking.
To use this to increase privacy, the N users would agree on a uniform output size and provide inputs amounting to at least that size. The transaction would have N outputs of that size and potentially N more change outputs if some of the users provided input in excess of the target. All would sign the transaction, and then the transaction could be transmitted. No risk of theft at any point.
In the illustration 'transaction 2' has inputs from 1A1 and 1C3. Say we beliece 1A1 is an address used for Alice and 1C3 is an address used for Charlie. Which of Alice and Charlie owns which of the 1D and 1E outputs?
The idea can also be used more casually. When you want to make a payment, find someone else who also wants to make a payment and make a joint payment together. Doing so doesn't increase privacy much, but it actually makes your transaction
smaller and thus easier on the network (and
lower in fees); the extra privacy is a perk.
Such a transaction is externally
indistinguishable from a transaction created through conventional use. Because of this, if these transactions become widespread they improve the privacy even of people who do not use them, because no longer will input co-joining be strong evidence of common control.
There are many variations of this idea possible, and all can coexist because the idea requires no changes to the Bitcoin system. Let a thousand flowers bloom: we can have diversity in ways of accomplishing this and learn the best.
FAQ:
Don't you need tor or something to prevent everyone from learning everyone's IP?Any transaction privacy system that hopes to hide user's addresses should start with some kind of anonymity network. This is no different. Fortunately networks like Tor, I2P, Bitmessage, and Freenet all already exist and could all be used for this. (Freenet would result in rather slow transactions, however)
However, gumming up "taint analysis" and reducing transaction sizes doesn't even require that the users be private from each other. So even without things like tor this would be no worse than regular transactions.
Don't the users learn which inputs match up to which outputs?In the simplest possible implementation where users meet up on IRC over tor or the like, yes they do. The next simplest implementation is where the users send their input and output information to some meeting point server, and the server creates the transaction and asks people to sign it. The server learns the mapping, but no one else does, and the server still can't steal the coins.
More complicated implementations are possible where even the server doesn't learn the mapping.
E.g. Using chaum blind signatures: The users connect and provide inputs (and change addresses) and a cryptographically-blinded version of the address they want their private coins to go to; the server signs the tokens and returns them. The users anonymously reconnect, unblind their output addresses, and return them to the server. The server can see that all the outputs were signed by it and so all the outputs had to come from valid participants. Later people reconnect and sign.
Similar things can be accomplished with various zero-knowledge proof systems.
Does the totally private version need to have a server at all? What if it gets shut down?No. The same privacy can be achieved in a decentralized manner where all users act as blind-signing servers. This ends up needing n^2 signatures, and distributed systems are generally a lot harder to create. I don't know if there is, or ever would be, a reason to bother with a fully distributed version with full privacy, but it's certainly possible.
What about DOS attacks? Can't someone refuse to sign even if the transaction is valid?Yes, this can be DOS attacked in two different ways: someone can refuse to sign a valid joint transaction, or someone can spend their input out from under the joint transaction before it completes.
However, if all the signatures don't come in within some time limit, or a conflicting transaction is created, you can simply leave the bad parties and try again. With an automated process any retries would be invisible to the user. So the only real risk is a persistent DOS attacker.
In the non-decentralized (or decentralized but non-private to participants) case, gaining some immunity to DOS attackers is easy: if someone fails to sign for an input, you blacklist that input from further rounds. They are then naturally rate-limited by their ability to create more confirmed Bitcoin transactions.
Gaining DOS immunity in a decentralized system is considerably harder, because it's hard to tell which user actually broke the rules. One solution is to have users perform their activity under a zero-knowledge proof system, so you could be confident which user is the cheater and then agree to ignore them.
In all cases you could supplement anti-DOS mechanisms with proof of work, a fidelity bond, or other scarce resource usage. But I suspect that it's better to adapt to actual attacks as they arise, as we don't have to commit to a single security mechanism in advance and for all users. I also believe that bad input exclusion provides enough protection to get started.
Isn't the anonymity set size limited by how many parties you can get in a single transaction?Not quite. The anonymity set size of a single transaction is limited by the number of parties in it, obviously. And transaction size limits as well as failure (retry) risk mean that really huge joint transactions would not be wise. But because these transactions are cheap, there is no limit to the number of transactions you can cascade.
In particular, if you have can build transactions with m participants per transaction you can create a sequence of m*3 transactions which form a three-stage
switching network that permits any of m^2 final outputs to have come from any of m^2 original inputs (e.g. using three stages of 32 transactions with 32 inputs each 1024 users can be joined with a total of 96 transactions). This allows the anonymity set to be any size, limited only by participation.
In practice I expect most users only want to prevent nosy friends (and thieves) from prying into their financial lives, and to recover some of the privacy they lost due to bad practices like address reuse. These users will likely be happy with only a single pass; other people will just operate opportunistically, while others may work to achieve many passes and big anonymity sets. All can coexist.
How does this compare to zerocoin?As a crypto and computer science geek I'm super excited by Zerocoin: the technology behind it is fascinating and important. But as a Bitcoin user and developer the promotion of it as
the solution to improved privacy disappoints me.
Zerocoin has a number of serious limitations:
- It uses cutting-edge cryptography which may turn out to be insecure, and which is understood by relatively few people (compared to ECDSA, for example).
- It produces large (20kbyte) signatures that would bloat the blockchain (or create risk if stuffed in external storage).
- It requires a trusted party to initiate its accumulator. If that party cheats, they can steal coin. (Perhaps fixable with more cutting-edge crypto.)
- Validation is very slow (can process about 2tx per second on a fast CPU), which is a major barrier to deployment in Bitcoin as each full node must validate every transaction.
- The large transactions and slow validation also means costly transactions, which will reduce the anonymity set size and potentially make ZC usage unavailable to random members of the public who are merely casually concerned about their privacy.
- Uses an accumulator which grows forever and has no pruning. In practice this means we'd need to switch accumulators periodically to reduce the working set size, reducing the anonymity set size. And potentially creating big UTXO bloat problems if the horizon on an accumulator isn't set in advance.
Some of these things may improve significantly with better math and software engineering over time.
But above all:
Zerocoin requires a soft-forking change to the Bitcoin protocol, which all full nodes must adopt, which would commit Bitcoin to a particular version of the Zerocoin protocol. This cannot happen fast—probably not within years, especially considering that there is so much potential for further refinement to the algorithm to lower costs. It would be politically contentious, as some developers and Bitcoin businesses are very concerned about being overly associated with "anonymity". Network-wide rule changes are something of a suicide pact: we shouldn't, and don't, take them lightly.
CoinJoin transactions work today, and they've worked since the first day of Bitcoin. They are indistinguishable from normal transactions and thus cannot be blocked or inhibited except to the extent that any other Bitcoin transaction could be blocked.
(As an aside: ZC could potentially be used externally to Bitcoin in a decentralized CoinJoin as a method of mutually blinding the users in a DOS attack resistant way. This would allow ZC to mature under live fire without taking its costs or committing to a specific protocol network-wide.)
The primary argument I can make for ZC over CoinJoin, beyond it stoking my crypto-geek desires, is that it may potentially offer a larger anonymity set. But with the performance and scaling limits of ZC, and the possibility to construct sorting network transactions with CJ, or just the ability to use hundreds of CJ transactions with the storage and processing required for one ZC transactions, I don't know which would actually produce bigger anonymity sets in practice. E.g. To join 1024 users, just the ZC redemptions would involve 20k * 1024 bytes of data compared to less than 3% of that for a complete three-stage cascade of 32 32-way joint transactions. Though the ZC anonymity set could more easily cross larger spans of time.
The anonymity sets of CoinJoin transactions could easily be big enough for common users to regain some of their casual privacy and that's what I think is most interesting.
How does this compare to CoinWitness?CoinWitness is even rocket-sciency than Zerocoin, it also shares many of the weaknesses as a privacy-improver: Novel crypto, computational cost, and the huge point of requiring a soft fork and not being available today. It may have some scaling advantages if it is used as more than just a privacy tool. But it really is overkill for this problem, and won't be available anytime real soon.
Sounds great! Where is it?Theres the rub: There exist no ready made, easy-to-use software for doing this. You can make the transactions by hand using bitcoin-qt and the raw transactions API, as we did in that "taint rich" thread, but to make this into a practical reality we need easy-to-use automated tools.
Luke has written up some
sketches a protocol which would enable establishing joint transactions over the regular Bitcoin network.
The Bitcoin-qt RPC system provides everything someone needs to write a side-car applet (including the ability to lock txouts to prevent them from being spent out from from under it) that participants in such a system. But the fact that so many users use centralized webwallets today which can spy on them will ultimately limit the userbase for these tools.
Personally, most of my coding brain capacity is spent on other things which are even more important to me. And what I could spare on Bitcoin is spent on more core and security things— if I work on anything wallet related anytime soon it will likely be improving the privacy behavior of coin selection... But moreover:
Anyone who builds this is going to be accused of enabling criminal activity, it doesn't matter if any actual criminals use this or not: Criminal activity sells headlines. Being a Bitcoin core developer already fills my quota for accusations of this kind, especially my quota for risk that I'm not even paid for.
In reality, real criminals don't need CoinJoin if they have even the slightest clue: They can afford to
buy privacy in a way that regular users cannot, it's just a cost of their (often lucrative) business.
Joe-criminal can go out and buy 120% PPS mining to get brand new coins, or run his money through a series of semi-sham high cashflow gambling businesses for a 50% cut, they can afford the cost of seeking out and interfacing with these seedy services... Joe and Jane doe? Their names are up in neon on blockchain.info. It might not seem great to them, but if there a high cost of fixing it they simply won't, because the cost of fixing it is very concrete and the cost or privacy loss is speculative and distant. They might just need to give up bitcoin and switch to something almost totally private: cash... Regular users need efficient and inexpensive privacy if it is to help them at all.
I know that making such a tool doesn't fit into the get-rich-quick mold of many Bitcoin businesses, but the importance is self-apparent and the simplest versions of this don't require very deep technical wizardry. I think the "political" risk of improving people's privacy is a real one that you should carefully consider, but around these parts I see people sticking their names on some rather outrageously risky stuff. I'd hoped the "taint rich" thread would be enough to inspire some community action, but perhaps this will be.
So, instead, I ask
you: Where is it?