Meni Rosenfeld (OP)
Donator
Legendary
Offline
Activity: 2058
Merit: 1054
|
|
November 27, 2012, 06:04:55 PM |
|
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change? How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used? We can assume that no custom anonymization techniques are used. Among other things it means that there will be at most 1 change output. Looking at some transactions I've made, it looks like Bitcoin-qt always puts the change first, and blockchain.info MyWallet always puts the change last. Do they consistently do so? What about other clients? Some suggested heuristics: 1. In sendmany, if the outputs have simple integer ratios between them, the transaction is likely to be some proportional distribution, and the one output which isn't an integer multiple of the GCD is the change. Example: In this transaction, all outputs are a multiple of 0.01648442, except for the change. 2. Change outputs are likely to have more decimal places (or be otherwise more complicated). Example: In this transaction, 114 is the real transfer, while 7.527 is the difference which has 3 decimal places since that's what the input had. 3. (Assuming no anonymity-targeted coin selection): The change must be lower in value than all inputs (otherwise that input would not have been selected). This is especially useful for transactions where multiple coins are combined. 4. Transactions involving addresses known to belong to some service can have their own rules to be deduced on a per-service basis. Any other suggestions?
|
|
|
|
Stephen Gornick
Legendary
Offline
Activity: 2506
Merit: 1010
|
|
November 27, 2012, 08:49:21 PM |
|
Any other suggestions?
The Bitcoin-Qt client will pull a new address from the key pool for change. Thus any amount sent to an address that already has been used in a transaction is unlikely to be change. Of course, with other clients, particularly Blockchain.info, this isn't necessarily the case. There are some clues from anecdotes as well. For over-the-counter trades, quite often upon receipt of funds the recipient will turn around and spend those funds that were received as soon as there has been a confirmation. This alone doesn't help much but because the coins chosen by Bitcoin-Qt client will first try to use coins for a transaction where all coins selected already have at least six confirmations, then if a transaction spends a coin with fewer than six confirmations it was only done so because there were insufficient funds in the wallet of coins with six or more confirmations. So take a transaction A with outputs X and Y. Then another transaction, (transaction B), spends X after 1 confirmation. More often than not, it would seem, the argument could be made that looking at both transactions together, that Y was the change in transaction A. The odds are much less likely to be that X was the change but that change was then used in another transaction from the same wallet shortly thereafter. It might even be likely that in transaction B, if there is more than one output that the smaller output is the change. A person that spends funds shortly after receiving them will generally be spending all of the funds received or at least a large chunk of it.
|
|
|
|
jgarzik
Legendary
Offline
Activity: 1596
Merit: 1100
|
|
November 27, 2012, 08:51:40 PM |
|
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?
It is intentionally randomized to make it harder to guess
|
Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own. Visit bloq.com / metronome.io Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
|
|
|
kjj
Legendary
Offline
Activity: 1302
Merit: 1026
|
|
November 27, 2012, 09:18:34 PM |
|
3. (Assuming no anonymity-targeted coin selection): The change must be lower in value than all inputs (otherwise that input would not have been selected). This is especially useful for transactions where multiple coins are combined.
This is a decent assumption most of the time, but once every few weeks, someone pops on IRC complaining that the coin selector overselected. Most recent was, if I recall, something like 0.5 + 0.2 + 0.3 + 0.01 -> 1.0 + 0.01 On the topic of tracking in general, I would just say, "Welcome to the arms race". The raw transaction API lets crazy people like me do silly things like create all transactions with 2 to 5 outputs of nearly equal size. The next step will be a public mix-sender.
|
17Np17BSrpnHCZ2pgtiMNnhjnsWJ2TMqq8 I routinely ignore posters with paid advertising in their sigs. You should too.
|
|
|
etotheipi
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
November 27, 2012, 10:35:32 PM Last edit: November 27, 2012, 10:52:50 PM by etotheipi |
|
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?
How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?
I spent a lot of time implementing coin selection in Armory. A few of your assumptions will be incorrect if the tx was created with Armory. Here's how Armory creates transactions: (1) Coin-selection-evaluation method. This is a method that scores a coin-selection based on various factors: How many unique addresses are linked on the input side, the size of the transaction, the required fee, and... the output anonymity of the transaction (explained below). All factors are scored between 0 (unfavorable) and 1 (favorable). (2) I generate a dozen deterministic coin selection methods that accommodate different, specific wallet structures. Those are added to the list. Then I generate some semi-random solutions (deterministic solutions like above, but with some perturbation). (3) Score all the solutions in part (2) with the eval method in part (1). Default weights are hard-coded into armoryengine.py, but the user could modify them. It defines how important each factor is: most important is not using zero-conf. Next is whether the transaction can be free (not unrelated to zero-conf). Then number of input addresses linked. Then tx priority, tx size, and output anonymity. However, the user is free to make output anonymity most important (I do! but only for fun, not because I actually care ). (4) A new address is retrieved from the wallet, a change TxOut is added to the recipient list, then the recipient list is shuffled. Transaction is signed and broadcast. The parts you care about: (1) If the transaction is small relative to the wallet balance, it a couple of the deterministic solutions are approx 2x the output value and 3x, so that change and recip are approximately the same or at least same order of magnitude. It may help clean up dust, it may help anonymity (but no guarantee for either). But when the tx amount is small relative to the wallet balance, you will frequently get equal or reversed size outputs, in random order. This violates your assumption that the change address is always smaller. Even if Armory didn't do it, a wallet with one very large TxOut, must have a large change output if you want to send only a small amount to someone. So even without Armory's technique, you'll see that assumption violated with any client. (2) Recipients are always shuffled, though only with python RNG (don't need cryptographic RNG for this ) (3) The fun part of Armory's "output anonymity" scoring is that it actually takes into account the number of significant digits in both change output and recipient. Even though (recip, change) = (12.4, 11.8320385) are the same order of magnitude, it's pretty obvious which one is change which is output. As such, I actually count the number of significant digits, and give the selection a higher score if it's closer, like (12.4, 11.83). It gets an extra bonus -- a score actually above 1.0 -- if it finds a solution in which the change address has less sigfigs than the recip: (12.4, 10.0). i.e. deceptively-sized outputs is favored. How often does (3) trigger? I don't know. I've tested the evaluation code, though, and it definitely scores those solutions higher, but I don't know how often such deceptive solutions are ever found/included. But if it ever happens, then your assumption about sig figs would be violated, as well.
|
|
|
|
keystroke
|
|
November 29, 2012, 04:51:50 AM |
|
Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?
How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?
I spent a lot of time implementing coin selection in Armory. A few of your assumptions will be incorrect if the tx was created with Armory. Here's how Armory creates transactions: (1) Coin-selection-evaluation method. This is a method that scores a coin-selection based on various factors: How many unique addresses are linked on the input side, the size of the transaction, the required fee, and... the output anonymity of the transaction (explained below). All factors are scored between 0 (unfavorable) and 1 (favorable). (2) I generate a dozen deterministic coin selection methods that accommodate different, specific wallet structures. Those are added to the list. Then I generate some semi-random solutions (deterministic solutions like above, but with some perturbation). (3) Score all the solutions in part (2) with the eval method in part (1). Default weights are hard-coded into armoryengine.py, but the user could modify them. It defines how important each factor is: most important is not using zero-conf. Next is whether the transaction can be free (not unrelated to zero-conf). Then number of input addresses linked. Then tx priority, tx size, and output anonymity. However, the user is free to make output anonymity most important (I do! but only for fun, not because I actually care ). (4) A new address is retrieved from the wallet, a change TxOut is added to the recipient list, then the recipient list is shuffled. Transaction is signed and broadcast. The parts you care about: (1) If the transaction is small relative to the wallet balance, it a couple of the deterministic solutions are approx 2x the output value and 3x, so that change and recip are approximately the same or at least same order of magnitude. It may help clean up dust, it may help anonymity (but no guarantee for either). But when the tx amount is small relative to the wallet balance, you will frequently get equal or reversed size outputs, in random order. This violates your assumption that the change address is always smaller. Even if Armory didn't do it, a wallet with one very large TxOut, must have a large change output if you want to send only a small amount to someone. So even without Armory's technique, you'll see that assumption violated with any client. (2) Recipients are always shuffled, though only with python RNG (don't need cryptographic RNG for this ) (3) The fun part of Armory's "output anonymity" scoring is that it actually takes into account the number of significant digits in both change output and recipient. Even though (recip, change) = (12.4, 11.8320385) are the same order of magnitude, it's pretty obvious which one is change which is output. As such, I actually count the number of significant digits, and give the selection a higher score if it's closer, like (12.4, 11.83). It gets an extra bonus -- a score actually above 1.0 -- if it finds a solution in which the change address has less sigfigs than the recip: (12.4, 10.0). i.e. deceptively-sized outputs is favored. How often does (3) trigger? I don't know. I've tested the evaluation code, though, and it definitely scores those solutions higher, but I don't know how often such deceptive solutions are ever found/included. But if it ever happens, then your assumption about sig figs would be violated, as well. Awesome.
|
"The difference between a castle and a prison is only a question of who holds the keys."
|
|
|
etotheipi
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
December 02, 2012, 05:05:56 AM |
|
By the way, I just implemented coin-control in Armory (not released yet, but soon). And I discovered a good use for it: cleaning up BTC crumbles in my wallet. For instance, I just select all addresses that have balances with more than 2 decimal places, then I create a transaction sending their combined balance: the integer part goes back to me, the non-integer part gets donated to another project. Then my wallet is a ton cleaner (its balance is 28.15 instead of 28.37302811). I just started doing this with each of my wallets and it made me think of this thread:
(1) The recipient of my transaction is being sent an output with 8 decimal places, the change has zero decimal places -- the exact opposite of what you would expect of a normal transaction (2) Future transactions will be much cleaner in terms of decimal places, because when I pay someone 10.1 BTC, my change is 18.05 BTC, which is fairly indistinguishable. (3) Not that it matters to determining change, but I think it's good for the network (albeit, very mildly) to be sweeping up the crumbs like that.
I suspect, when I do release coin control, there will be other users doing the same thing, further complicating anyone's efforts to determine what is the correct change address.
|
|
|
|
casascius
Mike Caldwell
VIP
Legendary
Offline
Activity: 1386
Merit: 1140
The Casascius 1oz 10BTC Silver Round (w/ Gold B)
|
|
December 02, 2012, 05:14:33 AM |
|
For instance, I just select all addresses that have balances with more than 2 decimal places, then I create a transaction sending their combined balance: the integer part goes back to me, the non-integer part gets donated to another project.
Funny I'm not the only one who does this. I normally work mostly with whole bitcoins (since my lowest physical coin is 1 BTC anyway), and then I use coin crumbles to make paper banknote wallets worth 0.1 BTC and carry them around to leave in tip jars etc. Anyway, one useful (but less useful in a deterministic sense) heuristic when following the blockchain is to try and correlate behavior as well. If I notice consistent behavior following a chain one way, it's probably the same person. This is what led to me providing what was later determined to be a critical clue as to Goat's lost 400 BTC a while back: I notice that if I followed the chain a certain way I could see a pattern of a sender who frequently sent bitcoins in amounts like 3.995 1.995 8.995 which led me to suggest that it was a service that charged a 0.005 BTC transaction fee and took it out of the transaction amount.
|
Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable. I never believe them. If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins. I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion. Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice. Don't keep coins online. Use paper or hardware wallets instead.
|
|
|
Hal
VIP
Sr. Member
Offline
Activity: 314
Merit: 4276
|
|
December 14, 2012, 01:43:57 AM |
|
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw:
// Insert change txn at random position: vector<CTxOut>::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size()); wtxNew.vout.insert(position, CTxOut(nChange, scriptChange));
The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.
I think it should be size()+1.
|
Hal Finney
|
|
|
MoonShadow
Legendary
Offline
Activity: 1708
Merit: 1010
|
|
December 14, 2012, 02:00:06 AM |
|
I wonder if having a choice in the automatic selection methods would be beneficial as well. Consider this scenario.
Say I have a full desktop client, as well as a thin hardware client (think bitcoincard) that syncs with the desktop client on a regular basis. So I want my desktop client to automaticly create simple transactions that the thin hardware client can use easily, and preferablely in combinations that don't have a change address at all. (Thin hardware clients most likely would have only one, or only a few, addresses to work with, change addresses would be trivial to identify if they go back to the sending address)
So what I would want my desktop client to do is to regularly produce transactions wherein I have two change transactions; one for the thin client, in the useful denominations, and one with the true change back to the desktop client. One time, the thin client might get a .01 transaction, another time it might get .02, and another .05; up through .1|.2|.5|1|2|5 so that it would have a selction of transactions that would permit it to combine a spending transaction to any exact amount, or almost any exact amount. A full client could also use this same technique to, occasionally, produce a changeless transaction as well. I'm not sure if this would be useful or not, but I would guess that it might complicate 'taint' tracking systems.
|
"The powers of financial capitalism had another far-reaching aim, nothing less than to create a world system of financial control in private hands able to dominate the political system of each country and the economy of the world as a whole. This system was to be controlled in a feudalist fashion by the central banks of the world acting in concert, by secret agreements arrived at in frequent meetings and conferences. The apex of the systems was to be the Bank for International Settlements in Basel, Switzerland, a private bank owned and controlled by the world's central banks which were themselves private corporations. Each central bank...sought to dominate its government by its ability to control Treasury loans, to manipulate foreign exchanges, to influence the level of economic activity in the country, and to influence cooperative politicians by subsequent economic rewards in the business world."
- Carroll Quigley, CFR member, mentor to Bill Clinton, from 'Tragedy And Hope'
|
|
|
Gavin Andresen
Legendary
Offline
Activity: 1652
Merit: 2311
Chief Scientist
|
|
December 14, 2012, 02:30:16 AM |
|
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years? I introduced that bug with the 'sendmany' command two years ago (commit b9d1ed85). This is why programmers should not be trusted to test their own code (I probably carefully tested to make sure the change position looked random when I send to more than one destination, and never tested the degenerate send-to-one case; sigh).
|
How often do you get the chance to work on a potentially world-changing project?
|
|
|
keystroke
|
|
December 14, 2012, 07:26:04 AM |
|
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years? I introduced that bug with the 'sendmany' command two years ago (commit b9d1ed85). This is why programmers should not be trusted to test their own code (I probably carefully tested to make sure the change position looked random when I send to more than one destination, and never tested the degenerate send-to-one case; sigh). Oh wow, so the change was always the first output if you send bitcoins to 1 address? Awesome find. But yea.. how did we miss it.
|
"The difference between a castle and a prison is only a question of who holds the keys."
|
|
|
BkkCoins
|
|
December 14, 2012, 08:45:00 AM |
|
The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.
I think it should be size()+1.
Even with many outputs it would always not be the last one. Would it be useful as well to randomly split change into more than one address when above some value? To further complicate predictability.
|
|
|
|
Meni Rosenfeld (OP)
Donator
Legendary
Offline
Activity: 2058
Merit: 1054
|
|
December 14, 2012, 08:52:12 AM Last edit: December 14, 2012, 12:08:04 PM by Meni Rosenfeld |
|
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years? Maybe because all this time nobody made an analysis of the Bitcoin transaction graph which prompted an investigation into ways to determine common ownership of addresses . So you can thank Adi and Dorit for that. Also: http://dilbert.com/strips/comic/2001-10-25/. Many of us have looked at transactions in Blockexplorer, assuming change should be in a random location, but without paying special attention to this, there's hardly any smoking gun saying "This isn't random".
|
|
|
|
|
molecular
Donator
Legendary
Offline
Activity: 2772
Merit: 1019
|
|
December 14, 2012, 11:44:52 AM |
|
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw:
// Insert change txn at random position: vector<CTxOut>::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size()); wtxNew.vout.insert(position, CTxOut(nChange, scriptChange));
The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.
I think it should be size()+1.
Holy moly! That means we can get a much better approximation of transaction volume
|
PGP key molecular F9B70769 fingerprint 9CDD C0D3 20F8 279F 6BE0 3F39 FC49 2362 F9B7 0769
|
|
|
gmaxwell
Moderator
Legendary
Offline
Activity: 4298
Merit: 8818
|
|
December 14, 2012, 11:51:26 AM |
|
Maybe because all this time nobody made an analysis of the Bitcoin transaction graph which prompted an investigation into ways to determine common ownership of addresses . So you can thank Adi and Dorit for that. More like: the fact that their analysis didn't turn that up further confirms how shallow and ineffective the research was. If only there were real research happening it might have been discovered more quickly! (Of course, many other people did do things with the transaction graph without noticing this too— see for example blockchain.info's taint tracking service. So they were hardly alone in not noticing it)
|
|
|
|
Sergio_Demian_Lerner
|
|
December 14, 2012, 01:35:58 PM |
|
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw:
// Insert change txn at random position: vector<CTxOut>::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size()); wtxNew.vout.insert(position, CTxOut(nChange, scriptChange));
The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.
I think it should be size()+1.
Excellent catch Hal!!!!. It's clear that is very important that programmers/researchers of the community, apart from the core developers, review the code. Some time ago I added to the Weaknesses page of the Bitcoin wiki a section "Security Vulnerabilities and bugs" ( https://en.bitcoin.it/wiki/Weaknesses#Security_Vulnerabilities_and_bugs) regarding the impact of bugs. I will update to reflect this one.
|
|
|
|
SgtSpike
Legendary
Offline
Activity: 1400
Merit: 1005
|
|
December 14, 2012, 04:34:02 PM |
|
When will we have a fix for this?
|
|
|
|
Rassah
Legendary
Offline
Activity: 1680
Merit: 1035
|
|
December 14, 2012, 06:31:24 PM |
|
bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years? Because we're still in Beta
|
|
|
|
|