Bitcoin Forum
April 18, 2014, 12:15:12 AM *
News: Due to the OpenSSL heartbleed bug, changing your forum password is recommended.
 
   Home   Help Search Donate Login Register  
Pages: [1] 2  All
  Print  
Author Topic: How to detect the change output?  (Read 3986 times)
Meni Rosenfeld
Donator
Hero Member
*
expert
Offline Offline

Activity: 1148



View Profile WWW

Ignore
November 27, 2012, 06:04:55 PM
 #1

Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?

We can assume that no custom anonymization techniques are used. Among other things it means that there will be at most 1 change output.

Looking at some transactions I've made, it looks like Bitcoin-qt always puts the change first, and blockchain.info MyWallet always puts the change last. Do they consistently do so? What about other clients?

Some suggested heuristics:

1. In sendmany, if the outputs have simple integer ratios between them, the transaction is likely to be some proportional distribution, and the one output which isn't an integer multiple of the GCD is the change. Example: In this transaction, all outputs are a multiple of 0.01648442, except for the change.

2. Change outputs are likely to have more decimal places (or be otherwise more complicated). Example: In this transaction, 114 is the real transfer, while 7.527 is the difference which has 3 decimal places since that's what the input had.

3. (Assuming no anonymity-targeted coin selection): The change must be lower in value than all inputs (otherwise that input would not have been selected). This is especially useful for transactions where multiple coins are combined.

4. Transactions involving addresses known to belong to some service can have their own rules to be deduced on a per-service basis.

Any other suggestions?

1EofoZNBhWQ3kxfKnvWkhtMns4AivZArhr   |   Who am I?   |   bitcoin-otc WoT
Bitcoil - Exchange bitcoins for ILS (thread)   |   Israel Bitcoin community homepage (thread)
Analysis of Bitcoin Pooled Mining Reward Systems (thread, summary)  |   PureMining - Infinite-term, deterministic mining bond
1397780112
Hero Member
*
Offline Offline

Posts: 1397780112

View Profile Personal Message (Offline)

Ignore
1397780112
Reply with quote  #2

1397780112
Report to moderator
1397780112
Hero Member
*
Offline Offline

Posts: 1397780112

View Profile Personal Message (Offline)

Ignore
1397780112
Reply with quote  #2

1397780112
Report to moderator
1397780112
Hero Member
*
Offline Offline

Posts: 1397780112

View Profile Personal Message (Offline)

Ignore
1397780112
Reply with quote  #2

1397780112
Report to moderator
Creating a Bitcoin client that fully implements the network protocol is extremely difficult. Bitcoin-Qt is the only known safe implementation of a full node. Some other projects attempt to compete, but it is not recommended to use such software for anything serious. (Lightweight clients like Electrum and MultiBit are OK.)
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction. Advertise here.
1397780112
Hero Member
*
Offline Offline

Posts: 1397780112

View Profile Personal Message (Offline)

Ignore
1397780112
Reply with quote  #2

1397780112
Report to moderator
Stephen Gornick
Hero Member
*****
Offline Offline

Activity: 1232



View Profile WWW

Ignore
November 27, 2012, 08:49:21 PM
 #2

Any other suggestions?

The Bitcoin-Qt client will pull a new address from the key pool for change.  Thus any amount sent to an address that already has been used in a transaction is unlikely to be change.   Of course, with other clients, particularly Blockchain.info, this isn't necessarily the case.

There are some clues from anecdotes as well.  For over-the-counter trades, quite often upon receipt of funds the recipient will turn around and spend those funds that were received as soon as there has been a confirmation.

This alone doesn't help much but because the coins chosen by Bitcoin-Qt client will first try to use coins for a transaction where all coins selected already have at least six confirmations, then if a transaction spends a coin with fewer than six confirmations it was only done so because there were insufficient funds in the wallet of coins with six or more confirmations.

So take a transaction A with outputs X and Y.
Then another transaction, (transaction B), spends X after 1 confirmation.   More often than not, it would seem, the argument could be made that looking at both transactions together, that Y was the change in transaction A.   The odds are much less likely to be that X was the change but that change was then used in another transaction from the same wallet shortly thereafter.   It might even be likely that in transaction B, if there is more than one output that the smaller output is the change.  A person that spends funds shortly after receiving them will generally be spending all of the funds received or at least a large chunk of it.

jgarzik
Staff
Hero Member
*****
qt
Offline Offline

Activity: 1260


View Profile

Ignore
November 27, 2012, 08:51:40 PM
 #3

Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

It is intentionally randomized to make it harder to guess Smiley


Jeff Garzik, bitcoin core dev team and BitPay engineer; opinions are my own, not my employer.
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
kjj
Hero Member
*****
Offline Offline

Activity: 1022


Bitcoin Foundation - Lifetime Member


View Profile

Ignore
November 27, 2012, 09:18:34 PM
 #4

3. (Assuming no anonymity-targeted coin selection): The change must be lower in value than all inputs (otherwise that input would not have been selected). This is especially useful for transactions where multiple coins are combined.

This is a decent assumption most of the time, but once every few weeks, someone pops on IRC complaining that the coin selector overselected.  Most recent was, if I recall, something like 0.5 + 0.2 + 0.3 + 0.01 -> 1.0 + 0.01

On the topic of tracking in general, I would just say, "Welcome to the arms race".  The raw transaction API lets crazy people like me do silly things like create all transactions with 2 to 5 outputs of nearly equal size.  The next step will be a public mix-sender.

p2pcoin: a USB/CD/PXE p2pool miner - 1N8ZXx2cuMzqBYSK72X4DAy1UdDbZQNPLf - todo
I routinely ignore posters with paid advertising in their sigs.  You should too.
etotheipi
Hero Member
*****
expert
Offline Offline

Activity: 1036


Core Armory Developer


View Profile WWW

Ignore
November 27, 2012, 10:35:32 PM
 #5

Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?


I spent a lot of time implementing coin selection in Armory.  A few of your assumptions will be incorrect if the tx was created with Armory.  Here's how Armory creates transactions:

(1) Coin-selection-evaluation method.   This is a method that scores a coin-selection based on various factors:  How many unique addresses are linked on the input side, the size of the transaction, the required fee, and... the output anonymity of the transaction (explained below).  All factors are scored between 0 (unfavorable) and 1 (favorable).

(2) I generate a dozen deterministic coin selection methods that accommodate different, specific wallet structures.  Those are added to the list. Then I generate some semi-random solutions (deterministic solutions like above, but with some perturbation).  

(3) Score all the solutions in part (2) with the eval method in part (1).  Default weights are hard-coded into armoryengine.py, but the user could modify them.  It defines how important each factor is:  most important is not using zero-conf.  Next is whether the transaction can be free (not unrelated to zero-conf).  Then number of input addresses linked.  Then tx priority, tx size, and output anonymity.  However, the user is free to make output anonymity most important (I do!  but only for fun, not because I actually care Smiley).

(4) A new address is retrieved from the wallet, a change TxOut is added to the recipient list, then the recipient list is shuffled.  Transaction is signed and broadcast.

The parts you care about:  

(1)  If the transaction is small relative to the wallet balance, it a couple of the deterministic solutions are approx 2x the output value and 3x, so that change and recip are approximately the same or at least same order of magnitude.  It may help clean up dust, it may help anonymity (but no guarantee for either).  But when the tx amount is small relative to the wallet balance, you will frequently get equal or reversed size outputs, in random order.  This violates your assumption that the change address is always smaller.  Even if Armory didn't do it, a wallet with one very large TxOut, must have a large change output if you want to send only a small amount to someone.  So even without Armory's technique, you'll see that assumption violated with any client.

(2) Recipients are always shuffled, though only with python RNG (don't need cryptographic RNG for this Smiley)

(3) The fun part of Armory's "output anonymity" scoring is that it actually takes into account the number of significant digits in both change output and recipient.  Even though (recip, change) = (12.4, 11.8320385) are the same order of magnitude, it's pretty obvious which one is change which is output.  As such, I actually count the number of significant digits, and give the selection a higher score if it's closer, like (12.4, 11.83).  It gets an extra bonus -- a score actually above 1.0 -- if it finds a solution in which the change address has less sigfigs than the recip:  (12.4, 10.0).  i.e. deceptively-sized outputs is favored.  

How often does (3) trigger?  I don't know.  I've tested the evaluation code, though, and it definitely scores those solutions higher, but I don't know how often such deceptive solutions are ever found/included.  But if it ever happens, then your assumption about sig figs would be violated, as well.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
keystroke
Hero Member
*****
Offline Offline

Activity: 499



View Profile

Ignore
November 29, 2012, 04:51:50 AM
 #6

Given a transaction, what methods exist to detect which of the outputs is most likely to be the change?

How do different clients choose the order of the outputs? Assuming no order information can be used, what kind of other heuristics can be used?


I spent a lot of time implementing coin selection in Armory.  A few of your assumptions will be incorrect if the tx was created with Armory.  Here's how Armory creates transactions:

(1) Coin-selection-evaluation method.   This is a method that scores a coin-selection based on various factors:  How many unique addresses are linked on the input side, the size of the transaction, the required fee, and... the output anonymity of the transaction (explained below).  All factors are scored between 0 (unfavorable) and 1 (favorable).

(2) I generate a dozen deterministic coin selection methods that accommodate different, specific wallet structures.  Those are added to the list. Then I generate some semi-random solutions (deterministic solutions like above, but with some perturbation).  

(3) Score all the solutions in part (2) with the eval method in part (1).  Default weights are hard-coded into armoryengine.py, but the user could modify them.  It defines how important each factor is:  most important is not using zero-conf.  Next is whether the transaction can be free (not unrelated to zero-conf).  Then number of input addresses linked.  Then tx priority, tx size, and output anonymity.  However, the user is free to make output anonymity most important (I do!  but only for fun, not because I actually care Smiley).

(4) A new address is retrieved from the wallet, a change TxOut is added to the recipient list, then the recipient list is shuffled.  Transaction is signed and broadcast.

The parts you care about:  

(1)  If the transaction is small relative to the wallet balance, it a couple of the deterministic solutions are approx 2x the output value and 3x, so that change and recip are approximately the same or at least same order of magnitude.  It may help clean up dust, it may help anonymity (but no guarantee for either).  But when the tx amount is small relative to the wallet balance, you will frequently get equal or reversed size outputs, in random order.  This violates your assumption that the change address is always smaller.  Even if Armory didn't do it, a wallet with one very large TxOut, must have a large change output if you want to send only a small amount to someone.  So even without Armory's technique, you'll see that assumption violated with any client.

(2) Recipients are always shuffled, though only with python RNG (don't need cryptographic RNG for this Smiley)

(3) The fun part of Armory's "output anonymity" scoring is that it actually takes into account the number of significant digits in both change output and recipient.  Even though (recip, change) = (12.4, 11.8320385) are the same order of magnitude, it's pretty obvious which one is change which is output.  As such, I actually count the number of significant digits, and give the selection a higher score if it's closer, like (12.4, 11.83).  It gets an extra bonus -- a score actually above 1.0 -- if it finds a solution in which the change address has less sigfigs than the recip:  (12.4, 10.0).  i.e. deceptively-sized outputs is favored.  

How often does (3) trigger?  I don't know.  I've tested the evaluation code, though, and it definitely scores those solutions higher, but I don't know how often such deceptive solutions are ever found/included.  But if it ever happens, then your assumption about sig figs would be violated, as well.


Awesome.

We are participating in one of the greatest experiments in history.
etotheipi
Hero Member
*****
expert
Offline Offline

Activity: 1036


Core Armory Developer


View Profile WWW

Ignore
December 02, 2012, 05:05:56 AM
 #7

By the way, I just implemented coin-control in Armory (not released yet, but soon).  And I discovered a good use for it:  cleaning up BTC crumbles in my wallet.  For instance, I just select all addresses that have balances with more than 2 decimal places, then I create a transaction sending their combined balance:  the integer part goes back to me, the non-integer part gets donated to another project.   Then my wallet is a ton cleaner (its balance is 28.15 instead of 28.37302811).  I just started doing this with each of my wallets and it made me think of this thread:

(1) The recipient of my transaction is being sent an output with 8 decimal places, the change has zero decimal places -- the exact opposite of what you would expect of a normal transaction
(2) Future transactions will be much cleaner in terms of decimal places, because when I pay someone 10.1 BTC, my change is 18.05 BTC, which is fairly indistinguishable.  
(3) Not that it matters to determining change, but I think it's good for the network (albeit, very mildly) to be sweeping up the crumbs like that.

I suspect, when I do release coin control, there will be other users doing the same thing, further complicating anyone's efforts to determine what is the correct change address. 

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
casascius
Mike Caldwell
VIP
Hero Member
*
Offline Offline

Activity: 1204


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW

Ignore
December 02, 2012, 05:14:33 AM
 #8

For instance, I just select all addresses that have balances with more than 2 decimal places, then I create a transaction sending their combined balance:  the integer part goes back to me, the non-integer part gets donated to another project.

Funny I'm not the only one who does this.  I normally work mostly with whole bitcoins (since my lowest physical coin is 1 BTC anyway), and then I use coin crumbles to make paper banknote wallets worth 0.1 BTC and carry them around to leave in tip jars etc.

Anyway, one useful (but less useful in a deterministic sense) heuristic when following the blockchain is to try and correlate behavior as well.  If I notice consistent behavior following a chain one way, it's probably the same person.  This is what led to me providing what was later determined to be a critical clue as to Goat's lost 400 BTC a while back: I notice that if I followed the chain a certain way I could see a pattern of a sender who frequently sent bitcoins in amounts like 3.995 1.995 8.995 which led me to suggest that it was a service that charged a 0.005 BTC transaction fee and took it out of the transaction amount.

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper wallets instead.
Hal
VIP
Sr. Member
*
expert
Offline Offline

Activity: 314



View Profile

Ignore
December 14, 2012, 01:43:57 AM
 #9

bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw:

// Insert change txn at random position:
vector<CTxOut>::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size());
wtxNew.vout.insert(position, CTxOut(nChange, scriptChange));

The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.

I think it should be size()+1.

Hal Finney
MoonShadow
Hero Member
*****
Online Online

Activity: 1358



View Profile

Ignore
December 14, 2012, 02:00:06 AM
 #10

I wonder if having a choice in the automatic selection methods would be beneficial as well.  Consider this scenario.

Say I have a full desktop client, as well as a thin hardware client (think bitcoincard) that syncs with the desktop client on a regular basis.  So I want my desktop client to automaticly create simple transactions that the thin hardware client can use easily, and preferablely in combinations that don't have a change address at all.  (Thin hardware clients most likely would have only one, or only a few, addresses to work with, change addresses would be trivial to identify if they go back to the sending address)

So what I would want my desktop client to do is to regularly produce transactions wherein I have two change transactions; one for the thin client, in the useful denominations, and one with the true change back to the desktop client.  One time, the thin client might get a .01 transaction, another time it might get .02, and another .05; up through .1|.2|.5|1|2|5 so that it would have a selction of transactions that would permit it to combine a spending transaction to any exact amount, or almost any exact amount.  A full client could also use this same technique to, occasionally, produce a changeless transaction as well.  I'm not sure if this would be useful or not, but I would guess that it might complicate 'taint' tracking systems.

"The powers of financial capitalism had another far-reaching aim, nothing less than to create a world system of financial control in private hands able to dominate the political system of each country and the economy of the world as a whole. This system was to be controlled in a feudalist fashion by the central banks of the world acting in concert, by secret agreements arrived at in frequent meetings and conferences. The apex of the systems was to be the Bank for International Settlements in Basel, Switzerland, a private bank owned and controlled by the world's central banks which were themselves private corporations. Each central bank...sought to dominate its government by its ability to control Treasury loans, to manipulate foreign exchanges, to influence the level of economic activity in the country, and to influence cooperative politicians by subsequent economic rewards in the business world."

- Carroll Quigley, CFR member, mentor to Bill Clinton, from 'Tragedy And Hope'
Gavin Andresen
Hero Member
*****
qt
Offline Offline

Activity: 1330


Chief Scientist


View Profile WWW

Ignore
December 14, 2012, 02:30:16 AM
 #11

bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years?

I introduced that bug with the 'sendmany' command two years ago (commit b9d1ed85). This is why programmers should not be trusted to test their own code (I probably carefully tested to make sure the change position looked random when I send to more than one destination, and never tested the degenerate send-to-one case; sigh).


Will I see you in Amsterdam?
  http://bitcoin2014.com/
keystroke
Hero Member
*****
Offline Offline

Activity: 499



View Profile

Ignore
December 14, 2012, 07:26:04 AM
 #12

bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years?

I introduced that bug with the 'sendmany' command two years ago (commit b9d1ed85). This is why programmers should not be trusted to test their own code (I probably carefully tested to make sure the change position looked random when I send to more than one destination, and never tested the degenerate send-to-one case; sigh).



Oh wow, so the change was always the first output if you send bitcoins to 1 address? Awesome find. But yea.. how did we miss it.

We are participating in one of the greatest experiments in history.
BkkCoins
Hero Member
*****
Offline Offline

Activity: 784


firstbits:1MinerQ


View Profile WWW

Ignore
December 14, 2012, 08:45:00 AM
 #13

The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.

I think it should be size()+1.
Even with many outputs it would always not be the last one. Would it be useful as well to randomly split change into more than one address when above some value? To further complicate predictability.

Meni Rosenfeld
Donator
Hero Member
*
expert
Offline Offline

Activity: 1148



View Profile WWW

Ignore
December 14, 2012, 08:52:12 AM
 #14

bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years?
Maybe because all this time nobody made an analysis of the Bitcoin transaction graph which prompted an investigation into ways to determine common ownership of addresses Smiley. So you can thank Adi and Dorit for that.

Also: http://dilbert.com/strips/comic/2001-10-25/. Many of us have looked at transactions in Blockexplorer, assuming change should be in a random location, but without paying special attention to this, there's hardly any smoking gun saying "This isn't random".

1EofoZNBhWQ3kxfKnvWkhtMns4AivZArhr   |   Who am I?   |   bitcoin-otc WoT
Bitcoil - Exchange bitcoins for ILS (thread)   |   Israel Bitcoin community homepage (thread)
Analysis of Bitcoin Pooled Mining Reward Systems (thread, summary)  |   PureMining - Infinite-term, deterministic mining bond
elux
Hero Member
*****
Offline Offline

Activity: 868



View Profile

Ignore
December 14, 2012, 11:07:56 AM
 #15

I think Hal's donation wallet address has a flaw. The problem is that balance() returned 0.

I added one bitcoin, but balance()+1 doesn't seem like enough.

Details here:
http://blockchain.info/address/1HALookLvrqrn2kxHhXCnwDgknD3RXifhH
http://blockchain.info/address/157i5gK7iN4bNAN39Ahuoiq6Tx5TaQukTE

I think it should be size()+1.

Seeing this made me really happy.
molecular
Donator
Hero Member
*
Offline Offline

Activity: 1176



View Profile

Ignore
December 14, 2012, 11:44:52 AM
 #16

bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw:

// Insert change txn at random position:
vector<CTxOut>::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size());
wtxNew.vout.insert(position, CTxOut(nChange, scriptChange));

The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.

I think it should be size()+1.

Holy moly! That means we can get a much better approximation of transaction volume Wink

gmaxwell
Moderator
Hero Member
*
qt
Offline Offline

Activity: 1078


View Profile

Ignore
December 14, 2012, 11:51:26 AM
 #17

Maybe because all this time nobody made an analysis of the Bitcoin transaction graph which prompted an investigation into ways to determine common ownership of addresses Smiley. So you can thank Adi and Dorit for that.
More like: the fact that their analysis didn't turn that up further confirms how shallow and ineffective the research was.  If only there were real research happening it might have been discovered more quickly!  (Of course, many other people did do things with the transaction graph without noticing this too— see for example blockchain.info's taint tracking service. So they were hardly alone in not noticing it)
Sergio_Demian_Lerner
Sr. Member
****
expert
Offline Offline

Activity: 467


View Profile WWW

Ignore
December 14, 2012, 01:35:58 PM
 #18

bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw:

// Insert change txn at random position:
vector<CTxOut>::iterator position = wtxNew.vout.begin()+GetRandInt(wtxNew.vout.size());
wtxNew.vout.insert(position, CTxOut(nChange, scriptChange));

The problem is that size() is one in the common case of one payee, so GetRandInt will always return 0.The change ends up in the first output.

I think it should be size()+1.

Excellent catch Hal!!!!.
It's clear that is very important that programmers/researchers of the community, apart from the core developers, review the code.

Some time ago I added to the Weaknesses page of the Bitcoin wiki a section "Security Vulnerabilities and bugs" (https://en.bitcoin.it/wiki/Weaknesses#Security_Vulnerabilities_and_bugs) regarding the impact of bugs. I will update to reflect this one.

SgtSpike
Hero Member
*****
Offline Offline

Activity: 1106


Firstbits: 18tkn


View Profile WWW

Ignore
December 14, 2012, 04:34:02 PM
 #19

When will we have a fix for this?

Rassah
Hero Member
*****
Offline Offline

Activity: 1064


Director of Bitcoin100


View Profile

Ignore
December 14, 2012, 06:31:24 PM
 #20

bitcoin-qt tries to randomize the position of the change output, but I believe the code has a flaw...
Ay carumba, how did we not notice that for over two years?


Because we're still in Beta  Grin

Pages: [1] 2  All
  Print  
 
Jump to:  

Sponsored by , a Bitcoin-accepting VPN.
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!