Hi, I understand a little on how a determinist bitcoin wallet works. As far as I know, the master key allows to generate an infinity of different addresses, but these addresses cannot be linked back to a given wallet (= derivation principle).
From reading some press articles (particularly about companies specializing in on-chain data analysis), I understand that some are able to link addresses together and know, for example, that address x and address y come from the same wallet (i.e. from the same seed).
Is this possible? How could it be done?
Thanks
The most common tactic used by companies specializing in on chain data analysis is known as the "Common input ownership heuristic" which is described by Satoshi in the "Privacy" section of the whitepaper:
As an additional firewall, a new key pair should be used for each transaction to keep them from being linked to a common owner. Some linking is still unavoidable with multi-input transactions, which necessarily reveal that their inputs were owned by the same owner. The risk is that if the owner of a key is revealed, linking could reveal other transactions that belonged to the same owner.
When viewing a transaction with multiple inputs, a generally accurate assumption is that all those inputs belong to the same wallet, even if those inputs come from different addresses. In order to break this tracking tactic, you use a specially designed wallet to group your inputs in the same transaction with other users, called a
coinjoin.Question: when address 1 (which holds for example 1 BTC) of a wallet A sends 0.8 BTC to address 1 of a wallet B, do the remaining 0.2 remain on address 1 of A or are they assigned to a new virgin address (let's call it 2) on wallet A?
Yes, in a properly designed wallet, the remaining 0.2 BTC change will be sent to a new virgin address in wallet A to increase privacy. However, privacy is rarely gained by the change output in practice because its spending conditions will match the spending conditions of the inputs that created it. So, the change can often be distinguished from the recipient's output since wallet B's software will probably not match the exact same script type, lock time, version number, fee rate construction, and other fingerprints as the sender's wallet.