Bitcoin Forum
April 25, 2024, 05:12:37 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Transaction metadata (do we need an OP_DROP transaction type?)  (Read 3797 times)
Gavin Andresen (OP)
Legendary
*
qt
Offline Offline

Activity: 1652
Merit: 2216


Chief Scientist


View Profile WWW
September 10, 2012, 06:25:20 PM
 #1

Hoisted from the comments on pull request 1809:

Quote
Only data that is strictly necessary for the world to validate a transaction has a place in the block chain. That's the whole point of it. Everything that is only significant to the sender and receiver (or miner) should be between the sender and receiver (or miner) and doesn't need to be stored forever by every other full node in the world.

I think there are definitely use-cases for associating some immutable meta-data with a transaction. Example: a bitcoin client that gave a unique refund address for every outgoing transaction, and automatically groups refund transactions together with the original payment transactions.

Somebody could create a service that associates data with transaction ids, but they need to do more work to make the data immutable... and it is not clear to me how you make that secure.

I really want my refund address to be 'baked in' to the transaction that I sign, so if the transaction is accepted into the block chain I know there hasn't been some hacker somewhere who managed to rewrite the refund address so they get my coins.

If I'm doing some type of smart contract with bitcoin transactions, I want the contract data baked in and covered by the transaction signature. And the person I'm transacting with would like to be sure I can't change the terms of the contract once the transaction is signed.

It seems to me the simplest, most straightforward, and secure way to do that is with a limited-data OP_DROP transaction type. The data in the blockchain is (transaction+HASH(metadata)), and that is what is signed.  The actual metadata can be stored outside the blockchain and looked up (and verified) by hash (hand-wave, hand-wave, I have no idea how that happens, if there is more than one place that stores transaction metadata, etc).

Any scheme that tries to move the HASH(metadata) outside the transaction signature recorded in the blockchain will, at the very least, be more complicated. And, therefore, very likely to be less secure.

Am I missing some other simple, secure, decentralized, non-blockchain scheme for attaching metadata to transactions?

How often do you get the chance to work on a potentially world-changing project?
The Bitcoin network protocol was designed to be extremely flexible. It can be used to create timed transactions, escrow transactions, multi-signature transactions, etc. The current features of the client only hint at what will be possible in the future.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1714065157
Hero Member
*
Offline Offline

Posts: 1714065157

View Profile Personal Message (Offline)

Ignore
1714065157
Reply with quote  #2

1714065157
Report to moderator
1714065157
Hero Member
*
Offline Offline

Posts: 1714065157

View Profile Personal Message (Offline)

Ignore
1714065157
Reply with quote  #2

1714065157
Report to moderator
1714065157
Hero Member
*
Offline Offline

Posts: 1714065157

View Profile Personal Message (Offline)

Ignore
1714065157
Reply with quote  #2

1714065157
Report to moderator
Stefan Thomas
Full Member
***
Offline Offline

Activity: 234
Merit: 100


AKA: Justmoon


View Profile WWW
September 10, 2012, 06:56:35 PM
 #2

Am I missing some other simple, secure, decentralized, non-blockchain scheme for attaching metadata to transactions?

Maybe. Off the top of my head (sorry in advance if I'm missing something obvious):

Currently, the way we handle metadata in Bitcoin is that the metadata is transferred to the recipient, who then replies with a uniquely generated address. Once the sender makes the payment, the metadata can be determined by the address.

In theory, this already allows secure association of arbitrary metadata with a Bitcoin transaction. The problem is that we need to contact the recipient *before* we make the transaction, which doesn't work well for many use cases.

So how about this.

The recipient publishes their public ECDSA point P.

A sender generates a JSON metadata object M and calculates its hash e = SHA256(M). The sender then calculates a new public point PM = P * e. Next, the sender creates a transaction sending the money to the address RIPE160(SHA256(PM)). Finally, he transmits M to the recipient through a secure channel - this could be sent directly via HTTPS, encrypted email, etc. or perhaps left as an message in a DHT, encrypted with ECDH and the recipient's public point P as the key.


What properties does such a scheme have?

The recipient is committed to one set of metadata M for this transaction unless they can find a SHA256 collision. As long as the metadata object is kept private, no one else can determine the relationship between the public point P and the transaction-specific point PM. The recipient does not need to be always-online.

Twitter: @justmoon
PGP: D16E 7B04 42B9 F02E 0660  C094 C947 3700 A4B0 8BF3
Gavin Andresen (OP)
Legendary
*
qt
Offline Offline

Activity: 1652
Merit: 2216


Chief Scientist


View Profile WWW
September 10, 2012, 08:34:53 PM
 #3

Productive discussion in IRC today:
   http://bitcoinstats.com/irc/bitcoin-dev/logs/2012/09/10#l4463724

Executive summary: Good idea, Stefan.  Lots of details to be worked out...

How often do you get the chance to work on a potentially world-changing project?
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
September 10, 2012, 10:14:21 PM
 #4

Doing tricks with the public key doesn't solve the bond market cases where you want to explicitly advertise what the transaction "is" (beyond a Bitcoin payment).

I think the concerns about bloat are overblown for the use cases I've outlined. The chances of many or most transactions having additional data in them seems remote. For things like return addresses, associated messages, etc there are almost always better ways to do it - like a payment protocol, or for person-to-person transactions just attaching a nice simple protocol buffer to an email that contains:

Code:
message Payment {
  required bytes tx = 1;   // Has no outputs, signed with SIGHASH_NONE
  optional bytes refund_scripts = 2;
  optional string message = 3;
}

would do the trick.

As I said before, for use cases like bond markets and pay to policy outputs, having the extra data in the transaction has some benefits. Bitcoin already compromises on storage costs for reasons of simplicity and usability where it makes sense (addresses for ease of typing, checkmultisig rather than secret sharing for threshold signatures).

I think Gregory is primarily arguing that if we add this capability, the majority use case will not be smart property or other protocols with fairly subtle double-binding issues, but just pointless graffiti. And that's a real concern. Bond markets and other things are pretty exotic, I'd totally support a request to really nail payment protocols beforehand.
ByteCoin
Sr. Member
****
expert
Offline Offline

Activity: 416
Merit: 277


View Profile
September 11, 2012, 12:23:45 AM
 #5

I think there are definitely use-cases for associating some immutable meta-data with a transaction.

I'm posting in part because I felt my ears burning on #bitcoin-dev.

Does everything that is broadcast have to be incorporated into the block chain? I suggested a long time ago that the signatures not be part of the hash so that the signature data could be pruned out.

There are two issues associated with including OP_DROP data. One is the amount of data stored in the block chain for eternity and the other is the amount of data that has to propagate through the network right now. These issues are not the same and don't suffer the same constraints. I believe the current system smooshes them together to make things 'simpler'. I don't think this is a good idea.

ByteCoin
Stefan Thomas
Full Member
***
Offline Offline

Activity: 234
Merit: 100


AKA: Justmoon


View Profile WWW
September 11, 2012, 03:48:14 AM
 #6

Doing tricks with the public key doesn't solve the bond market cases where you want to explicitly advertise what the transaction "is" (beyond a Bitcoin payment).

Well, at the very least it would be a size optimization.

Take for example the "BOND" message:

Quote
"BOND" <hash of bond message> DROP DROP <issuer pubkey> CHECKSIG

Can be reduced to:

Quote
"BOND" DROP <message pubkey> CHECKSIG

Where <message pubkey> = <issuer pubkey> * <hash of bond message>.

The only necessary change to the bond network protocol is that the Issuer object must (rather than "may") contain the issuer pubkey.

You might say that now we cannot derive the hash of the bond message from the transaction if we haven't seen the corresponding Bond message yet. But if that's the case the hash of the Bond message wouldn't tell us anything anyway, so we don't lose anything. For the hashmaps we can use the message pubkey as the key, which can be derived from both the transaction and the Bond message, so indexing and lookups in either direction are still possible.

All the "BOND" string really does is alert us to the fact that we need to check this transaction against our index. In other words, it is there for performance reasons only, so whether it is necessary is really a practical/implementation concern, weighing blockchain bloat against bond node performance. Given that it's just six bytes though it seems certainly worth it.

If we want to really minimize the size of the output script, we can even use the hash of the message pubkey instead of the message pubkey itself.

Quote
"BOND" DROP DUP HASH160 <msgPubKeyHash> EQUALVERIFY CHECKSIG

I haven't looked at it in-depth, but it seems the same optimizations apply to the "POLICY" transaction, although I think a new broadcast and a new hashmap would have to be added to the bond network.

Twitter: @justmoon
PGP: D16E 7B04 42B9 F02E 0660  C094 C947 3700 A4B0 8BF3
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
September 11, 2012, 03:59:42 PM
Last edit: September 11, 2012, 09:34:56 PM by Mike Hearn
 #7

Quote
Does everything that is broadcast have to be incorporated into the block chain? I suggested a long time ago that the signatures not be part of the hash so that the signature data could be pruned out.

Signatures are already not part of the hash, that would make it impossible to sign a transaction. See the definition of SignatureHash in the code.   (edit: sorry ByteCoin, I think you meant transaction hash not signature hash, right? I think the fact that the block chain is entirely self validating is pretty important though!).

Stefan - excellent suggestion. That's a neat way to do things. To sign for the control output then, the current bond owner would have to calculate owner privkey*bond record hash too to make it match the pubkey, is that right?
Stefan Thomas
Full Member
***
Offline Offline

Activity: 234
Merit: 100


AKA: Justmoon


View Profile WWW
September 11, 2012, 08:32:03 PM
 #8

Stefan - excellent suggestion. That's a neat way to do things. To sign for the control output then, the current bond owner would have to calculate owner privkey*bond record hash too to make it match the pubkey, is that right?

Yep.

--

One more point that is going to be obvious to those comfortable with elliptic curve math, but bears writing down: It's important that the base pubkey is captured in the hash, otherwise the scheme becomes ambiguous. I'll use the bond message case as an example.

Let's say you have a message pubkey M. It was calculated from issuer public key P1 and Bond message hash b1 as M = P1 * b1.

Now I'm an evil attacker and I want to create another pair P2, b2 that also results in M. What I can do I choose an arbitrary Bond message, calculate its hash b2 and then calculate P2 = M * b2-1. Obviously I don't have the corresponding private key but having a valid pair P2, b2 might be enough to cause problems depending on the use case. What prevents this attack is the fact that the Bond message contains the pubkey. If I try to enter P2 into the bond message, its hash changes and P2 is no longer correct. To make the message valid I would have to find a SHA256 collision, i.e. a Bond message where I've inserted P2, but that results in the same hash b2.

Twitter: @justmoon
PGP: D16E 7B04 42B9 F02E 0660  C094 C947 3700 A4B0 8BF3
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
September 11, 2012, 09:26:26 PM
 #9

Thanks for the explanation. I had to look up some notation/take a quick ECC tutorial (again) :-) but now I think I understand it.

For those following along b-1 means the modular inverse of b, that is xb mod p = 1 where x is the solution and p is defined by secp256k1 as the prime in which all modular operations take place.
ByteCoin
Sr. Member
****
expert
Offline Offline

Activity: 416
Merit: 277


View Profile
September 12, 2012, 12:24:24 AM
 #10

Let's say you have a message pubkey M. It was calculated from issuer public key P1 and Bond message hash b1 as M = P1 * b1.

Now I'm an evil attacker and I want to create another pair P2, b2 that also results in M. What I can do I choose an arbitrary Bond message, calculate its hash b2 and then calculate P2 = M * b2-1. Obviously I don't have the corresponding private key but having a valid pair P2, b2 might be enough to cause problems

Whoever holds the private key for P1 can easily calculate the private key for P2. I haven't been following your scheme but I presume that's the issuer. If I catch on correctly then the issuer could misrepresent some information about the bond, saying that the issuer public key was actually P2.

For those following along b-1 means the modular inverse of b, that is xb mod n = 1 where x is the solution and n is defined by secp256k1 as the prime in which all modular operations take place.
The "modular operations" use the prime p but for the above calculation you should use the group order n which is a somewhat smaller prime.
Code:
p = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F
n = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141


I think you meant transaction hash not signature hash, right? I think the fact that the block chain is entirely self validating is pretty important though!
Yes that's what I meant. The block chain is still entirely self-validating if the transaction hash doesn't include the signature as long as you bother to store the signature. At the moment though you HAVE to store the signatures. Can anyone propose a remotely plausible scenario in which we would regret not hashing the signatures?

ByteCoin
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
September 12, 2012, 02:41:28 AM
Last edit: September 12, 2012, 02:53:39 AM by gmaxwell
 #11

Yes that's what I meant. The block chain is still entirely self-validating if the transaction hash doesn't include the signature as long as you bother to store the signature. At the moment though you HAVE to store the signatures. Can anyone propose a remotely plausible scenario in which we would regret not hashing the signatures?

You don't have to store the signatures you have to store the signatures OR the txid, ultraprune stores the TXID but not the signatures in its coins database (which is the only thing it uses for validation, so not just possible, but implemented).   Because maximally efficient pruning always instantly prunes the signatures and does pruning per txout instead of per transaction you're going to have to store the txid anyways.

This is why I was loudly beating the drum up-thread about saying any OP_DROP stuff, if it exists at all should be in the scriptsig, and not the scriptpubkey... scriptsigs are always instantly prunable.

Unless you were talking about the data you need to initialize another node that doesn't trust you completely, in which case you can't discard the signatures no matter what is hashed.


ByteCoin
Sr. Member
****
expert
Offline Offline

Activity: 416
Merit: 277


View Profile
September 13, 2012, 09:35:42 PM
 #12

Unless you were talking about the data you need to initialize another node that doesn't trust you completely, in which case you can't discard the signatures no matter what is hashed.
Why would the other node have to trust you completely to initialize itself under my scheme?
Can you come up with a remotely plausible scheme in which anyone would regret us excluding the signature from the hash that generates the transaction id?

ByteCoin
Stefan Thomas
Full Member
***
Offline Offline

Activity: 234
Merit: 100


AKA: Justmoon


View Profile WWW
September 13, 2012, 09:38:14 PM
 #13

Whoever holds the private key for P1 can easily calculate the private key for P2. I haven't been following your scheme but I presume that's the issuer. If I catch on correctly then the issuer could misrepresent some information about the bond, saying that the issuer public key was actually P2.

Very true, I hadn't thought of that, So yeah that underlines the point that you need to have the pubkey contained in the hash (and confirm it matches of course) to prevent this class of attack.

Twitter: @justmoon
PGP: D16E 7B04 42B9 F02E 0660  C094 C947 3700 A4B0 8BF3
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
September 17, 2012, 12:33:18 AM
 #14

This has probably been brought up before, somewhere, but Gavin's suggestion is to add a new standard transaction of the following type:

Code:
..64-or-fewer bytes.. OP_DROP ..n.. ..pubkeys.. ..m.. OP_CHECKMULTISIG

So what's stopping someone from using transactions like the following as an interim measure?

Code:
1 pubkey data1 (data2) 2/3 OP_CHECKMULTISIG

That stores 64 or 128 bytes encoded in the public key(s), or even just 32 bytes with a compressed public key, while still having one real public key that you have a private key too so that you can spend the output of the transaction later. (any 1 of the 3 signatures can be used to spend) The encoded data can still be in the clear as well, for instance one of the two public keys can easily be "BOND", plus padding. (I assume random padding would be required to ensure the secret key can't be forced from an overly-simple public key)

You could also use a P2SH-style transaction, moving the data into the scriptSig:

scriptPubKey:
Code:
OP_HASH160 [20-byte-hash-value] OP_EQUAL

scriptSig:
Code:
 signature {1 pubkey data1 (data2) 2/3 OP_CHECKMULTISIG}

jgarzik
Legendary
*
qt
Offline Offline

Activity: 1596
Merit: 1091


View Profile
September 18, 2012, 12:54:46 AM
 #15

You could also use a P2SH-style transaction, moving the data into the scriptSig:

Using P2SH at least means the unspent txout is small, and does not carry the extra data.


Jeff Garzik, Bloq CEO, former bitcoin core dev team; opinions are my own.
Visit bloq.com / metronome.io
Donations / tip jar: 1BrufViLKnSWtuWGkryPsKsxonV2NQ7Tcj
Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
September 18, 2012, 11:07:34 AM
 #16

You can't put the data into the scriptSig, for obvious reasons - the purpose of the OP_DROPd data is to announce that the output is special and needs special handling despite being unspent.
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
September 18, 2012, 02:07:08 PM
 #17

You can't put the data into the scriptSig, for obvious reasons - the purpose of the OP_DROPd data is to announce that the output is special and needs special handling despite being unspent.

Sure you can. By putting it in the scriptSig we're announcing that the output of the transaction using that scriptSig is special and needs special handling despite being unspent.

Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
September 18, 2012, 03:49:02 PM
 #18

Oh, I see what you mean now. That's a bit awkward but I suppose it could work. You would need to announce in the scriptSig which output you were actually marking as special.
Peter Todd
Legendary
*
expert
Offline Offline

Activity: 1120
Merit: 1149


View Profile
September 18, 2012, 07:09:11 PM
 #19

Oh, I see what you mean now. That's a bit awkward but I suppose it could work. You would need to announce in the scriptSig which output you were actually marking as special.

You don't even need to do that actually: outputs are numbered so your standard can be that if there are more than one outputs, the first one is reserved for change. At worst you need to add some extra coins if what you are spending divides exactly.


Actually, this made me realize that for many applications any type of scriptPubKey message opens you to a nasty problem: replay attacks. Let's suppose your "issue a bond" protocol is that any coins from a transaction with the specially marked scriptPubKey are considered to be issued. Now anyone can create a transaction with an identically marked scriptPubKey, creating an output that looks like it's part of the bond issue. Obviously there are lots of ways to mitigate this but...

On the other hand a scriptSig-based issue is protected from this griefing, as only you can issue more bonds from that address as only you have the private key to create a valid scriptSig.

If you were to use my P2SH hack you'd have to go a step further in the protocol, as it's really the act of spending from the special address itself that communicates the information. So just say that only if the last bit of the output amount is 0, one satoshi, is the information valid. Now you can freely any transactions sent to your magic address without communicating anything, at worst by including an extra input to get you an odd number of satoshis.


Of course this is all irrelevant for something like timestamping, where what you're really trying to do is get some data into the merkle tree of a block.

Mike Hearn
Legendary
*
expert
Offline Offline

Activity: 1526
Merit: 1128


View Profile
September 19, 2012, 11:04:10 AM
 #20

Quote
Now anyone can create a transaction with an identically marked scriptPubKey, creating an output that looks like it's part of the bond issue.

I'm not so sure - the point of smart property is that the transaction is the property. The scriptPubKey isn't read in isolation, it's fundamentally a part of the containing transaction.

If you copied a scriptPubKey like that it would create a different bond that happened to share the same parameters as the first. It would show up under somebody elses identity which would be confusing yes, but the app that people use to trade bonds would notice the problem when it contacted the node in the bond advert and said "I want to buy the bond identified by output <hash>:index" and the node said "er, I never issued that". Contacting the origin node is a part of the protocol already (that's why the protobuf message has a URL in it).

It might open up interesting DoS attacks though.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!