Bitcoin Forum
May 12, 2024, 05:14:47 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Is it possible to keep non-transaction data out of a blockchain?  (Read 1254 times)
DeathAndTaxes (OP)
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
January 10, 2015, 07:42:57 PM
 #1

Is it possible to keep non-transaction data out of a blockchain? This doesn't necessarily need to be Bitcoin for the sake of the thread lets just consider a generic blockchain like system.  Also for the sake of the thread lets not get into "if" it is a good idea to limit the blockchain to only transactional data (that is a good debate for another thread).

For partial solutions which make it more difficult lets give higher value to solutions which keep non-transactional data out of the UTXO.  Spent or provably unspendable transactions can be pruned.

One idea would be to make all transaction outputs being P2SH.  This could even be implicit with no opcodes in the output at all where outputs are simply a hash of the redeem script and a value.   As a side note this has an added benefit of making the UTXO (but not the full blockchain) smaller with fixed length records, making lookups simpler and cheaper.  Of course this by itself would not significantly limit inclusion of non-transactional data as encoded data (the 20 bytes "hash" could be put any encoded data without verification).  This by itself would actually be a step back because unlike OP_RETURN outputs the output would be unspendable but not provably so leading to UTXO bloat.

To secure the output against misuse one could require that the "pre-hash" be included when a transaction is relayed or when a block is published.  For Bitcoin the ScriptHash = RIPEMD160(SHA256(script)).  So the RIPEMD160 hash would be in the output and the transmitting node would also include the prehash (output of the SHA256 function) with the transaction.   Nodes would take the SHA256 "prehash" hash it and verify it matches the output to confirm the output has a valid hash.  Once a txn was sufficiently deep in the blockchain to make re-orgs unlikely, nodes could discard the prehash.

Any other ideas?







1715534087
Hero Member
*
Offline Offline

Posts: 1715534087

View Profile Personal Message (Offline)

Ignore
1715534087
Reply with quote  #2

1715534087
Report to moderator
1715534087
Hero Member
*
Offline Offline

Posts: 1715534087

View Profile Personal Message (Offline)

Ignore
1715534087
Reply with quote  #2

1715534087
Report to moderator
The forum strives to allow free discussion of any ideas. All policies are built around this principle. This doesn't mean you can post garbage, though: posts should actually contain ideas, and these ideas should be argued reasonably.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
1715534087
Hero Member
*
Offline Offline

Posts: 1715534087

View Profile Personal Message (Offline)

Ignore
1715534087
Reply with quote  #2

1715534087
Report to moderator
TimS
Sr. Member
****
Offline Offline

Activity: 250
Merit: 253


View Profile WWW
January 10, 2015, 07:55:21 PM
 #2

I'm pretty sure it's impossible.
So the RIPEMD160 hash would be in the output and the transmitting node would also include the prehash (output of the SHA256 function) with the transaction.
So you stick your non-transaction data in there as the "prehash".

The only thing I can think of is to make it prohibitively expensive to include a substantial amount of non-transaction data, e.g. via high fees or a proof of work alongside your transaction.

If it's not possible to prevent it, you might as well make it pruneable.
DeathAndTaxes (OP)
Donator
Legendary
*
Offline Offline

Activity: 1218
Merit: 1079


Gerald Davis


View Profile
January 10, 2015, 08:24:40 PM
 #3

So you stick your non-transaction data in there as the "prehash".

Nodes will discard the prehash once the txn is included in a block (or alternatively once the txn is sufficiently deep in the blockchain).[/quote]

It is only used for temporary validation.

Quote
The only thing I can think of is to make it prohibitively expensive to include a substantial amount of non-transaction data, e.g. via high fees or a proof of work alongside your transaction.

The difficulty is how to distinguish encoded data from actual transactions.  Any ideas? It is made difficult because Bitcoin has a rather open scripting language.   This provides flexibility but it also makes it difficult to avoid using opcodes for other purposes.   For example OP_CHECKMULTISIG is great for encoding bulk data in the blockchain.  Instead of 20 bytes per transaction you can put hundeds and it appears indistinguishable from normal transactions and is even considered standard by the network.

Quote
If it's not possible to prevent it, you might as well make it pruneable.

That is another possible route.   Imagine if OP_RETURN had no size limit but outputs were hashed first and the txn hash was created from the output hashes.  This would allow pruning the OP_RETURN data with just its hash.   Not only could OP_RETURN data be pruned out of the blockchain, txns could still be validated so even full nodes could choose validate th txn without the data.   Those who wanted to keep the data would need to pay the cost to store and relay the data.  Other nodes would relay the hash only.  However for that to work it should be difficult or expensive to bypass this and bury data in "fake" txns instead or it does no good.
amaclin
Legendary
*
Offline Offline

Activity: 1260
Merit: 1019


View Profile
January 10, 2015, 09:27:45 PM
 #4

Quote
Is it possible to keep non-transaction data out of a blockchain?
[...]
Any other ideas?
Alt-coin with strict rules.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
January 10, 2015, 09:41:42 PM
Last edit: January 10, 2015, 10:37:14 PM by gmaxwell
 #5

To secure the output against misuse one could require that the "pre-hash" be included when a transaction is relayed or when a block is published.  For Bitcoin the ScriptHash = RIPEMD160(SHA256(script)).  So the RIPEMD160 hash would be in the output and the transmitting node would also include the prehash (output of the SHA256 function) with the transaction.   Nodes would take the SHA256 "prehash" hash it and verify it matches the output to confirm the output has a valid hash.  Once a txn was sufficiently deep in the blockchain to make re-orgs unlikely, nodes could discard the prehash.
When someone independantly invents the same idea, you know it must be at least somewhat good. Smiley (uh, well actually, maybe not. Tongue )

I called this P2SH^2: http://sourceforge.net/p/bitcoin/mailman/message/30705609/

Quote
Any other ideas?
Yes, I have a more powerful scheme that doesn't _ever_ distribute the pre-image, it goes as follows:

First some super sloppy background so the rest makes sense to people who don't know it (you can skip if you're familiar with pairing crypto):

A BLS signature is a simple signature scheme for pairing cryptography which uses groups where computational discrete log is believed intractable but the decisional discrete log problem is tractable. BLS signatures have the important property that a signature is _unique_ for a given pubkey,message tuple there is only one signature possible. Pairing crypto works like ECC, but you have an additional operation pairing(point, point)  that can be used to solve the decisional discrete log problem.

The construction of a BLS signature is,

G is a generator of the group, I'm writing it additively here.
H(y) is that converts y to a point on the curve such that the discrete log is not known. (e.g. it's some direct bit coercion not y*G)

secret key: x
pubkey:  xG
sign:  sig = x H(message)
verify:  pairing(H(message), pubkey) == pairing(sig, g)

Verification is basically answering the question "Is sig the ECDH value for the pair of points pubkey and H(message)".

Here is how you can use this to suppress the convert channel:

H2 = sha256 or what have you, random collision resistant hash.

secret key:  x  == data you want to encode
pubkey:  xG  == assuming the data is the size of the order of the group this is a one way computable bijection, if the data you wanted to encode was a a secure hash, so is the pubkey.
proof:  proof = x H(H2(pubkey))

Proof now shows that pubkey is a point you know the discrete log of,  hardness of the discrete log in the group shows you can't pick some data embedding pubkey as you wouldn't know the discrete log.

You transmit {pubkey, proof}

You can't do this trick using regular ECDSA/schnorr signatures as far as I can tell because the non-uniqueness allows you to use knowledge of the nonce as a trapdoor.

This reduces your covert channel to only the small bits that you can afford to grind into it with guess and check. If the values already have to meet some computational criteria (e.g. you always have to grind) then any additional grinding is a multiplication on that, so with a linear cost to honest users that exponential cost channel can be made as computationally expensive as you like.

Better: a complete failure of the cryptosystem only reintroduces the covert-channel, it doesn't compromise the security of your P2SH.

Downside:  The proof adds another group-element (e.g. 32 bytes for 256 bit security) size amount of data, and the verification takes on the order of 1ms. Hashing (required at spending time) requires a point scalar multiply, which is also slower than a classical hashing function (though much better than a half ms).

It may be possible to construct something similar with RSA signatures; but I haven't tried and the cost of deterministically generating a private key (hashing your message) may be problematic... and obviously the size would be unattractive. I think basically the same can be done with any unique signature scheme where the signature leaks no information about the private key (E.g. doesn't work with a lamport signature, though they're trivially unique).

I'm pretty sure it's impossible.
The impossible presented. I'm glad to be of service.

So you stick your non-transaction data in there as the "prehash".
Yes, but you can verify this prehash only for blocks on the tip, not the history. and forget about it. So I think that fits at least some definitions of "out of the blockchain". (Though as I just demonstrated even that weakness can be closed.)

Anything here works better with a somewhat narrow definition of "out of a blockchain". Even with outputs fixed signatures still end up with a covert channel (though something with very limited functionality could also make it arbitrarily narrow, again using unique signatures), but most of my thought in this space has been centred around eliminating high bandwidth covert channels for data that all participants are strongly forced by the system to store.

There are also other things that can be done here.  A utxo is, roughly,   txid:vout -> {scriptPubkey, value}.   You could instead store H1(txid:vout) -> Enc(H2(txid:vout), {scriptPubkey, value})   to locally blind yourself to this data until the very moment it's needed (when a block shows up that actually spends the output).
TimS
Sr. Member
****
Offline Offline

Activity: 250
Merit: 253


View Profile WWW
January 11, 2015, 12:36:09 AM
 #6

This reduces your covert channel to only the small bits that you can afford to grind into it with guess and check. If the values already have to meet some computational criteria (e.g. you always have to grind) then any additional grinding is a multiplication on that, so with a linear cost to honest users that exponential cost channel can be made as computationally expensive as you like.
I had considered a plan similar to this: e.g. that the first x bits of the public key have to be 0. However, similar to Hashcash, I figured that what is too expensive for a legitimate user is still too weak for a serious attacker and his botnets (especially if ASICs can get involved!).

This hypothetical network should also prohibit address reuse. Why? Imagine a scenario where an attacker wishes to use the last byte of the address/P2SH (whatever is visible in the blockchain) to store his data. An attacker only needs to search through (he doesn't even have to generate them himself, just find people's addresses in the blockchain) about 1568 addresses to find 256 distinct-end-byte addresses and use them to represent 1 byte each in an unlimited number of data storage transactions. Or, if having an off-blockchain lookup table is acceptable, just choose or generate any 256 addresses.

How long is acceptable to force a legitimate user on a mobile device to come up with a new address for a transaction? Seconds? Minutes? Whatever it is, someone with a farm of ten desktop computers could be expected to generate 100 addresses in that time. 15 times that is still easy if the data is really worth putting in the blockchain.

Granted, all of this does make it significantly more expensive. But it's still quite possible.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
January 11, 2015, 12:50:06 AM
 #7

Granted, all of this does make it significantly more expensive. But it's still quite possible.
It basically makes a data storing party have a huge (e.g. dozens fold) disadvantage against normal users, even if you give them large computation, and at that point economic pressures are likely more effective.  Pragmatically it may change how things work legally.  People might order a newspaper to censor an old classified in its archive, they're not so likely to do ask the censoring of 50 articles each which leak one letter; at some point the "message" is really embedded in the decode instructions rather than the data elsewhere.   You've gone from outputs in a transaction being able to code kilobytes with modest overhead, to, say, a couple hundred bytes with huge overhead.
Sukrim
Legendary
*
Offline Offline

Activity: 2618
Merit: 1006


View Profile
January 11, 2015, 01:50:35 AM
 #8

There might be a way to only allow transactions to public keys and include a clause that unspent transactions can be re-distributed after e.g. a year or so. This way, even if someone spams the chain, it is not really permanent but will be resolved after a fixed time period.

Another way might be to decouple account management and transactions - this way you might have a block chain that only deals with transfers between accounts that would have very tiny transactions and very strict rules about what can be entered and a different one that contains public keys, smart contracts and whatnot that might have non-transaction data too but is expensive to write into as well as slower and larger in size.

https://www.coinlend.org <-- automated lending at various exchanges.
https://www.bitfinex.com <-- Trade BTC for other currencies and vice versa.
TimS
Sr. Member
****
Offline Offline

Activity: 250
Merit: 253


View Profile WWW
January 11, 2015, 04:28:28 AM
 #9

There might be a way to only allow transactions to public keys and include a clause that unspent transactions can be re-distributed after e.g. a year or so. This way, even if someone spams the chain, it is not really permanent but will be resolved after a fixed time period.
This doesn't stand in the way of storing data in the blockchain at all: you just send 0 or 1 satoshi to a dummy public key. You don't care whether that's locked up forever or can be reclaimed eventually: the data is still in the blockchain (and, for a year, in UTXO).
Another way might be to decouple account management and transactions - this way you might have a block chain that only deals with transfers between accounts that would have very tiny transactions and very strict rules about what can be entered and a different one that contains public keys, smart contracts and whatnot that might have non-transaction data too but is expensive to write into as well as slower and larger in size.
So I'll encode my data in the transaction amounts (this can be limited by giving this no-data chain low divisibility, e.g. tenths of BTC instead of satoshis), addresses, or even txids (this last one is more far-fetched, I'll admit).

And don't forget in all of this that a short amount of data can link to a larger amount of data: e.g. 80 bits can be an .onion address, 256 bits can be a magnet URI, etc. Which prompts the question: just what problem are we trying to solve here? If that can be identified, then I can know whether a certain solution is feasible for that or not. I think it will always be possible, but can be made more expensive, given certain tradeoffs.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
January 11, 2015, 05:36:18 AM
 #10

So I'll encode my data in the transaction amounts
Again, you're limited to a fairly small number of bits at considerable costs, even without any additional artificial limitations.

Quote
can link to a larger amount of data: e.g. 80 bits can be an .onion address
Great; and thats outside of the blockchain, which was the goal of the OP's post.
hhanh00
Sr. Member
****
Offline Offline

Activity: 467
Merit: 266


View Profile
January 11, 2015, 06:04:20 AM
 #11

x = SHA-256(offchain data)?
So you can refer to arbitrary size data, right?

gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4172
Merit: 8420



View Profile WWW
January 11, 2015, 07:25:12 AM
 #12

x = SHA-256(offchain data)?
So you can refer to arbitrary size data, right?
Sure.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!