Bitcoin Forum
March 29, 2024, 08:38:10 AM *
News: Latest Bitcoin Core release: 26.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Signature aggregation for improved scalablity.  (Read 7536 times)
gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
February 26, 2016, 12:52:05 AM
Last edit: April 11, 2016, 05:16:20 PM by gmaxwell
Merited by Hueristic (35), suchmoon (29), ABCbits (22), Welsh (10), LoyceV (4), darosior (2), vapourminer (1)
 #1

Back in 2013 an Anonymous Author wrote a brief paper on using BLS signatures to achieve a kind of non-interactive coinjoin.  I jumped on it right away, pointing out that it also could have useful anti-censorship properties. (Edit: Paper link is down, a copy is available from Andrew Poelstra.)

The core property it depends on is that fact that for a BLS signature one can take a collection of {message, pubkey, signature} tuples and merge their signatures: {[message1, message2, ...], [pubkey1, pubkey2, ...], signature}, in a way which is infeasible to unmerge them without knowing the components, and still be able to verify the result. Via a clever trick that involves signing per input and per output, the authorship of inputs and outputs can be unlinked. I have a sidechain implementation of this that I've been working on-- like coinjoin generally it composes well with CT, but I'm unsure of what to do about the performance (more below).

Another application of these aggregatable signatures which has come up from time to time in #bitcoin-wizards is using them to reduce blocksize  ("2013-09-13 21:38 < gmaxwell> sipa: I'm mildly excited about this pairing crypto aggregate signature idea. Not because of the anonymity stuff, but because it makes scalable relay fees viable. and can also reduce transaction sizes in the non-anonymous case"), since you'd save the sizes of the free standing signatures. Aggregate BLS signatures could even be made compatible with 'efficient' fraud proofs, though the cost of doing probably makes it not a win. (You would do a merkel sum tree over the interior computation, which aggregates the partial signatures as products in a 3kbit field; which would make it possible to show a block had an invalid aggregate signature with a compact proof).

The main downside of this are two fold-- the BLS signatures involve Pairing Cryptography-- less mature security assumptions than plain ECC signatures, and more importantly they're slow to verify: On hardware that can verify 70,000 secp256k1 signatures per second (or 140,000/s with batched Schnorr) they could only process about 8,000 BLS signature per second, with state of the art parameter selection and software.  So, while bandwidth would be reduced, the capacity of most nodes wouldn't be increased much because they'd be bottlenecked on signatures.  (And, right now, signature validation is extensively cached in Bitcoin-- and the aggregation would move validation back on the critical path).

[I've had some conversations about Dan Boneh about using multi-exponentiation like optimizations in computing the required product of parings here-- but it's likely that the performance improvement that this would get would only be on the order of 2x.]

Adam Back was asking me about this again today and I pointed him to the prior discussions regarding the performance limits.  Then he suggested just using Schnorr multi-signature instead of the BLS signatures. The idea is that if you have inputs with pubkeys P and P2  you can combine them and to form a P+P2 pubkey and sign with that to prove authorization with both keys.  When this had previously been discussed the fact that someone malicious could set their P2 to P3-P and then sign the aggregate with just P3 seemed to kill it.

But actually a few months ago Pieter Wuille solved this as a side effect of his key tree multi-signature work,  He suggested instead that to combine P and P2, you use P*H(P) + P2*H(P2); this doesn't inhibit signing, but it makes it impossible to create a pubkey as a linear combination of other pre-existing pubkeys. Pieter has an implementation of this as part of the Libsecp256k1 multi-signature work: https://github.com/bitcoin/secp256k1/pull/322

So the way we could use this to improve scalability in Bitcoin would be to add a OP_GROUPCHECKSIGVERIFY (and perhaps a multi-version of it) as part of a new segwit script typecode that takes as input sighash flags and a pubkey.  The script operation would push a the pubkey onto a transaction scoped 'signature stack'  and the sighash-masked transaction hash onto a transaction scoped hash stack.

After all inputs are processed the pubkeys on the stack would be combined to a new Pubkey as P = P1*H(H(all_Pn) || P1) + P2*H(H(all_Pn) || P2) + ..., and a root message hash computed-- then an additional group signature witness at the end of the transaction would contain a signature of the root with P.

Verification speed would be similar to individual inputs with batch schnorr (the multiplication with H() can be folded into the multi-exp in verification), so this is more of a space savings than a speedup; but the size of the transaction would be reduced by 41% asymptotically-- without any new security assumptions or performance hit. The main limit is that a transaction must have many inputs to fully realize that benefit.

Unlike other past shared signature proposals based on exploiting pubkey reuse, there is no advantage in this design in reusing public keys; so it also doesn't create a new privacy moral hazard for the network.

Critically, multiple distinct users could cooperate to securely produce a single large transaction with a two round protocol which may make it reasonable to get _many_ inputs in a single transaction-- but unlike the BLS signatures they must interact in order to merge their signatures.  Effectively this would be a CoinJoin, with a big space (and thus fee) savings for the CoinJoin users.

Even without combining multiple users this approach would make it much more economical for a single user to sweep up txout dust which would be an important improvement on its own. For block 400083 if all transactions used this (but with no increase in CoinJoin, or even behavioral changes to prefer sendmany over chains of transactions which we would expect if this actually existed; and also ignoring multisig use, which would further improve this figure) I compute this would reduce the size of the block by 172050 bytes out of 897956 or a 19% reduction which is substantial fraction of the asymptotic result. [Edit: A scan of 2016 recent blocks with accurate counting for multisig, shows the actual reduction for existing usage patterns would be 30%]

In effect this could achieve at least half the asymptotic gain while having no negative performance impact (10x-17x better performance than BLS) and no new strong cryptographic security assumptions. To me it seems like a no brainer improvement that Bitcoin is almost sure to eventually adopt once implemented.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
jl777
Legendary
*
Offline Offline

Activity: 1176
Merit: 1109


View Profile WWW
February 26, 2016, 02:13:40 AM
Last edit: February 26, 2016, 05:50:28 AM by theymos
 #2

I am no cryptographer, just a C coder that calls the cool functions in creative ways, so I cant speak to all the deeper maths, but if there is a way to allow bitcoin to have arbitrary sized anon-sets participate in a transaction, then that would be fantastic.

With IP privacy compromised, then an attacker would be able to correlate a lot more, but if there is an offchain DCnet to create a shared transaction, I think it will be possible to have a zero knowledge protected IP privacy that feeds into group signatures like above to provide a provably secure method of crypto payments even assuming all IP traffic is fully monitored.

Once IP privacy can be assured, then nodes can randomly choose peers and crunch away at making the joint transaction without worry of leaking information to an attacker in the group. Even if the computation required is 1000x more, it wont bother the nodes that value privacy. Maybe there is a way for the group transaction to solve for a "zero" or some other special case value to make the verification much faster?

An implementation idea would be to assume that SIGHASH_ALL is used for all these, it seems that other SIGHASH modes are rarely used and not sure it makes sense to support them for radically new usecases.

Also, is there a reason that a unique number be used to identify each txid and even output script (address)? To my thinking, after a block is past any chance of being reorganized, then there is a canonical ordering of all blocks, and therefore all tx and vins and vouts.

Since each spend currently requires the 32byte txid and vout, mapping this to 4 or 6 bytes creates a lot of space savings. There is a lot of redundancy in the blockchain with each txid potentially being duplicated once for each vout.

The other big redundancy are reused addresses, which are actually rmd160 hashes inside spend scripts. Using the canonical ordering of everything, then each rmd160 hash would map to a 32bit index and with the vast majority of scripts being standard a few more bytes can be encoded into a few bits. However, with the coming diaspora of p2sh script proliferation, it could be that address numbering wont be so effective in the future.

I am getting more than a 50% size reduction in the blockchain using the canonical numbering method, but I still need to add the raw sigs. The actual data is actually significantly smaller, but I wanted an "instant-on" feature so there is no need to verify the vast majority of the blockchain on each restart. So there is a fair amount of space used for bloom filters and hashtables, but still it all adds up to less than half the size.

However, when it is compressed it shrinks to about half the size, so I end up with a squashfs volume, around 15GB now. During the parallel sync, I create a write once data set for each bundle of 2000 blocks into read-only files that are memory mapped data structures. Once created, it never changes and has enough data to directly implement the bitcoin RPC. Without an additional layer that creates a global search, the rpc calls might have to query all 200 bundles, however scanning backwards tends to find things much quicker than the "expected" N/2 bundles and all the lookups in each bundle uses constant time hash tables, so the overhead in the worst case is typically less than RPC overhead.

The only part that is ever changing it is the utxo set and I have it taking 6 bytes per vout, but this is quite inefficient as most vouts get spent and only the unspents have the data. I guess I will need to make a 50% full hashtable which would get close to constant time performance at a cost of 12 bytes per utxo. Almost small enough now to fit inside the CPU's cache.

With about 200+ million high entropy 72 byte sigs, that seems to double the total required size to 30GB. So anything that allows for encoding sigs, will allow for much smaller sizes for full copies of the blockchain.

James

http://www.digitalcatallaxy.com/report2015.html
100+ page annual report for SuperNET
gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
February 26, 2016, 03:18:22 AM
 #3

An implementation idea would be to assume that SIGHASH_ALL is used for all these, it seems that other SIGHASH modes are rarely used and not sure it makes sense to support them for radically new usecases.
Well everything I described there is completely compatible with all scripthash types; so no real need to limit flexibility. Assuming the sighash code is separate and reused, it shouldn't increase implementation complexity.

Quote
Also, is there a reason that a unique number be used to identify each txid and even output script (address)? To my thinking, after a block is past any chance of being reorganized, then there is a canonical ordering of all blocks, and therefore all tx and vins and vouts.

Since each spend currently requires the 32byte txid and vout, mapping this to 4 or 6 bytes creates a lot of space savings. There is a lot of redundancy in the blockchain with each txid potentially being duplicated once for each vout.
That could be done-- but there is no need to do so normatively.  E.g. Peers could agree to transmit data using these compressed indexes, while still hashing the original values. This has the advantage of having the IDs not change out from under already authored transactions, and making sure that offline devices don't need access to (or to trust) external index providers.

Quote
The other big redundancy are reused addresses, which are actually rmd160 hashes inside spend scripts. Using the canonical ordering of everything, then each rmd160 hash would map to a 32bit index and with the vast majority of scripts being standard a few more bytes can be encoded into a few bits. However, with the coming diaspora of p2sh script proliferation, it could be that address numbering wont be so effective in the future.
This would undermine the privacy/fungibility properties of bitcoin as a whole to incentivize parties to reuse addresses. Bitcoin's privacy strongly depends on non-reuse, and additional pressure against it for the benefit of saving  on the order 12 bytes per txout doesn't sound like a win-- especially as it would be used to justify bad software, bad businesses practices, and bad public policy that force users to reuse.  Access to those indexes would have to be handled quite carefully since if you paid the wrong one the funds would be stolen.

Thanks for your thoughts!
kanzure
Newbie
*
Offline Offline

Activity: 13
Merit: 3


View Profile
February 26, 2016, 02:10:16 PM
 #4

Related comment from jl2012 on bip131 "coalescing transactions" appears at https://github.com/bitcoin/bips/pull/268#issuecomment-170780308

Quote from: jl2012
With Schnorr signature it is only possible to combine signature of the same public key, but also different public keys, as long as they are signing the same hash. I think it is the plan after BIP141 is deployed.

Also some other comments from the aethers...

Quote from: gmaxwell
If you have pubkey P1, I can craft P2 = P3-P1, then multisign P1+P2 using P3 all on my own;

Quote from: sipa
P1*H(P1)+P2*H(P2)

gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
February 26, 2016, 07:19:19 PM
 #5

Yea, I spoke to jl2012-- it's interesting how a bit of ignorance can be brilliance: He was previously unaware of the linear cancellation that was keeping this from working, so he just assumed it would work, when others had previously discarded the idea.  I still can't figure out why he wasn't jumping up and down excited to do it-- it's a nice improvement with little downside.
NoMoneyNoHoney
Newbie
*
Offline Offline

Activity: 2
Merit: 0


View Profile
February 29, 2016, 05:03:11 PM
 #6

(And, right now, signature validation is extensively cached in Bitcoin-- and the aggregation would move validation back on the critical path).

Validation would not be fully on the critical path. The individual pairings could all be calculated in the mempool. Only the pairing on the aggregate signature e(g, sigma) (where sigma is the aggregate sig) and any pairing e(pk_i, H(m_i)), where i corresponds to a "message/tx and pubkey pair" that has not yet been seen would need to be calculated on the critical path. Pairings on transactions already seen can be precalculated. If all transactions in a block have been previously seen, then only 1 pairing need be calculated for validation: e(g, sigma), which itself uses a fixed-point and can be sped up significantly. This value would then be compared to the product of all pairings  e(pk_i, H(m_i)), which have been precalculated.

Therefore, the total best-case validation time would be the time to compute e(g, sigma) and the product of n pre-calculated pairings.


gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
February 29, 2016, 10:03:27 PM
Last edit: February 29, 2016, 10:24:04 PM by gmaxwell
 #7

Therefore, the total best-case validation time would be the time to compute e(g, sigma) and the product of n pre-calculated pairings.
Great point! Thought also precludes gaining a product of pairings speedup (though as I mentioned, efficient fraud proofs would preclude that).  Products in the transfer group (which is going to be 3kilobit or so), are not ultrafast either, so that is not totally negligible.  It would also mean that the 3kbit intermediate elements would need to be cached, instead of just a set.  Memory usage for our signature caches is already a consideration.

My thinking there is also somewhat influenced by my thinking in terms of the benefit from privacy or anti-reorg, transactions need to be not broadcast in advance to some degree. But you're absolutely right that caching is not totally lost.
priestc
Jr. Member
*
Offline Offline

Activity: 34
Merit: 1


View Profile
March 13, 2016, 11:42:21 PM
 #8

Quote
Unlike other past shared signature proposals based on exploiting pubkey reuse, there is no advantage in this design in reusing public keys; so it also doesn't create a new privacy moral hazard for the network.

Could you please clarify this paragraph? I assume by "other proposals" you mean BIP131.

It seems to me that BIP131 pretty much solves zipping inputs into a single input in a very simple manner. Your proposal sounds like there are only 3 people in the world with enough phDs to understand whats going on. I feel like I can explain BIP131 to a group of kindergartners and they'll pretty much know what I'm talking about.

Lets say you run a hot dog stand. At the beginning of the day you generate an address, and print off a QR code. Each time someone buys a hot dog, they send you $1.50 worth of bitcoin to that address in the QR code. At the end of the day, your wallet generates a single 233 byte transaction which moves all transactions for that day into your coinbase account (or wherever). Then you throw away that QR code, and generate a new one for the next day. A lot of people use bitcoin this way already. There is no moral hazard because no private key is being re-used. BIP131 solves the moral hazard.

All you have to do to enable the "wildcard" feature is to flip a bit in the version field. Now all inputs that have the same scriptPubKey and are confirmed in a block lesser than the block of the input you did sign, then those inputs gets spent by the transaction as well. The work to implement this can be done by someone with 0 ph.Ds. I could probably implement it myself if I knew C++...
gmaxwell (OP)
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8343



View Profile WWW
March 14, 2016, 07:41:52 PM
 #9

I wasn't aware of 131 when I wrote that text, but aggregating public key reuse is a perennial proposal which reoccurs every 6 to 9 months and is shot down each time.

Practical fungibility is an essential property of money, privacy is an essential requirement for a financial transaction system.  No widespread money system is in use without these properties. Bitcoin's ability to have them depends _strongly_ on the use of random pseudonymous addresses, whenever people don't use that Bitcoin's status as money is degraded for everyone.  Inserting a huge incentive to compromise fungibility and privacy into the system to get a modest capacity boost is a non-starter, even more than it was a non-starter in 2011 when I first recall seeing it proposed. And yes, some people currently use Bitcoin in ways that damage the system-- it can take that-- but in no way makes it acceptable to reward that harmful behavior.

(As an aside, the example you give is pretty broken-- if every customer pays to the same address you cannot distinguish which of multiple concurrent payments has actually gone through; so that only works so long as your hotdog stand is a failure, as soon as you had multiple customers paying close together in time it turns into a mass of confusion.)


Quote
It seems to me that BIP131 pretty much solves zipping inputs into a single input in a very simple manner. [...]
All you have to do to enable the "wildcard" feature is to flip a bit in the version field.
When you lack understanding many things which are not simple seem simple, and many things which are simple seem not simple.

No kind of aggregation can be just done by "flipping a bit in the version field", as that is utterly incompatible with the design of the system; violates all the layering, and would be rejected as coin-theft by all the existing nodes.

Quote
I feel like I can explain BIP131 to a group of kindergartners and they'll pretty much know what I'm talking about. [...] Now all inputs that have the same scriptPubKey and are confirmed in a block lesser than the block of the input you did sign, then those inputs gets spent by the transaction as well
In fact, the way you're describing it here would result in _immediate_ funds loss, even absent an attacker. Imagine an additional payment shows up that you weren't expecting when you signed but happens to arrive first at miners, and the total value of that additional payment get converted into fees! As you described it here, it would also be replay vulnerable... where someone sends the same transaction to the chain a second time to move new payments that have shown up since. This is why we don't have kindergartners design the Bitcoin protocol, I guess.   That kind of design also results in a substantial scalablity loss as every node would need an additional search index for the utxo set (or perform a linear scan of it, which takes tens of seconds currently) in order to gather all the inputs with the same scriptpubkey.

What I've described here is actually very simple, straight forward to implement, and is understood by many people... and it achieves its goal without the massive downside of forcing address reuse-- and in doing so avoids trashing fungiblity and sensible business workflows; and as a bonus isn't grievously broken.

If I've bamboozled you with my explanation, that is likely because I took the time to explain some of the history of the thinking in this space and because my intended audience was other Bitcoin experts (and not PHD's I can assure you)-- whom understood it just fine; not your conjectural kindergartners. Not to mention, ... good explanation is a difficult art and generally only ideas which are too simple to actually be correct can be simply explained without considerable effort. When implementation moves forward you can trust that simple and clear explanations will be provided.

priestc
Jr. Member
*
Offline Offline

Activity: 34
Merit: 1


View Profile
March 15, 2016, 07:36:05 PM
 #10

I wasn't aware of 131 when I wrote that text, but aggregating public key reuse is a perennial proposal which reoccurs every 6 to 9 months and is shot down each time.
Do you have a link to any of these proposals? I wrote BIP131, so I'm curious to see other people's (failed) take on it.

Quote
 Inserting a huge incentive to compromise fungibility and privacy into the system to get a modest capacity boost is a non-starter, even more than it was a non-starter in 2011 when I first recall seeing it proposed. And yes, some people currently use Bitcoin in ways that damage the system-- it can take that-- but in no way makes it acceptable to reward that harmful behavior.

I keep seeing this posted over an over again by multiple people, but it makes no sense to me. I'm referring to the notion that me using the system a certain way has any effect your privacy. Like me re-using addresses has any effect on you. Could you explain how this is so? Its not a self-evident claim at all.

Lets say my shower is bugged with a hidden camera. This will cause my privacy to be lost, not anyone else's. The only way this bugged shower will effect your privacy is if I were to paste a naked picture of you over the camera hole. In order for me to do this, I need a naked picture of you in the first place. I can't violate someone's privacy unless I have their private information in the first place. If you're sending me bitcoin from a wallet that generates a new change address every transaction, and I'm using the same address over and over again, what is the proverbial naked picture I have of you that I'm leaking to the world?

Quote
(As an aside, the example you give is pretty broken-- if every customer pays to the same address you cannot distinguish which of multiple concurrent payments has actually gone through; so that only works so long as your hotdog stand is a failure, as soon as you had multiple customers paying close together in time it turns into a mass of confusion.)

Oh come on, you're being pedantic. If you want to accept concurrent payments, you print out multiple QR codes, one for each register.

Quote
No kind of aggregation can be just done by "flipping a bit in the version field", as that is utterly incompatible with the design of the system; violates all the layering, and would be rejected as coin-theft by all the existing nodes.

You're the only person who thinks this.

Quote
In fact, the way you're describing it here would result in _immediate_ funds loss, even absent an attacker. Imagine an additional payment shows up that you weren't expecting when you signed but happens to arrive first at miners, and the total value of that additional payment get converted into fees! As you described it here, it would also be replay vulnerable... where someone sends the same transaction to the chain a second time to move new payments that have shown up since.

Only UTXOs confirmed in a block previous to the signed input get included in the coalesce. Lets say you have three UTXOs to the same address:

#1 value 4 BTC in block 1000
#2 value 2 BTC in block 2000
#3 value 1 BTC in block 3000

and you included input #2 into your coalescing transaction, only output #1 and #2 will be included in the coalesc. Output #3 is left off because it's included in a block after the output that was included in the TX. The total input amount in this case would be 6 BTC.

Quote
That kind of design also results in a substantial scalablity loss as every node would need an additional search index for the utxo set (or perform a linear scan of it, which takes tens of seconds currently) in order to gather all the inputs with the same scriptpubkey.

You just invoked a pet peeve of mine. You can't throw out numbers like that without mentioning the hardware. Currently the UTXO database has 35 million rows. On my 3 year old laptop I can query a 35 million table database and have it finish in a few hundred milliseconds. I guess if you're running this on an Apple II it would take "tens of seconds"...

Even if it does take a long time to query the UTXO database, it shouldn't matter because the purpose is to spend those outputs. This results in a smaller UTXO database, which makes subsequent queries faster.
tryexcept
Full Member
***
Offline Offline

Activity: 192
Merit: 100



View Profile
March 15, 2016, 11:28:22 PM
 #11


I keep seeing this posted over an over again by multiple people, but it makes no sense to me. I'm referring to the notion that me using the system a certain way has any effect your privacy. Like me re-using addresses has any effect on you. Could you explain how this is so? Its not a self-evident claim at all.


I can give you a simple example: say I don't reuse addresses and I send you a transaction. If both of us didn't reuse addresses then someone looking at the transaction wouldn't know how much I spent vs how much was change or other payment recipients. However if you reuse your address it would make it quite evident to external people that _your_ address is with a very high probability not the change and likely the other address used is my change.

priestc
Jr. Member
*
Offline Offline

Activity: 34
Merit: 1


View Profile
March 16, 2016, 02:12:55 AM
 #12

However if you reuse your address it would make it quite evident to external people that _your_ address is with a very high probability not the change and likely the other address used is my change.

One for that one single transaction. If you wallet is generating new addresses for each change output, nobody is even going to know it's you making the transaction to my re-used address in the first place.

When doing blockchain analysis to link addresses together, there are two types of links: Links that are 100% guaranteed to be the identified person, and links that are only thought to be an identified person according to a probability. Each time you move coins to a newly generated change address, you're decreasing the probability that those funds belong to you from the perspective of someone trying to do analysis.

If you buy BTC from Coinbase, and then send those coins directly to the Silk Road, Coinbase is going to know with 100% certainty that you made that transaction, and they could report you to the police. If you instead sent that coin to a separate address, then send them to the Silk Road, then there is plausible deniability. That intermediate address could be another person, it also could be your own address. Coinbase can't do anything about it because they can't be 100% sure. The court system works under the premise of reasonable doubt. The intermediate address creates enough reasonable doubt to make this not be a privacy leak.

To use the naked picture analogy, moving your coins to a new address is like blurring a naked picture of yourself. Each time your funds moves addresses, a stronger blur is applied. If somebody publishes a naked picture of you that is blurred, is that a violation of privacy? This is sort of a philosophical question.

Sending coin to an address that is being re-used is technically decreasing the probability that its you since the change address is not ambiguous, but it doesn't destroy your privacy, it merely decreases it slightly. Its like somebody taking a blurry naked picture and making it slightly less blurry. Its still blurry so you're still private.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!