thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 08, 2013, 10:11:24 AM |
|
3. Isn't it cleaner for type-1 if k_i is the direct output of a HMAC?
Yes, it seems better to do k_i=I_L instead of k_i=k_par+I_L (mod n) If I'm guessing correctly, the reason that BIP32 specifies that for I_L>=n that resulting keypair is invalid is because we would like to have the privkey chosen uniformly at random from the valid range, so if we take I_L modulo n then the smaller privkey values will have a little bit higher probability? Edit: instead of specifying that k_i is invalid, we could specify for example k_i=SHA256(I_L) in this case (and invoke SHA256 again if k_i is still invalid, and so on), so that we couldn't have forced gaps in the indices of the HD wallet? But if the probability is less than 1/2^127 then specifying that i is invalid is also fine. I still agree with thanke that for type-1 derivation it makes slightly more sense to have k_i=I_L and specify that k_i is invalid if I_L=0 or I_L>=n, rather than to have k_i=k_par+I_L (mod n) and specify that k_i is invalid if I_L>=n. Note that the OP of this thread also specifies for type-1 that "Privatekey(type,n) is then simply set to H(n|S|type)". With k_i=I_L we could still treat d=k_i-k_par as the difference, so d*G+K_par is the corresponding pubkey, in case this d in needed for compatibility with type-2 in some possible context. Since n = 2^256 - 2^32 - 2^9 - 2^8 - 2^7 - 2^6 - 2^4 - 1 the case I_L >= n occurs in one out of 2^224 times...
|
|
|
|
iddo
|
|
May 08, 2013, 10:15:40 AM |
|
Since n = 2^256 - 2^32 - 2^9 - 2^8 - 2^7 - 2^6 - 2^4 - 1 the case I_L >= n occurs in one out of 2^224 times...
I think that that's the order of the field F_p, not n=order(G)
|
|
|
|
thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 08, 2013, 11:24:50 AM |
|
True, the order of G is 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141 so roughly one out of 2^128 times
|
|
|
|
iddo
|
|
May 08, 2013, 09:27:22 PM |
|
It is generally discouraged to receive more than one payment to the same address. One reason is anonymity (not only the receiver's, also the payer's anonymity), the other reason is that if you have two outputs to the same address, you're unlikely to spend them both at the same time, so the public key will be revealed before you get to spend the second output. I am saying that this general guideline is not compatible with using intermediate tree nodes for receiving, because an intermediate node is too "static" (it roots a subtree which probably has a significant lifetime). Only leafs are suitable "one-time objects".
I don't really see the problem here. The default HD wallet layout that probably most users will have is just one or more subaccounts where the subaccount is an unbounded linear path, meaning that it'd be similar to the current random-independent wallet in the sense that the user would just keep generating new receiving addresses as needed, except that your initial backup of the root node is enough to restore all the subsequent privkeys, and greater security / less hassle because there's no need to access the encrypted privkeys in order to generate new receiving addresses (plus the other possible benefits like type-2 pubkey delegation and brainwallet passphrase).
|
|
|
|
iddo
|
|
May 08, 2013, 09:28:46 PM |
|
I still agree with thanke that for type-1 derivation it makes slightly more sense to have k_i=I_L and specify that k_i is invalid if I_L=0 or I_L>=n, rather than to have k_i=k_par+I_L (mod n) and specify that k_i is invalid if I_L>=n. Note that the OP of this thread also specifies for type-1 that "Privatekey(type,n) is then simply set to H(n|S|type)". With k_i=I_L we could still treat d=k_i-k_par as the difference, so d*G+K_par is the corresponding pubkey, in case this d in needed for compatibility with type-2 in some possible context.
One tiny correction: k_i=n is valid according to https://en.bitcoin.it/wiki/Private_key, so if type-1 is k_i=I_L then k_i is invalid when I_L=0 or I_L>n Alternatively, type-1 could be k_i=1+I_L, and it'd be invalid when k_i>n Or maybe we should stick with the current k_i=k_par+I_L (mod n), any of these 3 possibilities would be OK.
|
|
|
|
Pieter Wuille
|
|
May 08, 2013, 10:26:04 PM |
|
The reason for specifying that I_L >= n isn't allowed is indeed to guarantee uniformity. However, as this has such a ridiculously small chance to ever occur, I don't want whatever rule is used to deal with it to be complex to implement.
Also, the reason for using the same construction (k_i = k_par + I_L) for type-1, is to 1) make the code to implement it more uniform 2) not give a performance advantage to type-1. I agree that strictly from a security point of view k_i = I_L is just as good.
|
I do Bitcoin stuff.
|
|
|
iddo
|
|
May 09, 2013, 06:14:57 PM Last edit: May 09, 2013, 07:12:11 PM by iddo |
|
The reason for specifying that I_L >= n isn't allowed is indeed to guarantee uniformity. However, as this has such a ridiculously small chance to ever occur, I don't want whatever rule is used to deal with it to be complex to implement.
Also, the reason for using the same construction (k_i = k_par + I_L) for type-1, is to 1) make the code to implement it more uniform 2) not give a performance advantage to type-1. I agree that strictly from a security point of view k_i = I_L is just as good.
OK, great, please let me just mention that IMHO you should finalize BIP32 as it is now. In the time period from when BIP32 is final until it's ready to be rolled out in the Satoshi client, maybe we should solicit more peer review by other crypto specialists, just to be extra safe before this major change. Regarding thanke's alternative scheme, I'd say that it could have some drawbacks (possibly less security/privacy, using less than optimal entropy), and its use case isn't too convincing (and doesn't support type-1 derivations that already existed in the subtree, so we'd need to be careful not to cause mix-ups). There are still pending issues that are external to BIP32 itself, and as you mentioned it'd be good to have recommended guidelines. I'll summarize here several external issues that seemed important to me: 1) exporting some intermediate node as the root node of a new HD wallet. One use case for this is the hot storage / cold storage issue, where we use type-1 at some relevant place of the tree structure to derive a new keypair whose subtree would be considered hot storage, and exporting this new keypair as a root of a secondary HD wallet file so that its AES passphrase will decrypt only the hot storage privkeys. In this scenario it could be nice that the secure cold storage wallet also gives access to the hot storage part. 2) incentivize users to keep their chaincodes private, I recommended to have a deniable encryption feature in this context. 3) HD brainwallet passphrase derivation via self-descriptive strengthened keying. BTW if you agree that scrypt makes sense for password derivation, this also implies that you wouldn't need to mess with OpenCL code in the Satoshi client (because we'd minimize the GPU advantage anyway). I asked Colin Percival about his recommended scrypt parameters for password derivation and he said r=8, p=1: http://bitbin.it/E68HeKkM
|
|
|
|
thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 10, 2013, 12:09:38 PM Last edit: May 10, 2013, 12:57:14 PM by thanke |
|
I found an argument why the multiplier I_L in K_i := I_L *Kpar (or k_i := I_L*G + Kpar) should depend on Kpar. Recall: In BIP 32 we have (IL,IR) = func(cpar,Kpar,i) whereas I previously had suggested (IL,IR) = func(cpar,i). The argument is about the provability of the link between Kpar and K_i. There are three things that one might want to prove: 1. Kpar and Ki have the same owner 2. Ki was derived from Kpar (and not some other parent) 3. Kpar existed before Ki Revealing I_L clearly proves 1. So let's discuss the other two properties. Case IL=func(cpar,i): Given Ki we can choose arbitrary cpar and i, compute I_L:=func(cpar,i), compute Kpar with Ki=I_L*Kpar. So we can trivially come up with infinitely many plausible parents Kpar. Case IL=func(cpar,Kpar,i): The above is not possible anymore, at least if Kpar is properly hashed in func. If we know Kpar we can check that it is a parent, otherwise we cannot come up with any plausible parent at all. So we can prove 2. and 3. after all. For this to be useful, as we might not want to reveal cpar, we would define: I_L := func(func(cpar,i),Kpar) and reveal func(cpar,i) as proof of the link. So I actually suggest (for type-2): I := HMAC( HMAC(cpar,i) , Kpar) However, this is not yet an argument why c_i should be func(cpar,Kpar,i) instead of func(cpar,i).
|
|
|
|
thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 10, 2013, 12:56:22 PM |
|
About type-2. Here is a strong argument that re-using chaincodes across different pubkeys makes sense. The argument is multisigs. We can have a multisig address as "parent", which contains several pubkeys, and still a unique chaincode for it, ie. one chaincode for all the pubkeys that appear in the multisig. If I understand Pieter correctly, his concern was to keep the wallet organized around addresses, which serve as the key to a table in which we store the values "pubkey", "privkey", "chaincode" associated to the address. "Extended" multisig addresses can be totally consistent with that: several pubkeys are stored under a multisig address, still only one chaincode. To get a derived multisig address, CKD is applied to each pubkey, each time with the same chaincode. If you consider hierarchies of multisig addresses this turns into an argument why c_i should not depend on Kpar. Because if the parent is a multisig address, which of the pubkeys should c_i depend on? I say let c_i not depend on any of them. Then you can split your pubkeys across several machines and still derive arbitrary many levels on each machine, even if not all pubkeys of the multisig address are present on the machine. It all seems to fit very natural with how addresses are currently stored and handled. Summarizing this and my previous post, the suggestion for type-2 (additive version) is: K_i := HMAC( HMAC(cpar,i||0), Kpar ) * G + Kpar c_i := HMAC(cpar,i||1)
|
|
|
|
Pieter Wuille
|
|
May 10, 2013, 01:37:29 PM |
|
I imagined constructing multisig addresses of a set of (potentially completely independent) BIP32 chains, using equal indices in each, in lockstep.
In my opinion, BIP32 is mostly about generating sequences of keys, which can either be used directly as pay-to-pubkeyhash addresses, or combined using multisig into a P2SH address. Isn't that both more simple than trying to put BIP32 on top of multisig, and no reason for having the chaincodes be independent?
|
I do Bitcoin stuff.
|
|
|
thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 10, 2013, 02:46:27 PM |
|
I imagined constructing multisig addresses of a set of (potentially completely independent) BIP32 chains, using equal indices in each, in lockstep.
In my opinion, BIP32 is mostly about generating sequences of keys, which can either be used directly as pay-to-pubkeyhash addresses, or combined using multisig into a P2SH address. Isn't that both more simple than trying to put BIP32 on top of multisig, and no reason for having the chaincodes be independent?
I thought to derive like this from a multisig given by a CScriptID called id: - get the basescript with GetCScript(id) - extract the pubkeys with Solver() - get the chaincode by calling a new function GetCChaincode(id) - call CKD() for each pubkey - put together with SetMultisig() - store with AddCScript() As you describe it, you would have to keep track of several chaincodes or run another lookup at the third line. What if some of the individual pubkeys are simply not present in pwalletMain? Maybe they aren't on some of your machines. Just seems like more to take track of this way. Of course, combining pubkeys from arbitrary chains into a multisig is somewhat more flexible. But deriving from multisig scripts may be more straight forward than we thought. Individual pubkeys and multisig scripts are nicely unified as a "destination" after all. Why shall BIP 323 break that up again?
|
|
|
|
Pieter Wuille
|
|
May 10, 2013, 11:30:41 PM |
|
- get the basescript with GetCScript(id)
Where does the CScript from if you don't know the other keys in the first place? P2SH transactions do not contain the actual (public) keys anyway, so whatever you do, you need some metadata about the P2SH script in your wallet. I don't see the problem with extending this with the individual BIP32 public parent keys (if you need to be able to access the entire chain of multisig addresses in the first place).
|
I do Bitcoin stuff.
|
|
|
iddo
|
|
May 11, 2013, 01:59:28 AM Last edit: May 11, 2013, 07:21:28 AM by iddo |
|
If you consider hierarchies of multisig addresses this turns into an argument why c_i should not depend on Kpar. Because if the parent is a multisig address, which of the pubkeys should c_i depend on? I say let c_i not depend on any of them. Then you can split your pubkeys across several machines and still derive arbitrary many levels on each machine, even if not all pubkeys of the multisig address are present on the machine.
I don't understand, if some pubkeys aren't present, then what exactly do you claim that you could do now, which you couldn't do if c_i depends on K_par ? If there's a single chaincodes tree, you still cannot derive a new child pubkey if its parent pubkey isn't present. Sync'ing the multisig keys just according to the derivation indices makes more sense as far as I can see. This appears to be once again an argument for gaining efficiency in exchange for potential loss of security, and in the context of multisig the security concerns are more pronounced, because the whole point of having separate keys on different machines that control an address once they are combined is that the data that's stored on a single machine isn't enough to steal the coins, and now you're proposing that the chaincode data will be shared across the separate machines, just for (dubious?) gains in efficiency? Even if it's a good idea, you haven't shown how to support a single chaincode tree when type-1 derivations are allowed.
|
|
|
|
thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 11, 2013, 05:44:39 PM |
|
- get the basescript with GetCScript(id)
Where does the CScript from if you don't know the other keys in the first place? P2SH transactions do not contain the actual (public) keys anyway, so whatever you do, you need some metadata about the P2SH script in your wallet. I don't see the problem with extending this with the individual BIP32 public parent keys (if you need to be able to access the entire chain of multisig addresses in the first place). I mean you do have the other keys, of course, inside the CScript. But that doesn't necessarily mean that the other keys are present as entries of their own in the key database. HasKey() will still return false for the ids of the individual pubkeys of the script, unless you imported them individually. My impression was you wanted to stay as close as possible with current key/address handling and in particular wanted a map addr -> chaincode. The easiest multisig generalization (in terms of handling) would be one chaincode per multisig address. Matter of taste I guess. Individual chaincodes per pubkey in a multisig can do potentially more, like combining different pubkeys from unrelated chains into a multisig. But that's far beyond scope of BIP 32 I assume. With one chaincode per address there is no handling issues and you can trivially implement derivation from a multisig address with very few additional lines of code.
|
|
|
|
etotheipi
Legendary
Offline
Activity: 1428
Merit: 1093
Core Armory Developer
|
|
May 11, 2013, 09:54:40 PM |
|
My impression was you wanted to stay as close as possible with current key/address handling and in particular wanted a map addr -> chaincode. The easiest multisig generalization (in terms of handling) would be one chaincode per multisig address. Matter of taste I guess.
Individual chaincodes per pubkey in a multisig can do potentially more, like combining different pubkeys from unrelated chains into a multisig. But that's far beyond scope of BIP 32 I assume. With one chaincode per address there is no handling issues and you can trivially implement derivation from a multisig address with very few additional lines of code.
Btw, it is my intention with Armory to do exactly what you just said is out of scope. There's no reason to optimize the case of multiple wallets coming from the same seed, because most of the time you don't ever want a single device seeing all the keys. You want your phone to create one key, your computer to create another key, completely independently. In the case of a big organization, you want different branches to create their wallets independently, and swap pubkey-chaincode pairs. They need to do that to guarantee that no one employee ever has access to all the pieces. And that all the pieces are never geographically co-located. Plus, if there's, say, a vulnerability in one of the devices' RNG, you'd prefer that it only affect one piece of the multisig wallet. There's a lot of possibilities there. While there may some use cases where keys are generated on the same device, that's far from guaranteed. In fact, I think it's the opposite, if we want to promote security best practices.
|
|
|
|
thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 12, 2013, 09:26:29 AM |
|
Btw, it is my intention with Armory to do exactly what you just said is out of scope. There's no reason to optimize the case of multiple wallets coming from the same seed, because most of the time you don't ever want a single device seeing all the keys. You want your phone to create one key, your computer to create another key, completely independently. In the case of a big organization, you want different branches to create their wallets independently, and swap pubkey-chaincode pairs. Exactly: swap chaincodes. If they swap their chaincodes anyway they could just as well use the exact same one. (It doesn't make a difference securitywise if they do.) Plus, if there's, say, a vulnerability in one of the devices' RNG, you'd prefer that it only affect one piece of the multisig wallet.
That's an argument, indeed. Plus, a party may not want to rely on another party's entropy but prefer to use it's own. There's a lot of possibilities there. While there may some use cases where keys are generated on the same device, that's far from guaranteed. In fact, I think it's the opposite, if we want to promote security best practices.
Yes, lot's of different ways. I'm just saying prepare for the fact that chaincodes will get re-used across different pubkeys. Whether it's for multisigs or some other purpose. Even if it's not the default behaviour for multisigs and even if you don't promote it in any other way, people might still find applications for it and do it. Chaincodes are only semi-secret after all.
|
|
|
|
Pieter Wuille
|
|
May 12, 2013, 09:41:27 AM |
|
I still haven't seen a compelling argument for being able to change chaincode independently of keys. As far as I'm concerned, they constitute an somewhat-less-secret overflow entropy buffer for keys, and are thus intimately linked with them.
Yes, I understand there are use cases, but they seem far-fetched and either assume it is hard to set up a secure communication channel between trusted parties (which I think in practice is easy), or integrating them more tightly into multisig schemes (which I don't like; scripts/addresses are a layer strictly above key derivation, imho; in that sense I also dislike using the term "address" for identifying keys - it should only be called address when you're talking about the script template associated with a particular key, with the intention of receiving coins on it).
Maybe at some point there will be a use case important enough, and we can still standardize a system for "updating" a BIP32 chain with a new chaincode, though I don't like the complexity of reasoning about the security of that.
Unless someone comes up with a use case that it worth supporting (and hopefully doesn't require pretty much rewriting it from scratch...), I'm going to announce the current specification as final on the conference next week.
|
I do Bitcoin stuff.
|
|
|
thanke
Member
Offline
Activity: 104
Merit: 10
|
|
May 12, 2013, 01:05:05 PM |
|
BTW, there has not been a compelling argument for making I_L in type-2 depend on Kpar either. IMO this just closes a door, is completely unnecessary, and even complicates reasoning about the security of the whole scheme. But ok, let it be as it is and close that discussion now, as time pressurizes before the conference. Pieter, can you please address these three points: 1. Can we reserve more than one bit of the number i for different kind of derivations, not only the highest bit? This would allow addition other kinds of derivations or tweaks of existing ones in the future.
5. It is certainly possible to do everything with HMAC-SHA256 alone. For instance, if you need two values like (I_L,I_R) you can do I_L=HMAC(secret,0) and I_L=HMAC(secret,1). The question is, does it reduce dependencies of the code, code review, etc. to be worthwhile?
All this might get implemented on smardcards one day. I really don't understand why you'd want to rely on a new, second hash function here, SHA512. Entropy is _not_ a reason. And third, can you change I = HMAC( cpar, Kpar || i) in the public derivation to I = HMAC( HMAC(cpar,i), Kpar) ? I see the provability of the link as a quite an important property (cf. #228). BTW, the function I will propose for payer-derived payment addresses (which is basically just a deterministic wallet of the payee in the hands of the payer) will be K_derived := HMAC( m, K_base) *G + K_base where m is the (hash of the) invoice or contract that is being paid for. This would fit nicely together, as it would be the exact same function just with m substituted for HMAC(cpar,i). We could re-use code as well as security reasoning.
|
|
|
|
iddo
|
|
May 12, 2013, 01:51:04 PM |
|
BTW, there has not been a compelling argument for making I_L in type-2 depend on Kpar either. IMO this just closes a door, is completely unnecessary, and even complicates reasoning about the security of the whole scheme.
I offered you some arguments with regard to reduced security and privacy in post #218, and Pieter and I mentioned that you'd feed less than optimal entropy into the hash function (e.g. you'd repeatedly feed 256 pseudorandom bits to derive the next 512 pseudorandom bits, instead of 512-->512 as in BIP32). One reason why you might not find the entropy argument compelling is that you don't spell out your exact proposal on how to do type-2 and type-1 derivations. I'm not sure if you think that the chaincode should depend on the parent privkey with type-1 derivation.
|
|
|
|
iddo
|
|
May 12, 2013, 02:40:32 PM |
|
I found an argument why the multiplier I_L in K_i := I_L *Kpar (or k_i := I_L*G + Kpar) should depend on Kpar. Recall: In BIP 32 we have (IL,IR) = func(cpar,Kpar,i) whereas I previously had suggested (IL,IR) = func(cpar,i). The argument is about the provability of the link between Kpar and K_i. There are three things that one might want to prove: 1. Kpar and Ki have the same owner 2. Ki was derived from Kpar (and not some other parent) 3. Kpar existed before Ki Revealing I_L clearly proves 1. So let's discuss the other two properties. Case IL=func(cpar,i): Given Ki we can choose arbitrary cpar and i, compute I_L:=func(cpar,i), compute Kpar with Ki=I_L*Kpar. So we can trivially come up with infinitely many plausible parents Kpar. Case IL=func(cpar,Kpar,i): The above is not possible anymore, at least if Kpar is properly hashed in func. If we know Kpar we can check that it is a parent, otherwise we cannot come up with any plausible parent at all. So we can prove 2. and 3. after all. For this to be useful, as we might not want to reveal cpar, we would define: I_L := func(func(cpar,i),Kpar) and reveal func(cpar,i) as proof of the link. So I actually suggest (for type-2): I := HMAC( HMAC(cpar,i) , Kpar) What would be some possible use cases for your (2) and (3) here? You can prove that you know c_par such that both (I_L,I_R)=hash(c_par, K_par, i) and K_i=I_L*K_par, without revealing c_par itself, via zero-knowledge computational integrity (PCP) proofs. There will be a talk about it by my PhD supervisor in the upcoming Bitcoin conference (unfortunately I cannot attend the conference). So we could achieve your goals with (significantly) more complexity which would be used only when needed, instead of pushing this hash(hash()) extra complexity on everyone.
|
|
|
|
|