Title: [Experimental-2] Better mnemonic Post by: Coding Enthusiast on April 13, 2021, 06:56:38 AM We're trying to address some of the problems and shortcoming with the existing mnemonic algorithms namely BIP-39 and Electrum. The idea is to add the information that the wallet needs to derive child keys to final string that the user writes down and more importantly we see mnemonic as a simple encoding and avoid modifying the entropy when deriving the BIP-32 seed (see #4).
The C# implementation can be found here: https://github.com/Autarkysoft/Denovo/blob/master/Src/Autarkysoft.Bitcoin/Experimental/BetterMnemonic.cs Like any experimental class under Experimental namespace this is a rough idea and the code is untested. Code: 4 bit version | 4 bit depth | 32 bit index * depth | CompactInt entropy length | entropy | checksum (pad) 1. Scalability To make the algorithm scalable a small 4 bit version is added to the beginning of the encoding. This way we can have different definitions using the same scheme. 2. Derivation path/script type In order to let the wallet know which BIP-32 derivation path to use, that path has to be encoded into the mnemonic. This path could also act as the script (address) type of the child keys. For instance m/84'/0'/0'/0 is for P2WPKH addresses. The downside is increasing the length of the final mnemonic. This could be mitigated by introducing version 2 where instead of encoding the entire path a 4 bit integer is used to indicate the predefined paths. 16 hard-coded paths can be defined this way and for anything new next version could be used. version 1: 4-bit version | 4 bit derivation path depth | each index as 32-bit unsigned integers version 2: 4-bit version | 4 bit hard-coded path BIP-39 doesn't have this feature. Electrum has a partial solution by first increasing the entropy size from 128 to 132 bits then brute forcing a version that indicates the derivation paths and script types 3. Creation time This can be useful for the full nodes to avoid re-scanning the entire blockchain when the user imports their mnemonic. The creation time (as a LockTime object) can quickly tell the client from when in the history to begin the scan. For example a mnemonic created today doesn't have to scan the 300 GB of existing history. Since this is LockTime as used in transactions (instead of simple DateTime) it could be used by both online and offline clients generating the mnemonic as the online synced client knows the block height and can use that and the offline (cold storage) can use the DateTime (values as Unix timestamp and above 500,000,000). The rest is implementation detail, for example the wallet UI can show the LockTime (block height or DateTime) and still give the user the option to do a full rescan. 4. No modification of the initial entropy This is the main and biggest difference. Both BIP-39[2] and Electrum[3] are modifying the mnemonic: A. Modifying by normalization BIP-39 uses a Unicode normalization with full compatibility decomposition (form KD) but Electrum takes a step further and modifies the input mnemonic more by removing accents, changing Chinese words, removing spaces for word lists other than English, etc. Then the byte array representation of the string is used to derive the keys. This can create incompatibility between different implementations and lead to losses specially for any word list other than English. If the words are only used to look up index of each word in the word list to get the entropy out and nothing else, this problem is eliminated. In other words if user enters a word like "aliener" instead of "aliéner" (from French word list) it still can be searched inside the word list and if the search fails (due to bad normalization) the import process fails right away instead of going ahead. While normalizing this word can give different values: Code: 0x616c6965cc816e6572 No normalization Code: そつう れきだい ほんやく わかす りくつ ばいか ろせん やちん そつう れきだい ほんやく わかめ B. Use the UTF-8 decoded bytes of the mnemonic instead of the entropy When creating a mnemonic a random entropy with a good distribution of bits is chosen. For example in a 12 word mnemonic the entropy is an octet string with length of 16 (ie. 128 bits). Each octet can have a value between 0 and 255 inclusive which is 256 different values. When this is converted to 12 words using English word list then UT8 decoded to be used to derive BIP-32 seed, it is a longer octet string but each octet can only have a value between 97 and 122 inclusive which is only 26 different values. In other words a big bias is introduced in the input of PBKDF2 while the initial entropy had no bias at all (assuming it was chosen using a strong RNG). By using the entropy itself we can eliminate this bias and also avoid the normalization bugs that could occur in different implementations while greatly simplifying the implementation. Downside The only disadvantage of this proposal is the longer mnemonic. For example a 12-word BIP-39/Electrum mnemonic would turn into a 26-word version 1 BetterMnemonic using BIP-84 derivation path (without locktime) or 14-word version 2+ (without locktime). [1] Experimental-1: https://bitcointalk.org/index.php?topic=5245712.0 [2] https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki#from-mnemonic-to-seed [3] https://github.com/spesmilo/electrum/blob/392a648de5f6edf2e06620763bf0685854d0d56d/electrum/mnemonic.py |