In either event, I think we agree that the 1M or 200k discrepancy is largely irrelevant. For brainwallets, there are two constraints on word selection: 1) They must be memorizable. 2) They must be randomly selectable.
Adding to point (2). To achieve maximum entropy, it is essential that no word is more or less likely to be selected than any other and each select event is independent from any other. Some people erroneously attempt to think up their own words or select them from random pages of some book.
Diceware uses five rolls of a six-sided die to do word selection. This gives 7,776 possible "words", some of which aren't words, aren't well-known, and won't be easily memorized. There are other lists out there, but they suffer the same constraints. 10,000 is a generous estimate of word pool size for this purpose.
Agreed. I made my own version of the Diceware list years ago to counter this problem. 10 000 words is indeed generous. Even as a native English speaker I wouldn't care to push much beyond 1000 words.
These days I use the
English 2048-word list supplied with BIP0039:
abandon ability able about above ... zero zone zoo
Memorizing 12+ words, selected at random via dice roll, is a mathematically provable method to generate a sufficiently safe brainwallet. Additional steps, shortcuts, obfuscations, etc are not necessary at best, and crippling to security at worst.
Certainly, shortcuts can cost entropy and while method obscurity may increase security, it will typically do so in a non-quantifiable way. Relying on one's intuition regarding the difficulty of divining an obscure method is to abandon a foundational premise of information theory.
However, I'd like to highlight key-stretching as a fair source of additional security for a true brainwallet. In essence, one simply forgets the last few words of their passphrase and brute-forces them whenever access is required.
I'd also like to expand on "sufficiently safe" here.
Selecting 12 words randomly and uniformly from a pool of 10 000 words gives 12 * log
2(10000) = 159.45 bits of entropy (2.d.p). Roughly speaking, there are as many equally plausible 12-word passphrases as there are Bitcoin addresses. Assuming the entropy of the passphrase is not reduced as it is converted into a private key, such a private key will be no less effective in securing a Bitcoin output than a standard random key.
Selecting 12 words from a pool of just 2048 yields
12 * log2(2048) = 12 * 11 = 132
bits of entropy. This is less secure than a standard address but is arguably "sufficiently safe" today. Electrum
1 seeds have 128 bits by default. Casascius coins used special 128-bit compact private keys.
Even 9 words from 2048 gives 99 bits of entropy. We're well past the point of general cryptographic recommendation here but as far as a convenience/security tradeoff is concerned, I believe there are cases where 9 words would be a reasonable choice. Extending your earlier point of reference: As of block #387287, approximately 2
83.71 hashes have been calculated by miners in Bitcoin's lifetime, and such a hash is computationally cheaper than converting a private key to an address.
[1] Most new Electrum seeds are 13 words from the pool of 2048 words I linked to above. One might expect such a seed to have 13 * 11 = 143 bits of entropy but some of the data is dedicated to a checksum/version-number and the final word is underutilised (usually begins 'ab' or 'ac').