SirArthur2 (OP)
Newbie

Activity: 4
Merit: 1
|
 |
May 20, 2026, 09:47:05 PM |
|
I'd an idea to compress BIP-39 seeds, make them to become a tiny string suitable for QR or RFID storage. The process: - We convert the wordlist into line numbers.
- We base58 encode the corresponding line number to the given word, so that each word will be turn into exactly 2 characters (zero padded)
- We can add a + sign to the end of the generated output or not, indicating if the seed is expanded with a unknown word or not
Eg: Let's assume the following seed (freshly generated using BIP-39 tool): cannon garment estate enforce remind attract about that east retreat uncle route method practice chunk As QR this will generate a massive code that low definition cameras struggle to read. Representing a 102 characters long string. Using the base58 word-to-line-number compressor, we get 5dEFBhBFS6345XuAdSQZeT2LMQP6d , a 30 chars long string, rendering a much smaller and easy readable QR code. Sample Python code for this (requires tkinter and the wordlist, named as english.txt in the same folder as the .py script - the wordlist can be found at BIP-39's Github [ https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt ]): #!/usr/bin/python3 import os import base64 import tkinter as tk import tkinter.messagebox as messagebox from typing import Mapping, Union
BITCOIN_ALPHABET = \ '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz' dir_path = os.path.dirname(__file__) with open(f"{dir_path}/english.txt", "r") as f: words = [line.strip() for line in f if line.strip()]
win = [Suspicious link removed]() win.title("BIP-39 Compressor/Decompressor") frame1 = tk.Frame(master=win) frame1.grid(row=0, column=0, padx=5, pady=5)
label1 = tk.Label(master=frame1, text="Decompressed Seed") label1.grid(row=0, column=0) label2 = tk.Label(master=frame1, text="Your words BIP-39 seed") label2.grid(row=0, column=2) decomp = tk.Text(master=frame1, height=10) decomp.grid(row=1, column=0, columnspan=3)
btn = tk.Button(master=frame1, text="Compress", command=lambda: compressSeed()) btn.grid(row=2, column=2)
frame2 = tk.Frame(master=win) frame2.grid(row=1, column=0, padx=5, pady=5)
label3 = tk.Label(master=frame2, text="Compressed Seed") label3.grid(row=0, column=0) label4 = tk.Label(master=frame2, text="Your compressed seed") label4.grid(row=0, column=2) comp = tk.Text(master=frame2, height=1) comp.grid(row=1, column=0, columnspan=3)
btn2 = tk.Button(master=frame2, text="Decompress", command=lambda: decompressSeed()) btn2.grid(row=2, column=2) expanded_var = tk.BooleanVar() check_btn = tk.Checkbutton( master=win, text="Mark as Expanded (adds '+' to output)", variable=expanded_var, ) check_btn.grid(row=3, column=0, sticky='w', pady=10)
def b58encode(i: int, default_one: bool = True, alphabet: bytes = BITCOIN_ALPHABET) -> str: if not i and default_one: return alphabet[0:1] string = "" base = len(alphabet) while i: i, idx = divmod(i, base) string = alphabet[idx:idx+1] + string return string
def b58decode(v: Union[str, bytes], alphabet: bytes = BITCOIN_ALPHABET, *, autofix: bool = False) -> int: map = _get_base58_decode_map(alphabet, autofix=autofix) decimal = 0 base = len(alphabet) for char in v: decimal = decimal * base + map[char] return decimal
def _get_base58_decode_map(alphabet: bytes, autofix: bool) -> Mapping[int, int]: invmap = {char: index for index, char in enumerate(alphabet)} groups = [b'0Oo', b'Il1'] for group in groups: pivots = [c for c in group if c in invmap] if len(pivots) == 1: for alternative in group: invmap[alternative] = invmap[pivots[0]] return invmap
def compressSeed(): bip_words = decomp.get("1.0", "end-1c").strip().split() out = [] for word in bip_words: if word in words: pos = words.index(word) out.append(b58encode(pos + 1)) else: messagebox.showinfo("Seed Error", "The word " + word + " wasn't found in the dictionary!") compressed_seed = "".join(out) # Append '+' if expanded option is selected if expanded_var.get(): compressed_seed += '+' comp.delete("1.0", "end-1c") comp.insert("1.0", compressed_seed)
def decompressSeed(): compressed_seed = comp.get("1.0", "end-1c").strip() # Remove trailing '+' if present (expanded seed indicator) if compressed_seed.endswith('+'): compressed_seed = compressed_seed[:-1] messagebox.showinfo("Expanded Seed Alert", "The seed appears to be expanded with an unknown secret extra word!")
decompressed_words = [] i = 0 while i < len(compressed_seed): chunk = compressed_seed[i:i+2] if not chunk: break pos = b58decode(chunk) decompressed_words.append(words[pos - 1]) i += 2
decomp.delete("1.0", "end-1c") decomp.insert("1.0", " ".join(decompressed_words))
win.mainloop() Footnote: sorry for posting from a new account, but I don't know where's my TOTP code (and don't quite remember to had create one) for my old account.
|
|
|
|
|
odolvlobo
Legendary

Activity: 5026
Merit: 3780
|
 |
May 21, 2026, 08:31:38 AM |
|
I'd an idea to compress BIP-39 seeds, make them to become a tiny string suitable for QR or RFID storage.
A BIP-39 phrase is an encoding of a binary value. If you want to "compress" the phrase, you can use the binary value instead. It can't be compressed any more than that. If you want text, I think base-64 is a better choice, because it is more compact and more widely used.
|
Join an anti-signature campaign: Click ignore on the members of signature campaigns. PGP Fingerprint: 6B6BC26599EC24EF7E29A405EAF050539D0B2925 Signing address: 13GAVJo8YaAuenj6keiEykwxWUZ7jMoSLt
|
|
|
|
BattleDog
|
This is not really compression, BIP39 words are already an encoding of entropy plus checksum. Each word is basically an 11-bit index into a 2048-word list, so replacing the words with line numbers and then base58'ing those numbers is not squeezing magic juice out of it. You are just moving from a human-readable standard format into a custom format that future-you and/or tired-you will have to remember how to decode.
For QR use, sure, representing the raw index/entropy data directly can make the QR smaller. That part is reasonable. But I would not casually store this on RFID/NFC unless the threat model is "I hope nobody waves a reader near my backup." Seeds are not coupons. Anything electronic that can be read conveniently can usually be stolen conveniently too.
The bigger danger is making a clever backup that only one Python script understands. Wallets understand BIP39 words. Humans can check BIP39 words. Recovery tools understand BIP39 words. Your custom base58 seed-string with a "+ means unknown expanded word" flag is the sort of thing that feels elegant now and becomes archaeological pain in 2031 when you are trying to recover coins from an old drawer and a half-dead laptop. If doing this for fun, fine. If doing this for actual funds, keep the real BIP39 phrase backed up in a boring, standard, preferably non-electronic way. Boring survives. Clever often needs tech support.
|
|
|
|
|
SirArthur2 (OP)
Newbie

Activity: 4
Merit: 1
|
 |
May 21, 2026, 02:03:36 PM Last edit: May 21, 2026, 02:21:30 PM by SirArthur2 |
|
That's some valid criticism, let me address it. A BIP-39 phrase is an encoding of a binary value. If you want to "compress" the phrase, you can use the binary value instead. It can't be compressed any more than that. We are talking about compress to human-usable format, not binary formats. The computed seed for that textual representation is eac38d38b91495d46f12e28790c78cf7c5da2937f3b1ff7332ad68007f06316e8156be8520dc880db3946fac8e0e896f6ce27a367e204ac3feb50676a56ab10c ...or decimal 7377687107399363427919506940445526085943085060767798863389917008452289517492375 9375047662272726680500679881548796212384941982210304075203137347288298516924490 7766500354732069958083853735531831032887267586702456712895290533588458780 This has no use for humans, it's the computer technical part. If you want text, I think base-64 is a better choice, because it is more compact and more widely used.
Even base58 is too much, base46 would be the ideal base size [sqrt(2048)]. Base64 wouldn't give any advantage, the whole idea is to compact a number in maximum of 2 chars. The bigger danger is making a clever backup that only one Python script understands. You can even do it by hand with pen and paper, sorry for use GTK in the sample Pyhton, all you need is to split it by pairs, convert it from b58 to decimal and check what line number it corresponds to in BIP-39 wordlist, even Windows Notepad can show line numbers, and if you open in Github instead of downloading the file, Github UI already shows you to the line number. Using: https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt 5d = decimal 268; line 268 cannon EF = 768 = garment Bh = 620 = estate BF = 594 = enforce (...)
As for the "+" it's to remind you that is not the whole seed, the whole purpose of add a custom word to the derivation path is so that if someone gets your seed backup won't be able to get your funds, but you can write it like: 5dEFBhBFS6345XuAdSQZeT2LMQP6d+<your custom word>
|
|
|
|
|
nc50lc
Legendary

Activity: 3150
Merit: 8800
Self-proclaimed Genius
|
 |
May 22, 2026, 05:26:01 AM |
|
We are talking about compress to human-usable format, not binary formats. The computed seed for that textual representation is
They are talking about the entropy that the words represent, not the binary seed that's computed from it. In your example, those 15 words represent a considerably short 160-bit value ( excluding the checksum). In Hex: 216bfd35a51b5c1d801eff45b703b25e48c352ca in Base64: IWv9NaUbXB2AHv9FtwOyXkjDUso= Since you mentioned BIP39 tool, you can find it by toggling " Show entropy details" after typing/generating the seed phrase.
|
|
|
|
NotATether
Legendary

Activity: 2338
Merit: 9727
┻┻ ︵㇏(°□°㇏)
|
 |
May 22, 2026, 12:20:17 PM |
|
We are talking about compress to human-usable format, not binary formats. The computed seed for that textual representation is eac38d38b91495d46f12e28790c78cf7c5da2937f3b1ff7332ad68007f06316e8156be8520dc880db3946fac8e0e896f6ce27a367e204ac3feb50676a56ab10c ...or decimal 7377687107399363427919506940445526085943085060767798863389917008452289517492375 9375047662272726680500679881548796212384941982210304075203137347288298516924490 7766500354732069958083853735531831032887267586702456712895290533588458780 This has no use for humans, it's the computer technical part. You don't have to compress that many bytes. Since the dictionary size is 2048 words, each word can be encoded in 12 bits of information. For example - abandon: 0x0000 (using 0-based indexing). For 12 words, that's 144 bits = 18 bytes. So you only need to encode 18 bytes, which is really small, but it assumes that the wordlist is constant and cannot be changed. Thus it will break for e.g. Electrum seeds. If you want to make it human readable, you'd use Base64 which adds about 33% more length.
|
|
|
|
SirArthur2 (OP)
Newbie

Activity: 4
Merit: 1
|
 |
May 22, 2026, 03:37:37 PM |
|
Base64 is NOT human friendly, that's why Satoshi used base58 for Bitcoin. Also you would gain nothing with base64, other than a potential confusing dictionary. Be it base64 or any other base starting from base46, you will always require two chars. Bitcoin base58 alphabet ordination is not only well known as this tools is meant to be used with it, so if base58 disappears your bitcoins and seed won't have any use whatsoever.
The only uptick would be a base2048, in which you would be using only one char, but our alphabet clearly lacks characters enough for it. (And actually that's what a seed is, a number in base2048 format)
So, please, let's stop with b64, it's not a "standard" is just the highest base you can go in geometric progression of 8 with the ASCII table printable characters (you could probably go to base128 using the expanded table, but adding odd totally unreadable symbols), but that's it. Good for machines, meaningless to humans, and when comes to this finality we want it to be human usable and as simple as possible - that's why we use words and not simply two bytes stuck together in hex format.
By just using the entropy base58 you would get "U1MVu8LMiXXMUKqBUtuCZ3ZrUaV", 1 char shorter than base64. The issue is, using entropy will disable the capacity of doing it with pen and paper.
As for "other wordlists", sure, but as long as it doesn't exceed 3364 words, base58 pairs can handle it. You probably will need the software used to generate the seed, if not pure BIP-39, for get its wordlist, just that.
|
|
|
|
|
NotATether
Legendary

Activity: 2338
Merit: 9727
┻┻ ︵㇏(°□°㇏)
|
 |
May 22, 2026, 05:56:41 PM |
|
Base64 is NOT human friendly, that's why Satoshi used base58 for Bitcoin. Also you would gain nothing with base64, other than a potential confusing dictionary. Be it base64 or any other base starting from base46, you will always require two chars. Bitcoin base58 alphabet ordination is not only well known as this tools is meant to be used with it, so if base58 disappears your bitcoins and seed won't have any use whatsoever.
There's absolutely no problem with using other bases, since the idea of such a thing is to make them sharable by QR codes. Users are never actually going to read the string anyway. If they want the human-readble version, well that's why mnemonic words exist.
|
|
|
|
odolvlobo
Legendary

Activity: 5026
Merit: 3780
|
 |
Today at 06:37:14 AM |
|
A BIP-39 phrase is an encoding of a binary value. If you want to "compress" the phrase, you can use the binary value instead. It can't be compressed any more than that. We are talking about compress to human-usable format, not binary formats.... You wrote: I'd an idea to compress BIP-39 seeds, make them to become a tiny string suitable for QR or RFID storage.
Neither of those are "human-usable" formats. So, please, let's stop with b64, it's not a "standard" ...
Base 64 is a standard: RFC 4648: The Base16, Base32, and Base64 Data Encodings
|
Join an anti-signature campaign: Click ignore on the members of signature campaigns. PGP Fingerprint: 6B6BC26599EC24EF7E29A405EAF050539D0B2925 Signing address: 13GAVJo8YaAuenj6keiEykwxWUZ7jMoSLt
|
|
|
SirArthur2 (OP)
Newbie

Activity: 4
Merit: 1
|
 |
Today at 02:59:22 PM Last edit: Today at 04:06:45 PM by SirArthur2 |
|
Discussion is going sideways for no reason.
Let's sum this up:
What's the intent?
Provide more options to store a Bitcoin (or BIP-39 altcoin) seed.
A seed is the weakest link of security, and often induce users on error, such as think that if someone gets the seed but not his gizmo (trezor, ledger, whatever) will not be able to take his coins, others think that reordering the seed or not write one word will make any difference... and ultimately simply write it down in a piece of paper makes that whole security of your "air-gapped, unhackable, 512bits encrypted device" depends on how secure that piece of paper is.
QR's and RFID (NFC)
Those formats aren't human "immediately" readable (you can learn how to read a QR using your eyes anyway), but nowadays technology made those readers widely available; your mobile from 2015 can read a QR, your mobile from 2020 most likely can read an NFC tag. You can print a QR to paper, but you can also engrave it to metal or wood... there are a lot of options to work with these formats nowadays.
However, those supports often just support a bunch of bytes, RFID tags often as low as 64 bytes, QR codes, as you encode more data, become increasingly hard to read or require a much higher printing definition or size.
About bases
You can work with any base at all, as higher the base the lower the number of symbols to represent a number, however we run out of symbols quickly to further increase it and as we go up we get in troublesome symbols (base 64 includes / - making it unsuitable to write filenames at any *nix based FS, and ambiguous characters, as O and 0), the RFC doesn't mean "it's a standard that everyone should use", the RFC is to set the alphabet (the symbol orders) as we pass 9 (in which case isn't even so as b64(9) = 67). We have the decimal system (base10) as convention for numerals, so to translate any base to another we need to know the proper place of that symbol. Satoshi got up to base 58 as the upper most usable by humans using the western alphabet, base64 is way too much machine only, including an useless to humans pad terminator (=).
This said, there's no way to get to base2048, in which case each line number would be a single character, using our set of alphabet and symbols, the ideal base would be base46 as the square root of 2048 is 45.25..., in which case it also be 2 characters per number, but taken base58 is already there it fulfills the purpose.
The human decoding
Despite the need of a piece of technology to decode the QR or tag contents, this extracts simple readable text, the further work (base58 to decimal to line number on the wordlist) can be done totally offline and with a pen and a piece of paper.
Final considerations
Even thus the system "complicates" a bit the process, it still offers alternative storage systems, specially non-human eye obvious, such as write down in a paper and put to a vault. Also to note that the seed backup isn't to be interacted with on regular basis, on which this process would be a pain, but just as last resort due to hardware failure (trezor, ledger, etc, bricked).
|
|
|
|
|
|