Bitcoin Forum
May 22, 2026, 08:15:49 PM *
News: Latest Bitcoin Core release: 31.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Compressing BIP-39 Seeds  (Read 77 times)
SirArthur2 (OP)
Newbie
*
Offline

Activity: 3
Merit: 1


View Profile
May 20, 2026, 09:47:05 PM
Merited by BattleDog (1)
 #1

I'd an idea to compress BIP-39 seeds, make them to become a tiny string suitable for QR or RFID storage.

The process:

  • We convert the wordlist into line numbers.
  • We base58 encode the corresponding line number to the given word, so that each word will be turn into exactly 2 characters (zero padded)
  • We can add a + sign to the end of the generated output or not, indicating if the seed is expanded with a unknown word or not

Eg:

Let's assume the following seed (freshly generated using BIP-39 tool):

Code:
cannon garment estate enforce remind attract about that east retreat uncle route method practice chunk

As QR this will generate a massive code that low definition cameras struggle to read. Representing a 102 characters long string.

Using the base58 word-to-line-number compressor, we get
Code:
5dEFBhBFS6345XuAdSQZeT2LMQP6d
, a 30 chars long string, rendering a much smaller and easy readable QR code.

Sample Python code for this (requires tkinter and the wordlist, named as english.txt in the same folder as the .py script - the wordlist can be found at BIP-39's Github [ https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt ]):

Code:
#!/usr/bin/python3
import os
import base64
import tkinter as tk
import tkinter.messagebox as messagebox
from typing import Mapping, Union

BITCOIN_ALPHABET = \
    '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
dir_path = os.path.dirname(__file__)
with open(f"{dir_path}/english.txt", "r") as f:
    words = [line.strip() for line in f if line.strip()]

win = [Suspicious link removed]()
win.title("BIP-39 Compressor/Decompressor")
frame1 = tk.Frame(master=win)
frame1.grid(row=0, column=0, padx=5, pady=5)

label1 = tk.Label(master=frame1, text="Decompressed Seed")
label1.grid(row=0, column=0)
label2 = tk.Label(master=frame1, text="Your words BIP-39 seed")
label2.grid(row=0, column=2)
decomp = tk.Text(master=frame1, height=10)
decomp.grid(row=1, column=0, columnspan=3)

btn = tk.Button(master=frame1, text="Compress", command=lambda: compressSeed())
btn.grid(row=2, column=2)

frame2 = tk.Frame(master=win)
frame2.grid(row=1, column=0, padx=5, pady=5)

label3 = tk.Label(master=frame2, text="Compressed Seed")
label3.grid(row=0, column=0)
label4 = tk.Label(master=frame2, text="Your compressed seed")
label4.grid(row=0, column=2)
comp = tk.Text(master=frame2, height=1)
comp.grid(row=1, column=0, columnspan=3)

btn2 = tk.Button(master=frame2, text="Decompress", command=lambda: decompressSeed())
btn2.grid(row=2, column=2)
expanded_var = tk.BooleanVar()
check_btn = tk.Checkbutton(
    master=win,
    text="Mark as Expanded (adds '+' to output)",
    variable=expanded_var,
)
check_btn.grid(row=3, column=0, sticky='w', pady=10)

def b58encode(i: int, default_one: bool = True, alphabet: bytes = BITCOIN_ALPHABET) -> str:
    if not i and default_one:
        return alphabet[0:1]
    string = ""
    base = len(alphabet)
    while i:
        i, idx = divmod(i, base)
        string = alphabet[idx:idx+1] + string
    return string

def b58decode(v: Union[str, bytes], alphabet: bytes = BITCOIN_ALPHABET,
              *, autofix: bool = False) -> int:
    map = _get_base58_decode_map(alphabet, autofix=autofix)
    decimal = 0
    base = len(alphabet)
    for char in v:
        decimal = decimal * base + map[char]
    return decimal

def _get_base58_decode_map(alphabet: bytes,
                           autofix: bool) -> Mapping[int, int]:
    invmap = {char: index for index, char in enumerate(alphabet)}
    groups = [b'0Oo', b'Il1']
    for group in groups:
        pivots = [c for c in group if c in invmap]
        if len(pivots) == 1:
            for alternative in group:
                invmap[alternative] = invmap[pivots[0]]
    return invmap

def compressSeed():
    bip_words = decomp.get("1.0", "end-1c").strip().split()
    out = []
    for word in bip_words:
        if word in words:
            pos = words.index(word)
            out.append(b58encode(pos + 1))
        else:
                    messagebox.showinfo("Seed Error", "The word " + word + " wasn't found in the dictionary!")
    compressed_seed = "".join(out)
   
    # Append '+' if expanded option is selected
    if expanded_var.get():
        compressed_seed += '+'
   
    comp.delete("1.0", "end-1c")
    comp.insert("1.0", compressed_seed)

def decompressSeed():
    compressed_seed = comp.get("1.0", "end-1c").strip()
   
    # Remove trailing '+' if present (expanded seed indicator)
    if compressed_seed.endswith('+'):
        compressed_seed = compressed_seed[:-1]
        messagebox.showinfo("Expanded Seed Alert", "The seed appears to be expanded with an unknown secret extra word!")

    decompressed_words = []
    i = 0
    while i < len(compressed_seed):
        chunk = compressed_seed[i:i+2]
        if not chunk:
            break
       
        pos = b58decode(chunk)
        decompressed_words.append(words[pos - 1])
        i += 2

    decomp.delete("1.0", "end-1c")
    decomp.insert("1.0", " ".join(decompressed_words))

win.mainloop()

Footnote: sorry for posting from a new account, but I don't know where's my TOTP code (and don't quite remember to had create one) for my old account.
odolvlobo
Legendary
*
Offline

Activity: 5026
Merit: 3780



View Profile
May 21, 2026, 08:31:38 AM
Merited by ABCbits (3)
 #2

I'd an idea to compress BIP-39 seeds, make them to become a tiny string suitable for QR or RFID storage.

A BIP-39 phrase is an encoding of a binary value. If you want to "compress" the phrase, you can use the binary value instead. It can't be compressed any more than that.

If you want text, I think base-64 is a better choice, because it is more compact and more widely used.

Join an anti-signature campaign: Click ignore on the members of signature campaigns.
PGP Fingerprint: 6B6BC26599EC24EF7E29A405EAF050539D0B2925 Signing address: 13GAVJo8YaAuenj6keiEykwxWUZ7jMoSLt
BattleDog
Full Member
***
Offline

Activity: 241
Merit: 221



View Profile WWW
May 21, 2026, 09:51:14 AM
 #3

This is not really compression, BIP39 words are already an encoding of entropy plus checksum. Each word is basically an 11-bit index into a 2048-word list, so replacing the words with line numbers and then base58'ing those numbers is not squeezing magic juice out of it. You are just moving from a human-readable standard format into a custom format that future-you and/or tired-you will have to remember how to decode.

For QR use, sure, representing the raw index/entropy data directly can make the QR smaller. That part is reasonable. But I would not casually store this on RFID/NFC unless the threat model is "I hope nobody waves a reader near my backup." Seeds are not coupons. Anything electronic that can be read conveniently can usually be stolen conveniently too.

The bigger danger is making a clever backup that only one Python script understands. Wallets understand BIP39 words. Humans can check BIP39 words. Recovery tools understand BIP39 words. Your custom base58 seed-string with a "+ means unknown expanded word" flag is the sort of thing that feels elegant now and becomes archaeological pain in 2031 when you are trying to recover coins from an old drawer and a half-dead laptop. If doing this for fun, fine. If doing this for actual funds, keep the real BIP39 phrase backed up in a boring, standard, preferably non-electronic way. Boring survives. Clever often needs tech support.

SirArthur2 (OP)
Newbie
*
Offline

Activity: 3
Merit: 1


View Profile
May 21, 2026, 02:03:36 PM
Last edit: May 21, 2026, 02:21:30 PM by SirArthur2
 #4

That's some valid criticism, let me address it.

Quote
A BIP-39 phrase is an encoding of a binary value. If you want to "compress" the phrase, you can use the binary value instead. It can't be compressed any more than that.

We are talking about compress to human-usable format, not binary formats. The computed seed for that textual representation is

Code:
eac38d38b91495d46f12e28790c78cf7c5da2937f3b1ff7332ad68007f06316e8156be8520dc880db3946fac8e0e896f6ce27a367e204ac3feb50676a56ab10c
...or decimal 7377687107399363427919506940445526085943085060767798863389917008452289517492375 9375047662272726680500679881548796212384941982210304075203137347288298516924490 7766500354732069958083853735531831032887267586702456712895290533588458780

This has no use for humans, it's the computer technical part.

If you want text, I think base-64 is a better choice, because it is more compact and more widely used.

Even base58 is too much, base46 would be the ideal base size [sqrt(2048)]. Base64 wouldn't give any advantage, the whole idea is to compact a number in maximum of 2 chars.

Quote
The bigger danger is making a clever backup that only one Python script understands.

You can even do it by hand with pen and paper, sorry for use GTK in the sample Pyhton, all you need is to split it by pairs, convert it from b58 to decimal and check what line number it corresponds to in BIP-39 wordlist, even Windows Notepad can show line numbers, and if you open in Github instead of downloading the file, Github UI already shows you to the line number.

Code:
Using: https://github.com/bitcoin/bips/blob/master/bip-0039/english.txt
5d = decimal 268; line 268 cannon
EF = 768 = garment
Bh = 620 = estate
BF = 594 = enforce
(...)

As for the "+" it's to remind you that is not the whole seed, the whole purpose of add a custom word to the derivation path is so that if someone gets your seed backup won't be able to get your funds, but you can write it like:

Code:
5dEFBhBFS6345XuAdSQZeT2LMQP6d+<your custom word>
nc50lc
Legendary
*
Offline

Activity: 3150
Merit: 8792


Self-proclaimed Genius


View Profile
Today at 05:26:01 AM
Merited by ABCbits (2)
 #5

We are talking about compress to human-usable format, not binary formats. The computed seed for that textual representation is
They are talking about the entropy that the words represent, not the binary seed that's computed from it.

In your example, those 15 words represent a considerably short 160-bit value (excluding the checksum).
In Hex:
Code:
216bfd35a51b5c1d801eff45b703b25e48c352ca
in Base64:
Code:
IWv9NaUbXB2AHv9FtwOyXkjDUso=

Since you mentioned BIP39 tool, you can find it by toggling "Show entropy details" after typing/generating the seed phrase.

███████████████████████████
███████▄████████████▄██████
████████▄████████▄████████
███▀█████▀▄███▄▀█████▀███
█████▀█▀▄██▀▀▀██▄▀█▀█████
███████▄███████████▄███████
███████████████████████████
███████▀███████████▀███████
████▄██▄▀██▄▄▄██▀▄██▄████
████▄████▄▀███▀▄████▄████
██▄███▀▀█▀██████▀█▀███▄███
██▀█▀████████████████▀█▀███
███████████████████████████
.
.Duelbits PREDICT..
█████████████████████████
█████████████████████████
███████████▀▀░░░░▀▀██████
██████████░░▄████▄░░████
█████████░░████████░░████
█████████░░████████░░████
█████████▄▀██████▀▄████
████████▀▀░░░▀▀▀▀░░▄█████
██████▀░░░░██▄▄▄▄████████
████▀░░░░▄███████████████
█████▄▄█████████████████
█████████████████████████
█████████████████████████
.
.WHERE EVERYTHING IS A MARKET..
█████
██
██







██
██
██████
Will Bitcoin hit $200,000
before January 1st 2027?

    No @1.15         Yes @6.00    
█████
██
██







██
██
██████

  CHECK MORE > 
NotATether
Legendary
*
Offline

Activity: 2338
Merit: 9713


┻┻ ︵㇏(°□°㇏)


View Profile WWW
Today at 12:20:17 PM
 #6

We are talking about compress to human-usable format, not binary formats. The computed seed for that textual representation is

Code:
eac38d38b91495d46f12e28790c78cf7c5da2937f3b1ff7332ad68007f06316e8156be8520dc880db3946fac8e0e896f6ce27a367e204ac3feb50676a56ab10c
...or decimal 7377687107399363427919506940445526085943085060767798863389917008452289517492375 9375047662272726680500679881548796212384941982210304075203137347288298516924490 7766500354732069958083853735531831032887267586702456712895290533588458780

This has no use for humans, it's the computer technical part.

You don't have to compress that many bytes. Since the dictionary size is 2048 words, each word can be encoded in 12 bits of information. For example - abandon: 0x0000 (using 0-based indexing). For 12 words, that's 144 bits = 18 bytes.

So you only need to encode 18 bytes, which is really small, but it assumes that the wordlist is constant and cannot be changed. Thus it will break for e.g. Electrum seeds.

If you want to make it human readable, you'd use Base64 which adds about 33% more length.

 
 b1exch.to 
  ETH      DAI   
  BTC      LTC   
  USDT     XMR    
.███████████▄▀▄▀
█████████▄█▄▀
███████████
███████▄█▀
█▀█
▄▄▀░░██▄▄
▄▀██▄▀█████▄
██▄▀░▄██████
███████░█████
█░████░█████████
█░█░█░████░█████
█░█░█░██░█████
▀▀▀▄█▄████▀▀▀
SirArthur2 (OP)
Newbie
*
Offline

Activity: 3
Merit: 1


View Profile
Today at 03:37:37 PM
 #7

Base64 is NOT human friendly, that's why Satoshi used base58 for Bitcoin. Also you would gain nothing with base64, other than a potential confusing dictionary. Be it base64 or any other base starting from base46, you will always require two chars. Bitcoin base58 alphabet ordination is not only well known as this tools is meant to be used with it, so if base58 disappears your bitcoins and seed won't have any use whatsoever.

The only uptick would be a base2048, in which you would be using only one char, but our alphabet clearly lacks characters enough for it. (And actually that's what a seed is, a number in base2048 format)

So, please, let's stop with b64, it's not a "standard" is just the highest base you can go in geometric progression of 8 with the ASCII table printable characters (you could probably go to base128 using the expanded table, but adding odd totally unreadable symbols), but that's it. Good for machines, meaningless to humans, and when comes to this finality we want it to be human usable and as simple as possible - that's why we use words and not simply two bytes stuck together in hex format.


By just using the entropy base58 you would get "U1MVu8LMiXXMUKqBUtuCZ3ZrUaV", 1 char shorter than base64. The issue is, using entropy will disable the capacity of doing it with pen and paper.

As for "other wordlists", sure, but as long as it doesn't exceed 3364 words, base58 pairs can handle it. You probably will need the software used to generate the seed, if not pure BIP-39, for get its wordlist, just that.
NotATether
Legendary
*
Offline

Activity: 2338
Merit: 9713


┻┻ ︵㇏(°□°㇏)


View Profile WWW
Today at 05:56:41 PM
 #8

Base64 is NOT human friendly, that's why Satoshi used base58 for Bitcoin. Also you would gain nothing with base64, other than a potential confusing dictionary. Be it base64 or any other base starting from base46, you will always require two chars. Bitcoin base58 alphabet ordination is not only well known as this tools is meant to be used with it, so if base58 disappears your bitcoins and seed won't have any use whatsoever.

There's absolutely no problem with using other bases, since the idea of such a thing is to make them sharable by QR codes.

Users are never actually going to read the string anyway. If they want the human-readble version, well that's why mnemonic words exist.

 
 b1exch.to 
  ETH      DAI   
  BTC      LTC   
  USDT     XMR    
.███████████▄▀▄▀
█████████▄█▄▀
███████████
███████▄█▀
█▀█
▄▄▀░░██▄▄
▄▀██▄▀█████▄
██▄▀░▄██████
███████░█████
█░████░█████████
█░█░█░████░█████
█░█░█░██░█████
▀▀▀▄█▄████▀▀▀
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!