Bitcoin Forum
February 27, 2024, 08:10:07 AM
 News: Latest Bitcoin Core release: 26.0 [Torrent]
 Home Help Search Login Register More
 Pages: [1]
 Author Topic: Modulo Bias  (Read 242 times) This is a self-moderated topic. If you do not want to be moderated by the person who started this topic, create a new topic.
nullius (OP)
Copper Member
Hero Member

Offline

Activity: 630
Merit: 2610

If you don’t do PGP, you don’t do crypto!

 February 15, 2018, 07:11:56 PMMerited by hatshepsut93 (5), achow101 (3), ABCbits (2), DannyHamilton (1), johhnyUA (1), Taras (1), TechPriest (1)

A few weeks ago, I promised this explanation to hatshepsut93; and I half-wrote it at the time.  I am making this its own topic, because that’s not the first time here I’ve seen people either ask about this, or make this mistake without even realizing it.  For those in a hurry, a code snippet is below.

Roll a six-sided die with output d between 1 and 6, inclusive.  Convert the results d to a 2-bit number b, using the equation b = (d - 1) % 4, where “%” denotes “modulo”.

Here is how all potential inputs map to all potential outputs:

Code:
Input d: 1 2 3 4
5 6

Output b: 0 1 2 3

As you can see, the output numbers 2 and 3 each have one way of being chosen; whereas the numbers 0 and 1 each have two ways of being chosen.  That is to say, 0 and 1 are twice as likely as 2 and 3.

This is “modulo bias”; and it must be avoided anytime you need to pick a uniformly distributed output number from a range which mismatches the range of inputs.

Now, consider the common use case of generating a random alphanumeric password.  Picking a password alphabet in ASCII order in the 62-character range of [0-9A-Za-z] using a random_octet % 62 (0x3e), we obtain:

Code:
'0' '1' '2' '3' '4' '5' '6' '7' '8' '9' 'A' 'B' 'C' ... 'x' 'y' 'z'
-------------------------------------------------------------------
00  01  01  03  04  05  06  07  08  09  0a  0b  0c  ... 3b  3c  3d  % 0x3e
3e  3f  40  41  42  43  44  45  46  47  48  49  4a  ... 79  7a  7b  % 0x3e
7c  7d  7e  7f  80  81  82  83  84  85  86  87  88  ... b7  b8  b9  % 0x3e
ba  bb  bc  bd  be  bf  c0  c1  c2  c3  c4  c5  c6  ... f5  f6  f7  % 0x3e
f8  f9  fa  fb  fc  fd  fe  ff  [!!! OOPS !!!]

Observe that the eight characters [0-7] can be picked 5 different ways, whereas the others [8-9A-Za-z] can only be picked 4 different ways!  As a result, each character in [0-7] will be picked 5/256 = 1.953125% of the time, whereas each of the others will be picked only 4/256 = 1.5625% of the time.  (The decimals given are exact.)  Although that doesn’t look like much, it is a relative difference of 8 characters being a whopping 25% more likely than the other 54 characters.  You do not want a password with those properties!

Please keep handy and adapt as needed the following algorithm for avoiding modulo bias, here presented as a C snippet which I here copy with minor modifications from from FreeBSD’s libc (it was copied from OpenBSD, and probably somewhere else before that).  The code comment (not written by me) explains how it works.  Over the course of years, it will save you many instances of shooting yourself in the foot:

Code:
#include <stdint.h>

/*
* Add here a source of uniformly distributed,
* cryptographically secure random unsigned 32-bit integers:
*/
uint32_t arc4random(void);

/* Begin (mostly) copied code: */

/*
* Calculate a uniformly distributed random number less than upper_bound
* avoiding "modulo bias".
*
* Uniformity is achieved by generating new random numbers until the one
* returned is outside the range [0, 2**32 % upper_bound).  This
* guarantees the selected random number will be inside
* [2**32 % upper_bound, 2**32) which maps back to [0, upper_bound)
* after reduction modulo upper_bound.
*/
uint32_t
arc4random_uniform(uint32_t upper_bound)
{
uint32_t r, min;

if (upper_bound < 2)
return 0;

/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;
/*
* This could theoretically loop forever but each retry has
* p > 0.5 (worst case, usually far better) of selecting a
* number inside the range we need, so it should rarely need
* to re-roll.
*/
for (;;) {
r = arc4random();
if (r >= min)
break;
}

return (r % upper_bound);
}

[This thread is self-moderated, based on experience; it is for on-topic technical discussion only.]

 ∞/0 PGP: C2E9 1CD7 4A4C 57A1 05F6  C21B 5A00 591B 2F30 7E0C
On Truth.There is only one Bitcoin.Tips: bc1qas0cuqzdr8r9rv2n6v0vd50vddcaz0ayl6kfe5
😺 Solve the Lauda Memorial Puzzle for 0.001 0.002 0.00333333 BTC! 😺
1709021407
Hero Member

Offline

Posts: 1709021407

Ignore
 1709021407

1709021407
 Report to moderator
1709021407
Hero Member

Offline

Posts: 1709021407

Ignore
 1709021407

1709021407
 Report to moderator
"Your bitcoin is secured in a way that is physically impossible for others to access, no matter for what reason, no matter how good the excuse, no matter a majority of miners, no matter what." -- Greg Maxwell
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
hatshepsut93
Legendary

Offline

Activity: 2898
Merit: 2124

 February 15, 2018, 09:22:33 PMMerited by Anti-Cen (1)

Let's say I have some 6-sided dice and want to generate a new Bitcoin wallet. The goal is to get a 64-character long hex string that can be passed to BIP39 converter to get xprv and a nice mnemonic.
When I was researching about using dice for password generation, I've seen that you can just generate a long string of your dice throws and pass it to SHA256. So, in my case I have 5d6 and concatenating results of 28 rolls will give me slightly more than 128 bits of entropy, that will get passed to SHA-256 to get a hex string. Is this approach correct?

 .BEST.CHANGE.. ███████████████████████████████████████ BESTEXCHANGERATES ███████████████████████████████████████ ..BUY/ SELL CRYPTO..
HeRetiK
Legendary

Offline

Activity: 2856
Merit: 2038

 February 15, 2018, 11:07:44 PM

Let's say I have some 6-sided dice and want to generate a new Bitcoin wallet. The goal is to get a 64-character long hex string that can be passed to BIP39 converter to get xprv and a nice mnemonic.
When I was researching about using dice for password generation, I've seen that you can just generate a long string of your dice throws and pass it to SHA256. So, in my case I have 5d6 and concatenating results of 28 rolls will give me slightly more than 128 bits of entropy, that will get passed to SHA-256 to get a hex string. Is this approach correct?

SHA-256 being a cryptographic hash function the output should by definition be uniformly distributed, ie. not be affected by the problem described by nullius.

The tricky part is selecting an input with sufficient entropy, as to make it hard for an attacker to guess the input itself. Dice throws meet this condition.

 █▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀██          ▄         ▄      ▄▄▄▄▄█       ▄███      ▄███      ██████        ████      ████     ▀▀▀▀▀█         ████      █████          ████▄▄▄▄▄▄████▄▄▄▄▄▄▄▄█           ██████████████████████            ▀█████▄   ▀█████▄█              ▀█████▀   ▀█████▀█                 ▀▀        ▀▀██▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀WASABIWALLET▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▀▀▀▀▀▀▀████████████▄▄▄▄▄▄▄█ .....Your private Bitcoin wallet for desktop..... █▀▀▀▀▀▀████████████▄▄▄▄▄▄ ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀.Wasabi Wallet 2.0DOWNLOAD NOW.▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ ▀▀▀▀▀▀████████████▄▄▄▄▄▄█
 Pages: [1]