Title: bad randomness when generating keys within a specified range Post by: citb0in on November 19, 2022, 07:31:11 PM I am trying to learn Python and am currently facing the following question. I have a small program that generates random keys within a defined range. For this I have used the random library. Please bear with me if I have used bad code for this and I am of course happy about corrections or suggestions and hints how to do it better.
Code: $ cat rand_generator_within_range.py Quote #!/usr/bin/python3 # generate n keys within specified range # by citb0in@bitcointalk.org, 2022/11/19 import random def gen_prvkey(): low = 0x000000 high = 0xFFFFFFF return str(hex(random.randrange(low,high))) counter = 1 while counter <= 10: print(gen_prvkey()) counter += 1 Here are 5 examples of output when I run the program on my computer: Quote 0x7532e18 0x7034239 0x9172e57 0xcffbf6c 0x8b2aae1 0xc2ab798 0x19521ee 0xea443e7 0xf6e445e 0x1932bba Quote 0x78bc5a5 0x6554548 0x48a3554 0x782c14c 0x8932a36 0x68069ab 0xbc701b4 0xb29fa05 0x209db37 0x62599e Quote 0xbbc0afa 0x5eede30 0x8e7799d 0x9d08774 0xbaf7a7d 0x6056a86 0xc4d4863 0xcaf4ded 0x4c83c67 0x3c1310d Quote 0x5348039 0x1e73913 0x3c25c99 0x8f16f1 0x61ca173 0x5dead1b 0x9a9d314 0xf314f8a 0x32af92e 0xec09cde Quote 0xdfcfd55 0x21e1e56 0x927f524 0x3d94474 0xc4c42a7 0xa34b810 0x74ec8f7 0xe91077 0x694504e 0xc9502fd If at all, at most only one key is generated, which consists of 6 characters. All other generated keys are always in the 7-character range. Why so few 6-characters keys? I don't understand why obviously and clearly keys are generated from the upper range half. I would actually have expected by random generation that the generated keys would be wildly distributed in the specified span. However, randomness does not seem to work well here. Where do I have the thinking or programming error? Looking forward to any helpful answer. Title: Re: bad randomness when generating keys within a specified range Post by: jackg on November 19, 2022, 08:14:34 PM In your function you set low to 0x000000 6 0s and high to 0xFFFFFFF 7 Fs. I think this is where your "issue" is caused but it does expand your range if those are submitted to the random function.
Title: Re: bad randomness when generating keys within a specified range Post by: citb0in on November 19, 2022, 08:39:11 PM Thanks for your reply, however I didn't understand (yet). Can you be more specific, please? What do you mean by "expanding the range" ? In my humble understanding the range I specified was:
000000 <- 6 characters FFFFFFF <- 7 charachters The half of this range should be 7FFFFFF, shouldn't it ? So I would expect to see about equal results that are lower than 7FFFFFF and higher than 7FFFFFF. The behavior does not change when I set the lower range limit to 0000000 <- 7 characters same output results Title: Re: bad randomness when generating keys within a specified range Post by: COBRAS on November 19, 2022, 08:47:57 PM Thanks for your reply, however I didn't understand (yet). Can you be more specific, please? What do you mean by "expanding the range" ? In my humble understanding the range I specified was: 000000 <- 6 characters FFFFFFF <- 7 charachters The half of this range should be 7FFFFFF, shouldn't it ? So I would expect to see about equal results that are lower than 7FFFFFF and higher than 7FFFFFF. The behavior does not change when I set the lower range limit to 0000000 <- 7 characters same output results half of a range is 0xfff * 0xfff = 0xffffff, 1/2 = 0xfff+ 1/2f Title: Re: bad randomness when generating keys within a specified range Post by: odolvlobo on November 19, 2022, 09:46:43 PM ... 0x062599e ... 0x08f16f1 ... 0x0e91077 ... If at all, at most only one key is generated, which consists of 6 characters. All other generated keys are always in the 7-character range. Why so few 6-characters keys? Perhaps if I add the missing 0s the answer will become more clear. The probability of a digit being 0 is 1/16, so the probability of a number with the first digit being 0, making it a 6-digit number, is 1/16. That means that you should expect 1/16 of the numbers to have 6 or fewer digits. The probability of a 5 digit number (two leading 0s) is 1/256 for the same reason. The half of this range should be 7FFFFFF, shouldn't it ? So I would expect to see about equal results that are lower than 7FFFFFF and higher than 7FFFFFF. Yes, but don't forget that 1000000-7FFFFFE are all 7-digit numbers that are less than 7FFFFFF.Title: Re: bad randomness when generating keys within a specified range Post by: jackg on November 20, 2022, 03:32:03 AM Thanks for your reply, however I didn't understand (yet). Can you be more specific, please? What do you mean by "expanding the range" ? In my humble understanding the range I specified was: 000000 <- 6 characters FFFFFFF <- 7 charachters The half of this range should be 7FFFFFF, shouldn't it ? So I would expect to see about equal results that are lower than 7FFFFFF and higher than 7FFFFFF. The behavior does not change when I set the lower range limit to 0000000 <- 7 characters same output results I looked at this again and are you going from 0 to 268435455 (0 to 0xFFFFFFF) with your range? Half way is, as you say, 7FFFFFF or 134 217 727. Have you tested this with a lot of numbers to come to this conclusion (10000 can normally be processed fairly quickly and you can plot a graph if you don't want to read the data) or have you only done it in sets of 10 if it's done it continuously with a lot of tries then it might be an issue with the number generator? What module are you using for random numbers? The random number generator in python is also not truly random which might be contributing to your problem (os.urandom apparently is - though it needs you to import os): https://stackoverflow.com/questions/22891583/can-i-generate-authentic-random-number-with-python Title: Re: bad randomness when generating keys within a specified range Post by: citb0in on November 20, 2022, 07:26:56 AM Quote from: citb0in The half of this range should be 7FFFFFF, shouldn't it ? Quote The half of the range is: 0x7ffffff half of a range is 0xfff * 0xfff = 0xffffff, 1/2 = 0xfff+ 1/2f So what is the exact result for half of range 0x0000000 - 0xfffffff according your opinion ? @odolvlodo: unfortunately it is not clear to me with that explanation. Maybe I have still a thinking error because of this? Will look into again... @all: I have rewritten the python code with following changes and addition: It calculcates the half of the specified range and then compares the output number if it falls into the upper range or in the bottom range. It adds a counter and you will get a good result overview, how many numbers in total that were output were in the lower range or in the upper range. You can then raise the random numbers of 10 to a high number eg 1,000,000 The output is what I initially described. There are significant more numbers output in the upper range. Code: #!/usr/bin/python3 This is the output of last 5 lines, generated 1 million numbers Quote [...] Random number was: 0xac9e74d Bottom half total: 466376 Upper half total: 533620 Exact half total: 0 Random number was: 0x1f5b50c Bottom half total: 466377 Upper half total: 533620 Exact half total: 0 Random number was: 0x3a707c0 Bottom half total: 466378 Upper half total: 533620 Exact half total: 0 Random number was: 0xce3fd7d Bottom half total: 466378 Upper half total: 533621 Exact half total: 0 Random number was: 0xc260324 Bottom half total: 466378 Upper half total: 533622 Exact half total: 0 Another run, this time even 10 million numbers... Quote [...] Random number was: 0xc16b2e1 Bottom half total: 4669562 Upper half total: 5330434 Exact half total: 0 Random number was: 0xc98c1d4 Bottom half total: 4669562 Upper half total: 5330435 Exact half total: 0 Random number was: 0x5b892db Bottom half total: 4669563 Upper half total: 5330435 Exact half total: 0 Random number was: 0x51e8999 Bottom half total: 4669564 Upper half total: 5330435 Exact half total: 0 Random number was: 0x8f44860 Bottom half total: 4669564 Upper half total: 5330436 Exact half total: 0 Another run with 10M numbers... Quote [...] Random number was: 0x2157410 Bottom half total: 4666783 Upper half total: 5333213 Exact half total: 0 Random number was: 0x646a7e4 Bottom half total: 4666784 Upper half total: 5333213 Exact half total: 0 Random number was: 0xd9d63de Bottom half total: 4666784 Upper half total: 5333214 Exact half total: 0 Random number was: 0xba35fde Bottom half total: 4666784 Upper half total: 5333215 Exact half total: 0 Random number was: 0x2700b7f Bottom half total: 4666785 Upper half total: 5333215 Exact half total: 0 I did run a huge run with 1,000,000,000 numbers and let it go a while, then aborted ... Quote [...] I even hit two times the exact half :) but again as you see the upper half hits are significantly higher which is really weird.Random number was: 0x419bad0 Bottom half total: 97085644 Upper half total: 110965268 Exact half total: 2 Random number was: 0x35ef60e Bottom half total: 97085645 Upper half total: 110965268 Exact half total: 2 Random number was: 0xea6a6ca Bottom half total: 97085645 Upper half total: 110965269 Exact half total: 2 Random number was: 0x7fce229 Bottom half total: 97085646 Upper half total: 110965269 Exact half total: 2 Random number was: 0xe29ef8e Bottom half total: 97085646 Upper half total: 110965270 Exact half total: 2 Random number was: 0x2115726 Bottom half total: 97085647 Upper half total: 110965270 Exact half total: 2 Title: Re: bad randomness when generating keys within a specified range Post by: citb0in on November 20, 2022, 01:37:55 PM Meanwhile, I was able to find out the cause of the problem. Am still clarifying the finer points and then I post the result with all the details.
Title: Re: bad randomness when generating keys within a specified range Post by: BlackHatCoiner on November 20, 2022, 02:02:11 PM 0xfff * 0xfff = 0xffffff I'm not sure if you're multiplying here, but 0xfff * 0xfff is not equal with 0xffffff, but with 0xffe001.@odolvlodo: unfortunately it is not clear to me with that explanation. Maybe I have still a thinking error because of this? Will look into again... What haven't you understood exactly? If you're picking a random number netween 0x0000000 and 0xfffffff, then there's 1 in 16 chance to pick one that starts with 0 (e.g., 0x062599e), because there are 16 possible hexadecimal characters the number can begin with. Title: Re: bad randomness when generating keys within a specified range Post by: citb0in on November 20, 2022, 02:12:58 PM Let's keep it simple and assume the range is 1-10. The half of the range is 5. When you randomly generate numbers you would expect that those generated numbers are well distributed within the range. That means that as more numbers you generate (let's say thousands and millions of them) the total of "lower half" and "upper half" should be around 'n' (=total of numbers generated). That's what randomness does, nothing fancy.
For example, range is 1-10 and we generate 1000 numbers. I would expect that about 500 numbers are on the lower half and 500 numbers are in the upper half. As n raises this distribution should be more clear and you should see the results clearly near n/2 Currently I'm busy with some other stuff, I will post the solution later, be patient. The solution was not library secure (I tried that, too) but it works also with the simple random library in Python. The culprit was something quite different EDIT: Well, back again. First of all, here are two variations of my updated program. The first one is using python's "random" and the second one is using python's "secrets" library. Both programs are doing the same. I added the option for command line argument where we can specify n (how many numbers to create) and also with some detailled output information. Code: #!/usr/bin/python3 Code: #!/usr/bin/python3 Now let's run the first program and generate 10 numbers... Code: $ ./rand_generator_using_random.py 10 Quote Range information: ------------------ HEX = 0x0 - 0xfffffff DEC = 0 - 268435455 Half of range in HEX: 0x7ffffff Half of range in DEC: 134217727 You're about to generate 10 numbers in total. We expect even distribution in bottom/upper half of the range with each count of about 5 Press ENTER to start ... Random number was: 0xa97a9bb ( DEC= 177711547 ) Bottom half count: 0 Upper half count: 1 Exact half count: 0 Random number was: 0x78d6d5 ( DEC= 7919317 ) Bottom half count: 1 Upper half count: 1 Exact half count: 0 Random number was: 0xe6d74d7 ( DEC= 242054359 ) Bottom half count: 1 Upper half count: 2 Exact half count: 0 Random number was: 0xa5df423 ( DEC= 173929507 ) Bottom half count: 1 Upper half count: 3 Exact half count: 0 Random number was: 0x215860 ( DEC= 2185312 ) Bottom half count: 2 Upper half count: 3 Exact half count: 0 Random number was: 0x4b80cdc ( DEC= 79170780 ) Bottom half count: 3 Upper half count: 3 Exact half count: 0 Random number was: 0x21b609b ( DEC= 35348635 ) Bottom half count: 4 Upper half count: 3 Exact half count: 0 Random number was: 0x26ff112 ( DEC= 40890642 ) Bottom half count: 5 Upper half count: 3 Exact half count: 0 Random number was: 0x9f3785 ( DEC= 10434437 ) Bottom half count: 6 Upper half count: 3 Exact half count: 0 Random number was: 0x197c28b ( DEC= 26722955 ) Bottom half count: 7 Upper half count: 3 Exact half count: 0 We see that 7 of those numbers fall into the lower range and 3 numbers fall into the upper range. Let's run the same command again ... Code: $ ./rand_generator_using_random.py 10 Quote [...] This time it was perfectly even distributed, what a coincidence :)Random number was: 0xf48da93 ( DEC= 256432787 ) Bottom half count: 5 Upper half count: 5 Exact half count: 0 Let's run the same command again ... Code: $ ./rand_generator_using_random.py 10 Quote [...] Random number was: 0xb2eed0c ( DEC= 187624716 ) Bottom half count: 4 Upper half count: 6 Exact half count: 0 Now let's run the program with 10K numbers a few times and compare the results... Code: $ ./rand_generator_using_random.py 10000 Quote [...] Random number was: 0xc4cb938 ( DEC= 206354744 ) Bottom half count: 5037 Upper half count: 4963 Exact half count: 0 Quote [...] Random number was: 0x2b80462 ( DEC= 45614178 ) Bottom half count: 5041 Upper half count: 4959 Exact half count: 0 Quote [...] Random number was: 0x5c0f08 ( DEC= 6033160 ) Bottom half count: 5135 Upper half count: 4865 Exact half count: 0 Quote [...] Random number was: 0x6230922 ( DEC= 102959394 ) Bottom half count: 4973 Upper half count: 5027 Exact half count: 0 we already see that with increasing n the even distribution yield better results. Now let's raise n to 1M and run it a few times to compare results ... Code: $ ./rand_generator_using_random.py 1000000 Quote [...] Random number was: 0x3d38eb8 ( DEC= 64196280 ) Bottom half count: 499887 Upper half count: 500113 Exact half count: 0 Quote [...] Random number was: 0x8dec3e2 ( DEC= 148816866 ) Bottom half count: 500167 Upper half count: 499833 Exact half count: 0 Quote [...] Random number was: 0x9a6de7b ( DEC= 161930875 ) Bottom half count: 500477 Upper half count: 499523 Exact half count: 0 Quote [...] Random number was: 0xd049ba1 ( DEC= 218405793 ) Bottom half count: 499698 Upper half count: 500302 Exact half count: 0 This is what I call a perfect even distribution as expected. One last try, let's go ahead and raise n to 100M (hundred million) ... Code: $ ./rand_generator_using_random.py 100000000 Quote [...] Random number was: 0xdbec231 ( DEC= 230605361 ) Bottom half count: 50003376 Upper half count: 49996624 Exact half count: 0 ==> PERFECT! That is what I expected to see. Now the question remains "What was the issue?". Well, I modified my program to calculate with decimals instead of hex. I need to dig into it and see and understand where the coding error was with hex calculation. Title: Re: bad randomness when generating keys within a specified range Post by: jackg on November 20, 2022, 05:21:51 PM (transcribed from my computer because the captcha wouldn't load)
I can't reproduce your error running: Code: import random Ran for 3 iterations I get: 499866 500134 499625 500375 500135 499065 Title: Re: bad randomness when generating keys within a specified range Post by: citb0in on November 20, 2022, 05:36:28 PM I said that already in my posting before and I explained where the culprit was. You are using the same method (in decimal, not in hex) for calculation as I did in my shown solution that's why you get the same results as me.
My initial try in posting #7 Code: half = hex(int((high-low)/2)) the correct way which worked in posting #11 Code: #half = hex(int((high-low)/2)) Now the question remains "What was the issue?". Well, I modified my program to calculate with decimals instead of hex. I need to dig into it and see and understand where the coding error was with hex calculation. I can't reproduce your error running: I don't see anything fancy in your solution, you just arranged it a little more different, that's all. Cannot find valuable information in your post, sorry.Title: Re: bad randomness when generating keys within a specified range Post by: jackg on November 20, 2022, 05:50:30 PM I can't reproduce your error running: I don't see anything fancy in your solution, you just arranged it a little more different, that's all. Cannot find valuable information in your post, sorry.I reworked it because your solution wouldn't work on my version of python... I just got type errors for it which might mean there's something causing yours to do that considering I had to explicitly move things to different lines (and rework a bit of it) to get it to actually run. ALL the errors were to do with converting hex types to integers and strings to hex so if you don't find the error, it's because it's in the python library. Title: Re: bad randomness when generating keys within a specified range Post by: PowerGlove on November 21, 2022, 12:39:28 AM @citb0in: Haven't read this thread properly, but just wanted to point out that if your intention is to generate random integers between (and including) low and high with random.randrange, then you should remember to add 1 to high or use random.randint instead.
So, either this (will select from the full range): Code: random.randrange(low, high+1) Or this (will also select from the full range): Code: random.randint(low, high) But not this (will never select high): Code: random.randrange(low, high) Also, for anyone unaware: Most of what's in the random module in Python only strives for well-distributed values that are suitable for modeling and/or simulation (not cryptographic use). Be careful and do your own research before using it for anything important (see — not exhaustive, and in no particular order — os.getrandom, os.urandom, random.SystemRandom and the secrets module for other approaches). Title: Re: bad randomness when generating keys within a specified range Post by: citb0in on November 21, 2022, 09:02:55 AM @citb0in: Haven't read this thread properly, but just wanted to point out that if your intention is to generate random integers between (and including) low and high with random.randrange, then you should remember to add 1 to high or use random.randint instead. Hello PowerGlove. Many thanks for pointing out to this important part, will keep this in mind. Also, for anyone unaware: Most of what's in the random module in Python only strives for well-distributed values that are suitable for modeling and/or simulation (not cryptographic use). Be careful and do your own research before using it for anything important (see — not exhaustive, and in no particular order — os.getrandom, os.urandom, random.SystemRandom and the secrets module for other approaches). Absolutely. That's what ETFbitcoin and JackG suggested to me and which was very helpful to read and understand. Have you tried running multiple experiment with different seed for the RNG? And as @jackg said, random library isn't secure. You should use os.urandom or secrets library. Thanks to all of you for pointing to those modules. I did check the secrets (https://docs.python.org/3/library/secrets.html) module and carefully did read the warning in the random module (https://docs.python.org/3/library/random.html)Quote Warning That's why I had rewritten my python program as I posted it. However, for the sake of completeness I want to emphasize that this issue was not related to the python module random, the issue lies somewhere in between the hex calculation.The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module. Thanks to you I learned some very important things. Big thanks Title: Re: bad randomness when generating keys within a specified range Post by: PowerGlove on November 23, 2022, 06:22:19 AM Now the question remains "What was the issue?". Well, I modified my program to calculate with decimals instead of hex. I need to dig into it and see and understand where the coding error was with hex calculation. I think I see the piece of missing information that might be preventing you from understanding what went wrong: the program listing in post #7 was accidentally doing string comparisons instead of integer comparisons (on lines 14, 16 and 18).The hex function returns a string, so when you compared randhex with half, you were not actually doing the numeric comparison that you intended, but instead were doing the comparison lexicographically (i.e. using the Unicode code points that make up the string). I'm not sure how far along you are in your Python learning, so please don't be offended if this is obvious to you, but in normal day-to-day programming (i.e. outside of specialist applications) it doesn't really make sense to do calculations "in hex" or "in decimal"; those are just notations and they don't affect the underlying value that is stored in a variable. For example, these three variables hold the same value: Code: a = 0o33653337357 Code: b = 0xDEADBEEF Code: c = 0b11011110101011011011111011101111 They all share the same type (which is int) and they all compare equal to each other and to the value 3735928559. The fact that they were entered with different base notations (octal, hexadecimal and binary) doesn't affect the underlying representation (and therefore doesn't affect how the calculation proceeds). A small aside: You can use floor division (//) to prevent integers from being converted into floats. So, you can do this: (high-low)//2 instead of this: int((high-low)/2). Title: Re: bad randomness when generating keys within a specified range Post by: citb0in on November 23, 2022, 06:58:12 AM the program listing in post #7 was accidentally doing string comparisons instead of integer comparisons (on lines 14, 16 and 18). That should it be. Thank you.The hex function returns a string, so when you compared randhex with half, you were not actually doing the numeric comparison that you intended, but instead were doing the comparison lexicographically (i.e. using the Unicode code points that make up the string). I'm not sure how far along you are in your Python learning, so please don't be offended if this is obvious to you, I'm absolute beginner and trying to learn the ropes, see also my introductory sentence at the top ;) Therefore, I also ask openly and directly, so that my mistakes are pointed out because only in this way I can become better. Many thanks at this point for the constructive criticism, which I appreciate very much.but in normal day-to-day programming (i.e. outside of specialist applications) it doesn't really make sense to do calculations "in hex" or "in decimal"; those are just notations and they don't affect the underlying value that is stored in a variable. Yeah, I know about this. I had already read in another place that it doesn't matter which display format is currently selected because the underlying representation matters. Thanks for pointing it out.A small aside: You can use floor division (//) to prevent integers from being converted into floats. So, you can do this: (high-low)//2 instead of this: int((high-low)/2). Yes, I stumbled over this on another python script I looked into and found it very helpful. Much appreciate your feedback, thank you so much. |