Bitcoin Forum
May 24, 2024, 06:06:38 PM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: « 1 [2]  All
  Print  
Author Topic: Pieter and Greg, Bech32, please  (Read 1019 times)
nullius
Copper Member
Hero Member
*****
Offline Offline

Activity: 630
Merit: 2610


If you don’t do PGP, you don’t do crypto!


View Profile WWW
December 24, 2017, 08:40:35 PM
 #21

i am the same, its easier to read the address with mixed case as i find its easier to visually break it into "chunks" of 3-5 characters i can check. these chunks can be anywhere in the address, as its pretty much wherever patterns catch my eye. i check several of these "chunks" before sending.

all uppercase just looks like one big.. well, something. but its actually harder to read and check for me.

Here is a good GUI idea, which I’ve had before for displaying hashes:  Subtly break the long pseudorandom-looking string into chunks using something between a hair-space (U+200A) and a punctuation space (U+2008).  The space should not be too large—just enough for subtle visual breakup of the string.  In my experience, the chunks should each be about four characters.  The particulars really depend on the display font, and also the alphabet of the “pseudorandom” string.

Of course, copy/paste of the string must not be affected.  The extra space is display formatting only, not the actual addition of characters to the string.

In the forum font, with very little typographic control of what I can post, how does this look to you?  (Warning:  I added actual Unicode U+2009 THIN SPACE; do not copypaste the display text, though the link URI is unaffected.)


Rebroken into evenly divisible, larger chunks:


With this address length, that can be done two different ways.  N.b. all Bitcoin address lengths can vary.


Which of those do you find easiest on the eyes?  I myself deem the four-character chunks to be visually optimal.  That also seems a de facto standard in printing account numbers, tracking numbers, and the like.

I used to have a set of LaTeX macros to do this for high-quality typesetting and deadtree printing of hashes; thus I can attest, it much enhances readability.  I think I may also have worked out a way to do it in HTML/CSS without any adding spaces on copypaste.

Additional implementation note:  Wallet GUIs should filter out any Unicode space-class characters from a pasted string, in case other software did this wrongly—or in case somebody added spacing characters by hand, as here.

Implementers, please do this!  I mean generally, not only for Bech32.

vapourminer
Legendary
*
Offline Offline

Activity: 4340
Merit: 3570


what is this "brake pedal" you speak of?


View Profile
December 24, 2017, 10:49:58 PM
 #22


Which of those do you find easiest on the eyes?  I myself deem the four-character chunks to be visually optimal.  That also seems a de facto standard in printing account numbers, tracking numbers, and the like.

this one (4 character chunks) seems easiest.
nullius
Copper Member
Hero Member
*****
Offline Offline

Activity: 630
Merit: 2610


If you don’t do PGP, you don’t do crypto!


View Profile WWW
December 24, 2017, 11:14:20 PM
 #23


Which of those do you find easiest on the eyes?  I myself deem the four-character chunks to be visually optimal.  That also seems a de facto standard in printing account numbers, tracking numbers, and the like.

this one (4 character chunks) seems easiest.

Thanks for the feedback.  I agree; that’s easiest for me, too.

Now, how would you find that to read aloud to someone else using a radio alphabet?  Whether an official standard radio alphabet, or your own ad hoc choices of words to unambigously represent letters.  Please try it, by yourself or with a friend.  Also imagine what it would be like to read or hear this in word-letters over the phone, perhaps over a bad mobile or VoIP connection with some dropouts.

I’ve tried doing this aloud, though only to myself thus far.  The ease of reading these out without case distinction is why I nickname these “Bravo Charlie Addresses”, an idea I had earlier in this thread; we now intersect with a topic about which I’ve been preparing to otherwise post.


That won’t work so well with the old-style addresses.

vapourminer
Legendary
*
Offline Offline

Activity: 4340
Merit: 3570


what is this "brake pedal" you speak of?


View Profile
December 24, 2017, 11:49:40 PM
 #24

ive used the military alphabet for years in poor voice quality communications. it works well. but its not hard to say "caps alfa, niner, lower bravo caps charlie" for mixed case either.
nullius
Copper Member
Hero Member
*****
Offline Offline

Activity: 630
Merit: 2610


If you don’t do PGP, you don’t do crypto!


View Profile WWW
December 25, 2017, 12:25:44 AM
 #25

ive used the military alphabet for years in poor voice quality communications. it works well. but its not hard to say "caps alfa, niner, lower bravo caps charlie" for mixed case either.

Well, I suppose that different people will have differing comfort levels on that point; and it also sounds as if you have more practical experience with such than I do.  I know it drives me crazy, as well as effectually doubling the spoken length of the alphabetic parts.

Thanks again for the feedback.

cellard
Legendary
*
Offline Offline

Activity: 1372
Merit: 1252


View Profile
December 25, 2017, 05:25:24 PM
 #26

Im not sure what I like more. The strings separated by 4 characters remind me of the usual bank account and credit card numbers which usually are separated in chunks of 4 by the GUI even if you paste the entire number without any gaps which is what I think is your idea. So this could help most people to make it more readable and spot any mistakes faster.

But for some reason I think I can discern shapes easier in the 6 char chunks where the address is divided in less chunks which for some reason I think my brain takes less effort due less chunks.. anyway, I guess this could be be switchable in options, to make it customizable to everyone's appeal.
casascius (OP)
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1136


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 31, 2017, 03:29:49 PM
 #27

I might take a little liberty here, after running across this:

(from https://blockstream.com/team/greg-maxwell )
Quote
For many in the Bitcoin community, Greg is likely the person telling you that your protocol is broken and why, but he usually feels pretty bad about it.

I am going to submit that the idea to start introducing Bitcoin addresses that contain both the number 1 and lowercase l together, the majority of the time (71% by my calculation), is broken from a UX perspective.  It is as fashionable as tomato soup on a dress shirt.  At the risk of sounding redundant, this is a regression from Satoshi that will be observable to the majority of users who ever make use of a bitcoin address.

If I ever propose any sort of amendment to BIP 173, it will be to formally suggest the usage of the letter "b" in place of "l", either before this spec gets serious traction for Bitcoin.  I would further specify that if not implemented in Bitcoin, it remains a proposed recommendation for alt coin implementers to enjoy a free gimme gift to help differentiate their coin as a superior UX experience that will be salient to even the most casual users.

Presumably, B could be said to look like an 8, and that's the only reason I can imagine that "b" is on the list of exclusions.  But since the spec strongly encourages the usage of lowercase except where technical constraints dictate otherwise, it would seem like allowing "b" is a far better choice for a spec that clearly spells out visual dissimilarity as a valued attribute in the chosen character selection.

What does it take for me to formally propose fixing this and put my OCD-like energy to work improving this here? Smiley


Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
DannyHamilton
Legendary
*
Offline Offline

Activity: 3402
Merit: 4656



View Profile
December 31, 2017, 04:59:45 PM
 #28

Presumably, B could be said to look like an 8,

It could.

and that's the only reason I can imagine that "b" is on the list of exclusions.  But since the spec strongly encourages the usage of lowercase except where technical constraints dictate otherwise, it would seem like allowing "b" is a far better choice for a spec that clearly spells out visual dissimilarity as a valued attribute in the chosen character selection.

Except that (especially when crowded by a number of random characters on each side) "b" can look a LOT like "lo".  Even more so with certain typeface selections.
casascius (OP)
Mike Caldwell
VIP
Legendary
*
Offline Offline

Activity: 1386
Merit: 1136


The Casascius 1oz 10BTC Silver Round (w/ Gold B)


View Profile WWW
December 31, 2017, 11:41:15 PM
Last edit: January 01, 2018, 12:10:05 AM by casascius
 #29

Except that (especially when crowded by a number of random characters on each side) "b" can look a LOT like "lo".  Even more so with certain typeface selections.


True. I forgot that random letters have a magnetic quality that pulls the two halves of the letter b (but not d) apart, making it confusing like this. And the research probably shows that 1 and l don’t have this problern, loecause they are all one piece. Good call

Companies claiming they got hacked and lost your coins sounds like fraud so perfect it could be called fashionable.  I never believe them.  If I ever experience the misfortune of a real intrusion, I declare I have been honest about the way I have managed the keys in Casascius Coins.  I maintain no ability to recover or reproduce the keys, not even under limitless duress or total intrusion.  Remember that trusting strangers with your coins without any recourse is, as a matter of principle, not a best practice.  Don't keep coins online. Use paper or hardware wallets instead.
Spendulus
Legendary
*
Offline Offline

Activity: 2898
Merit: 1386



View Profile
January 01, 2018, 12:34:23 AM
 #30

....

I am sorry that popping up with "constructive criticism" out of the blue without much in the way of reintroduction or community engagement probably seems more confrontational in the absence of regular positive contributions to the project's development.  I'm actually sad because I'm embarrassed to be explaining $30 transaction fees to people ....

Seeing this thread impels me to two comments.

Kudos to you for your early and continuing visions of crypto.

Regarding address formats. I'm seeing a lot of engineering-level discussion but better would be to simply toss samples at test groups and have them type them in, gather comments and so forth. Or just leave things along. But hey, whatever. The market will decide, sooner or later.
nullius
Copper Member
Hero Member
*****
Offline Offline

Activity: 630
Merit: 2610


If you don’t do PGP, you don’t do crypto!


View Profile WWW
January 01, 2018, 02:36:52 AM
 #31

I wish to reply further (also to cellard’s last post above); for now, simply to address one issue:

Regarding address formats. I'm seeing a lot of engineering-level discussion but better would be to simply toss samples at test groups and have them type them in, gather comments and so forth.

They did.  Did you read what gmaxwell said upthread?  Red highlights are here added.

Bech32 is designed for human use and basically nothing else [...]

In actual testing transfering bech32 addresses to another person is on the order of 5x faster with bech32 due to errors being made even in careful usage of base58-- more than the time itself transferring a base58 address is often insanely frustrating-- you read it, and ... nope, no idea where it's wrong, only that it's wrong -then you try reading the whole thing again and again and again.

[...]

Mature software will tell you _exactly_ where such errors are located, especially if they involve a charset mistake, but even errors beyond that. There should be very little hunt and peck with BECH32, and in my experience there isn't any at all.

that I hate it, I can’t handle it—I find it absurdly frustrating and error-prone.
This is what many people report, and even people that said they didn't mind it handled it much more slowly.  My general experience from when we stared on this was that people who said mixed case wasn't an issue changed their mind after actually trying to convey an address view spoken word or writing it down by hand with pencil and paper. ... Either they had never done it before or had done it infrequently enough that they'd already repressed the traumatic memories. I don't doubt that there are some odd people out there which never have any trouble with it, but I haven't encountered anyone yet who doesn't when actually tested on it.

[...]
1. My regards to Pieter Wuille and Greg Maxwell:  I can tell that an excruciatingly detailed thought process about Bitcoin address formats went into that bit of engineering.  Somebody stayed up in the dark wee hours, pondering the philosophy of Bitcoin address formats.  Somebody aspired to consummate perfection in the art of Bitcoin address formats.  Well, you are probably also “odd”.  Coming from me, take that as a compliment.

Thanks, including a lot of testing with both people and machines, several CPU decades went into the design of the error correcting code... and in fact the techniques even required to be able to measure their performance are themselves novel and probably publishable innovations.   Not to mention extensive review and redesign with many other similarly crazy people.   We understood that introducing a new address format is a big step that can't be done often, and thought it would be appropriate and acceptable to really work hard on it.

IIUC, the “CPU decades” must have crunched alphabets with the NIST visual similarity data referenced in BIP 173 and the error-correcting code to find the alphabet which, per available data, would have the lowest statistical likelihood of undetected or unrecoverable bit errors.  (gmaxwell, am I correct in this inference?)  Those data were originally gathered from humans; and the resulting address format was tested with humans.

It is a rule in UI/UX design that you never ask people what they subjectively prefer, because they don’t actually know what works better for them.  Instead, you test performance.  How long does an average user take to transcribe an address in a particular format?  How many errors are made on average?  These are objective measures.  According to the foregoing, all discussed upthread, this was done with Bech32.


P.S., try playing around with Bech32.  It’s really an awesome format for pseudorandom bitstrings.  I’m currently trying to apply it elsewhere, too.

Spendulus
Legendary
*
Offline Offline

Activity: 2898
Merit: 1386



View Profile
January 01, 2018, 03:45:07 AM
 #32

I wish to reply further (also to cellard’s last post above); for now, simply to address one issue:

Regarding address formats. I'm seeing a lot of engineering-level discussion but better would be to simply toss samples at test groups and have them type them in, gather comments and so forth.

They did.  Did you read what gmaxwell said upthread?  Red highlights are here added.

Bech32 is designed for human use and basically nothing else [...]

In actual testing transfering bech32 addresses to another person is on the order of 5x faster with bech32 due to errors being made even in careful usage of base58-- more than the time itself transferring a base58 address is often insanely frustrating-- you read it, and ... nope, no idea where it's wrong, only that it's wrong -then you try reading the whole thing again and again and again.

[...]....

Missed that.

Thanks for pointing it out.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4186
Merit: 8424



View Profile WWW
April 07, 2019, 08:55:44 AM
Last edit: April 07, 2019, 09:14:03 AM by gmaxwell
Merited by Carlton Banks (4), bones261 (3), nullius (2)
 #33

Sorry, I missed this question at the time but thought it was sad to have not followed up on it. -- For posterity:

IIUC, the “CPU decades” must have crunched alphabets with the NIST visual similarity data referenced in BIP 173 and the error-correcting code to find the alphabet which, per available data, would have the lowest statistical likelihood of undetected or unrecoverable bit errors.  (gmaxwell, am I correct in this inference?)  Those data were originally gathered from humans; and the resulting address format was tested with humans.
Yes, however we didn't have to jointly make all the decisions at once.

We searched all the possible BCH codes of the relevant size for the ones which the most character errors (5), among those we selected the ones with the lowest false acceptance rate for just beyond 5 errors, then searched the remaining ties (codes that were equally good for character errors) to find the one that was the best at bit errors (which could guarantee detecting up to 6 arbitrarily placed bit errors).

While these searches were ongoing, we used the NIST data make a machine search check all 58905 possible selections of 4 excluded characters out of 36 which had the least internal character confusion, assuming that the input was all uppercase or lower case.  It made a decision that I wouldn't have guessed by hand-- e.g. excluding both 1 and i but actually looking at the result confirmed it was a good choice.  B can be confused a lot of ways: B looks like 8 and 3, b looks like p or lo, etc.  Other guesswork versions looked a lot worse when compared objectively.  I don't doubt you could make a somewhat different decision with a different metric or using better input data... but you could also do a lot worse, as I found just from informally picking the chacters to exclude after looking at the NIST data.

[There was an earlier version of the search before we implemented all the NIST data that wanted to exclude 'B', 'L', 'O', 'Q', which I found pretty amusing, but was ultimately relieved that we wouldn't have to convince anyone that we hadn't intentionally targeted the name of a former developers ICO mill company Tongue...]

For most applications the HRP should be fixed by context (e.g. you don't have a reason intentionally input an arbitrary altcoin address into your bitcoin wallet), which means that any charset error (like entering 1s) can be immediately highlighted in the UI with zero false positives (I am highly dubious about auto-fixing anything since a detectable error may signify carelessness which could result in other non-detectable errors).

Finally we non-exhaustively searched the space of 32! permutations to find a permutation that mapped the remaining most likely character confusions to single bit errors.  E.g.  q and g are in the charset and differ by only a single bit.  likewise c/e, r/t, e/6, 7/l, q/p, S/5, n/m, n/h (the obviousness of these examples depends on your font...), and many other confusable pairs. This mapping means that likely errors for users get mapped to fewer bit errors and since the error detecting code was chosen to have improved detection for single bit errors this means improved detection for errors users are likely to make.   This is an optimization with only a small effect, but it is one with fairly low marginal cost since virtually any encode/decode implementation would already use a table due to the missing characters-- just as existing base58 (and base64) (de)coders do. Given that a table is being used, the choice of the order is essentially arbitrary so we were free to choose it to optimize the performance of the error detecting code. I doubt an implementation that includes signing (taking tens of kb of code) would care much about saving a few bytes bytes by using a slow branchy alternative rather than just a simple table in read only memory. Smiley

The code searches and permutation searches were computationally expensive but we didn't have to do the product of their efforts, only the sum because they could be searched independently and combined (and the charset search wasn't terribly expensive).  Pieter has code for all these searches in his ezbase32 repository...


Pages: « 1 [2]  All
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!