Bitcoin Forum
December 15, 2024, 06:31:19 AM *
News: Latest Bitcoin Core release: 28.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: Encoding bug in JSON-RPC handling of account/label names?  (Read 2643 times)
matsh (OP)
Member
**
Offline Offline

Activity: 93
Merit: 11


View Profile
September 05, 2013, 08:16:26 AM
Merited by Foxpup (1)
 #1

Hi!

I just bumped into something that might be an encoding bug. I sent a few milli Bitcoins to my plain vanilla 0.8.3 wallet, and labeled it as "Från MultiBit" ("From MultiBit" in Swedish). Then I shut down the QT client, started bitcoind in -deamon mode, and called "listreceivedbyaccount" and got the following result:

listreceivedbyaccount = [{"account":"Från MultiBit","amount":0.07,"confirmations":133}]

It *could* be just me doing something wrong in my Java commons-httpclient or net.sf.json-lib code, haven't really dug deep into that just yet.

Anyone seen anything similar?
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1073



View Profile
September 05, 2013, 05:45:12 PM
 #2

Anyone seen anything similar?
Yes, you are mixing character encodings: UTF-8 and ISO-8859-1. This forum is using ISO-8859-1. I manually forced it to UTF-8 and your listreceivedbyaccount example displayed correctly in my browser. You need to configure your OS and your terminal program and your HTTP library for the correct character encodings.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
theymos
Administrator
Legendary
*
Offline Offline

Activity: 5418
Merit: 13499


View Profile
September 05, 2013, 10:12:52 PM
 #3

It's a problem with your terminal, probably. Bitcoin just accepts whatever bytes you give it IIRC.

This forum is using ISO-8859-1.

The HTML is sent in ISO-8859-1, but Unicode is fully supported via HTML entities.

1NXYoJ5xU91Jp83XfVMHwwTUyZFK64BoAD
Foxpup
Legendary
*
Offline Offline

Activity: 4548
Merit: 3445


Vile Vixen and Miss Bitcointalk 2021-2023


View Profile
September 06, 2013, 11:24:26 PM
 #4

Would you guys at least try to reproduce the bug before assuming it's pilot error? Because I did, and it's not. This is what I get (in bitcoind 0.7.0 and 0.8.4):
Code:
"account" : "Fr\u00C3\u00A5n MultiBit"

Note the \u00C3\u00A5 instead of the correct \u00E5. It appears that bitcoind (and Bitcoin-Qt, but only in the debug console) is performing an ISO 8859-1 to UTF-8 conversion on a string that was already UTF-8 to begin with, even though neither bitcoind nor Bitcoin-Qt ever actually encode anything in ISO 8859-1 or anything other than UTF-8. A terminal (or other application) properly configured for Unicode will correctly display the resulting mess as "Från MultiBit".

Will pretend to do unspeakable things (while actually eating a taco) for bitcoins: 1K6d1EviQKX3SVKjPYmJGyWBb1avbmCFM4
I am not on the scammers' paradise known as Telegram! Do not believe anyone claiming to be me off-forum without a signed message from the above address! Accept no excuses and make no exceptions!
2112
Legendary
*
Offline Offline

Activity: 2128
Merit: 1073



View Profile
September 07, 2013, 04:44:00 AM
 #5

Would you guys at least try to reproduce the bug before assuming it's pilot error? Because I did, and it's not. This is what I get (in bitcoind 0.7.0 and 0.8.4):
Code:
"account" : "Fr\u00C3\u00A5n MultiBit"

Note the \u00C3\u00A5 instead of the correct \u00E5. It appears that bitcoind (and Bitcoin-Qt, but only in the debug console) is performing an ISO 8859-1 to UTF-8 conversion on a string that was already UTF-8 to begin with, even though neither bitcoind nor Bitcoin-Qt ever actually encode anything in ISO 8859-1 or anything other than UTF-8. A terminal (or other application) properly configured for Unicode will correctly display the resulting mess as "Från MultiBit".
If you really did see \u00C3\u00A5 then it appears that you are trying to program in Java without understanding the inner Buddha-nature of the char type in Java. The followin koan applies to you:
Quote from: Jargon file
A novice was trying to fix a broken Lisp machine by turning the power off and on.

Knight, seeing what the student was doing, spoke sternly: "You cannot fix a machine by just power-cycling it with no understanding of what is going wrong."

Knight turned the machine off and on.

The machine worked.
To understand what you're doing wrong you'll need to do the following:

1) grab the culprit JSON-RPC packets off the wire using Ethereal/Wireshark
2) display their hex dump
3) locate the documentation for the JSON-RPC class you've used as well as the internal TextStreamReader/TextStreamWriter classes used by the HTTP classes
4) print the JavaDoc of the entire inheritance hierarchy of the above all the way down to 'char'&'String' on a recycled/biodegradable paper with a vegetable-based ink
5) consume by mouth the above printout while intensly staring at the above hex dump.

Sometime during step 5) the internal Buddha-nature of Java's char&String types will illuminate your brain. You'll then easily fix your erroneous program and you'll never have any more problems of this type in your life.

Please comment, critique, criticize or ridicule BIP 2112: https://bitcointalk.org/index.php?topic=54382.0
Long-term mining prognosis: https://bitcointalk.org/index.php?topic=91101.0
Foxpup
Legendary
*
Offline Offline

Activity: 4548
Merit: 3445


Vile Vixen and Miss Bitcointalk 2021-2023


View Profile
September 07, 2013, 05:51:30 AM
 #6

If you really did see \u00C3\u00A5 then it appears that you are trying to program in Java without understanding the inner Buddha-nature of the char type in Java.
I'm not trying to program in Java at all. That is the raw output of the JSON-RPC interface, which I am showing because it makes the source of the bug clear (if you want it in hex, it's 22 61 63 63 6f 75 6e 74 22 3a 22 46 72 5c 75 30 30 43 33 5c 75 30 30 41 35 6e 20 4d 75 6c 74 69 42 69 74 22). The application is expected to translate the escape sequences into the appropriate (or, in this case, inappropriate) Unicode characters.

As you can clearly see, these characters are U+00C3 (LATIN CAPITAL LETTER A WITH TILDE) and U+00A5 (YEN SIGN), which are correctly displayed thus: å If you're displaying these characters any other way, you're doing it wrong.

However, while the application is displaying these characters correctly, the characters themselves are incorrect. Obviously, the intended character is U+00E5 (LATIN SMALL LETTER A WITH RING ABOVE), which in UTF-8 is represented by the byte sequence C3 A5, which is also the ISO 8859-1 representation of the above (incorrect) characters. Interpreting this byte sequence as though it were ISO 8859-1 instead of UTF-8 is what is causing the bug. This is happening to the text before it is output by the JSON-RPC interface, so clearly the bug is in bitcoind or one of its libraries, rather than the application making use of this faulty output.

Will pretend to do unspeakable things (while actually eating a taco) for bitcoins: 1K6d1EviQKX3SVKjPYmJGyWBb1avbmCFM4
I am not on the scammers' paradise known as Telegram! Do not believe anyone claiming to be me off-forum without a signed message from the above address! Accept no excuses and make no exceptions!
Schleicher
Hero Member
*****
Offline Offline

Activity: 675
Merit: 514



View Profile
September 07, 2013, 04:18:04 PM
 #7

Try this:
rename one of your labels to Ã
open the debug console in bitcoin-qt
type listreceivedbyaccount

Then you can see this:
"account" : "�\u0083"

nikitos99
Newbie
*
Offline Offline

Activity: 1
Merit: 0


View Profile
August 20, 2014, 07:04:16 AM
 #8

Try this:
rename one of your labels to Ã
open the debug console in bitcoin-qt
type listreceivedbyaccount

Then you can see this:
"account" : "�\u0083"

Yes - and this is the problem, as à should be \u00c3.

I have the same problem trying to use UTF-8 account names in bitcoind. The problem is the same from console, bitcoin-qt debug and json-rpc. I do not clearly understand how this should work, but i assume, that if i pass something to bitcoind with jsonrpc as utf-8 string - it can store it in any way, but should return utf-8 string. But in all cases it return double encoded string.



In [17]: a = 'Ã'

In [18]: json.dumps(a)
Out[18]: '"\\u00c3"'

that how char  'Ã' should escape.


but bitcoind returns

In [19]: b = "\u00C3\u0083"

In [20]: b.encode('latin-1').decode()
Out[20]: 'Ã'

and i should decode it to use.

sedgydean
Newbie
*
Offline Offline

Activity: 51
Merit: 0


View Profile
August 20, 2014, 04:50:40 PM
 #9

Note the \u00C3\u00A5 instead of the correct \u00E5.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!