Bitcoin Forum
May 30, 2024, 08:47:02 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: [BOUNTY 2.0 BTC] Python2.X encoding problems in windows - Please Help  (Read 1140 times)
etotheipi (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 28, 2013, 04:49:50 PM
Last edit: April 28, 2013, 09:01:07 PM by etotheipi
 #1

So, lots of reports of unicode issues on non-US systems trying to run Armory.  I had tested unicode support by setting my Satoshi datadir to "Bitcoiné" and then letting Armory try to figure out the rest.  This was tested both in Windows and Linux.   But now I have reports of this failing.  I realize that I didn't do parts of it right, but now I see that parts of it I can't figure out at the slightest.

If I instead use this directory name:  Bitcoinéś , everything now falls apart.  The ś is apparently un-convertable to the encoding used by subprocess.Popen, even though it succeeds everywhere else.  Having that filename in pure unicode works fine for os.path.exists() and I can even open a file inside and write data to it.  I think it's because the os module knows how to talk to Windows.  But I don't.

So here I am:

Code:
import os
import sys
import locale

pathUni = u'C:\\Users\\vbox\\ArmoryCheckout\\Bitcoin\xe9\u015b\bitcoin.conf'
os.path.exists(pathUni)  # true
open(pathUni, 'w').write(...)  # works

print locale.getpreferredencoding()  # cp1252
print sys.getfilesystemencoding()  # mbcs

# Errors out trying to convert to ASCII
subprocess.Popen(['something.exe', pathUni])

# Fails to find path
subprocess.Popen(['something.exe', pathUni.encode( 'utf-8')])

# Fails to find path
subprocess.Popen(['something.exe', pathUniencode( locale.getpreferredencoding() )])

# 'charmap' codec can't encode u'\u015b': character maps to <undefined>"
subprocess.Popen(['something.exe', pathUni.encode( sys.getfilesystemencoding() ])  

The single post I could find on stackexchange that had this exact problem, was resolved by modifying "something.exe", because it was an app they controlled.  That doesn't solve my problem, where I don't have control of it.

I don't even know how to ask for help.  But if someone has experience with this and can help me fix it, it's worth 2 BTC to me.  I've wasted almost a full day on this!  (p.s. this doesn't seem to be a problem in Linux, for which preferred and fs encoding are all UTF-8... it's only a problem in Windows).

I suppose you can just create a directory or file in Windows with a ton of crazy unicode characters, and then attempt to run a Popen command using that file or directory as an argument.  It will fail.  

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
flatfly
Legendary
*
Offline Offline

Activity: 1078
Merit: 1016

760930


View Profile
April 29, 2013, 09:18:51 PM
 #2

Hi etotheipi,

I don't have access to a PC right now but I did have to deal with this kind of annoyances in the past...
Can you simply try to add the below comment as the first line of your test file?
#encoding=utf-8

Does this fix the issue? If not, I'll dig some more into it tomorrow evening...
jackjack
Legendary
*
Offline Offline

Activity: 1176
Merit: 1255


May Bitcoin be touched by his Noodly Appendage


View Profile
April 29, 2013, 09:40:05 PM
 #3

I thought it was # -*- coding: utf-8 -*-

Looks like it's a OS bug though

Own address: 19QkqAza7BHFTuoz9N8UQkryP4E9jHo4N3 - Pywallet support: 1AQDfx22pKGgXnUZFL1e4UKos3QqvRzNh5 - Bitcointalk++ script support: 1Pxeccscj1ygseTdSV1qUqQCanp2B2NMM2
Pywallet: instructions. Encrypted wallet support, export/import keys/addresses, backup wallets, export/import CSV data from/into wallet, merge wallets, delete/import addresses and transactions, recover altcoins sent to bitcoin addresses, sign/verify messages and files with Bitcoin addresses, recover deleted wallets, etc.
etotheipi (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 29, 2013, 09:40:22 PM
 #4

For reference, QuantumFoam might have found the answer already.  He pointed me to using the win32process::CreateProcessW method which actually looks like it will work.  I haven't tried it yet, but I did a little googling about it and it looks like it's the correct answer.  I just want to get his answer here so no one posts "first", instead of him.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
etotheipi (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 29, 2013, 09:41:32 PM
 #5

Hi etotheipi,

I don't have access to a PC right now but I did have to deal with this kind of annoyances in the past...
Can you simply try to add the below comment as the first line of your test file?
#encoding=utf-8

Does this fix the issue? If not, I'll dig some more into it tomorrow evening...

The problem is not the source-file encoding.  I think that's what you're talking about, and would only matter if the source file itself had non-ASCII in it.  Is this correct?

The problem is not the source file, but rather, strings and filesystem objects that are handled by the code.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
yuzhe
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
April 29, 2013, 11:00:54 PM
 #6

In my case it appears to be utf8/codepage confusion:

After setting:

DEFAULT_ENCODING = locale.getpreferredencoding()

in armoryengine.py:89, it no longer complains about non-existent --satoshi-datadir

Furthermore, popen works as expected:

Code:
import sys
import locale
from subprocess import *
Popen(['msg.exe', '*', '/server:127.0.0.1', sys.argv[1].decode(locale.getpreferredencoding()).encode(sys.getfilesystemencoding())])

Shows it exactly in popup as on commandline.
etotheipi (OP)
Legendary
*
Offline Offline

Activity: 1428
Merit: 1093


Core Armory Developer


View Profile WWW
April 29, 2013, 11:04:40 PM
 #7

In my case it appears to be utf8/codepage confusion:

After setting:

DEFAULT_ENCODING = locale.getpreferredencoding()

in armoryengine.py:89, it no longer complains about non-existent --satoshi-datadir

Furthermore, popen works as expected:

Code:
import sys
import locale
from subprocess import *
Popen(['msg.exe', '*', '/server:127.0.0.1', sys.argv[1].decode(locale.getpreferredencoding()).encode(sys.getfilesystemencoding())])

Shows it exactly in popup as on commandline.

I was able to get it to work in some contexts but not others.  When I got it to work from the command line, I wasn't able to get it working from the settings file, when set from the File->Settings menu.  But also I wasn't sure if the encoding was hitting the file correctly.  There was just so many combinations...

Also, it worked with some unicode, and not others.

Founder and CEO of Armory Technologies, Inc.
Armory Bitcoin Wallet: Bringing cold storage to the average user!
Only use Armory software signed by the Armory Offline Signing Key (0x98832223)

Please donate to the Armory project by clicking here!    (or donate directly via 1QBDLYTDFHHZAABYSKGKPWKLSXZWCCJQBX -- yes, it's a real address!)
yuzhe
Newbie
*
Offline Offline

Activity: 24
Merit: 0


View Profile
April 29, 2013, 11:12:06 PM
Last edit: April 29, 2013, 11:50:42 PM by yuzhe
 #8

Beware that sys.argv[] passed arguments are "ascii" (so you must decode(locale) first to get unicode) and every unicode you pass to windows in non-obvious ways, such as popen arguments, but even CreateProcessW, must be encoded to multibyte (sys.getfilesystemencoding() is reasonably portable for that). Doing:

Code:
Popen(['msg.exe', '*', '/server:127.0.0.1', unicode(sys.argv[1].decode(locale.getpreferredencoding()))])

Is wrong (python will try to convert to mbcs, but with ascii source encoding). Generally one should be careful with win32api args, and env variables (including cmdline). Everything else in python is unicode....

I was able to get it to work in some contexts but not others.  When I got it to work from the command line, I wasn't able to get it working from the settings file, when set from the File->Settings menu.  But also I wasn't sure if the encoding was hitting the file correctly.  There was just so many combinations...

Also, it worked with some unicode, and not others.


To demonstrate the dialog bug:

Code:
  File "armoryqt.py", line 716, in openSettings
    dlgSettings = DlgSettings(self, self)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdialogs.py", line 10073, in __init__
    '(%s)' % BTC_HOME_DIR, size=2)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 212, in __init__
    self.setText(txt, **kwargs)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 215, in setText
    text = unicode(text)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 51: ordinal not in range(128)

Traceback (most recent call last):
  File "armoryqt.py", line 716, in openSettings
    dlgSettings = DlgSettings(self, self)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdialogs.py", line 10073, in __init__
    '(%s)' % BTC_HOME_DIR, size=2)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 212, in __init__
    self.setText(txt, **kwargs)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 215, in setText
    text = unicode(text)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 51: ordinal not in range(128)

Once again, we're trying to convert mbcs string without specifying source encoding (ie wherever the string comes from should be decode('mbcs') first).

Seems like Qt suffers from same behaviour (mbcs strings are treated as ascii).

All of this madness probably stems from the fact that mbcs is only subset of utf16.

Partial fix for command line:
https://github.com/wyuzhe/BitcoinArmory/commit/fd7ff04bd0b343ad119980c85996840803771a1d
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!