What encoding is used when initializing sys.argv?

What encoding is used when initializing sys.argv?

Post by Petr Prikr » Sat, 01 Oct 2005 22:02:40


Hi,

When solving the problem of passing the unicode
directory name through command line into a script
(MS Windows environment), I have discovered that
I do not understand what encoding should be used
to convert the sys.argv into unicode.

I know about the rejected attempt to implement
sys.argvu. Still, how the sys.argv is filled? What
encoding is used when parsing the cmd line internally?
To what encoding is it converted when non ASCII
characters appear?

Thanks for your time and experience,
pepr


--
Petr Prikryl (prikrylp at skil dot cz)
 
 
 

What encoding is used when initializing sys.argv?

Post by Martin v. » Sun, 02 Oct 2005 07:19:17


Python does not perform any conversion whatsoever.
It has a traditional main() function, with the
char *argv[] argument.

So if you think that the arguments are inherently
Unicode on your system, your question should be
"how does my operating system convert the arguments"?

That, of course, depends on your operating system.
"MS Windows environment" is not precise enough, since
it also depends on the specific incarnation of that
environment. On Windows 9x, I believe the command
line arguments are "inherently" *not* in Unicode,
but in a char array. On Windows NT+, they are Unicode,
and Windows (or is it the MS VC runtime?) converts them
to characters using the CP_ACP code page.

Kind regards,
Martin

 
 
 

What encoding is used when initializing sys.argv?

Post by Neil Hodgs » Sun, 02 Oct 2005 07:47:57

Petr Prikryl:


Martin mentioned CP_ACP. In Python on Windows, this can be accessed
as the "mbcs" codec.

import sys
print repr(sys.argv[1])
print repr(unicode(sys.argv[1], "mbcs"))

C:\bin>python glurp.py abc
'abc\xdf\x95'
u'abc\xdf\u2022'

Neil
 
 
 

What encoding is used when initializing sys.argv?

Post by Tim Robert » Mon, 03 Oct 2005 15:15:48


There's another entry in my "keep this post forever" file.
--
- Tim Roberts, XXXX@XXXXX.COM
Providenza & Boekelheide, Inc.
 
 
 

What encoding is used when initializing sys.argv?

Post by Petr Prikr » Fri, 07 Oct 2005 15:39:21

Thanks, Martin v. Lis, Neil Hodgson, and Tim Roberts.
I really appreciate your valuable comments. It simply
works.

Thanks again,
Petr

"Neil Hodgson" wrote...