Non-English language problems

Non-English language problems

Post by mjame » Thu, 25 Aug 2005 23:22:07


I'm trying to use VB6 to read a Word Document, manipulate the text,
then output the results to a text file. Parts of the document are in
non-English languages, and I can't get it to work properly.

Code snippet as follows:
Set wordDoc = wordApp.Documents.Open("c:\Document1.doc", , True)
Open "c:\Results.txt" For Output As 1
' looping through cells in a table
With wordDoc.Tables(1)
for nRow = 1 to .Rows.Count
strVal = .Cell(nRow, 1).Range.Text
' manipulate text omitted
Print #1, strVal
next nRow
end with
Close #1
wordDoc.Close

While looping, if I set break points and watch strVal, all that I see
is a number of ? characters. Also, this is what is printed to the
file. I don't entirely expect the watch window to work, but I would
expect better results from the file.

I have tried to open the document with an Encoding value, but that
doesn't seem to help.

If anyone has any ideas on how to improve my current results, I'd
appreciate it.

Thanks!
Matt
 
 
 

Non-English language problems

Post by Tony Proct » Fri, 26 Aug 2005 01:42:38

The Word doc will probably be holding the data in Unicode -- or actually
UTF-8 which is a serialised format for Unicode. When read into VB String
variables, the same data will be real Unicode (no problem there either).
However, when displayed in the IDE, or when written to files, the data has
to be converted to the current ANSI character set. Unless you're operating
in the same locale as that to which the data applies (e.g. Chinese) then you
won't be able to render those characters. In other words, your ANSI
character set would be inappropriate for it. Word can display those
characters in your locale because it "jumps through hoops", internally.

Tony Proctor