Reading file saved from Excel as Unicode text

G

Greg Lovern

I inherited some Excel code that does this:

Set fso = CreateObject("Scripting.FileSystemObject")
strTempFile = fso.GetSpecialFolder(2) & "\" & fso.GetTempName & ".txt"
'2 = TemporaryFolder.

ActiveWorkbook.SaveAs(Filename:=strTempFile, FileFormat:=xlText)

nSourceFile = FreeFile
Open strTempFile For Input As #nSourceFile
sText = Input$(LOF(1), 1)
Close

Then, sText is uploaded to a server.

That works fine for English, German, and French. But then we added
Japanese. Excel's xlText (tab delimited) file format changes all
Japanese characters into question marks. So, I changed that line to
save as Unicode tab delimited instead:

ActiveWorkbook.SaveAs(Filename:=strTempFile,
FileFormat:=xlUnicodeText)


The file looks great in Notepad; Japanese characters are preserved.
But, then I get Error 62 on this line:

sText = Input$(LOF(1), 1)


So, I changed that section to this:

Open strTempFile For Binary As #nSourceFile
sText = InputB$(LOF(1), 1)


With that, I don't get errors in VBA, but the backend server chokes on
extra characters that I guess must be invisible in Notepad.


So I tried this:


Set oTextStream = fso_OpenTextFile(strTempFile )
sText = oTextStream.ReadAll


The result is garbled; where the Japanese characters should be,
instead there are random characters like this:

K0Q0f0$B!&(Bc0_0$B!&%?(BL0$B!&%f(B0$B!&!V(B0$B%-(B0$B!&%#(B0$B%F(B0$B%M(B0g0



Any suggestions on how I can preserve the Japanese characters without
getting extra characters that are invisible in Notepad?



Thanks,

Greg
 
J

James Snell

:

The result is garbled; where the Japanese characters should be,
instead there are random characters like this:

K0Q0f0・c0_0・タL0・ユ0・「0ã‚­0・ィ0テ0ãƒ0g0

That result isn't garbled - it looks exactly like a unicode string being
displayed in ascii, because it is one. UTF-16 would be a good educated guess.

VBA strings are not unicode, they're only ascii (well, 8-bit anyway). So if
you're handling UTF-16 strings then the VBA native string functions will be
useless to you (this may or may not apply to Excel 2007 strings, I've no
idea).

You need some unicode aware string tools, I'm not aware of any for vba but
that doesn't mean they don't exist. It's also a very complex area and I
wouldn't recommend anyone trying to write their own tools to handle it.


Put very approximately; UTF-16 works on words (pairs of bytes) to represent
a character. For example (byte values shown in hex) a space rendered as 20
in ascii and would be the byte pair 0020 in UTF-16*. A line break might be
000D which vba strings would interpret as two characters 00 (which would
display as a block or question mark as it has no ascii character to display)
and 0D the carriage return. Using 0D as the end of line marker (as is
correct) would leave the vba string handler with an extra byte 00 at the end
of the string. Culling this may be enough to get the string through but it's
not substitute for proper localisation functions.

There's more to it than that but it's enough to understand the scope of the
problem. Notepad knows about UTF-16 and a number of other text formats and
renders the text correctly, which is why it is fine there.


Not sure how much that actually helps but at least you know what's
happening...
 
G

Greg Lovern

You need some unicode aware string tools, I'm not aware of any for vba

I was under the impression that FileSystemObject is Unicode-aware; is
that not correct?

proper localisation functions.

Are these available in a library, or do you mean rolling your own?


Greg
 
J

James Snell

Sorry for the delay - I'm not always around to respond to posts.

FileSystemObject is unicode aware in that you can open a file with a unicode
name, the content that is returned by a stream is language neutral (it's just
a bytestream) which would need to be handled downstream. By the same token
you could oly open a unicode filename if you were able to represent that
unicode name in VBA.

I've not looked for existing libraries, on further inspection there are some
unicode functions available in vba such as ascw() & chrw(). It may be easier
to just open the file up as a hidden sheet / workbook.

The biggest issue is that you'd have to know what flavour of unicode the
server can speak (if any). If that's only ANSI (in which case you'd have to
rethink the use of japanese) if it's UTF-7 / UTF-8 then a script like
(http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=44569&lngWId=1) might help.

Do you have the info about the server? Once you know what that handles then
you might be able to web-search for a conversion script...
 
G

Greg Lovern

Thanks, we've been banging on this for a few days now and I've posted
a new question that reflects some little progress we've made and more
detail on the problem:

http://groups.google.com/group/micr.../browse_thread/thread/12681da5ff2fb1df?hl=en#

It turns out the server requires Shift-JIS, not Unicode. But we're
still trying to go through Unicode, then convert the Unicode to Shift-
JIS, because Excel won't save as Shift-JIS. Details in the thread
referenced above. I hope you look at that post and comment.


Thanks,

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top