How can I make Excel save Unicode CSV data correctly?

F

FooBarBaz

Excel 2007's Unicode for CSV (comma separated values) support seems sketchy
at best:
- Reads ANSI CSV OK.
- Reads Unicode (UTF-16, UCS-2) CSV OK.
- Regularly garbles UTF-8 CSV when reading (depends on the included
characters).
- Regularly garbles Unicode big-endian CSV when reading (depends on the
included characters).
- Writes ANSI CSV OK.
- Can't write any form of non-ASCII Unicode CSV data OK.

When saving into CSV format it saves in ANSI CSV and breaks any non-ANSI
characters by converting them into ? (question mark) characters.

To replicate:
1. Open notepad and type in the following line of text:
Degrees C (℃),foo
You can type the special ℃ symbol by pressing-and-holding the Alt key while
typing 2103 on the numeric keypad, then release the Alt.
2. File / Save as... into a file named "sample-unicode.csv" and select the
"Unicode" encoding.
3. Open the "sample-unicode.csv" file in Excel. The data is still displayed
correctly.
4. Press F12 to get the "Save as..." window, save into a file named
"sample-excel.csv" in CSV format.

Excel incorrectly writes the following data, omitting any kind of
UTF-8/Unicode/Unicode-BE BOM:
0000: 44 65 67 72 65 65 73 20 43 20 28 3F 29 2C 66 6F Degrees C (?),fo
0010: 6F 0D 0A o..

Correct Unicode UTF-8 output should look like the following:
0000: EF BB BF 44 65 67 72 65 65 73 20 43 20 28 E2 84 ...Degrees C (..
0010: 83 29 2C 66 6F 6F 0D 0A .),foo..
(Note the EF BB BF byte order mark)

Correct Unicode UTF-16/UCS-2 output should look like the following:
0000: FF FE 44 00 65 00 67 00 72 00 65 00 65 00 73 00 ..D.e.g.r.e.e.s.
0010: 20 00 43 00 20 00 28 00 03 21 29 00 2C 00 66 00 .C. .(..!).,.f.
0020: 6F 00 6F 00 0D 00 0A 00 o.o.....
(Note the FF FE byte order mark)

Interestingly, saving from Excel to "Unicode text" format does save into
UTF-16/UCS-2 correctly:
0000: FF FE 44 00 65 00 67 00 72 00 65 00 65 00 73 00
0010: 20 00 43 00 20 00 28 00 03 21 29 00 09 00 66 00
0020: 6F 00 6F 00 0D 00 0A 00

But it converts the comma field separators (\x002c) into tabs (\x0009).

How can I make Excel behave correctly and save into some kind of Unicode CSV
format?

I'm using Microsoft Office Excel 2007 (12.0.6300.5000) SP1 MSO
(12.0.6213.1000).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top