CSV Format

C

CGA

I'm well aware of the accepted format of CSV files, optional headers, escaped
quotes and quoted values. What I've never found was a published standard.
Does anyone know what standard Microsoft follows and what organization
published the standard? The closest thing to a published standard I have
found is a recent proposed standard for the Text/CSV MIME type.
 
G

Gary Smith

So far as I know, there is no standard. Every software producer seems to
have come up with its own variant, and some use more than one. The best
are configurable so you can produce whatever you like.
 
H

Harlan Grove

Gary Smith wrote...
So far as I know, there is no standard. Every software producer seems to
have come up with its own variant, and some use more than one. The best
are configurable so you can produce whatever you like.

Further, CSV files are text files. There's no more standardization in
text files than there is in CSV files. On IBM mainframes, text files
use EBCDIC character encoding, while on most other systems they use
ASCII or Unicode. On Unix systems, lines end with a single linefeed
character, on Macs with a single carridge return, and on PCs/Windows
systems a carridge return-linefeed combination. Programmers call each
of these newlines.

Some software uses Unix escape character conventions for embedding
double quotes, e.g., "The next character \" is a double quote.", while
Excel uses doubled double quotes, e.g., "The next character "" is a
double quote." Most software doesn't accept newlines in fields, but
Excel does as long as they're embedded in double quote delimited
fields.

Finally, the comma isn't always the field separator. Most continental
European countries use commas as the decimal point in numbers with
fractional parts. Excel uses the Windows list separator character as
the field separator in CSV files.

Unfortunately, there are no published specifications for CSV files. As
general rules, it's safest to put double quotes around all strings and
never embed newlines. As for embedding double quotes in text fields,
you'll need to experiment.
 
G

Gary Smith

Harlan Grove said:
Finally, the comma isn't always the field separator. Most continental
European countries use commas as the decimal point in numbers with
fractional parts. Excel uses the Windows list separator character as
the field separator in CSV files.

The tab character is very commonly used as the separator and may be the
best choice in many situations. I've also seen the semicolon and pipe
used.
 
H

Harlan Grove

Gary Smith wrote...
....
The tab character is very commonly used as the separator and may be the
best choice in many situations. I've also seen the semicolon and pipe
used.

Tab isn't always safe. There are too many programs that convert tabs to
spaces automatically. Also, eye-check tab delimited files when fields
contain embedded spaces isn't reliable. Graphic characters are usually
best.
 
G

Gary Smith

Harlan Grove said:
Gary Smith wrote...
...
Tab isn't always safe. There are too many programs that convert tabs to
spaces automatically. Also, eye-check tab delimited files when fields
contain embedded spaces isn't reliable. Graphic characters are usually
best.

Of course, the choice -- if you have one -- necessarily depends on what
you've going to do with the file. No separator is always safe. Sometimes
it's a real challenge to find one that will get the job done.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top