code page

Carol · Jul 27, 2004

how I force the import wizard to use the ASCII code page rather than
UNICODE.

When importing a text file, the program freezes for a long time trying to
parse the file as unicode.
It may happen because there is a unicode character in there somewhere.
Still, I want to ignore that and treat the file as pure ascii.

John Nurick · Jul 27, 2004

Hi Carol,

Click the Advanced... button in the text import wizard. You can set the
code page there.

Carol · Jul 28, 2004

yes, but the thing locks up before I can get to that.
that is why i need to force it.

John Nurick · Jul 28, 2004

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)

yes, but the thing locks up before I can get to that.
that is why i need to force it.

Carol · Jul 28, 2004

NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

John Nurick said:
Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unico
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)

John Nurick · Jul 29, 2004

One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.

NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

Carol · Jul 29, 2004

hey thanks!!!!!!!!!!!!!!!!!!!

John Nurick said:
One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.

NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

John Nurick said:

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.

Click to expand...

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unic o

de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)

yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set the
code page there.

Click to expand...

Click to expand...

John Nurick · Jul 29, 2004

Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}

John Nurick said:
hey thanks!!!!!!!!!!!!!!!!!!!

John Nurick said:

One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.

NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unic

Click to expand...

o

de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)

yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set the
code page there.

Click to expand...

Click to expand...

Carol · Jul 30, 2004

John, thanks for your help!!
I believe you are right about the whole thing.
Thank you for the Perl script.
I think the CHKDT and CLEARDT are the problem.
Could this be a case where I need to offset the begin points as I 've heard
in this thread? It is strange that I am not get any values for CLEARDT.
Maybe my start point is wrong.
How would I possibly solve that? Trial and error? I guess that the starting
point depends on the field precending CHKDT andCLEARDT .

Second, are nulls the same thing as spaces?
i thought I was going from EBCDIC->ASCII not EBCDIC->ANSI. I obviously don;t
know the difference, and don't expect you to explain it all to me.
Thank you for your time, John

John Nurick said:
Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}

hey thanks!!!!!!!!!!!!!!!!!!!

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/o dbcjetschema_ini_file.asp

Click to expand...

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/uni c
o

de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a far-eastern
version of Windows or Office?)

yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can

Click to expand...

set
the

code page there.

how I force the import wizard to use the ASCII code page rather than
UNICODE.

When importing a text file, the program freezes for a long time trying
to
parse the file as unicode.
It may happen because there is a unicode character in there somewhere.
Still, I want to ignore that and treat the file as pure ascii.

Click to expand...

Click to expand...

Click to expand...

Click to expand...

Click to expand...

Click to expand...

Carol · Jul 30, 2004

I guess I mean 'slack bytes' when I say offset points.

Carol said:
John, thanks for your help!!
I believe you are right about the whole thing.
Thank you for the Perl script.
I think the CHKDT and CLEARDT are the problem.
Could this be a case where I need to offset the begin points as I 've heard
in this thread? It is strange that I am not get any values for CLEARDT.
Maybe my start point is wrong.
How would I possibly solve that? Trial and error? I guess that the starting
point depends on the field precending CHKDT andCLEARDT .

Second, are nulls the same thing as spaces?
i thought I was going from EBCDIC->ASCII not EBCDIC->ANSI. I obviously don;t
know the difference, and don't expect you to explain it all to me.
Thank you for your time, John

John Nurick said:

Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}

hey thanks!!!!!!!!!!!!!!!!!!!

One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:

Click to expand...

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/o
dbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.

NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to
identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first
two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.

Click to expand...

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/uni

Click to expand...

c

o
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a
far-eastern
version of Windows or Office?)

yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set
the
code page there.

how I force the import wizard to use the ASCII code page rather
than
UNICODE.

When importing a text file, the program freezes for a long time
trying
to
parse the file as unicode.
It may happen because there is a unicode character in there
somewhere.
Still, I want to ignore that and treat the file as pure ascii.

Click to expand...

Click to expand...

Click to expand...

Click to expand...

John Nurick · Jul 30, 2004

Carol,

I'm glad we seem to have worked it out.

Null bytes aren't the same as spaces. A space is ASCII hex 20 (decimal
32), and a null is ASCII 0. (A Null value in a database field is
something else again, a sort of non-value that represents the fact that
the actual value of the thingy that the field represents is unknown.)

As for ASCII and ANSI: if the file contains no accented characters,
line-drawing characters or the like, and the computer was made in the
last 20 years or so the difference is academic.

I've never used the software you mentioned - was it Novastor? Your
theory may be right: I can imagine a program inserting null bytes to
fill in gaps left by incorrect specification of the starting positions
and sizes of the COBOL fields. One place to check that is in CHKDT,
which seems to contain a yyyy-mm-dd date string followed by a null byte,
while the other date field came across fine. So maybe you've done
something that sets the width of the output field as 11 rather than 10,
so the program is padding it out with a null.

John Nurick said:
John, thanks for your help!!
I believe you are right about the whole thing.
Thank you for the Perl script.
I think the CHKDT and CLEARDT are the problem.
Could this be a case where I need to offset the begin points as I 've heard
in this thread? It is strange that I am not get any values for CLEARDT.
Maybe my start point is wrong.
How would I possibly solve that? Trial and error? I guess that the starting
point depends on the field precending CHKDT andCLEARDT .

Second, are nulls the same thing as spaces?
i thought I was going from EBCDIC->ASCII not EBCDIC->ANSI. I obviously don;t
know the difference, and don't expect you to explain it all to me.
Thank you for your time, John

John Nurick said:

Hi Carol,

I've had a look at the file. A terminal CRLF normally isn't a problem,
and I reckon the trouble is being caused by all the null bytes in the
fields CHKDT and CLEARDT. They are the only thing that could cause it to
be mistaken for Unicode.

Access 2002 couldn't understand your original file at all, but when I
got rid of the nulls with a find-and-replace in a text editor it
imported just fine.

I guess the nulls are the result of a problem in the EBCDIC->ANSI
conversion. If that can't be fixed it's not difficult to strip them out
before you import the file, using a little script or program in your
language of choice. If Perl is installed on your machine, here's the one
I used:

while (<>) {
#dispose of null bytes and trailing spaces in text fields
s/\x00+| +(?=")//g;

#remove quote marks and spaces from numeric fields
#so they import as numeric
s/" *(\d[\d.]+) *"/\1/g;

print;
}

hey thanks!!!!!!!!!!!!!!!!!!!

One possibility might be to use a SCHEMA.INI file with the file
specification including character set. This is documented very sketchily
in Help and somewhat better in the following links:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/o dbcjetschema_ini_file.asp

Create a Schema.ini file based on an existing table in your database:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;155512

http://support.microsoft.com/default.aspx?scid=kb;EN-US;149090
http://www.devx.com/tips/Tip/12566

Otherwise, if you email me a textfile that exhibits the problem I can
examine it for unusual features. Just remove the reversed spam trap from
my address.

NO I'M NOT USING FAR EASTERN oops
characters.

I am getting those tiny boxes and strange characters.

Umm. I don't understand exactly what "have a unicode character in there
somewhere" means. Typically there are two ways for a program to
identify
a textfile as Unicode (apart from being told by the user):

1) A byte order mark at the beginning of the file (e.g. if the first
two
bytes are FE FF the file should be interpreted as little-endian UTF-16,
if the first three bytes are EF BB BF it should be interpreted as
UTF-8).

2) Analysing the contents of the file to see if they look more like
Unicode than some other encoding.

Click to expand...

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/uni c
o
de_42jv.asp)

So: what's in your file that could be triggering one of these? Or does
the problem lie somewhere else. (By the way: are you using a
far-eastern
version of Windows or Office?)

yes, but the thing locks up before I can get to that.
that is why i need to force it.

Hi Carol,

Click the Advanced... button in the text import wizard. You can set
the
code page there.

how I force the import wizard to use the ASCII code page rather
than
UNICODE.

When importing a text file, the program freezes for a long time
trying
to
parse the file as unicode.
It may happen because there is a unicode character in there
somewhere.
Still, I want to ignore that and treat the file as pure ascii.

Click to expand...

Click to expand...

Click to expand...

Click to expand...

VMS ASCII file import into Access results in garbage	1	Feb 16, 2006
Page numbering with Merged Document and IF statements	1	Jan 19, 2022
Unicode text and carriage returns	4	Aug 26, 2009
Import ASCII Char 31 delimited file	0	Apr 15, 2008
Open a CSV how God meant it	1	May 1, 2009
Import File Line by Line	1	Jul 27, 2009
Odd character display (ASCII 128-159)	1	Jan 13, 2010
Publisher 2010 insert page break and importing Word docs	2	May 19, 2015

code page

Carol

John Nurick

Carol

John Nurick

Carol

John Nurick

Carol

John Nurick

Carol

Carol

John Nurick

Ask a Question

Similar Threads