Word XP converts characters

Gert · Jan 15, 2004

Hey,

I have a very strange problem when mailmerging :
I have a *.dat file containing the data (which may vary) and actuall
it is a tab delimited text file (I may/can not change the extentio
because it's determined by an other app.)

The first word in this *.dat file is "NEDERLANDS", just to make sur
that Word recognizes the language in this file as Dutch.

When merging a Word-document with this file, some characters are show
as Japanese/Chinese characters. It only happens when the data contain
é / à / è / ... which are rather normal characters in Dutch.
The merge is part of a VBA-macro.

In Word XP, I've turned off the 'Confirm Conversion on opening', th
grammar control, the spell checking and the determine language option.

Does anyone have a solution to prevent this oddity from happening ?

TIA,

Ger

Gert · Jan 16, 2004

Just giving more information on the topic :

as said, an app. makes a *.dat file containing data and the first word
is NEDERLANDS (to make sure Word XP recognizes it as Dutch).

When I open this *.dat file with Notepad, everything is OK.
When I try to open this *.dat file with Word XP, sometimes (!) Word XP
asks about a conversion. And when it does, the Japanese (Shift-JIS)
code is suggested.

The mailmerge is executed by a macro, and so it is not possible to
choose a right 'conversion' manually. The macro contains a send keys
(ENTER) on this.

So, what I'd like to know :
- is placing the word "Nederlands" a good thing to make sure Word
recognizes our data as being Dutch ?
- is there a way (in the macro or in the .dat-file) to make sure that
Word uses the right conversion ?

Peter Jamieson · Jan 16, 2004

I think the only option is probably to do a separate conversion of the file
format before you use the
file as a data source. However, this may introduce other problems.

If you convert to another type of encoded text file, as far as I know, the
only text file formats that Word will /always/ recognise correctly are the
Unicode ones, particularly UTF-8, as long as they start with the Unicode
Byte Order Mark (BOM). Although the BOM is strictly speaking optional, both
Notepad (on WIndows 2000 and later) and Word always insert it. However, the
other problem is that Word
always seems to display the Encoding dialog box when the file is Unicode, so
the only thing you really gain by using this approach is that you do not
have to choose the right encoding (as it is already selected), and the
characters should be correct.

If you convert to a Word .doc file, you may encounter performance problems
and (possibly) restrictions on the column count. But otherwise, that is
probably the best option.

To do the conversion, you need a simple macro, but you might need to do more
to cope with different file names and so on.

E.g.

Sub ConvertToUTF8()
' convert to a UTF8 format text file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to SaveAsAOCLetter

oDoc.SaveAs _
FileName:="the path name of the file to convert to.txt", _
FileFormat:=wdFormatUnicodeText, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub

or

Sub ConvertToWord()
' convert to a Word document file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to Encoding

oDoc.SaveAs _
FileName:="the path name of the file to convert to.doc", _
FileFormat:=wdFormatDocument, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub

Cindy M -WordMVP- · Jan 16, 2004

Hi Gert,

I have a couple of thoughts following up on Peter's response:

1. If you rename this to a *.txt file does it work? (Note that I'm not
suggesting you do so permanently, but this will give us a better idea of
what process is connecting to the data)

2. How about using *.prn, instead?

3. If you activate Tools/OPtions/Confirm conversions on open you can
choose to use OLE DB, Word's internal file converter or ODBC to connect.
(ODBC may not be in the list, but activate "Show all" and you can see
it.) Test with these different methods. Does any one of them give you a
better result?

4. Possibly, you could set up a DSN for the ODBC driver to work
specifically with TAB and a different character set if none of the above
help very much. The bigger problem then becomes the distribution of the
DSN.

5. Do you NEED any Asian languages in your work? If not, one could try
deactivating (and uninstalling) any Asian language support for Office
and see if this helps.

Additonal note: It's quite possible that the program creating a *.dat
file did so explicitly for *previous* versions of Office to force Word
to use its internal text converter to open the data source.
Unfortunately, OLE DB does recognize *.dat as something it can work
with, so this "trick" doesn't work with Word 2002 and later.

I have a very strange problem when mailmerging :
I have a *.dat file containing the data (which may vary) and actually
it is a tab delimited text file (I may/can not change the extention
because it's determined by an other app.)

The first word in this *.dat file is "NEDERLANDS", just to make sure
that Word recognizes the language in this file as Dutch.

When merging a Word-document with this file, some characters are shown
as Japanese/Chinese characters. It only happens when the data contains
é / à / è / ... which are rather normal characters in Dutch.
The merge is part of a VBA-macro.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Sep 30 2003)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question
or reply in the newsgroup and not by e-mail

Peter Jamieson · Jan 16, 2004

Hi CIndy,

FWIW my response came from a conversation early last year and now I look
further
a. I see your response in that conversation...

"How about using ODBC as the connection method? I think that doesn't call up
a prompt..."
b. I remember thinking that that might be the way to go...but the original
questioner never told us what worked, if anything.

Peter Jamieson · Jan 17, 2004

Hello Gert,

There have been several replies already with suggestions and some conversion
macros.

Although I think using the word NEDERLANDS was worth trying, your experience
already says that it has not helped solve the problem. I do not know exactly
how Word determines the character set of a data file, but I suspect that
a. it uses a standard Win32 routine that takes a piece of text and tries to
determine the character set/locale
b. that routine is easily confused, particularly when there are a lot of
delimiter characters in the text.

Just in case you cannot see the suggestions, here they are again:

From Cindy Meister:

-----------------------------------------------------------------
Hi Gert,

I have a couple of thoughts following up on Peter's response:

1. If you rename this to a *.txt file does it work? (Note that I'm not
suggesting you do so permanently, but this will give us a better idea of
what process is connecting to the data)

2. How about using *.prn, instead?

3. If you activate Tools/OPtions/Confirm conversions on open you can
choose to use OLE DB, Word's internal file converter or ODBC to connect.
(ODBC may not be in the list, but activate "Show all" and you can see
it.) Test with these different methods. Does any one of them give you a
better result?

4. Possibly, you could set up a DSN for the ODBC driver to work
specifically with TAB and a different character set if none of the above
help very much. The bigger problem then becomes the distribution of the
DSN.

5. Do you NEED any Asian languages in your work? If not, one could try
deactivating (and uninstalling) any Asian language support for Office
and see if this helps.

Additonal note: It's quite possible that the program creating a *.dat
file did so explicitly for *previous* versions of Office to force Word
to use its internal text converter to open the data source.
Unfortunately, OLE DB does recognize *.dat as something it can work
with, so this "trick" doesn't work with Word 2002 and later.

-----------------------------------------------------------------

From me (but I think you should pursue Cindy's suggestions before trying
this, especially the ODBC ones, and if you do it is worth adding that ODBC
creates a SCHEMA.INI file in the folder containing the .DAT file and that it
is possible to eidt that using e.g. Notepad to specify a character set of
ANSI or OEM. It is also possible to specify a Windows character set number,
but
a. that is an undocumented "feature"
b. any changes in the ODBC Administrator will overwrite acharacter set
number with either ANSI or OEM.
-----------------------------------------------------------------
I think the only option is probably to do a separate conversion of the file
format before you use the
file as a data source. However, this may introduce other problems.

If you convert to another type of encoded text file, as far as I know, the
only text file formats that Word will /always/ recognise correctly are the
Unicode ones, particularly UTF-8, as long as they start with the Unicode
Byte Order Mark (BOM). Although the BOM is strictly speaking optional, both
Notepad (on WIndows 2000 and later) and Word always insert it. However, the
other problem is that Word
always seems to display the Encoding dialog box when the file is Unicode, so
the only thing you really gain by using this approach is that you do not
have to choose the right encoding (as it is already selected), and the
characters should be correct.

If you convert to a Word .doc file, you may encounter performance problems
and (possibly) restrictions on the column count. But otherwise, that is
probably the best option.

To do the conversion, you need a simple macro, but you might need to do more
to cope with different file names and so on.

E.g.

Sub ConvertToUTF8()
' convert to a UTF8 format text file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to SaveAsAOCLetter

oDoc.SaveAs _
FileName:="the path name of the file to convert to.txt", _
FileFormat:=wdFormatUnicodeText, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub

or

Sub ConvertToWord()
' convert to a Word document file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to Encoding

oDoc.SaveAs _
FileName:="the path name of the file to convert to.doc", _
FileFormat:=wdFormatDocument, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub

Cindy M -WordMVP- · Jan 18, 2004

Hi Peter,

FWIW my response came from a conversation early last year

ah, I do remember that, now; hadn't made the connection,
before. Thanks for the reminder!

Cindy Meister

Gert · Jan 19, 2004

Peter & Cindy,

thanks for your replies.

I'm going to give your suggestions a shot and I'll let you know what
comes out !!

Thanks again,

Gert

Gert · Jan 22, 2004

Dear Peter & Cindy,

I've been trying your suggestions but switching the extention to *.txt
or *prn did not work.

But there is something else that seems to be working :

Our data source contains a variable amount of data. This data depends
on the number of fields used in the document before merging. We do this
because Word XP really slows things down and people just don't like
waiting...

What did we change ? As mentioned, we putted the word NEDERLANDS in our
sourcefile. Now we have made an addittion : we also put the clients
name, address en 3 other parameters as standard in the sourcefile.
Since then, the japanese / chinese characters did not occur.

I cross my fingers because it still needs some testing...

If something (or everything) is not clear to you, let me know and I 'll
try to rephrase it.

Greetings and thanks,

Gert

Cindy M -WordMVP- · Jan 23, 2004

Hi Gert,

What did we change ? As mentioned, we putted the word NEDERLANDS in our
sourcefile. Now we have made an addittion : we also put the clients
name, address en 3 other parameters as standard in the sourcefile.
Since then, the japanese / chinese characters did not occur.

I cross my fingers because it still needs some testing...

If something (or everything) is not clear to you, let me know and I 'll
try to rephrase it.

I believe I understand what you're saying, and if it's working, this is
great news. I'm seeing a lot of similar problem reports in the German
groups... What's not quite clear to me is how the text file looks when
you've added this information. could you just copy paste the relevant
lines (of a data source with only a few fields) - up through the field
names - into a reply?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Sep 30 2003)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail

Gert · Jan 27, 2004

Cindy,

I'll try to respond to your request this week !

Gert

GeDiSoft · Jan 28, 2004

Hello Cindy,

The files look like this:
(datasource: huw.dat)
Nederlands Grote Markt 34 8900 Ieper West
Vlaanderen Ieper 116 Devoldere Steven André Albien

(headersource: huwveld.dat)
M_0 M_1 M_2 M_3 M_4 M_5 M_52 M_53 M_55

Unfortunetaly, the problem isn't solved

. It's beter then it wa
before, but sometimes the chinese characters are still there.....s
we'll (have to) look further for a solution.

greetz
Geert D

Peter Jamieson · Jan 31, 2004

Geert -

This one is new to me - might be worth a look:

http://support.microsoft.com/default.aspx?scid=kb;en-us;290981

GeDiSoft · Feb 3, 2004

Hey Peter,

If we get our hands on the fix, we'll surely have a look at it.

Thnx in advance

Geert D.

GeDiSoft · Mar 3, 2004

Hi Cindy / Peter,

It seems that we finally found a workaround (I refuse to call it
solution ;(....This workes fine for about a month now....

As the first field (M_0) in our datafield file, we export a sentence i
dutch ("Dit is een nederlandse tekst. Het is de bedoeling dat Word di
herkent als een stuk nederlandse tekst om te zorgen dat he
samenstellen nu wel goed gaat en er dus geen vreemde tekens i
voorkomen. Als dit werkt bewijst onze geliefde leverancier dat z
niets ").
Freely translated "This is a dutch text. The intention is that Wor
recognizes this as a piece of dutch text to make sure that the merg
goes well and no strange characters occur. If this works, ou
favourite sorftwaremaker proves that they.....).

Thnx a lot for the advices and help
Geert & Ger

Peter Jamieson · Mar 4, 2004

Uitstekend!

Chinese characters	17	May 28, 2010
Mailmerge plain textfile selects double-byte (Chinese) characterse	1	Jun 6, 2007
Help merging japanese characters	2	Feb 15, 2006
Problem displaying Chinese characters	6	Sep 17, 2008
-- strange characters in mail merge	2	Oct 22, 2003
OCR for Japanese characters.	0	Feb 8, 2007
Help merging worksheets with more than 255 characters...	5	Nov 14, 2008
Will Word mailmerge more than 255 characters from an Excel file?	4	Feb 7, 2007

Word XP converts characters

Gert

Gert

Peter Jamieson

Cindy M -WordMVP-

Peter Jamieson

Peter Jamieson

Cindy M -WordMVP-

Gert

Gert

Cindy M -WordMVP-

Gert

GeDiSoft

Peter Jamieson

GeDiSoft

GeDiSoft

Peter Jamieson

Ask a Question

Similar Threads