Word XP converts characters

G

Gert

Hey,

I have a very strange problem when mailmerging :
I have a *.dat file containing the data (which may vary) and actuall
it is a tab delimited text file (I may/can not change the extentio
because it's determined by an other app.)

The first word in this *.dat file is "NEDERLANDS", just to make sur
that Word recognizes the language in this file as Dutch.

When merging a Word-document with this file, some characters are show
as Japanese/Chinese characters. It only happens when the data contain
é / à / è / ... which are rather normal characters in Dutch.
The merge is part of a VBA-macro.

In Word XP, I've turned off the 'Confirm Conversion on opening', th
grammar control, the spell checking and the determine language option.

Does anyone have a solution to prevent this oddity from happening ?

TIA,

Ger
 
G

Gert

Just giving more information on the topic :

as said, an app. makes a *.dat file containing data and the first word
is NEDERLANDS (to make sure Word XP recognizes it as Dutch).

When I open this *.dat file with Notepad, everything is OK.
When I try to open this *.dat file with Word XP, sometimes (!) Word XP
asks about a conversion. And when it does, the Japanese (Shift-JIS)
code is suggested.

The mailmerge is executed by a macro, and so it is not possible to
choose a right 'conversion' manually. The macro contains a send keys
(ENTER) on this.

So, what I'd like to know :
- is placing the word "Nederlands" a good thing to make sure Word
recognizes our data as being Dutch ?
- is there a way (in the macro or in the .dat-file) to make sure that
Word uses the right conversion ?
 
P

Peter Jamieson

I think the only option is probably to do a separate conversion of the file
format before you use the
file as a data source. However, this may introduce other problems.

If you convert to another type of encoded text file, as far as I know, the
only text file formats that Word will /always/ recognise correctly are the
Unicode ones, particularly UTF-8, as long as they start with the Unicode
Byte Order Mark (BOM). Although the BOM is strictly speaking optional, both
Notepad (on WIndows 2000 and later) and Word always insert it. However, the
other problem is that Word
always seems to display the Encoding dialog box when the file is Unicode, so
the only thing you really gain by using this approach is that you do not
have to choose the right encoding (as it is already selected), and the
characters should be correct.

If you convert to a Word .doc file, you may encounter performance problems
and (possibly) restrictions on the column count. But otherwise, that is
probably the best option.

To do the conversion, you need a simple macro, but you might need to do more
to cope with different file names and so on.

E.g.


Sub ConvertToUTF8()
' convert to a UTF8 format text file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to SaveAsAOCLetter

oDoc.SaveAs _
FileName:="the path name of the file to convert to.txt", _
FileFormat:=wdFormatUnicodeText, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub

or

Sub ConvertToWord()
' convert to a Word document file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to Encoding

oDoc.SaveAs _
FileName:="the path name of the file to convert to.doc", _
FileFormat:=wdFormatDocument, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub
 
C

Cindy M -WordMVP-

Hi Gert,

I have a couple of thoughts following up on Peter's response:

1. If you rename this to a *.txt file does it work? (Note that I'm not
suggesting you do so permanently, but this will give us a better idea of
what process is connecting to the data)

2. How about using *.prn, instead?

3. If you activate Tools/OPtions/Confirm conversions on open you can
choose to use OLE DB, Word's internal file converter or ODBC to connect.
(ODBC may not be in the list, but activate "Show all" and you can see
it.) Test with these different methods. Does any one of them give you a
better result?

4. Possibly, you could set up a DSN for the ODBC driver to work
specifically with TAB and a different character set if none of the above
help very much. The bigger problem then becomes the distribution of the
DSN.

5. Do you NEED any Asian languages in your work? If not, one could try
deactivating (and uninstalling) any Asian language support for Office
and see if this helps.

Additonal note: It's quite possible that the program creating a *.dat
file did so explicitly for *previous* versions of Office to force Word
to use its internal text converter to open the data source.
Unfortunately, OLE DB does recognize *.dat as something it can work
with, so this "trick" doesn't work with Word 2002 and later.
I have a very strange problem when mailmerging :
I have a *.dat file containing the data (which may vary) and actually
it is a tab delimited text file (I may/can not change the extention
because it's determined by an other app.)

The first word in this *.dat file is "NEDERLANDS", just to make sure
that Word recognizes the language in this file as Dutch.

When merging a Word-document with this file, some characters are shown
as Japanese/Chinese characters. It only happens when the data contains
é / à / è / ... which are rather normal characters in Dutch.
The merge is part of a VBA-macro.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Sep 30 2003)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question
or reply in the newsgroup and not by e-mail :)
 
P

Peter Jamieson

Hi CIndy,

FWIW my response came from a conversation early last year and now I look
further
a. I see your response in that conversation...

"How about using ODBC as the connection method? I think that doesn't call up
a prompt..."
b. I remember thinking that that might be the way to go...but the original
questioner never told us what worked, if anything.
 
P

Peter Jamieson

Hello Gert,

There have been several replies already with suggestions and some conversion
macros.

Although I think using the word NEDERLANDS was worth trying, your experience
already says that it has not helped solve the problem. I do not know exactly
how Word determines the character set of a data file, but I suspect that
a. it uses a standard Win32 routine that takes a piece of text and tries to
determine the character set/locale
b. that routine is easily confused, particularly when there are a lot of
delimiter characters in the text.

Just in case you cannot see the suggestions, here they are again:

From Cindy Meister:

-----------------------------------------------------------------
Hi Gert,

I have a couple of thoughts following up on Peter's response:

1. If you rename this to a *.txt file does it work? (Note that I'm not
suggesting you do so permanently, but this will give us a better idea of
what process is connecting to the data)

2. How about using *.prn, instead?

3. If you activate Tools/OPtions/Confirm conversions on open you can
choose to use OLE DB, Word's internal file converter or ODBC to connect.
(ODBC may not be in the list, but activate "Show all" and you can see
it.) Test with these different methods. Does any one of them give you a
better result?

4. Possibly, you could set up a DSN for the ODBC driver to work
specifically with TAB and a different character set if none of the above
help very much. The bigger problem then becomes the distribution of the
DSN.

5. Do you NEED any Asian languages in your work? If not, one could try
deactivating (and uninstalling) any Asian language support for Office
and see if this helps.

Additonal note: It's quite possible that the program creating a *.dat
file did so explicitly for *previous* versions of Office to force Word
to use its internal text converter to open the data source.
Unfortunately, OLE DB does recognize *.dat as something it can work
with, so this "trick" doesn't work with Word 2002 and later.

-----------------------------------------------------------------

From me (but I think you should pursue Cindy's suggestions before trying
this, especially the ODBC ones, and if you do it is worth adding that ODBC
creates a SCHEMA.INI file in the folder containing the .DAT file and that it
is possible to eidt that using e.g. Notepad to specify a character set of
ANSI or OEM. It is also possible to specify a Windows character set number,
but
a. that is an undocumented "feature"
b. any changes in the ODBC Administrator will overwrite acharacter set
number with either ANSI or OEM.
-----------------------------------------------------------------
I think the only option is probably to do a separate conversion of the file
format before you use the
file as a data source. However, this may introduce other problems.

If you convert to another type of encoded text file, as far as I know, the
only text file formats that Word will /always/ recognise correctly are the
Unicode ones, particularly UTF-8, as long as they start with the Unicode
Byte Order Mark (BOM). Although the BOM is strictly speaking optional, both
Notepad (on WIndows 2000 and later) and Word always insert it. However, the
other problem is that Word
always seems to display the Encoding dialog box when the file is Unicode, so
the only thing you really gain by using this approach is that you do not
have to choose the right encoding (as it is already selected), and the
characters should be correct.

If you convert to a Word .doc file, you may encounter performance problems
and (possibly) restrictions on the column count. But otherwise, that is
probably the best option.

To do the conversion, you need a simple macro, but you might need to do more
to cope with different file names and so on.

E.g.


Sub ConvertToUTF8()
' convert to a UTF8 format text file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to SaveAsAOCLetter

oDoc.SaveAs _
FileName:="the path name of the file to convert to.txt", _
FileFormat:=wdFormatUnicodeText, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub

or

Sub ConvertToWord()
' convert to a Word document file

' Needs error checking etc.
Dim oDoc as Word.Document

' change msoEncodingWestern to be the encoding you need. I think this should
work.

Set oDoc = Documents.Open("the path name of the file you need to
convert.txt", _
False, , False, , , , , , _
wdOpenFormatEncodedText, _
msoEncodingWestern, _
False, False, , True)

' Several of the parameters here are optional or
' irrelevant - you can probably remove the lines from
' ReadOnlyRecommended to Encoding

oDoc.SaveAs _
FileName:="the path name of the file to convert to.doc", _
FileFormat:=wdFormatDocument, _
AddToRecentFiles:=False, _
ReadOnlyRecommended:=False, _
EmbedTrueTypeFonts:=False, _
SaveNativePictureFormat:=False, _
SaveFormsData:=False, _
SaveAsAOCLetter:=False, _
Encoding:=msoEncodingUTF8, _
InsertLineBreaks:=False, _
AllowSubstitutions:=False, _
LineEnding:=wdCRLF

oDoc.Close Savechanges:=False
Set oDoc = Nothing
End Sub
 
C

Cindy M -WordMVP-

Hi Peter,
FWIW my response came from a conversation early last year
ah, I do remember that, now; hadn't made the connection,
before. Thanks for the reminder!

Cindy Meister
 
G

Gert

Peter & Cindy,

thanks for your replies.

I'm going to give your suggestions a shot and I'll let you know what
comes out !!

Thanks again,

Gert
 
G

Gert

Dear Peter & Cindy,

I've been trying your suggestions but switching the extention to *.txt
or *prn did not work.

But there is something else that seems to be working :

Our data source contains a variable amount of data. This data depends
on the number of fields used in the document before merging. We do this
because Word XP really slows things down and people just don't like
waiting...

What did we change ? As mentioned, we putted the word NEDERLANDS in our
sourcefile. Now we have made an addittion : we also put the clients
name, address en 3 other parameters as standard in the sourcefile.
Since then, the japanese / chinese characters did not occur.

I cross my fingers because it still needs some testing...

If something (or everything) is not clear to you, let me know and I 'll
try to rephrase it.

Greetings and thanks,

Gert
 
C

Cindy M -WordMVP-

Hi Gert,
What did we change ? As mentioned, we putted the word NEDERLANDS in our
sourcefile. Now we have made an addittion : we also put the clients
name, address en 3 other parameters as standard in the sourcefile.
Since then, the japanese / chinese characters did not occur.

I cross my fingers because it still needs some testing...

If something (or everything) is not clear to you, let me know and I 'll
try to rephrase it.
I believe I understand what you're saying, and if it's working, this is
great news. I'm seeing a lot of similar problem reports in the German
groups... What's not quite clear to me is how the text file looks when
you've added this information. could you just copy paste the relevant
lines (of a data source with only a few fields) - up through the field
names - into a reply?

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Sep 30 2003)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or
reply in the newsgroup and not by e-mail :)
 
G

GeDiSoft

Hello Cindy,

The files look like this:
(datasource: huw.dat)
Nederlands Grote Markt 34 8900 Ieper West
Vlaanderen Ieper 116 Devoldere Steven André Albien

(headersource: huwveld.dat)
M_0 M_1 M_2 M_3 M_4 M_5 M_52 M_53 M_55


Unfortunetaly, the problem isn't solved :( . It's beter then it wa
before, but sometimes the chinese characters are still there.....s
we'll (have to) look further for a solution.

greetz
Geert D
 
G

GeDiSoft

Hey Peter,

If we get our hands on the fix, we'll surely have a look at it.

Thnx in advance

Geert D.
 
G

GeDiSoft

Hi Cindy / Peter,

It seems that we finally found a workaround (I refuse to call it
solution ;(....This workes fine for about a month now....

As the first field (M_0) in our datafield file, we export a sentence i
dutch ("Dit is een nederlandse tekst. Het is de bedoeling dat Word di
herkent als een stuk nederlandse tekst om te zorgen dat he
samenstellen nu wel goed gaat en er dus geen vreemde tekens i
voorkomen. Als dit werkt bewijst onze geliefde leverancier dat z
niets ").
Freely translated "This is a dutch text. The intention is that Wor
recognizes this as a piece of dutch text to make sure that the merg
goes well and no strange characters occur. If this works, ou
favourite sorftwaremaker proves that they.....).

Thnx a lot for the advices and help
Geert & Ger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top