HELP: How do I get Formated Word Text into ASCII

R

regex_jedi

OK, I have searched the universe online and have not found anyone with
a solution that makes sense to me.

Here is what I am wanting to do:

I have a Word document. It contains text, and its formatted so that
the information on each line is somewhat important in its POSITION/
LOCATION. Like most Word docs, the information is in various fonts/
and variable spacing and font sizes apply, etc.

I would like to properly estimate the position of each word in any
given line and convert the same information to a mono-spaced text for
export.

Simply saving as a .txt file loses the positioning of each word
relative to the lines above and below it, obviously, because ASCII txt
is monospaced, while Word docs are reliant on fonts, kerns etc, for so
much of its positioning (not just the raw character position).

So.... What I would like to do is to write a routine that can
calculate (even if it is a reasonable estimate) the pixel position (or
DPI position, I guess in the print world) of the starting position of
each word on each line of a word document.

Is this possible? Is there something that could come close to doing
this? A function/routine, etc. Something I can access and execute
through COM/VB or such.

I am guessing Word programming gurus have had to figure out how to
location position in a word document for other reasons. Hoping this is
possible.

Regex Jedi...
 
L

LVTravel

regex_jedi said:
OK, I have searched the universe online and have not found anyone with
a solution that makes sense to me.

Here is what I am wanting to do:

I have a Word document. It contains text, and its formatted so that
the information on each line is somewhat important in its POSITION/
LOCATION. Like most Word docs, the information is in various fonts/
and variable spacing and font sizes apply, etc.

I would like to properly estimate the position of each word in any
given line and convert the same information to a mono-spaced text for
export.

Simply saving as a .txt file loses the positioning of each word
relative to the lines above and below it, obviously, because ASCII txt
is monospaced, while Word docs are reliant on fonts, kerns etc, for so
much of its positioning (not just the raw character position).

So.... What I would like to do is to write a routine that can
calculate (even if it is a reasonable estimate) the pixel position (or
DPI position, I guess in the print world) of the starting position of
each word on each line of a word document.

Is this possible? Is there something that could come close to doing
this? A function/routine, etc. Something I can access and execute
through COM/VB or such.

I am guessing Word programming gurus have had to figure out how to
location position in a word document for other reasons. Hoping this is
possible.

Regex Jedi...

Highlight all the text in Word and then change the font to Courier New which
is a fixed width font. Reformat the text and then Save As, .txt file type
which is plain text. You should then retain the proper character spacing
(including any tab positioning which are replaced with spaces in the .txt
format) in the final output. If you use any variable width font (times,
arial, etc.) you will always have issues with the spacing between words or
tables when you convert to a .txt file.

Hope this helps, let us know.
 
R

regex_jedi

The reason I wanted to know if there was a programming solution was
that I needed to convert literally hundreds of documents in an
automated way. The need to calculate the text placement is so that
the process could be automated. Saving it by hand, file by file
doesn't work. Even if it did work, the rendering of the files
currently in Word is the true correct format-- what I am trying to do
is estimate that formating, even if not exact, in an ASCII mode.

any programmatic tips on estimating the text location (in pixels), so
I can approximate it for a process like this?
 
M

MrBudgens

The following will give you a location in points:

Selection.Information(wdHorizontalPositionRelativeToPage)

You would loop through each element in your text and retrieve this
information and do whatever it is you need to do with it. You can
probably guess how to find the vertical position and use that to
detect when you are on a new line if you want to do that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top