automating the insertion of HTML tables (or tab delimited files)

R

Rob

Hi,
I am completely new to Word programming, so I am not even sure if this
is the way to solve my problem, but here goes:

I have approximately 400-500 HTML tables in various HTML files (the
data making up these tables also exists in tab delimited ASCII files).
My goal is to insert the tables into a Word document.


The problems I am facing are:
1.) How do I "loop" over each HTML file & "grab" the table & insert
it into the appropriate place within the Word document? I can do this
manually by going to "Insert" then "file" then navigating to the
proper file & inserting it into Word, then manually moving my cursor
to the next place within Word and doing it again. However, because
there are, as I said, hundreds of tables, this obviously will not
work. I recorded a simple macro as a "proof of concept" to ensure I
could do what I wanted, but it needs to be a lot more robust:


Dim pCell1 As Word.Cell
Dim pCell2 As Word.Cell


Sub TableFillAndRefill(pTable1 As Table, pTable2 As Table)
'Copies tables cell for cell left to right
Dim pCell1 As Word.Cell
Dim pCell2 As Word.Cell
Set pCell1 = pTable1.Cell(1, 1)
Set pCell2 = pTable2.Cell(1, 1)
Do
pCell2.Range = Left$(pCell1.Range, Len(pCell1.Range) - 2)
Set pCell1 = pCell1.Next
Set pCell2 = pCell2.Next
Loop Until pCell1 Is Nothing
End Sub


Sub Insert_Table()
'
' Insert_Table Macro
' Macro recorded 12/1/2009 by Rob
'
ChangeFileOpenDirectory _
"D:\data\Rob\Desktop\macro_examples\Table_Macro\"
Selection.InsertFile FileName:="Table_Example.htm", Range:="", _
ConfirmConversions:=False, Link:=False, Attachment:=False
Selection.MoveDown Unit:=wdLine, Count:=6
Selection.InsertFile FileName:="Table_Example2.htm", Range:="", _
ConfirmConversions:=False, Link:=False, Attachment:=False
End Sub


2.) What I need help on are modifying the macro to place the tables
into specific places within Word - what is the best way to "prepare"
Word to accept table data in specific places? Do I use bookmarks, or
what?


The person providing the tables said that he could make "one huge"
HTML file and I could just import that into Word, but then the problem
would be how to correctly set up the captions within Word to say
"Table XX-X blah" and "Table XX-Y blah" such that the Table headings will
show up properly in a Table of Contents.


Another option is to insert the tab delimited files and have Word
convert them to Word tables. Whichever is easier, it doesn't matter
to me.


Thanks in advance,
 
D

Doug Robbins - Word MVP

It might be easier to work from the ASCII files.

How is the data arranged in those files and how do you want it arranged in
the Word document?

--
Hope this helps,

Doug Robbins - Word MVP

Please reply only to the newsgroups unless you wish to obtain my services on
a paid professional basis.
 
R

Rob

Doug,

The data is tab delimited in the ASCII files.

The data has the names (headings) of the columns, and the data as
well.

However, I can request from the person supplying the files to me to
arrange the data in any way I want, as this is a work in progress. He
is parsing data from some tests we are running, and putting the data
into tables in both HTML and tab delimited ASCII files. So he has
complete control over the formats of both, and if there is anything I
would need in order to import these tables into Word, I can request it
and he can accommodate it.

As my simple macro shows, I can import HTML tables into Word, but I
have no control over *where* the files are placed. In my sample code,
I have:

Selection.MoveDown Unit:=wdLine, Count:=6

which is the recorded actions of me manually moving the cursor down 6
lines. This amount (6) may vary. I was thinking I could perhaps
"key" off of the word "Table" or something like that, and then insert
the tables directly above the word "Table."

Another issue is that there are hundreds of tables, so how do I "loop"
over the files (right now each file contains one table), but my co-
worker said he could put all the tables into 1 huge HTML file if that
would be easier for me.

So, the main issues for me are:
1.) How do I "loop" over many files
2.) How do I place the data/tables into *specific* places within Word
3.) If there is additional text within the HTML and the tab delimited
ASCII files, how do I "key" off of a certain keyword or phrase such
that I *only* copy the data I'm interested in?
4.) Do I go ahead & set up "Table" captions within Word, and place
the tables above the captions?
5.) Is VBA the correct "path" or do I use C#, ASP, OLE, bookmarks,
references/hot links, etc?

Anyway, thanks for your response.

Rob

P.S. Sorry for also posting this to the microsoft.word.programming
forum - I mistakenly assumed they were completely different forums, as
I found this forum in Google Groups, and found the Microsoft forum
from Microsoft's website.
 
P

Peter Jamieson

First, there are two fairly simple non-VBA methods to insert material
from external files into Word:
{ INCLUDETEXT } fields
{ DATABASE } fields (more suited to inserting material from
tab-delimited files

Since you are probably unfamiliar with fields, you may want to read
around that subject a little. Word 2007 (is that the version you are
using?) is poor on that front, but there is plenty of material around.

For example, to insert the complete text of an HTML file called
c:\myinserts\t001.htm

you might do the following:
a. use ctrl-F9 to insert a pair of special "field braces", which look
like this { } but are not the ordinary keyboard characters
b. type INCLUDETEXT between the braces:
{ INCLUDETEXT }
c. type the pathname of the file you want to include, surrounded by
double quotes, with the backslashes doubled up:

{ INCLUDETEXT "c:\\myinserts\\t1.htm" }
d. Select the field code and press F9 to update it.
e. use Alt-F9 to switch between "field codes" and "field results"

If that basic technique works for you, you may be able to forget about
the VBA and just use the INCLUDETEXT fields. However, you would need to
verify a number of things, e.g.
a. does it work with 400-500 of these fields?
b. how long does it take to update them all - (use ctrl-F9 to select
the document body, then F9 to update all the fields)
c. When you want to finalise your document, make a backup copy of your
document, then replace the field codes by their results by selecting all
the codes (e.g. select the entire document body using ctrl-A, then press
ctrl-shift-F9 to "unlink" the field codes

Whatever approach you use, you will need to determine is "how do I know
which table to insert where?" There is either
a. some natural sequence in the names of the HTML or delimited files -
for example they are named t001, t002 etc., and when you loop through
you insert the next table at the next "placeholder" - however they are
specified, or
b. something in the placeholder tells you which table to insert at
that point, e.g. your table files are named t001, t002, but you pick up
the file name from a caption line that you inserted saying (say) Table T001.

For the purposes of automation, (a) is simpler but it means you have to
have been able to plan the file names and sequence from the very start.
(b) is more flexible but requires that when you insert your
placeholders, you insert each one manually with a different table
identifier, which is also quite a slog.

There is also option (c), which would mean "there's a natural sequence
with a manageable number of exceptions that you would need to identify
in your document"

With the INCLUDETEXT approach and option (a) or (c), what you would
probably do is nest a sequence number field inside your INCLUDETEXT
field, e.g. at its simplest you might have the following field at the
beginning of your file:

{ SEQ tn \r0 }

then use the following fields:

{ INCLUDETEXT "c:\\myinserts\\t{ SEQ tn \*Arabic }.htm" }

(You probably don't need the \*Arabic, which just says "use the natural
numbers 1,2,3...)", and as long as your sequence starts at 1 you don't
really need that extra SEQ field at the beginning either)

Since this is always the same set of fields, you can select the complete
field construction and create an autotext/building block from it. Then
for option (a) you would need to insert that block at each point where
you need a table. For option (c) I would probably do that, then fix any
out-of-sequence placeholders by
a. moving the { SEQ } field outside the INCLUDETEXT field and adding a
\h switch to hide its result. This would mean the sequence was
maintained for subsequent insertions
b. putting the correct name inside the INCLUDETEXT field, e.g.

{ INCLUDETEXT "c:\\myinserts\\t235.htm" }{ SEQ tn \*Arabic \h }

With this approach, so far we have:
So, the main issues for me are:
1.) How do I "loop" over many files

You do it when you insert your placeholders, then use the ctrl-A, F9
approach to update the field results
2.) How do I place the data/tables into *specific* places within Word

You do it using the suggested approach
3.) If there is additional text within the HTML and the tab delimited
ASCII files, how do I "key" off of a certain keyword or phrase such
that I *only* copy the data I'm interested in?

With HTML files, you should be able to specify HTMLL bookmarks to
"cover" specific parts of the content, e.g. using

<a name="mybookmarkname">whatever it is that you need to mark</a>

You then specify the bookmark as an additional parameter in the
INCLUDETEXT, e.g.

{ INCLUDETEXT "c:\\myinserts\\t{ SEQ tn \*Arabic }.htm" mybookmarkname }

(Make sure that these bookmark names do not have spaces and are
otherwise easy to use. Also, you may need to verify that Word does not
insert an extra paragraph mark when it inserts a bookmarked piece of
text in this way - off the top of my head I forget when it comes to
material imported from HTML)

NB, you may be wondering whether this would allow you to store all your
stuff in one HTML file. Well, perhaps it would, and perhaps you would
then use sequenced-numbered bookmarks and a slightly different field
construction, e.g.

{ INCLUDETEXT "c:\\myinserts\\myfile.htm" { SEQ tn \*Arabic } }

Again, I would verify that this actually works when you have several
hundred INCLUDETEXTs getting text from one, or a small number

4.) Do I go ahead& set up "Table" captions within Word, and place
the tables above the captions?

I probably would. You can use the SEQ approach to number them, or
perhaps something else that is more in keeping with Word's standard
numbering features (not really my area)
5.) Is VBA the correct "path" or do I use C#, ASP, OLE, bookmarks,
references/hot links, etc?

Use fields! (If they work for you). But even if you don't/can't, VBA and
other things such as bookmarks, fields are not mutually exclusive.

Peter Jamieson

http://tips.pjmsn.me.uk
 
R

Rob

Peter,

I think this might be exactly what I'm looking for! I made a "proof
of concept" file to ensure it would work. Now I'm just trying to
grasp the scope/size of the actual project, and determine if this
method (using field codes) will actually work in my specific
application.

But, while I'm off doing that, I wanted to respond to let you and Doug
know that I appreciate you taking the time to reply to my question.

I'll write back when I have made further progress to give an update on
the success.

Thanks again,
Rob


First, there are two fairly simple non-VBA methods to insert material
from external files into Word:
  { INCLUDETEXT } fields
  { DATABASE } fields (more suited to inserting material from
tab-delimited files
<snip>
 
R

Rob

Peter,

Hi, I have been working with the INCLUDETEXT fields, and now I see
another issue. I can correctly insert the tables, but for some
reason, they are Left Justified when they are in Word. I have even
looked at the HTML code making up these tables and my co-worker has
used the <center> tag to center them, and as such, they do indeed show/
appear as centered when viewed in a browser.

But for some reason, when imported/inserted into Word, they are Left
Justified. I even went through and centered the paragraph marks where
I placed the field codes, but when I do an F9 to update the field
codes, it *still* inserts the tables as Left Justified.

Why is this? I have been looking at some possible solutions, one
being to make a new style called "centered_table" and just apply that
style to all my tables, but I would think I could set yet another
field code to force the center?

Thanks,
Rob
 
P

Peter Jamieson

If I create a table in Word and centre it, then save as HTML (filtered)
and look at the generated HTML, in this case I see that Word does it by
inserting a <div align=center> tag.

I don't know if that will work for you. Word's own markup is pretty
idiosyncratic to say the least, and I don't know what "native" HTML it
actually recognises, or whether you can also control the appearance of
imported text using CSS.
but I would think I could set yet another
field code to force the center?

Probably not. Field codes are really about inserting stuff. Although
some codes have some formatting capabilities, the expectation is mostly
that formatting will be applied to the field code result. The database
field has some standard switches for table formatting but that's about
it. There's one code that lets you specify that the following text
should start at a certain point, either at an absolute x-y position or
relative to the location of the field (it's the ADVANCE field) but I
don't think you would be able to get it to center anything and in
general it's not a field I'd recommend anyone used unless they
absolutely had to.

Peter Jamieson

http://tips.pjmsn.me.uk
 
R

Rob

Peter,
I think I may have found the answer. In my quest to figure out why
Word is overwriting the obvious intent of the HTML to center the table
via a <center> tag, I ran across the "preserve formatting during
updates" checkbox. You can see this if you right-click the field &
select the "edit field" option (I am using Word 2003, so I don't know
if the sequence and/or menu titles are the same in other versions).
Anyway, if you select the "edit field" option, it will bring up a
"Field" GUI window w/ options to "choose a field" and "field
properties" and "Field options." At the bottom right, next to the
"OK" and "Cancel" buttons is the checkbox for the "Preserve formatting
during updates" selector. I have found that if you leave this
"off" (for some reason Word turns this "On" by default) that it allows
the HTML formatting to take precedence, otherwise it will allow Word's
formatting to take precedence.

From what I gather, if you select it to be "on" then any formatting
changes you make (say, centering it yourself, or making the font bold,
or different font type, etc) - those changes would carry over to the
next time you update the field, regardless of what the HTML says it
should be. So, if you have a table in HTML that has the <center> tag
used to center it in a browser, but you want it to always show up on
the Right side of your Word document (Right Justified), then just go
ahead & Right Justify it within Word, then make sure you have the
"Preserve formatting during updates" checkbox enabled. Then any time
the data in the HTML table changes & you want to update it, it will
always be Right Justified.

This is a cool option if you want to override the source file's
formatting, but NOT a cool option if you want to use the source file's
formatting.

Also, as an aside, I believe I may have found a bug in Word, but as I
was playing around w/ checking the "Preserve formatting during
updates" selection on & off, I noticed that sometimes when I updated
my field, Word would present the "Error! Not a valid filename"
message. Obviously this is incorrect because I didn't change the
filename or anything like that, all I would do is toggle on & off the
"Preserve formatting during updates" selector. I haven't played
around w/ it enough to know when/why it happens, or under what
conditions it may or may not happen, I've just noticed it & thought
I'd pass the info along to warn others this may happen to them.
 
R

Rob

P.S. When the "preserve formatting during updates" is enabled, it
adds the "Mergeformat" command to the field. I noticed this when I
when to toggle my field code on & off. Just an FYI. I'll try to
research the "Mergeformat" command more & present my results later.

Thanks,
Rob
 
P

Peter Jamieson

P.S. When the "preserve formatting during updates" is enabled, it
adds the "Mergeformat" command to the field.

What Mergeformat actually does is to take the existing result of a field
and apply the formatting of that result, word by word (however the field
determines what a "word" is, to the next result of the field. There may
be a liit on the number of words that the formatting is applied to; if
the number of words is more than the previous number of words inserted
by the field, I forget exactly what happens; and not all types of
formatting are preserved.

Peter Jamieson

http://tips.pjmsn.me.uk
 
P

Peter Jamieson

I've described the effect of the \*Mergeformat switch in another
response, but there is another problem to do with merging text from two
different documents that have Word styles with different characteristics
but the same name. While that scenario isn't strictly applicable if you
are merging plain ("non-Word") HTML, it is quite possible that the way
MS has implemented it has an unexpected impact on merging from non-Word
formats. In Word 2007 the relevant options are in Word Office
Button->Word Options->Advanced-Cut, Copy and Paste. Whether they affect
INCLUDETEXT these days I cannot tell you off the top of my head, but I
would assume you may need to look at these options anyway if you are
including text any other way, e.g. via VBA.



Peter Jamieson

http://tips.pjmsn.me.uk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top