Word Data Extraction

K

kmbarz

I have a series of Word documents that some computerized process generates.
While the data is arranged in a rectangular format for the most part, it's
not actually in a Word table (from which I've written extracts before.) So
what I would like to do is extract from certain "cells" what data I need and
somehow get it out into Excel or Access. In general, it looks roughlylike
this:

WED, JAN 25, 2006, 8:56 AM REPORT
PAGE NO: 3


PART: 0017042 MINQTY: 1.00 FLTIME: 10 INVLOC: MXBOB
NDAYS : 0 MULTPL: 40.00 VALUE : 80.38 INSPCODE: 0


PAST DUE 01/23/06 01/30/06 02/06/06 02/13/06 02/20/06
GROSS REQTS: 8 0 0 0 14 14 14
14
OPEN ORDER : 0 0 0 0 0 0
0 0
ORDER DUE : 0 0 0 0 0 0
0 0
ORDER START: 0 0 0 0 0 0 0
0
PROJ AVAIL : 382 382 382 382 368 354
340
RESCHED TO : 0 0 0 0 0 0
0 0

What I need is to pull the value after PART: and the "cell" value that comes
at the intersection of ORDER DUE : and any column where there is a date ( and
need to associate that date with the cell value as well) This data can also
exist on different pages within the same document, so I need to figure out
how to search the whole thing. My VBA is a bit dated at this point, so any
help you can provide would be greatly appreciated.
Thanks,
Ken
 
J

Jezebel

Before you get into using VBA for this (which is certainly do-able), try
opening the document in Excel and using the 'Text to columns' function (on
the Data menu). If the source documents are as regular as your sample
appears, this might do all that you need to put the required data into
specific cells.
 
K

kmbarz

Yeah, I thought about that, but there's a whole mess of other stuff in this
report that wouldn't do well. Plus, I have hundreds of these documents
coming in and I'd like to be able to pull it in and extract just the data I
need.
 
C

Chuck Henrich

Do you have access to the "computerised process" that generates the Word
documents? It looks like a database report - can you request that the report
be provided in Excel format or ask for a report that provides just the
information you need in a format you can work with?

If not, in order to extract data from the example you provided, you'd need
to examine how the data is formatted in the Word document to come up with the
best way to identify what data you want. Are the columns tabbed or spaced?
Is there only one tab or a set number of tabs or spaces per column or can it
vary from column to colum and row to row?

By far the easiest way to do what you want is to get the data formatted the
way you want in a database-generated report, rather than to extract it from
the Word formatted report.
 
E

Ed

Ken:

I deal with something sort of related to this. My reports come with
"tables" like this that are actually generic text outputs saved as Word docs
or at least opened in Word. It's a monospaced font, there are spaces only -
no tabs or tables, and every line is a separate paragraph. I've used two
different processes, depending on what I'm dealing with and what I'm looking
for.

In one process, I select the lines of data, set a range to the Selection,
and then convert that range to a table. I am able to make the assumption
that anything more than one space in a row is a data separator, and use that
as the "marker" to determine the columns.

In another process, I can use Find to locate the text that marks the
beginning of the line of data I want. Then, because it's a monospaced font
and everything is consistent for each report, I can count the character
positions to where a bit of data might be, capture the length of text it
would be in, and check that string for data (use Trim to remove
leading/trailing spaces).

HTH
Ed
 
K

Kevin B

Have you ever considered using an application called Monarch to produce
Excel documents from text print files?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top