Get text strings from closed docs?

E

Ed

I have a folder with over 14,000 Word docs. I would like to extract a few
text strings from each doc to build a table in a "master list" doc (for
instance: for each doc, find "Title" and get the following 24 characters;
find "Date" and get the following 8 characters, etc). Can this be done
without opening and closing each doc?

Ed
 
J

Jonathan West

Ed said:
I have a folder with over 14,000 Word docs. I would like to extract a few
text strings from each doc to build a table in a "master list" doc (for
instance: for each doc, find "Title" and get the following 24 characters;
find "Date" and get the following 8 characters, etc). Can this be done
without opening and closing each doc?

Unfortunately not.
 
C

Chris Shelley

This may be able to be done using the file system. When you right click on
your docs and select properites and go to the Summary tab, are the correct
Title's listed there? Then click on the Advanced button and make sure that
these are the dates that you are interested in. If this is the information
that you want, then this information is exposed and you can access this
without opening the file. If you need more information that this, I'm not
sure what to do then.

Do you need an example or can you go from here?
 
C

Chris Shelley

This may be able to be done using the file system. When you right click on
your docs and select properites and go to the Summary tab, are the correct
Title's listed there? Then click on the Advanced button and make sure that
these are the dates that you are interested in. If this is the information
that you want, then this information is exposed and you can access this
without opening the file. If you need more information that this, I'm not
sure what to do then.

Do you need an example or can you go from here?
 
C

Chris Shelley

This may be able to be done using the file system. When you right click on
your docs and select properites and go to the Summary tab, are the correct
Title's listed there? Then click on the Advanced button and make sure that
these are the dates that you are interested in. If this is the information
that you want, then this information is exposed and you can access this
without opening the file. If you need more information that this, I'm not
sure what to do then.

Do you need an example or can you go from here?
 
E

Ed

Unfortunately, Chris, the "Title" and "Date" were just examples of fields
common to each document, and not just document statistics; the actual fields
I want (are there about 10 of them) are not identified by the doc stats.

Ed
 
C

Chris Shelley

Then I would agree with Jonathan and say there isn't a way to do this. If
you do want to go the route where you open each one of them and parse them,
consider using the StreamReader (StreamReader <variable name> =
File.OpenText(<string filename>); assuming that you are doing this in .net).
The StreamReader will make this operation very fast and doesn't require you
to open Word (such as using Word.Application). Then you can search the Stream
for the data you are looking for.
 
E

Ed

Most likely, Chris, I would do this in Word VBA. If I really had to use VB,
it would be "Classic" 6.0. (It's what I got and what I know.) Thanks for
the boost.

Ed
 
R

Reptilican

Ed
Chris Shelley was very close to your solution. You can create your own
"Custom Properies" either manually, or in code- I use VBA and VB 6. After
setting your custom properies, right click on the saved, un-opened file
select properties > then the Custom tab.

My question is how to use code to read those custom properties for
automation in a VB app.
 
J

Jonathan West

Reptilican said:
Ed
Chris Shelley was very close to your solution. You can create your own
"Custom Properies" either manually, or in code- I use VBA and VB 6. After
setting your custom properies, right click on the saved, un-opened file
select properties > then the Custom tab.

My question is how to use code to read those custom properties for
automation in a VB app.

That's not too hard

Getting access to the Document Properties of a Word file
http://word.mvps.org/FAQs/MacrosVBA/DSOFile.htm

The problem is, as far as I can tell, that Ed's documents aren't currently
structured like that. If they could be structured like that, then it might
become a viable approach.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top