Extracting text from files

L

Luther Gunter Jr.

Question:

I have a large number (in the hundreds) of folders, each with one file.
Each file is a Microsoft Word document with the same name.

I was wondering: Is it is possible to extract the text from each file and
combine into a single file?
 
G

Graham Mayor

I think that in the circumstances I might be inclined to create a list of
the folders - the printfolders utility downloadable from my web site will
enable you to grab all the *folder* names eg

D:\My Documents\DISK LABELS
D:\My Documents\DOWNLOAD
D:\My Documents\MAILMESSAGES
D:\My Documents\MONEY
etc

Open the resulting document in Word, then use replace to add the common
filename to the ends of the
entries ie replace
^p
with
\filename.doc^p
(where filename.doc is the name of your file)
to give

D:\My Documents\DISK LABELS\filename.doc
D:\My Documents\DOWNLOAD\filename.doc
D:\My Documents\MAILMESSAGES\filename.doc
D:\My Documents\MONEY\filename.doc

replace the slashes with double slashes ie \ with \\

then add quotes start and finish of each line ie *wildcard* replace
(D*)(^13)
with
"\1"\2

Where D here is the drive letter at the start of each line

You should then have:

"D:\\My Documents\\DISK LABELS\\filename.doc"
"D:\\My Documents\\DOWNLOAD\\filename.doc"
"D:\\My Documents\\MAILMESSAGES\\filename.doc"
"D:\\My Documents\\MONEY\\filename.doc"
etc

Finally run the following macro on the list:

Sub MakeFields()
Dim myRange As Range
Dim FieldText As String

Set myRange = ActiveDocument.Range
With myRange.Find
.Text = """*"""
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = True
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Do While myRange.Find.Execute
FieldText = (myRange.Text)
ActiveDocument.Fields.Add Range:=myRange, Type:=wdFieldIncludeText,
Text:=FieldText
'move the range past the current field
myRange.MoveEndUntil cset:=""""""
myRange.Collapse wdCollapseEnd
myRange.Select
Loop
End Sub

http://www.gmayor.com/installing_macro.htm

This will produce the following:

{ INCLUDETEXT "D:\\My Documents\\DISK LABELS\\filename.doc" \* MERGEFORMAT }
{ INCLUDETEXT "D:\\My Documents\\DOWNLOAD\\filename.doc" \* MERGEFORMAT }
{ INCLUDETEXT "D:\\My Documents\\MAILMESSAGES\\filename.doc" \*
MERGEFORMAT }
{ INCLUDETEXT "D:\\My Documents\\MONEY\\filename.doc" \* MERGEFORMAT }

which will include all your documents in a single document.


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 
G

Graham Mayor

Just for the hell of it, in case you are still watching - the following
variation on the macro will take your folder list and produce a document
containing all the files:

Sub MakeFields()
'Macro to combine a list of files of the same name
'each in its own folder, to a single document.
'Begin with a list of paths to the folders that contain the files.
'Use the PrintFolders utility to obtain the list.
'The macro prompts for the filename.

Dim myRange As Range
Dim FieldText As String
Dim sFilename As String
Dim sFile As String

sFilename = InputBox("Enter filename as filename.doc")
sFile = "^92^92" & sFilename & "^p"
Selection.HomeKey Unit:=wdStory
With Selection.Find
.ClearFormatting
.Text = "\"
.Replacement.ClearFormatting
.Replacement.Text = "\\"
.Forward = True
.Wrap = wdFindStop
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute replace:=wdReplaceAll
Selection.HomeKey Unit:=wdStory
With Selection.Find
.Text = "^13([!^13])"
.Replacement.Text = sFile
.MatchWildcards = True
End With
Selection.Find.Execute replace:=wdReplaceAll
With Selection.Find
.Text = "(D*)(^13)"
.Replacement.Text = """\1""\2"
End With
Selection.Find.Execute replace:=wdReplaceAll
Selection.EndKey Unit:=wdStory

Set myRange = ActiveDocument.Range
With myRange.Find
.Text = """*"""
.MatchWildcards = True
End With
Do While myRange.Find.Execute
FieldText = (myRange.Text)
ActiveDocument.Fields.Add Range:=myRange, _
Type:=wdFieldIncludeText, Text:=FieldText
myRange.MoveEndUntil cset:=""""""
myRange.Collapse wdCollapseEnd
myRange.Select
Loop
Selection.HomeKey Unit:=wdStory
ActiveWindow.View.ShowFieldCodes = True
With Selection.Find
.Text = "\* MERGEFORMAT "
.Replacement.Text = ""
.MatchWildcards = False
End With
Selection.Find.Execute replace:=wdReplaceAll
ActiveWindow.View.ShowFieldCodes = False
ActiveDocument.Fields.Update
Selection.WholeStory
Selection.Fields.Unlink
End Sub


--
<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
Graham Mayor - Word MVP


<>>< ><<> ><<> <>>< ><<> <>>< <>><<>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top