VBA to Find Key Words in Word doc then Extract Content to Excel.xlsm file

Joined
Sep 8, 2016
Messages
12
Reaction score
1
Short on time and long on the pile of files that need extractions - Used to have a Sub that would do something of this sort many years ago but can't locate it - or any other posts that are similar. Hope a wiz on here can assist as this is an awesome help site!
Need Excel to extract content from Word documents and place it into 4 columns like this:
Column C..........................Col D............Col E.................Col F...........
Field Name......................Condition......Rule Name.........Output.........
ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF

FYI:
*Do not need the code to open any files
(User will already have open 1 Excel file (ExampleExtractor.xlsm) and 1 Word file (LogRuleSourceXX.doc) open before running the SUB.
User will open ea Word file one at a time then Run the Sub from Excel
There's a large folder full of Word docs to extract from - so I'm needing a sub like this again to expedite ongoing extractions).

Here's the 4 Basic Rules the Sub Should Accommodate:
1==
If Excel finds one of these rule words in All Caps [IF,AND,OR,PERFORM,THRU] in the Word doc content, then copy the content that sits to the immediate RIGHT of that rule word & paste into Excel column C. (copy until a space occurs I think will work).
Here's what the WORD DOCUMENT content format looks like:
-------------------------------------------------------------------------------------
IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
(WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
((WS-DOC-ID = 'D6A') AND
(WS-TT = 'N' OR SPACES))

MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
-------------------------------------------------------------------------------------
==The content is not in tables. One, any or all of the RULE words could potentially be present within a Word doc.
==With each FIND occurrence, paste the extracted content on it's own row within the Excel table Col C (below).

Column C..........................Col D............Col E.....................Col F...........
Field Name......................Condition......Rule Name.............Output.........
ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF
WS-RECORD-CD.............4..................AA.....................ACN-KEY-SUF
WS-BGCD........................8..................AA.....................ACN-KEY-SUF
WS-DOC-ID.....................D7A..............AA.....................ACN-KEY-SUF
WS-TT.............................N..................AA......................ACN-KEY-SUF
WS-DOC-ID.....................D6R..............AA.....................ACN-KEY-SUF
WS-TT.............................T...................AA.....................ACN-KEY-SUF
WS-DOC-ID.....................D6A...............AA....................ACN-KEY-SUF
WS-TT.............................N...................AA....................ACN-KEY-SUF
WS-TT............................SPACES........AA....................ACN-KEY-SUF
or can do it like this:
WS-TT............................N,SPACES.....AA...................ACN-KEY-SUF

(for example, LOOK ABOVE to the WORD CONTENT CHUNK:
"WS-TT" appears more than once (3xs),
so it gets listed each time it appears on it's own row in EXCEL, along w/ whatever Condition Code is listed..
(but here's where it gets tricky).......,
The 3rd time it is listed, there's a 'N' code in single quotes, (and) there's an "OR" present followed by a code SPACES.
(SPACES doesn't have single quote marks around it like the code on the LEFT side of "OR" has - so a rule that says pick up both and either record them on 2 individual rows --
(or) if easier, list both N,SPACES on a single line like shown in the above Excel table (either is great):

2== Then, pick up (copy/paste) the content that sits on the immediate RIGHT of the equals sign into Excel column D.
(sometimes that Condition Code content is in quotes - sometimes it's not - so not using quotes in the definition is probably more accurate (to just say in the code to pick up everything to the RIGHT of the equals sign on that same line until a space or end of line occurs) - and I'll parse off any extra garbage that I don't need if something gets picked up)

3== Need the code to look at the Filename of the Word doc and extract the 12th+13th char position, paste that into Col E of Excel
-------------------------------------------------------------------------------------------
the 'Rule Name' is simply 2 characters of the Word doc file name repeated all the way down,
....(if too hard to do or time consuming for more code, just pick up the full file name, that's fine!)
--------------------------------------------------------------------------------------------

4== Last, the code should locate the "Output" content which always follows the rule word: [TO], copy/paste into Col F of Excel
The 'Output' is re-used as it is relational (meaning all the line items listed in that little chunk all tie to/relate to the 'Output' name listed at the end of that chunk signified by preceding the word "TO": (see below)
--------------------------------------------------------------------------------------------
IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
(WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
((WS-DOC-ID = 'D6A') AND
(WS-TT = 'N' OR SPACES))

MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
---------------------------------------------------------------------------------------------

OTHER: (in the example below, following the "IF" rule word, there's 2 Field names:
..............PUBLIC-SALES-CODE
..............PUBLIC-TIV-12
----------------------------------------------------------------------------
WORD CONTENT EXAMPLE:
IF HOLD-REIMB = ‘A’ OR ‘Y’<<<<<<<<<<<<< A & Y would get pasted into Col C either together or individually
IF PUBLIC-SALES-CODE AND PUBLIC-TIV-12
MOVE ‘12’ TO ACH-KEY-IND-VAL
----------------------------------------------------------------------------
(each would be listed on their own rows within the Excel table and both are associated to the ACH-KEY-IND-VAL Output)
Unlike the other IF's.. these PUBLICs don't have a code so, we are to use: "YES" or "NO" as the codes.
"NO" if the word "NOT" precedes it: NOT PUBLIC-SALES-CODE
"YES" alternately
(see the final table example at the bottom of this post to see how they would be laid into the Excel table)

That's it!

(I threw those RULE WORDS into Col A of the provided Excel file (cross-posted) as an idea that the code could look to that column when performing it's LOOKUP/INDEX - but it's probably much better to just hard code the rules into code and manage them there)

Forever in debt to you if you can figure out a decent sub to expedite this painful process...
Thanks, Chris

cross-posted to be able to upload/attach a sample Excel file: http://www.ozgrid.com/forum/showthread.php?t=202211&p=782727#post782727

THE GOAL IS: to use EXCEL to extract content from WORD docs; achieved in the fastest, most efficient way possible... if it's a little messy with picking up a little extra garb at the end of a line, so-be-it- as long as I can collect all the content needed from the Word doc -- I can do some cleaning then pass it on to programmers in a table such as is shown below.

Column C............................Col D..........Col E..............Col F...........
Field Name.........................Condition......Rule Name........Output.........
ACJ-TRANS-LR-CODE...........05..............AA..............ACN-KEY-SUF
WS-DOC-ID.............................F7A............AA..............ACN-KEY-SUF
WS-RECORD-CD......................4...............AA..............ACN-KEY-SUF
WS-DOC-NR1..........................M...............AA..............ACN-KEY-SUF
WS-RECORD-CD.....................4................AA..............ACN-KEY-SUF
WS-BGCD................................8................AA..............ACN-KEY-SUF
WS-DOC-ID.............................D7A............AA..............ACN-KEY-SUF
WS-TT......................................N................AA..............ACN-KEY-SUF
WS-DOC-ID.............................D6R............AA..............ACN-KEY-SUF
WS-TT......................................T.................AA..............ACN-KEY-SUF
WS-DOC-ID..............................D6A.............AA.............ACN-KEY-SUF
WS-TT......................................N.................AA.............ACN-KEY-SUF
PUBLIC-SALES-CODE............YES..............AA.............ACN-KEY-SUF
PUBLIC-TIV-12........................YES..............AA.............ACN-KEY-SUF
NOT-PUBLIC-SALES-CODE....NO...............AA.............ACN-KEY-SUF <<< "NOT" INDICATES "NO" Condition
------------------------------------------------------------------------------------------------
Hope that makes sense and THANKS AGAIN! - (sample file attachment is on cross-post link)

EXAMPLE-image.GIF
 
Joined
Sep 8, 2016
Messages
12
Reaction score
1
Here's an update to the other cross- post; hope it can be of help to others!:
It's designed to go open/extract from all the files sitting within a single folder (assuming they're the correct file type) -- note: it can be one file or many.

There's a couple of add-in pieces of code that can be added temporarily/retained or removed as desired: (these will generate Msg boxes that will allow you to see what kind of file that's sitting in the folder, see/verify the path, etc..

If the folder contains doc files you want to process, you should change:
strFile = Dir(strFolder & "\*.docx", vbNormal)
to:
strFile = Dir(strFolder & "\*.doc", vbNormal)
Note that, with this change, the code will pick up doc, docx & docm files.

To test whether the code is processing anything in the folder, you might insert:
MsgBox strFolder & "" & strFile
before:
Set wdDoc =
If that returns nothing, you might try repairing the Office installation (via Windows Control Panel > Programs > Programs & Features > Microsoft Office (version) > Change > Repair).
If the above message box returns the file path & name, you might then insert:
MsgBox StrIn
before:
If StrIn = UCase(StrIn) Then
or after:
StrIn = Trim(StrIn)
------------------------------
Next, I encountered this issue (which was easily fixed by going to the VBA Tools>References menu, locating the Microsoft Word 16.0 object library and making sure it was check-marked! - If not check-marked, you'll see the error and this is what happened:

The sub name was highlighted in yellow: Sub GetWordDocumentData()
and row 4 is highlighted in blue: wdApp As New Word.Application
says: "Compile Error User defined type not defined"?

(The code had been pasted into a new file into the "ThisWorkbook" area...
NOTE: you'll have to re-check that box within the VBA editor under Tools>References every time you move it to a new workbook- or alternatively, I decided to place the code into a Module within the "Personal.xlsm" area so it is available globally to any workbook I open. In the past, I've helped other teammates (when we're trying to meet tight deadlines and process a ton of work quickly) by going to their computer and pasting the code into their Personal.xlsm areas as well so they too can run it globally on any file.
With that, 10 of us can be cranking out extractions at once rather than 1 of us...

So - if you've simply placed the code within a single macro-enabled Excel file, and you need it to work after you've sent the file to someone who DOES NOT HAVE the code within their PERSONAL.xlsm area, you'll need to tell them to navigate to their VBA editor Tools>References area to checkmark that box mentioned otherwise the below is what they'll encounter... If you send them multiple Excel files w/ the code in each, they'll need to re-check that box every time they open the Excel file to run it's code.

---------------------------------------------------------------------------------------------------------------------------------------
Here's the final batch that works beautifully - assuming you've checked the Word 16.0 Object Library:
----------------------------------------------------------------------------------------------------------------------------------------
Code:
Sub GetWordDocumentData()
'Note: this code requires a reference to the Word object model to be set via Tools|References in the VBA editor.
Application.ScreenUpdating = False
Dim wdApp As New Word.Application, wdDoc As Word.Document
Dim strFolder As String, strFile As String
Dim StrCode As String, StrIn As String, StrTmp As String, StrOut As String
Dim WkSht As Worksheet, i As Long, j As Long, r As Long
strFolder = GetFolder
If strFolder = "" Then Exit Sub
Set WkSht = ActiveSheet
r = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
strFile = Dir(strFolder & "\*.docx", vbNormal)
While strFile <> ""
 
  MsgBox strFolder & "" & strFile
  Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, ReadOnly:=True, AddToRecentFiles:=False, Visible:=False)
  With wdDoc
  StrCode = Mid(.Name, 12, 2): StrOut = ""
  With .Range
  With .Find
  .ClearFormatting
  .Replacement.ClearFormatting
  .Text = "[A-Z0-9][!a-z]@^13"
  .Replacement.Text = ""
  .Forward = True
  .Wrap = wdFindStop
  .Format = False
  .MatchWildcards = True
  .Execute
  End With
  Do While .Find.Found
  StrIn = Split(.Paragraphs(1).Range.Text, vbCr)(0)
 

  If StrIn = UCase(StrIn) Then
  StrIn = Replace(Replace(Replace(StrIn, vbTab, " "), ")", " "), "(", " ")
  StrIn = Trim(StrIn)
 
  Do While InStr(StrIn, "  ")
  StrIn = Replace(StrIn, "  ", " ")
  Loop
  For i = 1 To UBound(Split(StrIn, " = "))
  StrTmp = Split(Split(StrIn, " = ")(i - 1), " ")(UBound(Split(Split(StrIn, " = ")(i - 1), " ")))
  StrOut = StrOut & vbCr & StrTmp & vbTab & Split(Split(StrIn, " = ")(i), " ")(0) & vbTab & StrCode
  Next
  If UBound(Split(StrIn, " OR ")) = 1 Then
  If InStr(Split(StrIn, " OR ")(UBound(Split(StrIn, " OR "))), " = ") = 0 Then
  StrOut = StrOut & vbCr & StrTmp & vbTab & Split(StrIn, " OR ")(1) & vbTab & StrCode
  End If
  End If
  If Split(StrIn, " ")(0) = "MOVE" Then
  StrOut = StrOut & vbCr
  StrTmp = Split(StrIn, " ")(UBound(Split(StrIn, " ")))
  StrOut = Replace(StrOut, StrCode & vbCr, StrCode & vbTab & StrTmp & vbCr)
  StrOut = Left(StrOut, Len(StrOut) - 1)
  Else
  For i = 1 To UBound(Split(StrIn, " TO "))
  StrTmp = Split(Split(StrIn, " TO ")(i - 1), " ")(UBound(Split(Split(StrIn, " TO ")(i - 1), " ")))
  StrOut = StrOut & vbCr & StrTmp & vbTab & Split(Split(StrIn, " TO ")(i), " ")(0) & vbTab & StrCode
  Next
  End If
  If InStr(StrIn, "NOT-PUBLIC-SALES-CODE") > 0 Then
  StrOut = StrOut & "PUBLIC-SALES-CODE" & vbTab & "NO" & vbTab & StrCode
  ElseIf InStr(StrIn, "PUBLIC-SALES-CODE") > 0 Then
  StrOut = StrOut & vbCr & "PUBLIC-SALES-CODE" & vbTab & "YES" & vbTab & StrCode
  End If
  If InStr(StrIn, "NOT-PUBLIC-TIV-12") > 0 Then
  StrOut = StrOut & vbCr & "PUBLIC-TIV-12" & vbTab & "NO" & vbTab & StrCode
  ElseIf InStr(StrIn, "PUBLIC-TIV-12") > 0 Then
  StrOut = StrOut & vbCr & "PUBLIC-TIV-12" & vbTab & "YES" & vbTab & StrCode
  End If
  End If
  .End = .Paragraphs(1).Range.End
  .Collapse wdCollapseEnd
  .Find.Execute
  Loop
  End With
  .Close SaveChanges:=False
  End With
  StrOut = Replace(Replace(Replace(Replace(StrOut, Chr(39), ""), Chr(96), ""), Chr(145), ""), Chr(146), "")
  For i = 1 To UBound(Split(StrOut, vbCr))
  r = r + 1
  StrTmp = Split(StrOut, vbCr)(i)
  For j = 0 To UBound(Split(StrTmp, vbTab))
  WkSht.Cells(r, j + 3).Value = Split(StrTmp, vbTab)(j)
  Next
  Next
  strFile = Dir()
Wend
wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
Application.ScreenUpdating = True
End Sub
Function GetFolder() As String
Dim oFolder As Object
GetFolder = ""
Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
Set oFolder = Nothing
End Function
--------------------------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top