VBA to Find Key Words in Word doc then Extract Content to Excel.xlsm file

Discussion in 'Excel' started by ChrisOK, Dec 21, 2016.

  1. ChrisOK

    ChrisOK

    Joined:
    Sep 8, 2016
    Messages:
    7
    Likes Received:
    1
    Short on time and long on the pile of files that need extractions - Used to have a Sub that would do something of this sort many years ago but can't locate it - or any other posts that are similar. Hope a wiz on here can assist as this is an awesome help site!
    Need Excel to extract content from Word documents and place it into 4 columns like this:
    Column C..........................Col D............Col E.................Col F...........
    Field Name......................Condition......Rule Name.........Output.........
    ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
    WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
    WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
    WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF

    FYI:
    *Do not need the code to open any files
    (User will already have open 1 Excel file (ExampleExtractor.xlsm) and 1 Word file (LogRuleSourceXX.doc) open before running the SUB.
    User will open ea Word file one at a time then Run the Sub from Excel
    There's a large folder full of Word docs to extract from - so I'm needing a sub like this again to expedite ongoing extractions).

    Here's the 4 Basic Rules the Sub Should Accommodate:
    1==
    If Excel finds one of these rule words in All Caps [IF,AND,OR,PERFORM,THRU] in the Word doc content, then copy the content that sits to the immediate RIGHT of that rule word & paste into Excel column C. (copy until a space occurs I think will work).
    Here's what the WORD DOCUMENT content format looks like:
    -------------------------------------------------------------------------------------
    IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
    (WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
    ((WS-DOC-ID = 'D6A') AND
    (WS-TT = 'N' OR SPACES))

    MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
    -------------------------------------------------------------------------------------
    ==The content is not in tables. One, any or all of the RULE words could potentially be present within a Word doc.
    ==With each FIND occurrence, paste the extracted content on it's own row within the Excel table Col C (below).

    Column C..........................Col D............Col E.....................Col F...........
    Field Name......................Condition......Rule Name.............Output.........
    ACJ-TRANS-LR-CODE.....05................AA......................ACN-KEY-SUF
    WS-DOC-ID......................F7A..............AA......................ACN-KEY-SUF
    WS-RECORD-CD..............4..................AA.....................ACN-KEY-SUF
    WS-DOC-NR1...................M.................AA.....................ACN-KEY-SUF
    WS-RECORD-CD.............4..................AA.....................ACN-KEY-SUF
    WS-BGCD........................8..................AA.....................ACN-KEY-SUF
    WS-DOC-ID.....................D7A..............AA.....................ACN-KEY-SUF
    WS-TT.............................N..................AA......................ACN-KEY-SUF
    WS-DOC-ID.....................D6R..............AA.....................ACN-KEY-SUF
    WS-TT.............................T...................AA.....................ACN-KEY-SUF
    WS-DOC-ID.....................D6A...............AA....................ACN-KEY-SUF
    WS-TT.............................N...................AA....................ACN-KEY-SUF
    WS-TT............................SPACES........AA....................ACN-KEY-SUF
    or can do it like this:
    WS-TT............................N,SPACES.....AA...................ACN-KEY-SUF

    (for example, LOOK ABOVE to the WORD CONTENT CHUNK:
    "WS-TT" appears more than once (3xs),
    so it gets listed each time it appears on it's own row in EXCEL, along w/ whatever Condition Code is listed..
    (but here's where it gets tricky).......,
    The 3rd time it is listed, there's a 'N' code in single quotes, (and) there's an "OR" present followed by a code SPACES.
    (SPACES doesn't have single quote marks around it like the code on the LEFT side of "OR" has - so a rule that says pick up both and either record them on 2 individual rows --
    (or) if easier, list both N,SPACES on a single line like shown in the above Excel table (either is great):

    2== Then, pick up (copy/paste) the content that sits on the immediate RIGHT of the equals sign into Excel column D.
    (sometimes that Condition Code content is in quotes - sometimes it's not - so not using quotes in the definition is probably more accurate (to just say in the code to pick up everything to the RIGHT of the equals sign on that same line until a space or end of line occurs) - and I'll parse off any extra garbage that I don't need if something gets picked up)

    3== Need the code to look at the Filename of the Word doc and extract the 12th+13th char position, paste that into Col E of Excel
    -------------------------------------------------------------------------------------------
    the 'Rule Name' is simply 2 characters of the Word doc file name repeated all the way down,
    ....(if too hard to do or time consuming for more code, just pick up the full file name, that's fine!)
    --------------------------------------------------------------------------------------------

    4== Last, the code should locate the "Output" content which always follows the rule word: [TO], copy/paste into Col F of Excel
    The 'Output' is re-used as it is relational (meaning all the line items listed in that little chunk all tie to/relate to the 'Output' name listed at the end of that chunk signified by preceding the word "TO": (see below)
    --------------------------------------------------------------------------------------------
    IF (WS-DOC-ID = 'D7A' AND WS-TT = 'N') OR
    (WS-DOC-ID = 'D6R' AND WS-TT = 'T') OR
    ((WS-DOC-ID = 'D6A') AND
    (WS-TT = 'N' OR SPACES))

    MOVE 'M' TO ACN-KEY-SUF <<< Output name is at RIGHT of "TO", every line abv it relates to this Output name
    ---------------------------------------------------------------------------------------------

    OTHER: (in the example below, following the "IF" rule word, there's 2 Field names:
    ..............PUBLIC-SALES-CODE
    ..............PUBLIC-TIV-12
    ----------------------------------------------------------------------------
    WORD CONTENT EXAMPLE:
    IF HOLD-REIMB = ‘A’ OR ‘Y’<<<<<<<<<<<<< A & Y would get pasted into Col C either together or individually
    IF PUBLIC-SALES-CODE AND PUBLIC-TIV-12
    MOVE ‘12’ TO ACH-KEY-IND-VAL
    ----------------------------------------------------------------------------
    (each would be listed on their own rows within the Excel table and both are associated to the ACH-KEY-IND-VAL Output)
    Unlike the other IF's.. these PUBLICs don't have a code so, we are to use: "YES" or "NO" as the codes.
    "NO" if the word "NOT" precedes it: NOT PUBLIC-SALES-CODE
    "YES" alternately
    (see the final table example at the bottom of this post to see how they would be laid into the Excel table)

    That's it!

    (I threw those RULE WORDS into Col A of the provided Excel file (cross-posted) as an idea that the code could look to that column when performing it's LOOKUP/INDEX - but it's probably much better to just hard code the rules into code and manage them there)

    Forever in debt to you if you can figure out a decent sub to expedite this painful process...
    Thanks, Chris

    cross-posted to be able to upload/attach a sample Excel file: http://www.ozgrid.com/forum/showthread.php?t=202211&p=782727#post782727

    THE GOAL IS: to use EXCEL to extract content from WORD docs; achieved in the fastest, most efficient way possible... if it's a little messy with picking up a little extra garb at the end of a line, so-be-it- as long as I can collect all the content needed from the Word doc -- I can do some cleaning then pass it on to programmers in a table such as is shown below.

    Column C............................Col D..........Col E..............Col F...........
    Field Name.........................Condition......Rule Name........Output.........
    ACJ-TRANS-LR-CODE...........05..............AA..............ACN-KEY-SUF
    WS-DOC-ID.............................F7A............AA..............ACN-KEY-SUF
    WS-RECORD-CD......................4...............AA..............ACN-KEY-SUF
    WS-DOC-NR1..........................M...............AA..............ACN-KEY-SUF
    WS-RECORD-CD.....................4................AA..............ACN-KEY-SUF
    WS-BGCD................................8................AA..............ACN-KEY-SUF
    WS-DOC-ID.............................D7A............AA..............ACN-KEY-SUF
    WS-TT......................................N................AA..............ACN-KEY-SUF
    WS-DOC-ID.............................D6R............AA..............ACN-KEY-SUF
    WS-TT......................................T.................AA..............ACN-KEY-SUF
    WS-DOC-ID..............................D6A.............AA.............ACN-KEY-SUF
    WS-TT......................................N.................AA.............ACN-KEY-SUF
    PUBLIC-SALES-CODE............YES..............AA.............ACN-KEY-SUF
    PUBLIC-TIV-12........................YES..............AA.............ACN-KEY-SUF
    NOT-PUBLIC-SALES-CODE....NO...............AA.............ACN-KEY-SUF <<< "NOT" INDICATES "NO" Condition
    ------------------------------------------------------------------------------------------------
    Hope that makes sense and THANKS AGAIN! - (sample file attachment is on cross-post link)

    EXAMPLE-image.GIF
     
    ChrisOK, Dec 21, 2016
    #1
    1. Advertisements

  2. ChrisOK

    macropod Microsoft MVP

    Joined:
    Mar 2, 2012
    Messages:
    194
    Likes Received:
    14
    macropod, Dec 27, 2016
    #2
    ChrisOK likes this.
    1. Advertisements

  3. ChrisOK

    ChrisOK

    Joined:
    Sep 8, 2016
    Messages:
    7
    Likes Received:
    1
    ChrisOK, Dec 28, 2016
    #3
  4. ChrisOK

    ChrisOK

    Joined:
    Sep 8, 2016
    Messages:
    7
    Likes Received:
    1
    Here's an update to the other cross- post; hope it can be of help to others!:
    It's designed to go open/extract from all the files sitting within a single folder (assuming they're the correct file type) -- note: it can be one file or many.

    There's a couple of add-in pieces of code that can be added temporarily/retained or removed as desired: (these will generate Msg boxes that will allow you to see what kind of file that's sitting in the folder, see/verify the path, etc..

    If the folder contains doc files you want to process, you should change:
    strFile = Dir(strFolder & "\*.docx", vbNormal)
    to:
    strFile = Dir(strFolder & "\*.doc", vbNormal)
    Note that, with this change, the code will pick up doc, docx & docm files.

    To test whether the code is processing anything in the folder, you might insert:
    MsgBox strFolder & "" & strFile
    before:
    Set wdDoc =
    If that returns nothing, you might try repairing the Office installation (via Windows Control Panel > Programs > Programs & Features > Microsoft Office (version) > Change > Repair).
    If the above message box returns the file path & name, you might then insert:
    MsgBox StrIn
    before:
    If StrIn = UCase(StrIn) Then
    or after:
    StrIn = Trim(StrIn)
    ------------------------------
    Next, I encountered this issue (which was easily fixed by going to the VBA Tools>References menu, locating the Microsoft Word 16.0 object library and making sure it was check-marked! - If not check-marked, you'll see the error and this is what happened:

    The sub name was highlighted in yellow: Sub GetWordDocumentData()
    and row 4 is highlighted in blue: wdApp As New Word.Application
    says: "Compile Error User defined type not defined"?

    (The code had been pasted into a new file into the "ThisWorkbook" area...
    NOTE: you'll have to re-check that box within the VBA editor under Tools>References every time you move it to a new workbook- or alternatively, I decided to place the code into a Module within the "Personal.xlsm" area so it is available globally to any workbook I open. In the past, I've helped other teammates (when we're trying to meet tight deadlines and process a ton of work quickly) by going to their computer and pasting the code into their Personal.xlsm areas as well so they too can run it globally on any file.
    With that, 10 of us can be cranking out extractions at once rather than 1 of us...

    So - if you've simply placed the code within a single macro-enabled Excel file, and you need it to work after you've sent the file to someone who DOES NOT HAVE the code within their PERSONAL.xlsm area, you'll need to tell them to navigate to their VBA editor Tools>References area to checkmark that box mentioned otherwise the below is what they'll encounter... If you send them multiple Excel files w/ the code in each, they'll need to re-check that box every time they open the Excel file to run it's code.

    ---------------------------------------------------------------------------------------------------------------------------------------
    Here's the final batch that works beautifully - assuming you've checked the Word 16.0 Object Library:
    ----------------------------------------------------------------------------------------------------------------------------------------
    Code:
    Sub GetWordDocumentData()
    'Note: this code requires a reference to the Word object model to be set via Tools|References in the VBA editor.
    Application.ScreenUpdating = False
    Dim wdApp As New Word.Application, wdDoc As Word.Document
    Dim strFolder As String, strFile As String
    Dim StrCode As String, StrIn As String, StrTmp As String, StrOut As String
    Dim WkSht As Worksheet, i As Long, j As Long, r As Long
    strFolder = GetFolder
    If strFolder = "" Then Exit Sub
    Set WkSht = ActiveSheet
    r = WkSht.Cells(WkSht.Rows.Count, 1).End(xlUp).Row
    strFile = Dir(strFolder & "\*.docx", vbNormal)
    While strFile <> ""
     
      MsgBox strFolder & "" & strFile
      Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, ReadOnly:=True, AddToRecentFiles:=False, Visible:=False)
      With wdDoc
      StrCode = Mid(.Name, 12, 2): StrOut = ""
      With .Range
      With .Find
      .ClearFormatting
      .Replacement.ClearFormatting
      .Text = "[A-Z0-9][!a-z]@^13"
      .Replacement.Text = ""
      .Forward = True
      .Wrap = wdFindStop
      .Format = False
      .MatchWildcards = True
      .Execute
      End With
      Do While .Find.Found
      StrIn = Split(.Paragraphs(1).Range.Text, vbCr)(0)
     
    
      If StrIn = UCase(StrIn) Then
      StrIn = Replace(Replace(Replace(StrIn, vbTab, " "), ")", " "), "(", " ")
      StrIn = Trim(StrIn)
     
      Do While InStr(StrIn, "  ")
      StrIn = Replace(StrIn, "  ", " ")
      Loop
      For i = 1 To UBound(Split(StrIn, " = "))
      StrTmp = Split(Split(StrIn, " = ")(i - 1), " ")(UBound(Split(Split(StrIn, " = ")(i - 1), " ")))
      StrOut = StrOut & vbCr & StrTmp & vbTab & Split(Split(StrIn, " = ")(i), " ")(0) & vbTab & StrCode
      Next
      If UBound(Split(StrIn, " OR ")) = 1 Then
      If InStr(Split(StrIn, " OR ")(UBound(Split(StrIn, " OR "))), " = ") = 0 Then
      StrOut = StrOut & vbCr & StrTmp & vbTab & Split(StrIn, " OR ")(1) & vbTab & StrCode
      End If
      End If
      If Split(StrIn, " ")(0) = "MOVE" Then
      StrOut = StrOut & vbCr
      StrTmp = Split(StrIn, " ")(UBound(Split(StrIn, " ")))
      StrOut = Replace(StrOut, StrCode & vbCr, StrCode & vbTab & StrTmp & vbCr)
      StrOut = Left(StrOut, Len(StrOut) - 1)
      Else
      For i = 1 To UBound(Split(StrIn, " TO "))
      StrTmp = Split(Split(StrIn, " TO ")(i - 1), " ")(UBound(Split(Split(StrIn, " TO ")(i - 1), " ")))
      StrOut = StrOut & vbCr & StrTmp & vbTab & Split(Split(StrIn, " TO ")(i), " ")(0) & vbTab & StrCode
      Next
      End If
      If InStr(StrIn, "NOT-PUBLIC-SALES-CODE") > 0 Then
      StrOut = StrOut & "PUBLIC-SALES-CODE" & vbTab & "NO" & vbTab & StrCode
      ElseIf InStr(StrIn, "PUBLIC-SALES-CODE") > 0 Then
      StrOut = StrOut & vbCr & "PUBLIC-SALES-CODE" & vbTab & "YES" & vbTab & StrCode
      End If
      If InStr(StrIn, "NOT-PUBLIC-TIV-12") > 0 Then
      StrOut = StrOut & vbCr & "PUBLIC-TIV-12" & vbTab & "NO" & vbTab & StrCode
      ElseIf InStr(StrIn, "PUBLIC-TIV-12") > 0 Then
      StrOut = StrOut & vbCr & "PUBLIC-TIV-12" & vbTab & "YES" & vbTab & StrCode
      End If
      End If
      .End = .Paragraphs(1).Range.End
      .Collapse wdCollapseEnd
      .Find.Execute
      Loop
      End With
      .Close SaveChanges:=False
      End With
      StrOut = Replace(Replace(Replace(Replace(StrOut, Chr(39), ""), Chr(96), ""), Chr(145), ""), Chr(146), "")
      For i = 1 To UBound(Split(StrOut, vbCr))
      r = r + 1
      StrTmp = Split(StrOut, vbCr)(i)
      For j = 0 To UBound(Split(StrTmp, vbTab))
      WkSht.Cells(r, j + 3).Value = Split(StrTmp, vbTab)(j)
      Next
      Next
      strFile = Dir()
    Wend
    wdApp.Quit
    Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing
    Application.ScreenUpdating = True
    End Sub
    Function GetFolder() As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
    End Function
    
    --------------------------------
     
    ChrisOK, Jan 3, 2017
    #4
    Becky likes this.
    1. Advertisements

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Similar Threads
  1. Jorge
    Replies:
    0
    Views:
    563
    Jorge
    Aug 27, 2013
  2. HoodGordon
    Replies:
    0
    Views:
    686
    HoodGordon
    Apr 25, 2015
  3. RandellLeon
    Replies:
    0
    Views:
    388
    RandellLeon
    May 13, 2015
  4. CharleGilb
    Replies:
    0
    Views:
    516
    CharleGilb
    May 23, 2015
  5. Echo-2
    Replies:
    1
    Views:
    419
    XLPadawan
    Jun 14, 2016
  6. Pete Marsh
    Replies:
    0
    Views:
    263
    Pete Marsh
    Nov 26, 2015
  7. Comrade Conrad
    Replies:
    0
    Views:
    187
    Comrade Conrad
    Aug 22, 2016
  8. tcebob
    Replies:
    0
    Views:
    216
    tcebob
    Dec 31, 2016
Loading...