Extract name, address and ph#

Joined
Nov 11, 2017
Messages
5
Reaction score
0
Hello

I’m looking for a macro that will extract

Name, address and phone numbers

from a RAW DATA file.

Attached is the RAW DATA file.

Sample RAW Data block ( in the data file ) where text can be search and extracted.
----------------------------------------------------------------------------------------------------
Melanie Pursglove </a>
</h3>
<div class="c-people-result__address">5720 Premier Park DR, West Palm Beach, FL 33407</div>
<div class="c-people-result__phone">(412) 264-6619</div>


The best way to handle this I think is to search for phone number in word as (^?^?^?) ^?^?^?-^?^?^?^? format and once that format is found then go up to the name area and extract the whole block of text in a new document and then once all of them are extracted then with the SEARCH and REPLACE cleanup the extra raw data.

I was able to write a macro doing above, but the whole macro takes up about ½ hr and that’s too much.

A friend wrote this macro USING RANGES for extracting data from a financial document and it works REALLY FAST.

I was wondering if anyone has any knowledge about RANGES and modify the following macro to extract the

Name, address and phone numbers

From the attached RAW DATA file.

Thanks.


Macro that was written. USING RANGES
---------------------------------------------------
Option Explicit

Sub Macro1()
Dim oDoc As Document
Dim oNewDoc As Document
Dim oRng As Range, oRng2 As Range, oFound As Range
Dim vFind As Variant
Dim fso As Object
Dim strPath As String
Const strFind As String = "Add to watchlist|TOTAL REVENUE"

strPath = Environ("USERPROFILE") & "\Desktop\DataExtract.doc" 'The name of the document to save the extract
Set fso = CreateObject("Scripting.FileSystemObject")
vFind = Split(strFind, "|")
Set oDoc = ActiveDocument
If fso.FileExists(strPath) Then
Set oNewDoc = Documents.Open(FileName:=strPath, AddToRecentFiles:=False)
Else
Set oNewDoc = Documents.Add
oNewDoc.SaveAs FileName:=strPath
End If
Set oRng = oDoc.Range
With oRng.Find
Do While .Execute(FindText:=vFind(0))
oRng.MoveStart wdParagraph, -2
oNewDoc.Range.InsertAfter _
Left(oRng.Paragraphs(1).Range.Text, _
Len(oRng.Paragraphs(1).Range.Text) - 1)
Set oFound = oRng
oFound.End = oDoc.Range.End
With oFound.Find
Do While .Execute(FindText:=vFind(1))
oFound.End = oFound.Paragraphs(1).Range.End - 1
Set oRng2 = oNewDoc.Range
oRng2.End = oRng2.End - 1
oRng2.Collapse 0
oRng2.Text = vbTab & oFound.Text & vbCr
oRng.Collapse 0
Exit Do
Loop
End With
oRng.Collapse 0
Loop
End With
With oNewDoc.Range
.ParagraphFormat.TabStops.ClearAll
.ParagraphFormat.TabStops.Add CentimetersToPoints(6.5)
.ParagraphFormat.SpaceAfter = 0
.Font.Name = "Arial"
.Font.Size = 8
End With

'oNewDoc.Close wdSaveChanges 'Optional
lbl_Exit:
Set fso = Nothing
Set oDoc = Nothing
Set oNewDoc = Nothing
Set oRng = Nothing
Set oRng2 = Nothing
Set oFound = Nothing
Exit Sub
End Sub
 

Attachments

  • raw file.doc
    468.5 KB · Views: 195
  • output data.doc
    19 KB · Views: 182

macropod

Microsoft MVP
Joined
Mar 2, 2012
Messages
578
Reaction score
50
Try:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim StrOut As String, wdDoc As Document
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "name^34\>[!\<]@(\</a\>\</li\>)*address^34\>[!\<]@\1*phone^34\>[!\<]@\1"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found
    StrOut = StrOut & Split(Split(.Text, "</a></li>")(0), ">")(1) & vbTab
    StrOut = StrOut & Split(Split(.Text, "</a></li>")(1), ">")(2) & vbTab
    StrOut = StrOut & Split(Split(.Text, "</a></li>")(2), ">")(2) & vbCr
    .Collapse wdCollapseEnd
    .Find.Execute
  Loop
End With
Set wdDoc = Documents.Add
wdDoc.Range.Text = StrOut
Application.ScreenUpdating = True
End Sub
 
Last edited:
Joined
Nov 11, 2017
Messages
5
Reaction score
0
Hello

When i ran the macro, this is what the output that i got.
-------------------------------------------------------------------------

By Name By Address By Phone Number
By Name By Address By Phone Number
By Name By Address By Phone Number


Attached are both the

1) Raw file that contains the Raw Data, and
2) The output file

This is the output i'm looking for.

Melanie Pursglove 5720 Premier Park DR, West Palm Beach, FL 33407 (412) 264-6619
Cheryl L Martiner 5720 Premier Park DR, West Palm Beach, FL 33407 (941) 926-2727
Steve E Arons 5720 Premier Park DR, West Palm Beach, FL 33407 (707) 864-0878
Gail M Perez 5720 Premier Park DR, West Palm Beach, FL 33407 (717) 796-6578

Thanks.
 

macropod

Microsoft MVP
Joined
Mar 2, 2012
Messages
578
Reaction score
50
Evidently there are two forms of the strings in your file. Try:
Code:
Sub Demo()
Application.ScreenUpdating = False
Dim StrOut As String, wdDoc As Document
With ActiveDocument.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Text = "^34\>[!\<]@\</a\>^13[ ]@\</h3\>*address^34\>[!\<]@\</div\>*phone^34\>[!\<]@\</div\>"
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindStop
    .Format = False
    .MatchWildcards = True
    .Execute
  End With
  Do While .Find.Found
    StrOut = StrOut & Trim(Split(Split(.Text, "</a>")(0), vbCr)(1)) & vbTab
    StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2) & vbTab
    StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(1), ">")(1) & vbCr
    .Collapse wdCollapseEnd
    .Find.Execute
  Loop
End With
Set wdDoc = Documents.Add
wdDoc.Range.Text = StrOut
Application.ScreenUpdating = True
End Sub
 
Joined
Nov 11, 2017
Messages
5
Reaction score
0
Hello

It worked kind of good, except its extracting some more RAW Data,

attached is the 1) RAW file and the 2) Output file.

Here is the output.
-------------------------
Gail M Perez 5720 Premier Park DR, West Palm Beach, FL 33407 (717) 796-6578
Curtis L Sober 5720 Premier Park DR, West Palm Beach, FL 33407

<a href="/whitepages/show?fp=eyJwcm9maWxlaWQiOiIiLCJmaXJzdCI6IkN1cnRpcyIsImxhc3QiOiJTb2JlciIsInN0cmVldCI6IjU3MjAgUHJlbWllciBQYXJrIERSIiwiY2l0eSI6Ildlc3QgUGFsbSBCZWFjaCIsInN0YXRlIjoiRkwiLCJwaG9uZSI6Ii0tIiwiemlwIjoiMzM0MDcifQ%3D%3D&form=sba"

William D Rodriguez 5720 Premier Park DR, West Palm Beach, FL 33407 (305) 207-1641
Richard E Troy 5720 Premier Park DR, West Palm Beach, FL 33407

<a href="/whitepages/show?fp=eyJwcm9maWxlaWQiOiIiLCJmaXJzdCI6IlJpY2hhcmQiLCJsYXN0IjoiVHJveSIsInN0cmVldCI6IjU3MjAgUHJlbWllciBQYXJrIERSIiwiY2l0eSI6Ildlc3QgUGFsbSBCZWFjaCIsInN0YXRlIjoiRkwiLCJwaG9uZSI6Ii0tIiwiemlwIjoiMzM0MDcifQ%3D%3D&form=sba"

Thanks.
 

Attachments

  • output data.doc
    20.5 KB · Views: 178
  • raw data.doc
    585 KB · Views: 188

macropod

Microsoft MVP
Joined
Mar 2, 2012
Messages
578
Reaction score
50
That's because, unlike your original data, some of the records in your latest set don't include phone #s. Try replacing:
Code:
   StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(1), ">")(1) & vbCr
with:
Code:
    If InStr(.Text, "<span>") = 0 Then
      StrOut = StrOut & Split(Split(Split(.Text, "</a>")(1), "</div>")(1), ">")(1)
    End If
    StrOut = StrOut & vbCr
 

macropod

Microsoft MVP
Joined
Mar 2, 2012
Messages
578
Reaction score
50
Refinement -
Change:
.Collapse wdCollapseEnd
to:
.MoveStart wdCharacter, InStr(.Text, Split(Split(Split(.Text, "</a>")(1), "</div>")(0), ">")(2))
.Collapse wdCollapseStart
Without this, the record following the one lacking a phone # will be skipped.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top