European Address Data

M

Mark S

Hi and thanks in advance for any help.

I have 700,000 European Companies in a csv file.

Unfortunately the COUNTRY is separate but all other information (street,
zip/postal, city, region/state) is in one ADDRESS column.

I would like to be able to "find and copy/move" info to separate
street/city/zip columns as best as possible

For example for French or German addresses the zip code is five digits so I
want to search the the ability to move '#####' ("five number characters in a
row")strings to a ZIP column

Ideally this would also allow the ability to also take everything AFTER the
"five number characters in a row" string (i.e. Paris and/or Chalonnes sur
Loire) to a CITY column given the European ordering standard of
address-zip-city)

Suggestions???
 
P

Peo Sjoblom

You almost have an impossible task, for instance in your example you can see
how hard it would be given that you have 2 cities in France, one use one
word the other 3 words, how would you know where one city ends? What about
UK postal codes or is this just for Germany and France?
 
M

Mark S

Yes, I do realize (and have been working at it for a couple of days
already). FWIW I've had "some success" replacing some spaces with _ then
using the spaces to separate into different coloumns but obviously very time
consuming -- and british codes are the worst, what with not only a space
between but variable total length of characters . . . so as before, I'm
open to suggestions!
 
N

NickHK

Mark,
You mean you have addresses like these 2 (bad) examples, i.e. no delimeter
between the various parts:

2 Rue de boulogne Chalonnes sur Loire 28544, France
17 Cul de Sac Taunton Somerset TA2 7HF, england

There are list of country names (or create your own limited ones of
interest) that you can search through.
Once you have the country name, you could search a list of that country's
postal/area/zip codes.
May be also a list of country's counties/departements and towns/cities.
Split out the relevant part each time a section is resolved.
Road names may well be a problem, but that may be all you are left with
after you have removed the other information.

Such list are commercially available. Some free lists are also available,
but quality may be patchy.
You could try throwing the address at some suitable web site that deals in
locating/mapping.

Not easy all round, but doing 700K manually would also take a while.

NickHK
 
T

Tim Williams

This is a job for regular expressions. Try this out:

'***********************
Sub ExtractZipAndCity()
Dim regEx As Object
Dim m
Dim i As Integer
Dim rngText As Range
Dim s As String

Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "\b(\d{5})\b"
regEx.Global = True
regEx.IgnoreCase = True

Set rngText = ActiveSheet.Range("A1")

Do While rngText.Value <> ""

s = rngText.Value
Set m = regEx.Execute(s)
If Not m Is Nothing Then
If m.Count = 1 Then
rngText.Offset(0, 1).Value = m(0)
rngText.Offset(0, 2).Value = Trim(Split(s, m(0))(1))
End If
End If

Set rngText = rngText.Offset(1, 0)
Loop

End Sub
'***************************

However, with 700k records you've little hope of being able to check the
results
in any meaningful way, so your final version will need to be a lot more
robust.

Tim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top