Cleaning data - extracting Post Codes

  • Thread starter SimonG via AccessMonster.com
  • Start date
S

SimonG via AccessMonster.com

I have a set of data which I am trying to tidy. The address table has just
been filled in from the top line down, so the address structure is not common
between records. As part of tiding this I would like to able to extract Post
Codes to their own field.

Currently Post Codes are typically found within AddressLine4 or AddressLine5,
but may also share the field with town or county,
eg
AB12 34CD
London EF56 7GH
West Midlands J9 12KL

Has any one achieved anything similar, or any ideas how this could be done?

Many thanks,
Simon
 
R

ruralguy via AccessMonster.com

Is the Post Code *always* on the right of the field and does it always
contain an embedded space? Did you also want to remove it from the field
where it currently resides?
 
S

SimonG via AccessMonster.com

Yes, its the last thing in the address (well I cannot find any fields with
anything following the post code).

A space in the post code is normal (again cannot guarantee for all, but very
few, if any, break this).

Yes, will be removing it from source fields to its own dedicated field
(remainder of source field will be left as is).
Is the Post Code *always* on the right of the field and does it always
contain an embedded space? Did you also want to remove it from the field
where it currently resides?
I have a set of data which I am trying to tidy. The address table has just
been filled in from the top line down, so the address structure is not common
[quoted text clipped - 12 lines]
Many thanks,
Simon
 
R

ruralguy via AccessMonster.com

I can figure our how to get the PostCode but how do you figure out which
field it is in?
Yes, its the last thing in the address (well I cannot find any fields with
anything following the post code).

A space in the post code is normal (again cannot guarantee for all, but very
few, if any, break this).

Yes, will be removing it from source fields to its own dedicated field
(remainder of source field will be left as is).
Is the Post Code *always* on the right of the field and does it always
contain an embedded space? Did you also want to remove it from the field
[quoted text clipped - 5 lines]
 
S

SimonG via AccessMonster.com

Simple way would be to use some sort of nested if

new post code = if postcode-in(address line 5) then
extract post code from address line 5
else
if postcode-in(address line 4) then
extract post code from address line 4
else
""
end if
end if

ok, it not pretty, but it would work; maybe possible to make it nicer
depending on how your routine works.

Regards,
Simon
I can figure our how to get the PostCode but how do you figure out which
field it is in?
Yes, its the last thing in the address (well I cannot find any fields with
anything following the post code).
[quoted text clipped - 10 lines]
 
R

ruralguy via AccessMonster.com

How do you know it is a post code?
Simple way would be to use some sort of nested if

new post code = if postcode-in(address line 5) then
extract post code from address line 5
else
if postcode-in(address line 4) then
extract post code from address line 4
else
""
end if
end if

ok, it not pretty, but it would work; maybe possible to make it nicer
depending on how your routine works.

Regards,
Simon
I can figure our how to get the PostCode but how do you figure out which
field it is in?
[quoted text clipped - 4 lines]
 
R

ruralguy via AccessMonster.com

We seem to be going around in circles. I'll work on some code and post back.
 
S

SimonG via AccessMonster.com

Sorry, I understood you could extract the post code from a given field, but
were unsure which field to extract from. Hence I thought a simple check by
trying to extract from the most likely field, which if it failed then tried
extracting from the next most likely.
 
R

ruralguy via AccessMonster.com

Well I found a site that describes the format of a UK PostCode but it does
not seem to agree with your examples.

http://en.wikipedia.org/wiki/UK_postcodes#Format

It describes the format as *generally*

A9 9AA
A99 9AA
A9A 9AA
AA9 9AA
AA99 9AA
AA9A 9AA

I can easily separate a string at the 2nd to the last space in the string,
but how do we know we have got a PostCode?

Sorry, I understood you could extract the post code from a given field, but
were unsure which field to extract from. Hence I thought a simple check by
trying to extract from the most likely field, which if it failed then tried
extracting from the next most likely.
We seem to be going around in circles. I'll work on some code and post back.
[quoted text clipped - 3 lines]
 
J

John Spencer

If the Trimmed value you find is like

"[a-z]*[0-9]* [0-9][a-z][a-z]"
then I would guess it is probably going to be a post code.

'====================================================
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
'====================================================

Well I found a site that describes the format of a UK PostCode but it does
not seem to agree with your examples.

http://en.wikipedia.org/wiki/UK_postcodes#Format

It describes the format as *generally*

A9 9AA
A99 9AA
A9A 9AA
AA9 9AA
AA99 9AA
AA9A 9AA

I can easily separate a string at the 2nd to the last space in the string,
but how do we know we have got a PostCode?

Sorry, I understood you could extract the post code from a given field, but
were unsure which field to extract from. Hence I thought a simple check by
trying to extract from the most likely field, which if it failed then tried
extracting from the next most likely.
We seem to be going around in circles. I'll work on some code and post back.
[quoted text clipped - 3 lines]
I can figure our how to get the PostCode ...
 
L

Linq Adams via AccessMonster.com

"Yes, its the last thing in the address (well I cannot find any fields with
anything following the post code)."

If this is true, wouldn't it make sense to check to see if AddressLine5 is
null, if so check AddressLine4, and so forth until you find an occupied
AddressLine textbox, then do the extraction routine?

If Not IsNull(Me.AddressLine5) Then
'Code to extract postal code goes here
Exit Sub
ElseIf Not IsNull(Me.AddressLine4) Then
'Code to extract postal code goes here
Exit Sub
ElseIf Not IsNull(Me.AddressLine3) Then
'Code to extract postal code goes here
Exit Sub
 
R

raskew via AccessMonster.com

R

ruralguy via AccessMonster.com

Here's a couple of functions for a standard module that can pick out the
PostCode
and I'll leave moving the strings to their respective fields to you.

Public Function StripPostCode(InField As String) As String

'-- Return InField as two separate strings separated by "|"
'London EF56 7GH - "London|EF56 7GH"
'West Midlands J9 12KL - "West Midlands|J9 12KL"
'London - ""

'The format of UK PostCodes are "generally":
'A9 9AA
'A99 9AA
'A9A 9AA
'AA9 9AA
'AA99 9AA
'AA9A 9AA
'...and this function does not currently check for any type of valid PostCode.


'-- Declare variables so the code is easier to read
Dim PostCodeStart As Long
Dim PostCodeLength As Long
'-- Check for at least 2 spaces in the InField
If InStr(InStr(1, InField, " ") + 1, InField, " ") > 0 Then
PostCodeStart = LocateSpace(InField)
PostCodeLength = Len(InField) - PostCodeStart
StripPostCode = Left(InField, PostCodeStart - 1) & _
"|" & Right(InField, PostCodeLength)
Else
StripPostCode = ""
End If

End Function

Function LocateSpace(InString As String) As Long
'--Return the location of the 2nd space from the end of the string

LocateSpace = InStrRev(InString, " ", InStrRev(InString, " ") - 1)

End Function

Sorry, I understood you could extract the post code from a given field, but
were unsure which field to extract from. Hence I thought a simple check by
trying to extract from the most likely field, which if it failed then tried
extracting from the next most likely.
We seem to be going around in circles. I'll work on some code and post back.
[quoted text clipped - 3 lines]
 
S

SimonG via AccessMonster.com

Many thanks Ruralguy and all,
John Spencer:
If the Trimmed value you find is like

  "[a-z]*[0-9]* [0-9][a-z][a-z]"
then I would guess it is probably going to be a post code.

Thanks John,
This is the sort of thing I was originally thinking of – unfortunately I'm
unsure how to use it.

My final solution was based on Ruralguy's code. This was modified slightly
to actually return the PostCode, and also to cope with fields which held just
a post code.
For completeness here is the code I used:

Public Function StripPostCode(InField As String) As String

'-- Returns UK Post Code, where Post Code is last item in field;
'-- if no Post Code found returns “â€

'The format of UK PostCodes are "generally":
'A9 9AA
'A99 9AA
'A9A 9AA
'AA9 9AA
'AA99 9AA
'AA9A 9AA
'...and this function does not currently check for any type of valid PostCode.


'-- Declare variables so the code is easier to read
Dim PostCodeStart As Long
Dim PostCodeLength As Long

'-- simple Post Code check first
If (Len(InField) > 5 And Len(InField) < 9) And (InStr(1, InField, " ") > 2
And InStr(1, InField, " ") < 6) Then
StripPostCode = InField 'InField only contains PostCode
Else
'-- Check for at least 2 spaces in the InField
If InStr(InStr(1, InField, " ") + 1, InField, " ") > 0 Then
PostCodeStart = LocateSpace(InField)
PostCodeLength = Len(InField) - PostCodeStart
If PostCodeLength < 6 Or PostCodeLength > 8 Then
'InField unlikely to contain post code
StripPostCode = ""
Else
StripPostCode = Mid(InField, PostCodeStart + 1, PostCodeLength)
End If
Else
StripPostCode = ""
End If
End If

End Function

Function LocateSpace(InString As String) As Long
'--Return the location of the 2nd space from the end of the string

LocateSpace = InStrRev(InString, " ", InStrRev(InString, " ") - 1)

End Function


Although the above function worked nicely for me (thanks rg), I could not get
it to cope with being passed a null value. I got around this by including an
nz in the calling query:
PostCode: StripPostCode(nz([AddressLine5]))

Thanks again to all who replied,
Regards,
Simon
 
R

ruralguy via AccessMonster.com

Excellent! Glad you got it sorted.
Many thanks Ruralguy and all,
John Spencer:
If the Trimmed value you find is like

  "[a-z]*[0-9]* [0-9][a-z][a-z]"
then I would guess it is probably going to be a post code.

Thanks John,
This is the sort of thing I was originally thinking of – unfortunately I'm
unsure how to use it.

My final solution was based on Ruralguy's code. This was modified slightly
to actually return the PostCode, and also to cope with fields which held just
a post code.
For completeness here is the code I used:

Public Function StripPostCode(InField As String) As String

'-- Returns UK Post Code, where Post Code is last item in field;
'-- if no Post Code found returns “â€

'The format of UK PostCodes are "generally":
'A9 9AA
'A99 9AA
'A9A 9AA
'AA9 9AA
'AA99 9AA
'AA9A 9AA
'...and this function does not currently check for any type of valid PostCode.

'-- Declare variables so the code is easier to read
Dim PostCodeStart As Long
Dim PostCodeLength As Long

'-- simple Post Code check first
If (Len(InField) > 5 And Len(InField) < 9) And (InStr(1, InField, " ") > 2
And InStr(1, InField, " ") < 6) Then
StripPostCode = InField 'InField only contains PostCode
Else
'-- Check for at least 2 spaces in the InField
If InStr(InStr(1, InField, " ") + 1, InField, " ") > 0 Then
PostCodeStart = LocateSpace(InField)
PostCodeLength = Len(InField) - PostCodeStart
If PostCodeLength < 6 Or PostCodeLength > 8 Then
'InField unlikely to contain post code
StripPostCode = ""
Else
StripPostCode = Mid(InField, PostCodeStart + 1, PostCodeLength)
End If
Else
StripPostCode = ""
End If
End If

End Function

Function LocateSpace(InString As String) As Long
'--Return the location of the 2nd space from the end of the string

LocateSpace = InStrRev(InString, " ", InStrRev(InString, " ") - 1)

End Function

Although the above function worked nicely for me (thanks rg), I could not get
it to cope with being passed a null value. I got around this by including an
nz in the calling query:
PostCode: StripPostCode(nz([AddressLine5]))

Thanks again to all who replied,
Regards,
Simon
 
J

John Spencer

Public Function StripPostCode(InField As String) As String

'-- Returns UK Post Code, where Post Code is last item in field;
'-- if no Post Code found returns “â€

'The format of UK PostCodes are "generally":
'A9 9AA
'A99 9AA
'A9A 9AA
'AA9 9AA
'AA99 9AA
'AA9A 9AA
'...and this function does not currently check for any type of valid PostCode.


'-- Declare variables so the code is easier to read
Dim PostCodeStart As Long
Dim PostCodeLength As Long

'-- simple Post Code check first
If (Len(InField) > 5 And Len(InField) < 9) And (InStr(1, InField, " ") > 2
And InStr(1, InField, " ") < 6) Then
StripPostCode = InField 'InField only contains PostCode
Else
'-- Check for at least 2 spaces in the InField
If InStr(InStr(1, InField, " ") + 1, InField, " ") > 0 Then
PostCodeStart = LocateSpace(InField)
PostCodeLength = Len(InField) - PostCodeStart
If PostCodeLength < 6 Or PostCodeLength > 8 Then
'InField unlikely to contain post code
StripPostCode = ""
'Test format of possible Postcode
'==================================================================
Elseif Trim(Mid(InField, PostCodeStart + 1, PostCodeLength)) _
LIKE "[a-z]*[0-9]* [0-9][a-z][a-z]" THEN
'==================================================================

StripPostCode = Mid(InField, PostCodeStart + 1, PostCodeLength)
End If
Else
StripPostCode = ""
End If
End If

End Function

Function LocateSpace(InString As String) As Long
'--Return the location of the 2nd space from the end of the string

LocateSpace = InStrRev(InString, " ", InStrRev(InString, " ") - 1)

End Function


John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
Many thanks Ruralguy and all,
John Spencer:
If the Trimmed value you find is like

"[a-z]*[0-9]* [0-9][a-z][a-z]"
then I would guess it is probably going to be a post code.

Thanks John,
This is the sort of thing I was originally thinking of – unfortunately I'm
unsure how to use it.
 
S

SimonG via AccessMonster.com

Thanks for the example John,

The *'s in your like string, mean any character, or no character – hence A9
is acceptable.
I hadn't appreciated * could also represent the absence of a character.

Many thanks,
Simon
 
Top