Searching for case specific data

C

Colin Foster

Hi,
I've been given aan Excel spreadsheet to "sort out"!

In simple terms it's just a series of names, addresses & postcodes...
HOWEVER, rather than use one column for address line #1, a second for Address
line #2, & others for Post Town, County & Post Code, they've been mixed up!

So, what I'd like to do is find a nice simple way of working through these
2000+ records & pull out the relevant data. As a start, I want to pull the
Postcodes into a single column & I had originally thought of looking for a
part of a text string of 3 characters using LEFT & Spaces, however, some
postcodes are 4 characters for the first part & some counties have 4
characters followed by a space (e.g. West Yorkshire).

So, I then thought... "ok... let's look for words that start with a capital
letter followed by lower case" This will give me place names, rather than
post codes, but it will help point the way... and that's where I came
unstuck!! I can use PROPER, UPPER & LOWER to convert one to the other, but is
there a way of searching for these? so that I can pull out all of the
"non-postcode" lines?

Alternatively, is there a different approach that could be suggested?

Regards
Colin
 
D

Dave O

Hi, Colin-
Greetings from Virginia, USA, near Washington DC.

This is always a tricky task, because finding a reliable pattern in
the data is difficult, as you're finding out. When you isolate the
post code out of the mix, you may find another pattern from there.

Do you mind posting some examples? If the post code always appears at
the end of the address, you might use and IF statement with a MID that
looks for a space 4 or 5 characters from the end, and isolate the post
code that way. If you post examples, for privacy reasons I suggest
substituting fake data.

Another technique that you may be able to use: you can use ASCII
character codes to look for upper and lower cases. For instance, you
can find the lower case "L" in "Colin" with this formula:
=SEARCH(CHAR(108),"Colin",1)
Character 108 is the ASCII representation of lower case L. One place
to find ASCII codes is
http://www.neurophys.wisc.edu/www/comp/docs/ascii.html

Dave O
 
C

Colin Foster

Hi Dave,
Greetings back from Hyde, England :)

Part of the problem is that, for example, they've sometimes used Column F
for Post Code (or ZipCode) & sometimes used it for the Post Town or County!
Here's some examples...
E F G
2 North Yorkshire HG4 1HJ
3 Rotherham South Yorkshire S63 7QQ
4 Beverley North Humberside HU17 8AZ
5 Sheffield South Yorkshire S10 2LN
6 South Yorkshire DN1 2DZ

So ideally, I want the "North" & "South" Yorkshires in one column,
Rotherham, Beverley & Sheffield in another & Postcodes in a third. But how to
do this programatically is, I believe, a bit of a nightmare!!

Regards
Colin
 
D

Dave O

In trying to find a pattern in your example, it looks like column G
always contains the post code, although it may be blank, and column F
should contain the county, although it might show the post code.
Column E contains the city name.

I have to display some of my American ignorance here: is a post code
always either 7 or 8 characters long? I see two examples: 3 or 4
alphanumeric characters, a space, then 3 alphanumeric chars. If that
is universally true, you can use those to your advantage. I'll see if
I can work up a solution- I know you're getting ready to leave for the
day.

Dave O
 
D

Dave O

More info: I searched Royal Mail's website for information about the
post code, and learned its format is 2 or 3 or 4 alphanumeric
characters, a space, and 3 alphanumeric characters. Then I searched a
list of counties in England, Scotland, and Wales for a " " (space) in
column n-4 (n minus 4) of each county name where n is the length of
the name: there are no counties in the list like that. So in your
example data we can search column F, which contains either the post
code or the county, for a space in column n-4.

I added two columns, H and I, to contain the county and postcode. I
used this formula in H:
=IF(MID(RIGHT(F2,4),1,1)=" ","",F2)
....and this in column I:
=IF(MID(RIGHT(F2,4),1,1)=" ",F2,G2)

The formula in H reviews the entry in column F to see if the text
string contains a space in column n-4. If it does, column H is blank;
if it does not, the formula pulls the county name from F. The formula
in I acts similarly: if F contains a space in column n-4, the formula
pulls the postcode from F; if the space is not in n-4, the formula
pulls the postcode from column G.

Does this work for the rest of your data?

Dave O
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top