parse some text from an inconsistent text field

T

tstew

Hello,

I need to construct an update query using some fairly bad/dirty data (at
least by my standards). There is one text field that has a lot of data with
an inconsistent number of spaces, commas, characters. It's actually an
address, poorly and inconsistently formatted. The one constant is that there
is a character sequence " CA " preceding the 5 characters I want. It is the
"zip code" data. Sometimes it is missing completely (1-3% of the time),
sometimes it is in zip+4 format (15% of the time), usually it's just the zip
code. I just want the Zip Code portion for filtering in the query.

The Len function won't work, nor will the Right function. So, I'm guessing
it would be some sort of compound criteria. Any ideas?

Different scenario: Same bad data source, different field, formatted as a
text field. If the field contains information like"Postponed to 10/14/2009 @
10:30 AM - Trustees Discretion", I need to parse out just the date. That
field could also contain data like "Cancelled 10/09/2009" or "Active". Any
ideas on this one?

Thanks!!
Mark
 
K

kc-mass

Re First Scenario

MyShortZip = Mid(myField, instr(myField,"CA") + 3,5

Regards

Kevin

)
 
J

John W. Vinson

Hello,

I need to construct an update query using some fairly bad/dirty data (at
least by my standards). There is one text field that has a lot of data with
an inconsistent number of spaces, commas, characters. It's actually an
address, poorly and inconsistently formatted. The one constant is that there
is a character sequence " CA " preceding the 5 characters I want. It is the
"zip code" data. Sometimes it is missing completely (1-3% of the time),
sometimes it is in zip+4 format (15% of the time), usually it's just the zip
code. I just want the Zip Code portion for filtering in the query.

This will of course give you all sorts of trouble if you have an address like
"3124 Macaca Ave, Calistoga, CA 12345"... Can you at least count on the zip
code immediately following the CA, if it's there at all, and being numeric?
The Len function won't work, nor will the Right function. So, I'm guessing
it would be some sort of compound criteria. Any ideas?

Different scenario: Same bad data source, different field, formatted as a
text field. If the field contains information like"Postponed to 10/14/2009 @
10:30 AM - Trustees Discretion", I need to parse out just the date. That
field could also contain data like "Cancelled 10/09/2009" or "Active". Any
ideas on this one?

Thanks!!
Mark

For either of these, I suspect you'll need to construct some VBA code to step
through the string; or you'll need to use "Regular Expressions" (Bing or
Google for the term, I'm not well versed in their use). I don't see any easy
way to do it inline in a query with builtin functions.
 
T

tstew

Hi John,

Yes, about the only thing I can count on is a sequence like: comma space CA
space zip (or zip+4 or nothing (rarely). Some live examples:

957- 963 S BONNIE BCH PL, Los Angeles, CA 90023
12127-12129 1/4 MOREHOUSE STREET, El Monte, CA 91732
8650 Belford Ave.,#A202, Los Angeles, CA 90045
10957 MCLENNAN AVENUE, LOS ANGELES (GRANADA HILLS AR,, CA 91344
4649 NORTH MIDSITE AVENUE, Los Angeles, (Covina Area), CA 91722-4645

I think the solution will end up being the INSTR() function getting the
absolute next five characters immediately after the "spaceCAspace"
characters, maybe nesting a Right() function in there. This is a bit beyond
my skill level. I know where the data is, I just don't know how to get it
out. :(

The file starts as an excel download. I tried "text to columns" but the
inconsistent commas caused too many problems. Hmmm, maybe an Iff formula in
excel using the Right function could work well enough.

Any other ideas?

Thanks,
Mark
 
T

TedMi

Then modify kc-mass's expression like so:
MyShortZip = Mid(myField, instr(myField,", CA " ) + 5,5)
-TedMi
 
J

John Spencer

Use this to return everything after ", CA "
Mid([AddressField],Instr(1,[AddressField],", CA ")+5)
Use this to return the next 5 characters after that string
Mid([AddressField],Instr(1,[AddressField],", CA ")+5,5)

Second scenario would involve a custom VBA function using SPLIT to break the
field into "words" based on the space character.

Then testing each "word" to see if it could be a date (IsDate function).

If it could be interpreted as as a date then test to see if the date value is
not a time. DateValue(txtArray(elementNumber)) > #1/1/1910#.

John Spencer
Access MVP 2002-2005, 2007-2009
The Hilltop Institute
University of Maryland Baltimore County
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top