How to extract specific text from a string of characters

R

rushdhih

In order to meet client reporting requirements, we download data from our ERP
system into Excel. Our ERP system has limitations and to overcome this we
have codified client budget data. Every transaction description field entered
into the ERP system includes a 6 digit code (Eg:A52101). The problem I am
encountering in the downloaded data is there are system generated text and
other spurious text that creeps in during the conversion process. The code
appears sometime in the left, right or center of the transaction description
together with other text.

I need a formula that will check through the transaction description and
extract only the first occurence of the code.

The pattern of the code - Alpha + 5 Numeric character (no spaces)


Thank you in advance for your help
 
S

Shane Devenshire

Why don't you show us a few examples, technically this could be

A52101
A52101A52101
zzzA52101
dfsdsafdsa A52101 amb
asA52101bcdA52101
B123456787vA52101dsfdsf

How can you tell all these apart? How can you tell which 6 digit code is
really a code as in the last example?
 
R

rushdhih

here are some actual data from which the code needs to be extracted. As you
can see the code appears sometimes in the begining preceded by a semicolon,
in the middle or towards the end.

I have marked as 1,2,3 & 4 for clarity to show the possibilities

1)AP Accruals; A52101Battery 6 cell; A52101 Battery 6 cell; LKOC_RH_Toshiba
Re-Inv#3004461
2); ; A52101 Internet usage charges on Dialog USB Modem for December 2008;
3)AP Accruals; A52101 Internet usage charges; A52101 Internet usage charges
on Dialog USB Modem for December 2008; LKOC_RH_Dialog Internet-Dec 08
4)2008 PO ACCRUAL REVERSAL; ; A52101 PCB;

Thank you for any help you can give.
 
G

Gary''s Student

Consider the following User Defined Function:

Function lookit(r As Range) As String
Dim s As String
s = r.Value
Dim outputt(5) As String
iPlace = 0
For i = 1 To Len(s)
ch = Mid(s, i, 1)
If iPlace = 0 Then
If caps(ch) Then
outputt(iPlace) = ch
iPlace = iPlace + 1
End If
Else
If digit(ch) Then
outputt(iPlace) = ch
iPlace = iPlace + 1
If iPlace = 6 Then Exit For
Else
iPlace = 0
End If
End If
Next
lookit = Join(outputt, "")
End Function


Function digit(v As Variant) As Boolean
v2 = Array("1", "2", "3", "4", "5", "6", "7", "8", "9", "0")
digit = False
For i = LBound(v2) To UBound(v2)
If v = v2(i) Then
digit = True
Exit Function
End If
Next
End Function


Function caps(v As Variant) As Boolean
Strn = "A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z"
v2 = Split(Strn, ",")
caps = False
For i = LBound(v2) To UBound(v2)
If v = v2(i) Then
caps = True
Exit Function
End If
Next
End Function

So, for example, if A1 contained:
87326428A12345lksfasjk
=lookit(A1) would display:

A12345
 
R

Ron Rosenfeld

In order to meet client reporting requirements, we download data from our ERP
system into Excel. Our ERP system has limitations and to overcome this we
have codified client budget data. Every transaction description field entered
into the ERP system includes a 6 digit code (Eg:A52101). The problem I am
encountering in the downloaded data is there are system generated text and
other spurious text that creeps in during the conversion process. The code
appears sometime in the left, right or center of the transaction description
together with other text.

I need a formula that will check through the transaction description and
extract only the first occurence of the code.

The pattern of the code - Alpha + 5 Numeric character (no spaces)


Thank you in advance for your help

This UDF will do what you describe:

====================
Option Explicit
Function GetCode(sTxt As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "[A-Z]\d{5}"
If re.test(sTxt) = True Then
Set mc = re.Execute(sTxt)
GetCode = mc(0).Value
End If
End Function
========================

To enter it, <alt-F11> opens the VBEditor. Ensure your project is highlighted
in the project explorer window, then Insert/Module and paste the code above
into the window that opens.

Then enter the function =GetCode(cell_ref) in some cell.

As written, this will return the first substring that consists of a capitalized
letter followed by at least 5 digits.

One of the issues that you may run into is how to delineate the code. From
your examples, it seems to be the case that the code could be followed
immediately by a capital letter (e.g. in your example 1: ...A52101Battery 6
cell;...)

Could it also be followed immediately by a number?
Will the letter be preceded by anything other than a <space>.

The answers to these two questions could help construct a more robust pattern
to use in the above UDF.
--ron
 
R

Ron Rosenfeld

In order to meet client reporting requirements, we download data from our ERP
system into Excel. Our ERP system has limitations and to overcome this we
have codified client budget data. Every transaction description field entered
into the ERP system includes a 6 digit code (Eg:A52101). The problem I am
encountering in the downloaded data is there are system generated text and
other spurious text that creeps in during the conversion process. The code
appears sometime in the left, right or center of the transaction description
together with other text.

I need a formula that will check through the transaction description and
extract only the first occurence of the code.

The pattern of the code - Alpha + 5 Numeric character (no spaces)


Thank you in advance for your help

Here's another UDF that does not use Regular Expressions:

========================
Option Explicit
Option Compare Binary
Function GetCode(sTxt As String) As String
Const sPattern As String = "[A-Z]#####"
Dim i As Long
For i = 1 To Len(sTxt) - 6
If Mid(sTxt, i, 6) Like sPattern Then
GetCode = Mid(sTxt, i, 6)
Exit Function
End If
Next i
End Function
=================================
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top