Help! Splitting Text And Numbers In The Same Cell

W

William

I need the formula for splitting a string of data with both numbers and
text. The sample is W435T786M32J45 I want to split the data starting at
each of the letters. i.e (CELL B2) "W435" (CELL B3) "T786" (CELL B4)
"M32" (CELL B5) "J45". All the numbers are not the same length so I
have to allow for that. Please Help!!!! THANKS TO ALL
 
P

Peo Sjoblom

If there is no "rule" to it as if the numbers of characters can differ where
they are etc..
Then it is basically impossible using functions
However if there are only 3 letters and the rest are numbers (regardless of
how many) it is possible
Assuming that W435T786M32J45 is in B1, in B2 use

=LEFT(B1,MATCH(FALSE,ISNUMBER(--MID(B1,ROW(INDIRECT("2:"&LEN(B1))),1)),0))

in B3 use

=LEFT(SUBSTITUTE(B1,B2,""),MATCH(FALSE,ISNUMBER(--MID(SUBSTITUTE(B1,B2,""),R
OW(INDIRECT("2:"&LEN(SUBSTITUTE(B1,B2,"")))),1)),0))

in B4 use

=LEFT(SUBSTITUTE(B1,B2&B3,""),MATCH(FALSE,ISNUMBER(--MID(SUBSTITUTE(B1,B2&B3
,""),ROW(INDIRECT("2:"&LEN(SUBSTITUTE(B1,B2&B3,"")))),1)),0))


all the above formulas entered with ctrl + shift & enter

finally in B5 use

=SUBSTITUTE(B1,B2&B3&B4,"")

entered normally
 
P

Peo Sjoblom

Sorry, meant 4 letters [are only 3 letters ]..

--

Regards,

Peo Sjoblom

Peo Sjoblom said:
If there is no "rule" to it as if the numbers of characters can differ where
they are etc..
Then it is basically impossible using functions
However if there are only 3 letters and the rest are numbers (regardless of
how many) it is possible
Assuming that W435T786M32J45 is in B1, in B2 use

=LEFT(B1,MATCH(FALSE,ISNUMBER(--MID(B1,ROW(INDIRECT("2:"&LEN(B1))),1)),0))

in B3 use

=LEFT(SUBSTITUTE(B1,B2,""),MATCH(FALSE,ISNUMBER(--MID(SUBSTITUTE(B1,B2,""),R
OW(INDIRECT("2:"&LEN(SUBSTITUTE(B1,B2,"")))),1)),0))

in B4 use

=LEFT(SUBSTITUTE(B1,B2&B3,""),MATCH(FALSE,ISNUMBER(--MID(SUBSTITUTE(B1,B2&B3
,""),ROW(INDIRECT("2:"&LEN(SUBSTITUTE(B1,B2&B3,"")))),1)),0))


all the above formulas entered with ctrl + shift & enter

finally in B5 use

=SUBSTITUTE(B1,B2&B3&B4,"")

entered normally
 
R

Ron Rosenfeld

I need the formula for splitting a string of data with both numbers and
text. The sample is W435T786M32J45 I want to split the data starting at
each of the letters. i.e (CELL B2) "W435" (CELL B3) "T786" (CELL B4)
"M32" (CELL B5) "J45". All the numbers are not the same length so I
have to allow for that. Please Help!!!! THANKS TO ALL

Assuming your original string is in B1, the following **array-entered**
formulas should do what you want:

B2: =MID(B1,1,MATCH(TRUE,ISERROR(-MID(B1,ROW(INDIRECT("2:255")),1)),0))

B3: =MID(RIGHT($B$1,LEN($B$1)-SUM(LEN($B$2:B2))),1,MATCH(TRUE,
ISERROR(-MID(RIGHT($B$1,LEN($B$1)-SUM(LEN($B$2:B2))),
ROW(INDIRECT("2:255")),1)),0))

Select B3 and copy/drag the formula down as far as might be needed. If you
drag further than needed, the formula should return a null length string in
those cells.

To **array-enter** a formula, after typing/pasting in the formula, hold down
<ctrl><shift> while hitting <enter>. XL will place braces around the formula.


--ron
 
H

Harlan Grove

Peo Sjoblom said:
If there is no "rule" to it as if the numbers of characters can differ where
they are etc..
Then it is basically impossible using functions
....

You should have known I'd respond. As long as I can use the defined name Seq
referring to =ROW(INDIRECT("1:1024")) I can extract the substring from the
beginning of the string to the end of the leftmost substring of numerals
using the array formula

=MID(A1,1,SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),1)-1)

and I can extract the n_th substring beginning after the (n-1)_th substring
of numerals and ending with the n_th substring of numerals using the array
formula

=MID(A1,SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),n-1),
SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),n)
-SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),n-1))

where the string being chopped up is in cell A1.

That said, this sort of thing is MUCH easier to do using a udf making use of
VBScript regular expressions.
 
P

Peo Sjoblom

Harlan Grove said:
...

You should have known I'd respond. As long as I can use the defined name Seq
referring to =ROW(INDIRECT("1:1024")) I can extract the substring from the
beginning of the string to the end of the leftmost substring of numerals
using the array formula

=MID(A1,1,SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),1)-1)

and I can extract the n_th substring beginning after the (n-1)_th substring
of numerals and ending with the n_th substring of numerals using the array
formula

=MID(A1,SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),n-1),
SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),n)
-SMALL(IF(ISNUMBER(-MID(" "&A1,Seq,1))
-ISNUMBER(-MID(A1,Seq,1))=1,Seq),n-1))

where the string being chopped up is in cell A1.

That said, this sort of thing is MUCH easier to do using a udf making use of
VBScript regular expressions.

As long as I learn something from your response it is fine
and I certainly do whether it is irony/sarcasm or excel
does not matter <g>
 
R

Ron Rosenfeld

If there is no "rule" to it as if the numbers of characters can differ where
they are etc..
Then it is basically impossible using functions


Not so. I think my array-formula solution should work.


--ron
 
H

Harlan Grove

Peo Sjoblom said:
....
Try this
123456789abcdefg123456789abcdefghijklmnopqrstuvwxyz111111111111111111111111 1
1111111111

Try it with my formulas. They seem to work.
 
P

Peo Sjoblom

Harlan Grove said:
Try it with my formulas. They seem to work.


Yes. I changed the string a few times and tested your second formula and it
worked nicely
for all that I came up with..
 
R

Ron Rosenfeld

Try this

123456789abcdefg123456789abcdefghijklmnopqrstuvwxyz1111111111111111111111111
1111111111

Seems to work fine. But perhaps you made a different assumption than I as to
how the string should be parsed.

When he stated that he wanted "to split the data starting at each of the
letters", I assumed that each substring would have a maximum of one letter.

So here is what I get:

123456789
a
b
c
d
e
f
g123456789
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z11111111111111111111111111111111111

--ron
 
P

Peo Sjoblom

Ron Rosenfeld said:
1

Seems to work fine. But perhaps you made a different assumption than I as to
how the string should be parsed.

When he stated that he wanted "to split the data starting at each of the
letters", I assumed that each substring would have a maximum of one
letter.


I am not disputing that, but you answered my post where I said:
"If there is no "rule" to it as if the numbers of characters can differ
where
they are etc..
Then it is basically impossible using functions"

I assumed the same as you but the "disclaimer" was meant
to explain that it is very difficult to parse strings
and given that it is almost impossible to adapt with a formula
unless there is some sort of pattern.
 
R

Ron Rosenfeld

I assumed the same as you but the "disclaimer" was meant
to explain that it is very difficult to parse strings
and given that it is almost impossible to adapt with a formula
unless there is some sort of pattern.

OK. I guess I thought your statement referred to any rules *except* what the
OP originally posted with regard to "starting at each of the letters."




--ron
 
H

Harlan Grove

Peo Sjoblom said:
to explain that it is very difficult to parse strings
and given that it is almost impossible to adapt with a formula
unless there is some sort of pattern.
....

While text parsing by formula isn't something Excel is ideally suited to
perform, it's not almost impossible. The trick is finding expressions that
identify transitions as economically as possible. Also, there's need for
some basic, common building blocks such as an array of sequential integers
to feed to MID. There could also be defined names for numerals, letters,
token characters (letters, numerals and underscores), and various others.
With these, fairly complicated parsing may be perfomred within the nested
function call and maximum formula length limitations.
 
K

kapahia

I think there is a very easy way to resolve. Go to data text to colum
and split all alpha numeric numbers into different colums. Then do
paste special and transpose. Then run a Data sort. Nos and labels ar
seperate!. A simple Macros recording can help
 
H

Harlan Grove

If you install the functions at
http://www.tmehta.com/regexp/add_code.htm
you can then select (at least) as many cells as needed to contain the
result and array-enter =RegExpFind(A1,"[a-zA-Z]\d+")
...

Yeah, but pre-selection of an appropriately sized range is a PITA. Using my own
Subst udf wrapper around the VBScript RegExp object, I could put the formula for
the first match in cell A2 as

=subst(A$1,"([A-Za-z]*\d*){"&SUM(ROW()-ROW($A$2))&"}([A-Za-z]*\d*).*","$2")

and fill A2 down as far as needed. Note that I'm also allowing for multiple
letters anywhere, no letters at all at the beginning, and no numerals at the
end.
 
W

William

SORRY! To the people smarter than I am

I just wanted to give you more information; The letters come i
patterns. The sample (CELL B1) W435T786M32J45 I want to split this dat
string with the letter at the start of each column. i.e (CELL B2
"W435" (CELL B3) "T786" (CELL B4) "M32" (CELL B5) "J45".

(CELL C1) R23.4E89W34.7T1218M2378J0.0 (CELL C2) "R23.4" (CELL C3
"E89"(CELL C3) "W34.7" (CELL C4) "T1218" (CELL C5)"M2378" (CELL C6
"J0.0" The letters come in the same pattern in each string I am tryin
to split. There may be more letters added to the ends of the string
but they come in a pattern also, and do not change the pattern of th
letters in the middle of the string . The T still comes after the W
The W is always after the E with the numbers in between the letters.

There is only a single letter anywhere in the string no double letter
to contend with. But the numbers are of different digits in length
Some with 1 digit and some with 10 and anywhere between.
Thanks so much for your help!!!!!!
 
H

Harlan Grove

William > said:
I just wanted to give you more information; The letters come in
patterns. The sample (CELL B1) W435T786M32J45 I want to split this data
string with the letter at the start of each column. i.e (CELL B2)
"W435" (CELL B3) "T786" (CELL B4) "M32" (CELL B5) "J45".

(CELL C1) R23.4E89W34.7T1218M2378J0.0 (CELL C2) "R23.4" (CELL C3)
"E89"(CELL C3) "W34.7" (CELL C4) "T1218" (CELL C5)"M2378" (CELL C6)
"J0.0" The letters come in the same pattern in each string I am trying
to split. There may be more letters added to the ends of the strings
but they come in a pattern also, and do not change the pattern of the
letters in the middle of the string . The T still comes after the W.
The W is always after the E with the numbers in between the letters.
....

So only the single letters R/E/W/T/M/J mark the beginning of each section
and some may be missing. There may be an optional set of ending characters
after the last of these letters present as well as their following numbers.
FTHOI, I'll assume those letters are "XYZ". Good. This is much easier to
handle. For the first one in cell B1, enter the following formula in B2.

B2:
=MID(B$1,
SMALL(FIND({"R","E","W","T","M","J","XYZ"},B$1&"XYZREWTMJ"),ROW()-1),
SMALL(FIND({"R","E","W","T","M","J","XYZ"},B$1&"XYZREWTMJ"),ROW())
-SMALL(FIND({"R","E","W","T","M","J","XYZ"},B$1&"XYZREWTMJ"),ROW()-1))

Fill B2 down as far as needed (into B3:B5). Fill B2 right into C2, then fill
C2 down as far as needed (into C3:C7).
There is only a single letter anywhere in the string no double letters
to contend with. But the numbers are of different digits in length.
Some with 1 digit and some with 10 and anywhere between.

We don't always answer just your question. Thus the digression into multiple
letters. This would still be much easier to do using regular expressions,
for which there are now quite a few postings in this newsgroup's archives in
Google Groups. This would just change the match pattern to
"[REWTMJ][0-9.]+".
 
T

Tushar Mehta

Maybe, you can show us how your method would work with the OP's
datasets (note, plural!)

--
Regards,

Tushar Mehta, MS MVP -- Excel
www.tushar-mehta.com
Excel, PowerPoint, and VBA add-ins, tutorials
Custom MS Office productivity solutions
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top