extract word function

D

Dinesh

Is their a function to extract first 3 words from a text string?

Thanks.

Dinesh Shah
 
M

Max

One way ..

Assuming space as text separators, with source data in A1 down

In B1:
=IF(ISERROR(LEFT(A1,SEARCH(" ",A1,SEARCH(" ",A1,SEARCH("
",A1)+1)+1)-1)),A1,LEFT(A1,SEARCH(" ",A1,SEARCH(" ",A1,SEARCH("
",A1)+1)+1)-1))
Copy down

If there's less than 3 words in the string, then the above simply returns
col A as-is
 
P

PapaDos

=LEFT( TRIM( A1 ), IF( ISERR( SEARCH( " ", SUBSTITUTE( TRIM( A1 ), " ", "
", 3 ) ) ), LEN( A1 ), SEARCH( " ", SUBSTITUTE( TRIM( A1 ), " ", " ", 3 ) )
- 1 ) )
 
P

PapaDos

If your text is coming from web pages, this one is a bit better:

=LEFT( TRIM( SUBSTITUTE( CLEAN( A1 ), CHAR( 160 ), " " ) ), IF( ISERR(
SEARCH( " ", SUBSTITUTE( TRIM( SUBSTITUTE( CLEAN( A1 ), CHAR( 160 ), " " )
), " ", " ", 3 ) ) ), LEN( A1 ), SEARCH( " ", SUBSTITUTE( TRIM( SUBSTITUTE(
CLEAN( A1 ), CHAR( 160 ), " " ) ), " ", " ", 3 ) ) - 1 ) )
 
R

Roger Govier

Hi Luc

An excellent solution, but because of text wrapping in the posting,
anybody copying and pasting the formula may get a problem.

In both of the Substitute terms,
it is a single space " " that is substituted by a double space " "

When I copied and pasted, it somehow dropped one of the double spaces in
the first term, and of course this returned the whole of the text from
cell A1.
 
E

Epinn

Luc, brilliant! I can change the number of words I want easily by just changing the instance.

Roger, you said it. I played with the formula (first one) before you posted. I copied and pasted it and there was no problem whatsoever i.e. I got the correct result. Just because it was so smooth, I didn't even know we were supposed to substitute one space with two spaces. At this point, I had not figured out the logic yet. I only found out about the double spaces when I tried to remove some spaces in the formula. I deleted one space between double quotes thinking that it was an extra space and I got #VALUE error. When I finally sorted out everything (including the logic), I decided to make the double spaces more visible by changing it to " *". Of course, I didn't get the right result. It stopped after the first word. Please confirm that " *" is treated as wildcard. Now, I am going to use " ^" instead of double spaces purely for visibility reasons. Does anyone see a problem with my change?

Appreciate guidance.

Epinn

Hi Luc

An excellent solution, but because of text wrapping in the posting,
anybody copying and pasting the formula may get a problem.

In both of the Substitute terms,
it is a single space " " that is substituted by a double space " "

When I copied and pasted, it somehow dropped one of the double spaces in
the first term, and of course this returned the whole of the text from
cell A1.
 
P

PapaDos

Thanks, Roger.
I should have thought of that problem.

I used the double space because it is a combinatioin that is sure not to
exist in the searched string (TRIMmed). I could have used CHAR(160) but,
being lazy I used the "shortest to type" route...
;-]
 
P

PapaDos

Using " ^" is not guaranteed to work as expected.

Since we don't know which characters could be part of the string, we need to
use a combination that will not be present in it.

With my first formula, only the double spaces is guaranteed to work.
With my second one (which is more robust), we have a few more choices;
CHAR(160), double spaces, one of the characters eliminated by CLEAN(), or a
combination of them...
 
R

Roger Govier

Hi Luc
being lazy I used the "shortest to type" route...
Now why am I familiar with that? <vbg>

No problem, I just didn't want any others to mistakenly think your
solution didn't work, just because of quirks we sometimes get with the
way a message gets formatted by our particular newsreader.

--
Regards

Roger Govier


PapaDos said:
Thanks, Roger.
I should have thought of that problem.

I used the double space because it is a combinatioin that is sure not
to
exist in the searched string (TRIMmed). I could have used CHAR(160)
but,
being lazy I used the "shortest to type" route...
;-]

--
Regards,
Luc.

"Festina Lente"


Roger Govier said:
Hi Luc

An excellent solution, but because of text wrapping in the posting,
anybody copying and pasting the formula may get a problem.

In both of the Substitute terms,
it is a single space " " that is substituted by a double space " "

When I copied and pasted, it somehow dropped one of the double spaces
in
the first term, and of course this returned the whole of the text
from
cell A1.
 
D

Dinesh

Thanks a lot. What criteria do I change if I want to extract 2 words or 4
words from this formula?
 
E

Epinn

As indicated in my previous post, we can easily change "3" in the formula to any number of words we want. For syntax of the SUBSTITUTE function, please refer to Excel Help text.

Epinn

Thanks a lot. What criteria do I change if I want to extract 2 words or 4
words from this formula?
 
R

Roger Govier

Hi

It is the number 3 in Luc's formula that you would need to amend, to the
number of words you wanted.
It might be easier to refer to another cell, e,g D1, and enter the
number of words required in that cell.
A single change of value here would save having to keep modifying the
formula. If B1 were left blank, then it would pick up all of the text
from A1

=LEFT( TRIM( A1 ),
IF( ISERR(
SEARCH( " ", SUBSTITUTE( TRIM( A1 ), " ", " ", $D$1 ) ) ),
LEN( A1 ),
SEARCH( " ", SUBSTITUTE( TRIM( A1 ), " ", " ", $D$1 ) )- 1 )
)
 
E

Epinn

Luc, thank you for your response. I want to apologize to all for not doing detailed testing. I shouldn't have asked ...... It was late and I wasn't thinking clearly. While I did my test on " *", I failed with " ^". I kept thinking that even if " ^" existed in the string it would still be okay because " ^" will be converted to " ^^". So, " ^" is still there to be searched. I focused on SUBSTITUTE which could be specified with instance_num and for a minute I forgot that SEARCH was a different story. It will grab the *first* instance. I used good test data today and " ^" failed miserably. On the bright side, this helps me realize that Luc's formula is more beautiful than I first thought - case in point TRIM and double spaces; TRIM will remove extra spaces even *between* words and not just the leading and trailing spaces of the string.

Thank you for an interesting question and a brilliant answer.

Epinn

Using " ^" is not guaranteed to work as expected.

Since we don't know which characters could be part of the string, we need to
use a combination that will not be present in it.

With my first formula, only the double spaces is guaranteed to work.
With my second one (which is more robust), we have a few more choices;
CHAR(160), double spaces, one of the characters eliminated by CLEAN(), or a
combination of them...
 
M

Max

Dinesh,
I'm not sure to whom your "diplomatic" feedback is addressed <g>.
Let me know if I'm one of the intended recipients. Thanks.
 
Top