Extracting numbers from a single sell

G

Georgi

Hi, there!

The situation goes like that : I have a messy cell that contains both
adresses and a couple of phone numbers per client. The problem is that
i need only the phone numbers. Is there a way that i can exctract each
one of them in a different cell - i mean if this client has 3 numbers
specified in the messy cell, they should be distributed in 3 separated
cells. Also as you can guess the adresses includes numbers, that are
not needed, and basically this can be worked around ( i guess ) by
specifying that we need numbers with more that 5 digits.

P.S. I`m not very advanced users, so I`ll apreciate detailed ( as for
dummies ) explanation.

Thanks in Advance,

Georgi
 
P

Per Jessen

Hi

You can use "TextToColumns", if address and each phone number has a
seperator like a comma.

Select the cells, goto Data menu > TextToColumns, select Delemited > Next >
select your seperator and follow instructions.

Hopes it helps.

Regards,
Per
 
R

Ron Rosenfeld

Hi, there!

The situation goes like that : I have a messy cell that contains both
adresses and a couple of phone numbers per client. The problem is that
i need only the phone numbers. Is there a way that i can exctract each
one of them in a different cell - i mean if this client has 3 numbers
specified in the messy cell, they should be distributed in 3 separated
cells. Also as you can guess the adresses includes numbers, that are
not needed, and basically this can be worked around ( i guess ) by
specifying that we need numbers with more that 5 digits.

P.S. I`m not very advanced users, so I`ll apreciate detailed ( as for
dummies ) explanation.

Thanks in Advance,

Georgi

Yes, it can be done.

Please post some examples of the cell contents that contains the variations in
telephone number formats for your region.
--ron
 
G

Georgi

First of all I would like to say Merry Christmas to all of you that
are celebrating it and thanks for the input !

@ Per : the only separators are spaces so TextToColumns, will give me
10 cells with different information, so again I need a way to select
only the ones that I need ( the cells with phone number )

@ Gary, @ Ron : here is the content of a exemplary cell :

adr:Sofia, P.K. 1715 Druzhba district 4 flat 407 entrance 6 floor 6
apartment 38 9754543 0898892158 9754543 000 e_mail:[email protected]

The numbers that we need are 9754543 ; 0898892158 ; 9754543,
everything else is needless, i.e. the phone numbers contains between 5
and 10 ( maybe in some case more ) digits without spaces or dashes in
it. Also the number of phone numbers provided can vary, i.e. the cell
can contain 1,2,3,4,5 different numbers that we should extract.

I`m really confused and I`ll be really grateful if you can help me.
Thank you in advance !

Georgi
 
R

Ron Rosenfeld

First of all I would like to say Merry Christmas to all of you that
are celebrating it and thanks for the input !

@ Per : the only separators are spaces so TextToColumns, will give me
10 cells with different information, so again I need a way to select
only the ones that I need ( the cells with phone number )

@ Gary, @ Ron : here is the content of a exemplary cell :

adr:Sofia, P.K. 1715 Druzhba district 4 flat 407 entrance 6 floor 6
apartment 38 9754543 0898892158 9754543 000 e_mail:[email protected]

The numbers that we need are 9754543 ; 0898892158 ; 9754543,
everything else is needless, i.e. the phone numbers contains between 5
and 10 ( maybe in some case more ) digits without spaces or dashes in
it. Also the number of phone numbers provided can vary, i.e. the cell
can contain 1,2,3,4,5 different numbers that we should extract.

I`m really confused and I`ll be really grateful if you can help me.
Thank you in advance !

Georgi

Using a UDF (User Defined Function) making use of Regular Expressions (see
below for details):

With your data in, for example, A1, to extract all instances of 5 or more
consecutive digits:

B1: =RegexMid($A1,"\d{5,}",COLUMNS($A:A))

Then fill right far enough to be sure to get all of the returns

To be able to use this UDF:

<alt-F11> opens the VBEditor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens:

===========================================
Option Explicit

Function RegexMid(Str As String, Pattern As String, _
Optional Index As Variant = 1, _
Optional CaseSensitive As Boolean = True, _
Optional MultiLin As Boolean = False) _
As Variant 'Variant as value may be string or array

'Index -- negative values return groups counting from end of string

Dim objRegExp As Object
Dim objMatch As Object
Dim colMatches As Object

Dim i As Long 'counter
Dim T() As String 'container for array results

' Create a regular expression object.
Set objRegExp = CreateObject("vbscript.regexp")

'Set the pattern by using the Pattern property.
objRegExp.Pattern = Pattern

' Set Case Insensitivity.
objRegExp.IgnoreCase = Not CaseSensitive

'Set global applicability.
objRegExp.Global = True

'Set multiline
objRegExp.MultiLine = MultiLin

'Test whether the String can be compared.
If (objRegExp.Test(Str) = True) Then

'Get the matches.
Set colMatches = objRegExp.Execute(Str) ' Execute search.

On Error Resume Next 'return null string if a colmatch index is non-existent
If IsArray(Index) Then
ReDim T(1 To UBound(Index))
For i = 1 To UBound(Index)
T(i) = colMatches(IIf(Index(i) > 0, Index(i) - 1, Index(i) +
colMatches.Count))
Next i
RegexMid = T()
Else
RegexMid = CStr(colMatches(IIf(Index > 0, Index - 1, Index +
colMatches.Count)))
If IsEmpty(RegexMid) Then RegexMid = ""
End If
On Error GoTo 0 'reset error handler
Else
RegexMid = ""
End If
End Function
=====================================

Regular Expressions
http://www.regular-expressions.info/reference.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;818802&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
http://msdn2.microsoft.com/en-us/library/ms974619.aspx
http://www.regex-guru.info/

--ron
 
R

Ron Rosenfeld

Using a UDF (User Defined Function) making use of Regular Expressions (see
below for details):

With your data in, for example, A1, to extract all instances of 5 or more
consecutive digits:

B1: =RegexMid($A1,"\d{5,}",COLUMNS($A:A))

Then fill right far enough to be sure to get all of the returns

To be able to use this UDF:

<alt-F11> opens the VBEditor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens:

===========================================
Option Explicit

Function RegexMid(Str As String, Pattern As String, _
Optional Index As Variant = 1, _
Optional CaseSensitive As Boolean = True, _
Optional MultiLin As Boolean = False) _
As Variant 'Variant as value may be string or array

'Index -- negative values return groups counting from end of string

Dim objRegExp As Object
Dim objMatch As Object
Dim colMatches As Object

Dim i As Long 'counter
Dim T() As String 'container for array results

' Create a regular expression object.
Set objRegExp = CreateObject("vbscript.regexp")

'Set the pattern by using the Pattern property.
objRegExp.Pattern = Pattern

' Set Case Insensitivity.
objRegExp.IgnoreCase = Not CaseSensitive

'Set global applicability.
objRegExp.Global = True

'Set multiline
objRegExp.MultiLine = MultiLin

'Test whether the String can be compared.
If (objRegExp.Test(Str) = True) Then

'Get the matches.
Set colMatches = objRegExp.Execute(Str) ' Execute search.

On Error Resume Next 'return null string if a colmatch index is non-existent
If IsArray(Index) Then
ReDim T(1 To UBound(Index))
For i = 1 To UBound(Index)
T(i) = colMatches(IIf(Index(i) > 0, Index(i) - 1, Index(i) +
colMatches.Count))
Next i
RegexMid = T()
Else
RegexMid = CStr(colMatches(IIf(Index > 0, Index - 1, Index +
colMatches.Count)))
If IsEmpty(RegexMid) Then RegexMid = ""
End If
On Error GoTo 0 'reset error handler
Else
RegexMid = ""
End If
End Function
=====================================

Regular Expressions
http://www.regular-expressions.info/reference.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;818802&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
http://msdn2.microsoft.com/en-us/library/ms974619.aspx
http://www.regex-guru.info/

--ron


By the way, should you have, within your data, strings of digits that are
greater than 5, but are not phone numbers (e.g. ... 123456abc) where you do
not want to return the 123456, then we will need to modify the regular
expression pattern: e.g.

"\b\d{5,}\b"

So, the formula would be:

B1: =RegexMid($A1,"\b\d{5,}\b",COLUMNS($A:A))

Also, with the information you have supplied, there is no way to differentiate
a 5 digit telephone number from a 5 digit address.
--ron
 
R

Ron Rosenfeld

I noticed an inappropriate line wrap in the UDF. Below should correct it:

================================
Option Explicit

Function RegexMid(Str As String, Pattern As String, _
Optional Index As Variant = 1, _
Optional CaseSensitive As Boolean = True, _
Optional MultiLin As Boolean = False) _
As Variant 'Variant as value may be string or array

'Index -- negative values return groups counting from end of string

Dim objRegExp As Object
Dim objMatch As Object
Dim colMatches As Object

Dim i As Long 'counter
Dim T() As String 'container for array results

' Create a regular expression object.
Set objRegExp = CreateObject("vbscript.regexp")

'Set the pattern by using the Pattern property.
objRegExp.Pattern = Pattern

' Set Case Insensitivity.
objRegExp.IgnoreCase = Not CaseSensitive

'Set global applicability.
objRegExp.Global = True

'Set multiline
objRegExp.MultiLine = MultiLin

'Test whether the String can be compared.
If (objRegExp.Test(Str) = True) Then

'Get the matches.
Set colMatches = objRegExp.Execute(Str) ' Execute search.

On Error Resume Next 'return null string if a colmatch index is non-existent
If IsArray(Index) Then
ReDim T(1 To UBound(Index))
For i = 1 To UBound(Index)
T(i) = colMatches(IIf(Index(i) > 0, Index(i) - 1, Index(i) + _
colMatches.Count))
Next i
RegexMid = T()
Else
RegexMid = CStr(colMatches(IIf(Index > 0, Index - 1, Index + _
colMatches.Count)))
If IsEmpty(RegexMid) Then RegexMid = ""
End If
On Error GoTo 0 'reset error handler
Else
RegexMid = ""
End If
End Function
=====================================
--ron
 
G

Georgi

I noticed an inappropriate line wrap in the UDF.  Below should correct it:

================================
Option Explicit

Function RegexMid(Str As String, Pattern As String, _
    Optional Index As Variant = 1, _
    Optional CaseSensitive As Boolean = True, _
    Optional MultiLin As Boolean = False) _
    As Variant 'Variant as value may be string or array

'Index -- negative values return groups counting from end of string

Dim objRegExp As Object
Dim objMatch As Object
Dim colMatches   As Object

Dim i As Long 'counter
Dim T() As String 'container for array results

' Create a regular expression object.
   Set objRegExp = CreateObject("vbscript.regexp")

   'Set the pattern by using the Pattern property.
   objRegExp.Pattern = Pattern

   ' Set Case Insensitivity.
   objRegExp.IgnoreCase = Not CaseSensitive

   'Set global applicability.
   objRegExp.Global = True

   'Set multiline
   objRegExp.MultiLine = MultiLin

   'Test whether the String can be compared.
   If (objRegExp.Test(Str) = True) Then

   'Get the matches.
    Set colMatches = objRegExp.Execute(Str)   ' Execute search.

On Error Resume Next 'return null string if a colmatch index is non-existent
If IsArray(Index) Then
    ReDim T(1 To UBound(Index))
    For i = 1 To UBound(Index)
        T(i) = colMatches(IIf(Index(i) > 0, Index(i) - 1, Index(i) + _
                colMatches.Count))
    Next i
    RegexMid = T()
Else
    RegexMid = CStr(colMatches(IIf(Index > 0, Index - 1, Index + _
        colMatches.Count)))
    If IsEmpty(RegexMid) Then RegexMid = ""
End If
On Error GoTo 0     'reset error handler
 Else
        RegexMid = ""
    End If
End Function
=====================================
--ron

Hi Ron,

Thanks for your help, but I do have a problem with this.

First of all I forgot to tell you that I`m using Excel 2007.

Here is what I did :
I`ve opened Visual Basic ( by pressing Alt+F11 ), then I`ve copied the
lines you provided and saved the Module. Now I have Module1 in Modules
in the VBA project of the excel document that I wanna exctract phone
numbers from ( I do think that I did that part OK ).
After that I`ve copied the formula that you`ve provided in a cell V2
( the cell that I wanna exctract information from is U2 ), so cell V2
looks like that : " =RegexMid($U2,"\d{5,}",COLUMNS($U:U)) ". This
formula don`t works, so I`ve changed the commas with semicolons, and
V2 looks like that " =RegexMid($U2;"\d{5,}";COLUMNS($U;U)) ". The
problem is that now the formula works, but it returns only the first
phone number of the cell, but not the other ones.

I`m sorry for disturbing you again, but I`m trying to get around this,
but I cannot find what`s wrong (since I am a dummy as I said ).

P.S. I do have a exemplary worksheet with just 5 examples, but as far
as I know in my country there is no case that a adress contains number
with more than 4 digits. Even if we find some, it`s not going to be
that crucial since it`s going to be one in a thousand so we can
correct it manually.

Thank you very much, again,

Georgi
 
R

Ron Rosenfeld

Hi Ron,

Thanks for your help, but I do have a problem with this.

First of all I forgot to tell you that I`m using Excel 2007.

Here is what I did :
I`ve opened Visual Basic ( by pressing Alt+F11 ), then I`ve copied the
lines you provided and saved the Module. Now I have Module1 in Modules
in the VBA project of the excel document that I wanna exctract phone
numbers from ( I do think that I did that part OK ).
After that I`ve copied the formula that you`ve provided in a cell V2
( the cell that I wanna exctract information from is U2 ), so cell V2
looks like that : " =RegexMid($U2,"\d{5,}",COLUMNS($U:U)) ". This
formula don`t works, so I`ve changed the commas with semicolons, and
V2 looks like that " =RegexMid($U2;"\d{5,}";COLUMNS($U;U)) ". The
problem is that now the formula works, but it returns only the first
phone number of the cell, but not the other ones.

I`m sorry for disturbing you again, but I`m trying to get around this,
but I cannot find what`s wrong (since I am a dummy as I said ).

P.S. I do have a exemplary worksheet with just 5 examples, but as far
as I know in my country there is no case that a adress contains number
with more than 4 digits. Even if we find some, it`s not going to be
that crucial since it`s going to be one in a thousand so we can
correct it manually.

Thank you very much, again,

Georgi


Georgi,

I have Excel 2007 also. But this routine should work on most versions.

You are correct in that the commas should be replaced by whatever your country
separators are -- semicolon is common.

You initially wrote that you wanted the numbers "distributed in 3 separated
cells."

So each cell with the formula will only return one number.

In my instructions, I had written to "Then fill right far enough to be sure to
get all of the returns". Did you do that? What does the formula look like
when you filled it into W2, X2, Y2 etc?

--ron
 
G

Georgi

Georgi,

I have Excel 2007 also. šBut this routine should work on most versions.

You are correct in that the commas should be replaced by whatever your country
separators are -- semicolon is common.

You initially wrote that you wanted thenumbers"distributed in 3 separated
cells."

So each cell with the formula will only return one number.

In my instructions, I had written to "Then fill right far enough to be sure to
get all of the returns". šDid you do that? šWhat does the formula look like
when you filled it into W2, X2, Y2 etc?

--ron

Hi Ron,

I didn`t saw this part of your post. I did it and it is working
briliantly! Sorry for the waste of your time !

P.S. Now after I`ve asked they`ve sent me a real table with around
1000 cells, so I can see if there is any variantions in the mobile
numbers.... and I found that the personal that entered the phone
numbers, used all kind of ways to enter them. In some cases they used
0877/55/44/33 ; 0877-55-44-33 - I`ve used Find&Replace on them and
after that your formula worked just perfect. But there is another case
- 0877 554433 or 0877 55 44 33 - in this case I don`t think that there
is something that we can do, despite some handwork :). Now I`m trying
to find a way to just find the cells that contain numbers with 3 to 6
digits, so I can manualy check what is the problem with them, but I
cannot find the right way to do it - I`ve tried some strings in
Find&Replace but they don`t seem to work.

Thanks for your help, I really appreciate that !

Georgi
 
R

Ron Rosenfeld

Hi Ron,

I didn`t saw this part of your post. I did it and it is working
briliantly! Sorry for the waste of your time !

P.S. Now after I`ve asked they`ve sent me a real table with around
1000 cells, so I can see if there is any variantions in the mobile
numbers.... and I found that the personal that entered the phone
numbers, used all kind of ways to enter them. In some cases they used
0877/55/44/33 ; 0877-55-44-33 - I`ve used Find&Replace on them and
after that your formula worked just perfect. But there is another case
- 0877 554433 or 0877 55 44 33 - in this case I don`t think that there
is something that we can do, despite some handwork :). Now I`m trying
to find a way to just find the cells that contain numbers with 3 to 6
digits, so I can manualy check what is the problem with them, but I
cannot find the right way to do it - I`ve tried some strings in
Find&Replace but they don`t seem to work.

Instead of using the Find/Replace, try using the following Regex instead:

"\b\d[-\d\s\/]{3,}\d\b"

So your formula might be:

=RegexMid($U2,"\b\d[-\d\s\/]{3,}\d\b",COLUMNS($A:A))

It may be that this may return data that is not valid, but that may be easier
to deal with. In particular, this regex is looking for a string that
1. Start with a digit
2. The next three or more characters can be a hyphen, digit, space or
backslash.
3. End with a digit.

So it should pick up all five digit string, even if they include the extra
characters. But it could also pick up something like 1///2. That's probably
not a problem for your database. (If it is, we could deal with it by making
the regex more complex, but I'm lazy :))

--ron
 
G

Georgi

I didn`t saw this part of your post. I did it and it is working
briliantly! Sorry for the waste of your time !
P.S. Now after I`ve asked they`ve sent me a real table with around
1000 cells, so I can see if there is any variantions in the mobile
numbers.... and I found that the personal that entered the phone
numbers, used all kind of ways to enter them. In some cases they used
0877/55/44/33 ; 0877-55-44-33 - I`ve used Find&Replace on them and
after that your formula worked just perfect. But there is another case
- 0877 554433 or 0877 55 44 33 - in this case I don`t think that there
is something that we can do, despite some handwork :). Now I`m trying
to find a way to just find the cells that containnumberswith 3 to 6
digits, so I can manualy check what is the problem with them, but I
cannot find the right way to do it - I`ve tried some strings in
Find&Replace but they don`t seem to work.

Instead of using the Find/Replace, try using the following Regex instead:

"\b\d[-\d\s\/]{3,}\d\b"

So your formula might be:

=RegexMid($U2,"\b\d[-\d\s\/]{3,}\d\b",COLUMNS($A:A))

It may be that this may return data that is not valid, but that may be easier
to deal with.  In particular, this regex is looking for a string that
1.  Start with a digit
2.  The next three or more characters can be a hyphen, digit, space or
backslash.
3.  End with a digit.

So it should pick up all five digit string, even if they include the extra
characters.  But it could also pick up something like  1///2.  That's probably
not a problem for your database.  (If it is, we could deal with it by making
the regex more complex, but I'm lazy :))

--ron

Hi there Ron,

The formula you provided is not working very well. First of all when I
try to enter it, Excel tells me that it`s wrong, and suggest to change
the formula. The funny thing is that as far as I can see the new
formula is exactly the same as the one you provided by you and this
must be some kind of bug :). After that this formula gets all of the
phone numbers from the big cell into the first destination cell + a
couple of other numbers :).

What I saw now is that the big cell contains two spaces ( the only two
consecutive space between something in the cell ) between the address
part and the numbers part, so it looks like that :
adr:Sofia, P.K. 1715 Druzhba district 4 flat 407 entrance 6 floor 6
apartment 38 (two spaces here ) 9754543 0898892158 9754543 000
e_mail:[email protected]
So maybe I can get rid of the adress part using this....
Also if we want to select the number that consist 2 or 3 parts, there
should be something like "If there is a number with less that 6
number, next to another number with less than 6 numbers and possibly
next to another number with less than 6 symbols, then consolidate
them". This would work in most of the cases, except when there is two
or three different phone numbers entered with spaces ( most of them
are entered in different branches in different parts of the country,
so if somebody is using spaces for delimation for one of the number,
he`ll probably use this pattern again with the next number in the cell
- it`s part of his style :-D ) in this case we cannot understand where
one number starts and the other ends.
If I have to be honest I cannot see how it can be done and I`m
starting to think that it`s impossible - they`re too many variantions
and too many possibilities that can occur. I`ll try to find some way
around by myself, since I`ve already taken enough advantage of you
kindness, but in the end I do think that some of the messes should be
untwisted manually.

Thank you very much for your exceptional help, and I`m wishing you to
get all the best from the upcoming 2009 year :)
 
E

eliano

Instead of using the Find/Replace, try using the following Regex instead:
"\b\d[-\d\s\/]{3,}\d\b"

So your formula might be:
=RegexMid($U2,"\b\d[-\d\s\/]{3,}\d\b",COLUMNS($A:A))

It may be that this may return data that is not valid, but that may be easier
to deal with.  In particular, this regex is looking for a string that
1.  Start with a digit
2.  The next three or more characters can be a hyphen, digit, space or
backslash.
3.  End with a digit.
So it should pick up all five digit string, even if they include the extra
characters.  But it could also pick up something like  1///2.  That's probably
not a problem for your database.  (If it is, we could deal with it bymaking
the regex more complex, but I'm lazy :))

Hi there Ron,

The formula you provided is not working very well. First of all when I
try to enter it, Excel tells me that it`s wrong, and suggest to change
the formula. The funny thing is that as far as I can see the new
formula is exactly the same as the one you provided by you and this
must be some kind of bug :). After that this formula gets all of the
phone numbers from the big cell into the first destination cell + a
couple of other numbers :).

What I saw now is that the big cell contains two spaces ( the only two
consecutive space between something in the cell ) between the address
part and the numbers part, so it looks like that :
adr:Sofia, P.K. 1715 Druzhba district 4 flat 407 entrance 6 floor 6
apartment 38 (two spaces here ) 9754543 0898892158 9754543 000
e_mail:[email protected]
So maybe I can get rid of the adress part using this....
Also if we want to select the number that consist 2 or 3 parts, there
should be something like "If there is a number with less that 6
number, next to another number with less than 6 numbers and possibly
next to another number with less than 6 symbols, then consolidate
them". This would work in most of the cases, except when there is two
or three different phone numbers entered with spaces ( most of them
are entered in different branches in different parts of the country,
so if somebody is using spaces for delimation for one of the number,
he`ll probably use this pattern again with the next number in the cell
- it`s part of his style :-D ) in this case we cannot understand where
one number starts and the other ends.
If I have to be honest I cannot see how it can be done and I`m
starting to think that it`s impossible - they`re too many variantions
and too many possibilities that can occur. I`ll try to find some way
around by myself, since I`ve already taken enough advantage of you
kindness, but in the end I do think that some of the messes should be
untwisted manually.

Thank you very much for your exceptional help, and I`m wishing you to
get all the best from the upcoming 2009 year :)- Nascondi testo citato

- Mostra testo citato -

Hi Georgi.
Try.

Public Sub trovapho() 'Trova il telefono (numerico >5 pos)
Dim CL As Range
Dim RNG As Range
Dim B() As String
Dim I As Long
Dim Pho As Long
Dim P As Long
Dim test As String
Dim X As Long
Sheets("Foglio2").Select
Range("B1:Z100") = "" ' <----- da variare
Pho = 5
Set RNG = Range("a1:a100") ' <----- da variare
For Each CL In RNG
X = 0
B = Split(CL, " ")
For I = LBound(B) To UBound(B)
If B(I) = "" Then Exit Sub
test = B(I)
If I <> 1 Then
If IsNumeric(test) Then
If Len(test) > Pho Then
X = X + 1
Cells(CL.Row, CL.Column + X) = B(I)
End If
End If
End If
Next
Next
End Sub

Have an Happy New Year.
Eliano
 
R

Ron Rosenfeld

Hi there Ron,

The formula you provided is not working very well. First of all when I
try to enter it, Excel tells me that it`s wrong, and suggest to change
the formula. The funny thing is that as far as I can see the new
formula is exactly the same as the one you provided by you and this
must be some kind of bug :). After that this formula gets all of the
phone numbers from the big cell into the first destination cell + a
couple of other numbers :).


What I saw now is that the big cell contains two spaces ( the only two
consecutive space between something in the cell ) between the address
part and the numbers part, so it looks like that :
adr:Sofia, P.K. 1715 Druzhba district 4 flat 407 entrance 6 floor 6
apartment 38 (two spaces here ) 9754543 0898892158 9754543 000
e_mail:[email protected]
So maybe I can get rid of the adress part using this....
Also if we want to select the number that consist 2 or 3 parts, there
should be something like "If there is a number with less that 6
number, next to another number with less than 6 numbers and possibly
next to another number with less than 6 symbols, then consolidate
them". This would work in most of the cases, except when there is two
or three different phone numbers entered with spaces ( most of them
are entered in different branches in different parts of the country,
so if somebody is using spaces for delimation for one of the number,
he`ll probably use this pattern again with the next number in the cell
- it`s part of his style :-D ) in this case we cannot understand where
one number starts and the other ends.
If I have to be honest I cannot see how it can be done and I`m
starting to think that it`s impossible - they`re too many variantions
and too many possibilities that can occur. I`ll try to find some way
around by myself, since I`ve already taken enough advantage of you
kindness, but in the end I do think that some of the messes should be
untwisted manually.

Thank you very much for your exceptional help, and I`m wishing you to
get all the best from the upcoming 2009 year :)


To pick out everything except that which uses <space> for the separator, you
could use the regex (this is what you substitute for the pattern in the
function):

"\b\d[-\d\/]{3,}\d\b"

You will note this differs by not having a "\s" which is the symbol for a
<space>

More information will be needed to determine how to deal with a space
separation.

If someone is entering phone numbers, using space as a separator, how does he
separate different phone numbers? 2 spaces?
--ron
 
R

Ron Rosenfeld

I didn`t saw this part of your post. I did it and it is working
briliantly! Sorry for the waste of your time !
P.S. Now after I`ve asked they`ve sent me a real table with around
1000 cells, so I can see if there is any variantions in the mobile
numbers.... and I found that the personal that entered the phone
numbers, used all kind of ways to enter them. In some cases they used
0877/55/44/33 ; 0877-55-44-33 - I`ve used Find&Replace on them and
after that your formula worked just perfect. But there is another case
- 0877 554433 or 0877 55 44 33 - in this case I don`t think that there
is something that we can do, despite some handwork :). Now I`m trying
to find a way to just find the cells that containnumberswith 3 to 6
digits, so I can manualy check what is the problem with them, but I
cannot find the right way to do it - I`ve tried some strings in
Find&Replace but they don`t seem to work.

Instead of using the Find/Replace, try using the following Regex instead:

"\b\d[-\d\s\/]{3,}\d\b"

So your formula might be:

=RegexMid($U2,"\b\d[-\d\s\/]{3,}\d\b",COLUMNS($A:A))

It may be that this may return data that is not valid, but that may be easier
to deal with.  In particular, this regex is looking for a string that
1.  Start with a digit
2.  The next three or more characters can be a hyphen, digit, space or
backslash.
3.  End with a digit.

So it should pick up all five digit string, even if they include the extra
characters.  But it could also pick up something like  1///2.  That's probably
not a problem for your database.  (If it is, we could deal with it by making
the regex more complex, but I'm lazy :))

--ron

Hi there Ron,

The formula you provided is not working very well. First of all when I
try to enter it, Excel tells me that it`s wrong, and suggest to change
the formula. The funny thing is that as far as I can see the new
formula is exactly the same as the one you provided by you and this
must be some kind of bug :). After that this formula gets all of the
phone numbers from the big cell into the first destination cell + a
couple of other numbers :).

What I saw now is that the big cell contains two spaces ( the only two
consecutive space between something in the cell ) between the address
part and the numbers part, so it looks like that :
adr:Sofia, P.K. 1715 Druzhba district 4 flat 407 entrance 6 floor 6
apartment 38 (two spaces here ) 9754543 0898892158 9754543 000
e_mail:[email protected]
So maybe I can get rid of the adress part using this....
Also if we want to select the number that consist 2 or 3 parts, there
should be something like "If there is a number with less that 6
number, next to another number with less than 6 numbers and possibly
next to another number with less than 6 symbols, then consolidate
them". This would work in most of the cases, except when there is two
or three different phone numbers entered with spaces ( most of them
are entered in different branches in different parts of the country,
so if somebody is using spaces for delimation for one of the number,
he`ll probably use this pattern again with the next number in the cell
- it`s part of his style :-D ) in this case we cannot understand where
one number starts and the other ends.
If I have to be honest I cannot see how it can be done and I`m
starting to think that it`s impossible - they`re too many variantions
and too many possibilities that can occur. I`ll try to find some way
around by myself, since I`ve already taken enough advantage of you
kindness, but in the end I do think that some of the messes should be
untwisted manually.

Thank you very much for your exceptional help, and I`m wishing you to
get all the best from the upcoming 2009 year :)


Here is one that seems to work with all of the examples you have provided,
including the space separated numbers:

=RegexMid($U2,"\b((\d[-\d\/]{4,}\d)|((\d{2,5}\s?){3,4}))\b",COLUMNS($A:A))

Changed to semicolons:

=RegexMid($U2;"\b((\d[-\d\/]{4,}\d)|((\d{2,5}\s?){3,4}))\b";COLUMNS($A:A))

There will be an issue, however, if you have a space delimiter and one of the
included digit strings is six digits (or more). Since you have stated that you
could have a phone number consisting of a minimum of six digits, there would be
no way of telling for example the 123456 123456 is two 6 digit numbers or a
single 12 digit number. This also requires that space delimited entries have
at least two digits in each segment. (Otherwise non phone numbers would be
detected)
--ron
 
G

Georgi

Hi Ron,
I didn`t saw this part of your post. I did it and it is working
briliantly! Sorry for the waste of your time !
P.S. Now after I`ve asked they`ve sent me a real table with around
1000 cells, so I can see if there is any variantions in the mobile
numbers.... and I found that the personal that entered the phone
numbers, used all kind of ways to enter them. In some cases they used
0877/55/44/33 ; 0877-55-44-33 - I`ve used Find&Replace on them and
after that your formula worked just perfect. But there is another case
- 0877 554433 or 0877 55 44 33 - in this case I don`t think that there
is something that we can do, despite some handwork :). Now I`m trying
to find a way to just find the cells that containnumberswith 3 to 6
digits, so I can manualy check what is the problem with them, but I
cannot find the right way to do it - I`ve tried some strings in
Find&Replace but they don`t seem to work.
Instead of using the Find/Replace, try using the following Regex instead:
"\b\d[-\d\s\/]{3,}\d\b"
So your formula might be:
=RegexMid($U2,"\b\d[-\d\s\/]{3,}\d\b",COLUMNS($A:A))
It may be that this may return data that is not valid, but that may beeasier
to deal with.  In particular, this regex is looking for a string that
1.  Start with a digit
2.  The next three or more characters can be a hyphen, digit, space or
backslash.
3.  End with a digit.
So it should pick up all five digit string, even if they include the extra
characters.  But it could also pick up something like  1///2.  That's probably
not a problem for your database.  (If it is, we could deal with it by making
the regex more complex, but I'm lazy :))
--ron
Hi there Ron,
The formula you provided is not working very well. First of all when I
try to enter it, Excel tells me that it`s wrong, and suggest to change
the formula. The funny thing is that as far as I can see the new
formula is exactly the same as the one you provided by you and this
must be some kind of bug :). After that this formula gets all of the
phonenumbersfrom the big cell into the first destination cell + a
couple of othernumbers:).
What I saw now is that the big cell contains two spaces ( the only two
consecutive space between something in the cell ) between the address
part and thenumberspart, so it looks like that :
adr:Sofia, P.K. 1715 Druzhba district 4 flat 407 entrance 6 floor 6
apartment 38 (two spaces here ) 9754543 0898892158 9754543 000
e_mail:[email protected]
So maybe I can get rid of the adress part using this....
Also if we want to select the number that consist 2 or 3 parts, there
should be something like "If there is a number with less that 6
number, next to another number with less than 6numbersand possibly
next to another number with less than 6 symbols, then consolidate
them". This would work in most of the cases, except when there is two
or three different phonenumbersentered with spaces ( most of them
are entered in different branches in different parts of the country,
so if somebody is using spaces for delimation for one of the number,
he`ll probably use this pattern again with the next number in the cell
- it`s part of his style :-D ) in this case we cannot understand where
one number starts and the other ends.
If I have to be honest I cannot see how it can be done and I`m
starting to think that it`s impossible - they`re too many variantions
and too many possibilities that can occur. I`ll try to find some way
around by myself, since I`ve already taken enough advantage of you
kindness, but in the end I do think that some of the messes should be
untwisted manually.
Thank you very much for your exceptional help, and I`m wishing you to
get all the best from the upcoming 2009 year :)

Here is one that seems to work with all of the examples you have provided,
including the space separatednumbers:

=RegexMid($U2,"\b((\d[-\d\/]{4,}\d)|((\d{2,5}\s?){3,4}))\b",COLUMNS($A:A))

Changed to semicolons:

=RegexMid($U2;"\b((\d[-\d\/]{4,}\d)|((\d{2,5}\s?){3,4}))\b";COLUMNS($A:A))

There will be an issue, however, if you have a space delimiter and one ofthe
included digit strings is six digits (or more).  Since you have stated that you
could have a phone number consisting of a minimum of six digits, there would be
no way of telling for example the 123456 123456 is two 6 digitnumbersor asingle12 digit number.  This also requires that space delimited entries have
at least two digits in each segment.  (Otherwise non phonenumberswould be
detected)
--ron

Hi there Ron,

First of all I wanna all the best in the new year to you, and all of
the other helpers here.

After that I would like to thank you for your help until now - I think
that this formula is as good as we can go.

Despite that now I`ve got a official confirmation from the Telecoms in
our region that :

A mobile phone can contain 10 numbers ( or 13 if our national code is
included )

A home phone can contain 8 or 9 numbers, including the area code ( or
11 or 12 if our national code is also included ).

This means that the minimum lenght of a valid number is 8 digits, and
the maximum is 13.

Even then there is some cases that a number is with less than 8
numbers - in these cases the area code is not provided - so the number
is not OK for our use, but is as good as we can go. Whatever we do,
there still be a need for someone to take a look at the table, before
we can use it, and with your previous formula the time needed is as
small as we can go :).

Thank you again,

Georgi

P.S. I wrote a post, before new year, but it dissapeared somewhere :)
 
R

Ron Rosenfeld

Even then there is some cases that a number is with less than 8
numbers - in these cases the area code is not provided - so the number
is not OK for our use, but is as good as we can go. Whatever we do,
there still be a need for someone to take a look at the table, before
we can use it, and with your previous formula the time needed is as
small as we can go :).

Thank you again,

Georgi

Well, sometimes we cannot achieve "perfection". Glad to help where I could.
--ron
 
G

Georgi

Well, sometimes we cannot achieve "perfection".   Glad to help where I could.
--ron

Hi there Ron,

My troubles with this sheet continues. I don`t know if I need to
create a new tread or .....

After we`ve created the new cells, a prefix should be attached. I`ve
tried with Concatenate, but I`ll need something a little bit more
complex.

What I mean :

- when the phone number starts with zero, another zero should be
prefixed ( ex. 0885475810 , becomes 00885475810 )

- if the number starts with a digit other than zero, then two zeros
(00) should be prefixed ( ex 885475810 , becomes 00885475810 ; ex.
684115452 becomes 0068115454)

- if the number starts with 359 then this prefix (359) should be
substituted with (00) ( ex. 359885475810, becomes 00885475810 ;
3592458965 becomes 002458965)

- if the number starts with two zeros, no action should be taken
( this is a case that we can exclude because it`s too rare ).

I can hardly find a word to express my gratitude for what you`ve done
for me so far.

Regards,
 
R

Ron Rosenfeld

What I mean :

- when the phone number starts with zero, another zero should be
prefixed ( ex. 0885475810 , becomes 00885475810 )

- if the number starts with a digit other than zero, then two zeros
(00) should be prefixed ( ex 885475810 , becomes 00885475810 ; ex.
684115452 becomes 0068115454)

- if the number starts with 359 then this prefix (359) should be
substituted with (00) ( ex. 359885475810, becomes 00885475810 ;
3592458965 becomes 002458965)

- if the number starts with two zeros, no action should be taken
( this is a case that we can exclude because it`s too rare ).

This can be done with worksheet functions, but given the numbers of times
you've revised your specifications, and anticipating that you might be revising
again in the future, I offer this UDF:

==================
Function Prefix(Pnum)
If Left(Pnum, 3) = 359 Then
Prefix = Replace(Pnum, "359", "00", 1, 1)
Exit Function
End If

Select Case Left(Pnum, 1)
Case Is = 0
Prefix = "0" & Pnum
Case 1 To 9
Prefix = "00" & Pnum
End Select
End Function
=======================

And you would use it by embedding the RegexMid function within it; like:

=prefix(regexmid(...))
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top