Split and Count text in a field

S

Simon

I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go about
it.

Thanks for any help,
Simon
 
J

John Nurick

Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.
 
S

Simon

This is great.
Now all the values are listed in the query, how am I able to count every
occurrence within the field? Are they not still complete strings?

John Nurick said:
Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.





I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go about
it.

Thanks for any help,
Simon
 
S

Simon

This is great.
Now all the values are listed in the query, how am I able to count every
occurrence within the field? Are they not still complete strings?

Simon

John Nurick said:
Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.





I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go about
it.

Thanks for any help,
Simon
 
J

John Nurick

Did you miss the sentence at the bottom of my previous message about using a
totals query to count them?


Simon said:
This is great.
Now all the values are listed in the query, how am I able to count every
occurrence within the field? Are they not still complete strings?

Simon

John Nurick said:
Hi Simon,

You can use this function

Public Function SafeSplit(V As Variant, _
Delim As String, Item As Long) As Variant

On Error Resume Next 'to return Null if Item is out of range
SafeSplit = Split(V, Delim)(Item)
End Function

First create a Union query that returns all the values in one column,
something like this, where XXX and YYY are field and table respectively,
and N is the largest number of values in any one field:

SELECT SafeSplit([XXX], 0) AS TheValue FROM YYY
UNION
SELECT SafeSplit([XXX], 1) AS TheValue FROM YYY
UNION
...
UNION
SELECT SafeSplit([XXX], N-1) AS TheValue FROM YYY
;

Then use this union query as the basis of a totals query that groups and
counts the values.





I have a field as follows:

aa_comp
03, 04, 05
03, 07, 05, 20
03, 06, 07

It is imported from xml (along with other fields), and I am aware of
multiple values in a field is bad normalisation.

I need to count of each text occurrence within the whole field column.
The results would look like:

03: 3
04: 1
05: 2
06: 1
07: 2
20: 1

The possible values range from 01 to 99.
The field could contain 0 values or anything up to 99 values
The delimiter is with a comma.

Please can someone offer some help?
I assume I would use split() and count(), but I am not sure how to go
about
it.

Thanks for any help,
Simon
 
K

Ken Higgins

Simon -

Try something like this. I did and it worked on your sample
You can use numbers strings etc but if you use strings
then you have to "normalize" them by removing leading and
trailing blanks which can cause errors.

this example uses 3 functions:

CountValues - which retuns a variant
which contains an Nx2 array with the
resulting answer

UniqueElementArray - which returns an array
that contains only 1 instance of any value
in the input - in other words the duplicated
values are eliminated.

IndexArray - which returns the position of any
instance in an array of values. or -1 if not
contained in the target array

You can write your own dumparray sub for enumerating the
resultant of CountValues which will either be NULL if an error
or the Nx2 array where n is the number of unique instances
and the columns are:

Column(0) = the unique instance
Column(1) = the count of the instance in the input array

Example:
CountValues("03, 04, 05, 03, 07, 05, 20 ,03, 06, 07")

Produces:
Row Column (0) Column (1)
---------- ---------
0) 03 3
1) 04 1
2) 05 2
3) 07 2
4) 20 1
5) 06 1

------------------------
Public Function CountValues(ByVal pstrInput) As Variant
'Counts occurances of single values in a comma delimited string
'retuns an array as a variant

Dim strLastChar As String
Dim astrInstances() As String
Dim avUniqueInstances As Variant
Dim lngPosition As Long

Dim lngInstanceCounts() As Long
Dim vElmt As Variant
Dim I As Long
Dim lngMaxInstances As Long

Dim avOutput As Variant

'First be sure that your input length is > 0
If Len(pstrInput) = 0 Then
CountValues = Null
GoTo CommonExit
Else
strLastChar = mID(pstrInput, Len(pstrInput), 1)
If strLastChar = "," Then
pstrInput = mID(pstrInput, 1, Len(pstrInput) - 1)
End If
End If

'Then create an array using the SPLIT function
astrInstances = Split(pstrInput, ",")
lngMaxInstances = UBound(astrInstances)

'eliminate leading and trailing blanks
For I = 0 To lngMaxInstances
astrInstances(I) = Trim(astrInstances(I))
Next I

'Create an Array of unique instances (See UniqueElementArray - Below)
avUniqueInstances = UniqueElementArray(astrInstances)

'now count the number of Each Element
ReDim lngInstanceCounts(UBound(avUniqueInstances))

For I = 0 To UBound(astrInstances)
lngPosition = IndexArray(avUniqueInstances, astrInstances(I))
If lngPosition > -1 Then
lngInstanceCounts(lngPosition) = lngInstanceCounts(lngPosition) + 1
End If
Next I

'load the unique instance and the counts into a nx2 array
ReDim avOutput(UBound(avUniqueInstances), 1)
For I = 0 To UBound(avUniqueInstances)
avOutput(I, 0) = avUniqueInstances(I)
avOutput(I, 1) = lngInstanceCounts(I)
Next I

'return the result as an nx2 array
CountValues = avOutput

CommonExit:
Exit Function

Errorhandler:
CountValues = Null
Resume CommonExit

End Function

---

You'll need these 2 functions

Public Function UniqueElementArray(aIN As Variant) As Variant
'creates a Unique Element array
'Array must be a one dimensioned array

Dim objDictionary As Dictionary
Dim vElmt As Variant
Dim aVars As Variant
Dim aLngs As Long
Dim aInts As Integer
Dim aSngs As Single
Dim aDbls As Double
Dim aCurs As Currency
Dim aObjs As Object
Dim aStrs As String
Dim lngCount As Long
Dim aDts As Date
Dim aBytes As Byte
Dim I As Long
Dim lngDim As Long
Dim lngVarType As Long
Dim aOut As Variant

On Error GoTo Errorhandler

'use the Dictionary object to force uniqueness
Set objDictionary = New Dictionary
With objDictionary
For Each vElmt In aIN
.Add vElmt, vElmt
NextElement:
Next vElmt

'now output the array into "aout"
lngCount = .Count
lngDim = lngCount - 1
ReDim aOut(lngDim)
'vartype returns a value vbArray + Scalar Object type
lngVarType = varType(aOut) - vbArray

Select Case lngVarType
Case vbInteger
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aInts = .Items(I)
aOut(I) = aInts
Next I

Case vbVariant
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aVars = .Items(I)
aOut(I) = aVars
Next I

Case vbLong
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aLngs = .Items(I)
aOut(I) = aLngs
Next I

Case vbString
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aStrs = .Items(I)
aOut(I) = aStrs
Next I

Case vbDouble
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aDbls = .Items(I)
aOut(I) = aDbls
Next I

Case vbSingle
For I = 0 To lngDim
'Debug.Print I, .Items(I)
aSngs = .Items(I)
aOut(I) = aSngs
Next I
End Select
End With
UniqueElementArray = aOut

CommonExit:
Exit Function
Errorhandler:
Select Case Err.Number
Case 457 'Key already There
Err.Clear
Resume NextElement
Case Else
WTO Err.Number & " - " & Err.Description & vbCrLf &
"UniqueElementArray"
Resume CommonExit
End Select
End Function

----------

Public Function IndexArray(vArray As Variant, _
vSearch As Variant) As Long
Dim I As Long
For I = 0 To UBound(vArray, 1)
If vArray(I) = vSearch Then
IndexArray = I
GoTo CommonExit
End If
Next I
IndexArray = -1
CommonExit:

End Function

Happy Coding!
 
J

John Nurick

On Thu, 15 Jun 2006 21:49:02 -0700, Ken Higgins

[snip]
the Nx2 array where n is the number of unique instances
and the columns are:

Column(0) = the unique instance
Column(1) = the count of the instance in the input array

[lots of useful VBA code snipped]

Hi Ken,

I'm not sure this does what Simon needs, which AIUI is to count the
occurrences of each value not just in one list but in a whole column.

Setting that aside: I looked at your 150+ lines of code and then thought
how one might do the same job in Perl:

sub getCountsOfItemsInList($) {
my %items;
$items{$_}++ foreach split /,\s*/ , $_[0];
return %items;
}

which translates into VBA as

Public Function GetCountsOfItemsInList(List As String) As Variant
Dim AllItems As Variant
Dim Item As Variant
Dim Uniques As Object 'Scripting.Dictionary

Set Uniques = CreateObject("Scripting.Dictionary")

AllItems = Split(List, ",")
For Each Item In AllItems
Item = Trim(Item)
If Uniques.Exists(Item) Then 'increment count
Uniques(Item) = Uniques(Item) + 1
Else 'add new item
Uniques.Add Item, 1
End If
Next
Set GetCountsOfItemsInList = Uniques

End Function
 
K

Ken Higgins

John:

I tried out that VB module - what a nice clean algorithm! Hats off for
simplification
I dont know Pearl, but i do know that the Dictionary object in VB had its
genesis in
Pearl. Thanks for offering the simplification. Nice work.

Ken

John Nurick said:
On Thu, 15 Jun 2006 21:49:02 -0700, Ken Higgins

[snip]
the Nx2 array where n is the number of unique instances
and the columns are:

Column(0) = the unique instance
Column(1) = the count of the instance in the input array

[lots of useful VBA code snipped]

Hi Ken,

I'm not sure this does what Simon needs, which AIUI is to count the
occurrences of each value not just in one list but in a whole column.

Setting that aside: I looked at your 150+ lines of code and then thought
how one might do the same job in Perl:

sub getCountsOfItemsInList($) {
my %items;
$items{$_}++ foreach split /,\s*/ , $_[0];
return %items;
}

which translates into VBA as

Public Function GetCountsOfItemsInList(List As String) As Variant
Dim AllItems As Variant
Dim Item As Variant
Dim Uniques As Object 'Scripting.Dictionary

Set Uniques = CreateObject("Scripting.Dictionary")

AllItems = Split(List, ",")
For Each Item In AllItems
Item = Trim(Item)
If Uniques.Exists(Item) Then 'increment count
Uniques(Item) = Uniques(Item) + 1
Else 'add new item
Uniques.Add Item, 1
End If
Next
Set GetCountsOfItemsInList = Uniques

End Function
 
J

John Nurick

If you want an SQL solution, try something like this. Ironically, I'm
using VBA to create the tables and data required to demo my SQL
solution <g>. Rather than create sequence tables (i.e. tables of
incrementing integers) on the fly as I have, you could create a
permanent auxillary table in your database:

[Ingenious SQL snipped].

Interesting... I'll have to spend part of my weekend working out how it
works. Thank you.
 
Top