Sub to extract uniques from 200k data in xl03

M

Max

Using xl 2003, subject to 65k row limit. I've got over 200k ids listed in
cols A to D from row1 down. Looking for a sub which can extract the unique
ids from amongst the 200k into a new sheet, listing these ids into as many
cols as may be required, ie down col A, then down col B, etc. Thanks for
insights.
 
Z

Zone

Max, this is surely a simple-minded approach to the problem, but it should
work. It will take a long time to run. I assumed that the data is in
Sheet1 with no column headings and that Sheet2 is an unused sheet. HTH,
James

Sub Uniques4Columns()
Dim FromRow As Long, FromCol As Integer
Dim ToRow As Long, ToCol As Integer
Dim This As Variant, c As Range
Worksheets(2).Activate
Cells.Clear
ToCol = 1: ToRow = 1
With Worksheets(1)
For FromCol = 1 To 4
For FromRow = 1 To .Cells(65536, FromCol).End(xlUp).Row
This = .Cells(FromRow, FromCol)
Set c = Cells.Find(This, LookIn:=xlValues, lookat:=xlWhole)
If c Is Nothing Then
Cells(ToRow, ToCol) = This
ToRow = ToRow + 1
If ToRow > 65535 Then
ToRow = 1
ToCol = ToCol + 1
End If
End If
Next FromRow
Next FromCol
End With
End Sub
 
A

Alan Beban

Max said:
Using xl 2003, subject to 65k row limit. I've got over 200k ids listed in
cols A to D from row1 down. Looking for a sub which can extract the unique
ids from amongst the 200k into a new sheet, listing these ids into as many
cols as may be required, ie down col A, then down col B, etc. Thanks for
insights.

I have a solution that takes 15 to 20 seconds to run. Are you interested?

Alan Beban
 
T

Tom Ogilvy

Just note that if the OP has data down to rows 65536 in columns 1 to 3 (as is
probable based on the description), you will miss examining 65535 names in
each of those columns.
 
Z

Zone

Tom, good point. I probably should have used some variation of rows.count.
How would you do this, otherwise using the same code? James
 
M

Max

James, thanks for your Sub Uniques4Columns().

I've noted the preceding discussions between you and Tom. I'll try running
your sub over lunchtime today.
 
M

Max

Instead of loading a listbox, write the unique items to another sheet.

I don't know enough vba to get this done, Tom. Could you assist?
 
A

Alan Beban

Max said:
Of course. I'm game.

Copy and paste the following three procedures into a general module in
your workbook; watch for wordwrap, though I think they're clean.

The first returns an array of the unique elements of the input array; by
default it is a case-sensitive, 1-based, vertical array.

The second transposes an array without some of the limitations of the
Excel TRANSPOSE function. It's necessary in this case because the first
function needs to convert the collection of unique elements, which
is a horizontal array in Excel, to a vertical array to match your data.
It's much more general than is required by your inquiry, but it's from
my library so that's what you get.

Then, in the VBEditor, select Tools|References and check Microsoft
Scripting Runtime; I believe this step is also necessary in John
Walkenbach's code cited by Tom Ogilvy, though neither John nor Tom
mentioned it.

Then, assuming that your data is on Sheets(1), and Sheets(2) is
available for the output, run the third SubProcedure, abtest1.

Post to let us know how it comes out.

Function ArrayUniques(InputArray, _
Optional MatchCase As Boolean = True, _
Optional Base_Orient As String = "1vert", _
Optional OmitBlanks As Boolean = True)
'THIS PROCEDURE REQUIRES A PROJECT REFERENCE
'TO "MICROSCOPIC SCRIPTING RUNTIME".
'The function returns an array of unique
'values from an array or range. By default
'it returns a 1-based vertical array; for
'other results enter "0horiz", "1horiz" or
'"0vert" as the third argument. By default,
'the function is case-sensitive; i.e., e.g.,
'"red" and "Red" are treated as two separate
'unique values; to avoid case-sensitivity,
'enter False as the second argument.

'Declare the variables
Dim arr, arr2
Dim i As Long, p As Object, q As String
Dim Elem, x As Dictionary
Dim CalledDirectFromWorksheet As Boolean

'For later use in selecting cells for worksheet output
CalledDirectFromWorksheet = False
If TypeOf Application.Caller Is Range Then
Set p = Application.Caller
q = p.Address
iRows = Range(q).Rows.Count
iCols = Range(q).Columns.Count
If InStr(1, p.FormulaArray, "ArrayUniques") = 2 _
Or InStr(1, p.FormulaArray, "arrayuniques") = 2 _
Or InStr(1, p.FormulaArray, "ARRAYUNIQUES") = 2 Then
CalledDirectFromWorksheet = True
End If
End If

'Convert an input range to a VBA array
arr = InputArray

'Load the unique elements into a Dictionary Object
Set x = New Dictionary
x.CompareMode = Abs(Not MatchCase) '<--Case-sensitivity
On Error Resume Next
For Each Elem In arr
x.Add Item:=Elem, key:=CStr(Elem)
Next
If OmitBlanks Then x.Remove ("")
On Error GoTo 0

'Load a 0-based horizontal array with the unique
'elements from the Dictionary Object
arr2 = x.Items

'This provides appropriate base and orientation
'of the output array
Select Case Base_Orient
Case "0horiz"
arr2 = arr2
Case "1horiz"
ReDim Preserve arr2(1 To UBound(arr2) + 1)
Case "0vert"
arr2 = ArrayTranspose(arr2)
Case "1vert"
ReDim Preserve arr2(1 To UBound(arr2) + 1)
arr2 = ArrayTranspose(arr2)
End Select

'Assure that enough cells are selected to accommodate output
If CalledDirectFromWorksheet Then
If Range(Application.Caller.Address).Count < x.Count Then
ArrayUniques = "Select a range of at least " & _
x.Count & " cells"
Exit Function
End If
End If

ArrayUniques = arr2
End Function

Function ArrayTranspose(InputArray)
'This function returns the transpose of
'the input array or range; it is designed
'to avoid the limitation on the number of
'array elements and type of array that the
'worksheet TRANSPOSE Function has.

'Declare the variables
Dim outputArrayTranspose As Variant, arr As Variant, p As Integer
Dim i As Long, j As Long

'Check to confirm that the input array
'is an array or multicell range
If IsArray(InputArray) Then

'If so, convert an input range to a
'true array
arr = InputArray

'Load the number of dimensions of
'the input array to a variable
On Error Resume Next

'Loop until an error occurs
i = 1
Do
z = UBound(arr, i)
i = i + 1
Loop While Err = 0

'Reset the error value for use with other procedures
Err = 0

'Return the number of dimensions
p = i - 2
End If

If Not IsArray(InputArray) Or p > 2 Then
Msg = "#ERROR! The function accepts only multi-cell ranges " & _
"and 1D or 2D arrays."
If TypeOf Application.Caller Is Range Then
ArrayTranspose = Msg
Else
MsgBox Msg, 16
End If
Exit Function
End If

'Load the output array from a one-
'dimensional input array
If p = 1 Then

Select Case TypeName(arr)
Case "Object()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Object
For i = LBound(outputArrayTranspose) To _
UBound(outputArrayTranspose)
Set outputArrayTranspose(i, _
LBound(outputArrayTranspose)) = _
arr(i)
Next
Case "Boolean()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Boolean
Case "Byte()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Byte
Case "Currency()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Currency
Case "Date()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Date
Case "Double()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Double
Case "Integer()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Integer
Case "Long()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Long
Case "Single()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Single
Case "String()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As String
Case "Variant()"
ReDim outputArrayTranspose(LBound(arr) To UBound(arr), _
LBound(arr) To LBound(arr)) As Variant
Case Else
Msg = "#ERROR! Only built-in types of arrays " & _
"are supported."
If TypeOf Application.Caller Is Range Then
ArrayTranspose = Msg
Else
MsgBox Msg, 16
End If
Exit Function
End Select
If TypeName(arr) <> "Object()" Then
For i = LBound(outputArrayTranspose) To _
UBound(outputArrayTranspose)
outputArrayTranspose(i, _
LBound(outputArrayTranspose)) = arr(i)
Next
End If

'Or load the output array from a two-
'dimensional input array or range
ElseIf p = 2 Then
Select Case TypeName(arr)
Case "Object()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Object
For i = LBound(outputArrayTranspose) To _
UBound(outputArrayTranspose)
For j = LBound(outputArrayTranspose, 2) To _
UBound(outputArrayTranspose, 2)
Set outputArrayTranspose(i, j) = arr(j, i)
Next
Next
Case "Boolean()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Boolean
Case "Byte()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Byte
Case "Currency()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Currency
Case "Date()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Date
Case "Double()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Double
Case "Integer()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Integer
Case "Long()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Long
Case "Single()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Single
Case "String()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As String
Case "Variant()"
ReDim outputArrayTranspose(LBound(arr, 2) To _
UBound(arr, 2), _
LBound(arr) To UBound(arr)) As Variant
Case Else
Msg = "#ERROR! Only built-in types of arrays " & _
"are supported."
If TypeOf Application.Caller Is Range Then
ArrayTranspose = Msg
Else
MsgBox Msg, 16
End If
Exit Function
End Select
If TypeName(arr) <> "Object()" Then
For i = LBound(outputArrayTranspose) To _
UBound(outputArrayTranspose)
For j = LBound(outputArrayTranspose, 2) To _
UBound(outputArrayTranspose, 2)
outputArrayTranspose(i, j) = arr(j, i)
Next
Next
End If
End If

'Return the transposed array
ArrayTranspose = outputArrayTranspose
End Function

Sub abtest1()
Dim arr1(), arr2(), arrA(), arrB(), arrC(), arrD()
Dim rng As Range
Dim rngA As Range, rngB As Range, rngC As Range, rngD As Range
Set rng = Sheets(1).Range("A:D")
Set rngA = Sheets(2).Range("A:A")
Set rngB = Sheets(2).Range("B:B")
Set rngC = Sheets(2).Range("C:C")
Set rngD = Sheets(2).Range("D:D")
arr1 = rng
arr2 = ArrayUniques(arr1)
x = ArrayCount(arr2)
z = 65536
y = x - (x \ z) * z
Select Case x \ z
Case 0
Sheets(2).Range("A1:A" & y).Value = arr2
Case 1
ReDim arrA(1 To 65536, 1 To 1)
ReDim arrB(1 To y, 1 To 1)
For i = 1 To z
arrA(i, 1) = arr2(i, 1)
Next
For i = 1 To y
arrB(i, 1) = arr2(i + z, 1)
Next
Sheets(2).Range("A1:A" & z).Value = arrA
Sheets(2).Range("B1:B" & y).Value = arrB
Case 2
ReDim arrA(1 To z, 1 To 1)
ReDim arrB(1 To z, 1 To 1)
ReDim arrC(1 To y, 1 To 1)
For i = 1 To z
arrA(i, 1) = arr2(i, 1)
arrB(i, 1) = arr2(i + z, 1)
Next
For i = 1 To y
arrC(i, 1) = arr2(i + 2 * z, 1)
Next
Sheets(2).Range("A1:A" & z).Value = arrA
Sheets(2).Range("B1:B" & z).Value = arrB
Sheets(2).Range("C1:C" & y).Value = arrC
Case 3
ReDim arrA(1 To z, 1 To 1)
ReDim arrB(1 To z, 1 To 1)
ReDim arrC(1 To z, 1 To 1)
ReDim arrD(1 To y, 1 To 1)
For i = 1 To z
arrA(i, 1) = arr2(i, 1)
arrB(i, 1) = arr2(i + z, 1)
arrC(i, 1) = arr2(i + 2 * z, 1)
Next
For i = 1 To y
arrD(i, 1) = arr2(i + 3 * z, 1)
Next
Sheets(2).Range("A1:A" & z).Value = arrA
Sheets(2).Range("B1:B" & z).Value = arrB
Sheets(2).Range("C1:C" & z).Value = arrC
Sheets(2).Range("A1:D" & y).Value = arrD
Case 4
Sheets(2).Range("A:D").Value = Sheets(1).Range("A:D").Value
End Select
End Sub

Alan Beban
 
M

Max

Alan,

I hit a "sub or function not defined" at this line:
x = ArrayCount(arr2)

when i ran Sub abtest1()

Pl advise.
 
A

Alan Beban

Max said:
Alan,

I hit a "sub or function not defined" at this line:
x = ArrayCount(arr2)

when i ran Sub abtest1()

Pl advise.
Drat! I didn't notice that the procedure was using another Function from
the Library; I'll post a fix.

Alan Beban
 
A

Alan Beban

Max said:
Alan,

I hit a "sub or function not defined" at this line:
x = ArrayCount(arr2)

when i ran Sub abtest1()

Pl advise.

Copy and paste the following into the same module as th other three.

Function ArrayCount(InputArray)
'This function counts NOT the number of
'non-blank values in the array, but the
'number of available slots for values,
'whether the slots contain anything or not.
'It's similar to the Count Property [e.g.,
'Range("a1:c3").Count]

Dim j As Long, k As Long

'Convert range to array
'InputArray = InputArray

If IsArray(InputArray) Then
If Not TypeOf InputArray Is Range Then

j = 1: k = 1

On Error Resume Next

Do
k = k * (UBound(InputArray, j) - _
LBound(InputArray, j) + 1)
j = j + 1
Loop While Err.Number = 0

ArrayCount = k
Else
If TypeOf Application.Caller Is Range Then
ArrayCount = "#ERROR! This function accepts only arrays."
Else
MsgBox "#ERROR! The ArrayCount function " & _
"accepts only arrays.", 16
End If
End If

Else
If TypeOf Application.Caller Is Range Then
ArrayCount = "#ERROR! This function accepts only arrays."
Else
MsgBox "#ERROR! The ArrayCount function " & _
"accepts only arrays.", 16
End If
End If

End Function

Sorry,
Alan Beban
 
M

Max

Alan, thanks.

Inserted that. Ran the sub,
it halted at the line: "End Select" within

"...
Case 4
Sheets(2).Range("A:D").Value = Sheets(1).Range("A:D").Value
End Select ... "
 
A

Alan Beban

Max said:
Re-ran the sub again. The run completes, but Sheet2 is blank (no results)

I reran it; it worked as it should. You mention Sheet2; is that in fact
the name of your Sheets(2)?

The test file I range it on had A:D on Sheets(1) filled like this

Set rng = Sheets(1).Range("A:D")
For i = 1 to 65536:For j = 1 to 4
rng(i,j).value = i * j
Next:Next

and arr2 had 158,378 elements that filled A1:C27306 on Sheets(2).

One thing to do is in the abtest1 SubProcedure, after the line

y = x - (x \ z) * z

insert Stop and when the code stops, mouse over x to see that it is
158738, and mouse over y to see that it is 27306.

Let me know.

Alan Beban
 
A

Alan Beban

My last email assumed your data was the same as my test file. In your
file, when you mouse over x it should be equal to the number of unique
elements in your data; and y should be the integral part of x/65536,
i.e., 0,1,2,3 or 4.

Alan Beban
 
M

Max

I reran it; it worked as it should ...

Alan, many thanks. I re-booted the PC,, and
re-ran it again, and this time round, it *worked*.

And yes, runtime took only around 15** sec, simply amazing !!
**for an estimated 200k worth of items
.. You mention Sheet2; is that in fact the name of your Sheets(2)?

Does Sheets(2) mean the 2nd tab from the left?
If so, yes, I have "Sheet2" -- tabname -- as the 2nd tab from the left

In VBE under Microsoft Excel Objects,
the tab appears as: Sheet2 (Sheet2)

What is the first instance of Sheet2 in the above ? I gather that the 2nd
instance -- within parens -- is the tabname, which can be changed by user.
How could we make use of the 1st instance which apparently cannot be changed
and is more robust? Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top