Delete matching cells

Ron Rosenfeld · Jan 16, 2012

SEE BELOW FOR OOPS.

Well, if you need column A to be numeric, then column B must be numeric also. If column B values are text, then you should use the Text format "@".

When I was testing, I had preformatted both columns as text, and had no problems.

Also, please note that I assumed you would have some label in Row 1. If there are no labels, then try this variation, which should work whether or not there is a label:

===============================
Option Explicit
Sub PruneColA()
'Requires setting reference (tools/references) to
' Microsoft Scripting Runtime

Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim dColA As Dictionary, dColB As Dictionary
Dim i As Long
Dim d As Variant

Set dColA = New Dictionary
Set dColB = New Dictionary
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
End With

vColB = rColB
vColA = rColA

For i = LBound(vColB, 1) To UBound(vColB, 1)
With dColB
If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i

For i = LBound(vColA, 1) To UBound(vColA, 1)
If Not dColB.Exists(Key:=vColA(i, 1)) Then
With dColA
If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1)
End With
End If
Next i

ReDim vColA(1 To dColA.Count, 1 To 1)
i = 0
For Each d In dColA
i = i + 1
vColA(i, 1) = dColA(d)
Next d

rColA.Offset.ClearContents
rColA.EntireColumn.NumberFormat = "@"
Set rColA = rColA.Resize(rowsize:=dColA.Count)

rColA = vColA
End Sub
=================================================

OOPS:

rColA.Offset.ClearContents

should read:

rColA.EntireColumn.ClearContents

GS · Jan 16, 2012

ET on my machine was 35 secs as per timing method used as shown. I
didn't think this task deserved the trouble to setup and use
cHiResTimer class.

Ron Rosenfeld · Jan 16, 2012

From the responses and their results, I think it'd be best to re-state
my OP:

I need a list of the values in Col A that are NOT found in Col B.

Just use the same routine, but instead of clearing Col A and then writing the results back to Col A, define rDest and write the results there:

=============================
Option Explicit
Sub SelectFromColA()
'Requires setting reference (tools/references) to
' Microsoft Scripting Runtime

Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim dColA As Dictionary, dColB As Dictionary
Dim i As Long
Dim d As Variant
Dim rDest As Range

Set dColA = New Dictionary
Set dColB = New Dictionary
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
Set rDest = .Cells(1, 5)
End With

vColB = rColB
vColA = rColA

For i = LBound(vColB, 1) To UBound(vColB, 1)
With dColB
If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i

For i = LBound(vColA, 1) To UBound(vColA, 1)
If Not dColB.Exists(Key:=vColA(i, 1)) Then
With dColA
If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1)
End With
End If
Next i

ReDim vColA(1 To dColA.Count, 1 To 1)
i = 0
For Each d In dColA
i = i + 1
vColA(i, 1) = dColA(d)
Next d

rDest.EntireColumn.ClearContents
rDest.EntireColumn.NumberFormat = "@"
Set rDest = rDest.Resize(rowsize:=dColA.Count)

rDest = vColA
End Sub
==========================

Ron Rosenfeld · Jan 16, 2012

I modified my sub to use your idea to use Dictionary, but NOT put colA
in a dictionary and it shaved 11 secs off the ET...

That should preserve the duplicates in col A also, to answer your previous question.

GS · Jan 16, 2012

Ron Rosenfeld wrote :

That should preserve the duplicates in col A also, to answer your previous
question.

Hhm.., that's quite true where non-matches occur. I suppose that might
be a better way to go when comparing 2 or more items. Won't help my
data logger file parser, though. It just loops 1 array, but it does
parse each element into a temp array for the test. It runs blazingly
fast on my machine (1.6Ghz Intel dual-core on a Dell Precision series
laptop w/2GB RAM).

Jim Cone · Jan 16, 2012

Garry,
More FWIW...
I ran Ron's code on my XP, i3 machine (xl2007) against 600,000 6 digit random numbers.
360,000 in col A and 240,000 in col B. It took about 7 1/2 seconds.
It returned ~87000 numbers not in col B.
'---
Jim Cone

Ron Rosenfeld · Jan 16, 2012

Next 'i

GS,
Technique question:
Why, on the "Next" line, do you comment out the counter variable that you are looping on?
I've not commented out, and have had the VBE help me out when I might be using nested loops.

-- Ron

GS · Jan 16, 2012

Ron Rosenfeld submitted this idea :

GS,
Technique question:
Why, on the "Next" line, do you comment out the counter variable that you
are looping on? I've not commented out, and have had the VBE help me out when
I might be using nested loops.

-- Ron

Ron,
Just something I picked up from the Classic VB crowd. It falls in the
same bucket as the dif using Mid() and Mid$(), and how VB handles this
at runtime. Sorry, but I can't give you technical details about these
without going back over a few years of posts. I include the comment for
notation purposes so I know which counter is repeating in nested or
long loops. Otherwise, I don't see any problem with leaving the
apostrophe out if desired. My choice to use it was formed a long time
ago because I didn't want to lose the notation. (You'll see other code
samples that use a similar technique for Select Case, If, While, and Do
constructs as well. I also do similar for end of procedures because
it's helpful when reading through modules in a text editor outside the
VBE.

HTH

GS · Jan 16, 2012

Jim Cone explained on 1/15/2012 :

Garry,
More FWIW...
I ran Ron's code on my XP, i3 machine (xl2007) against 600,000 6 digit random
numbers.
360,000 in col A and 240,000 in col B. It took about 7 1/2 seconds.
It returned ~87000 numbers not in col B.
'---
Jim Cone

Thanks, Jim. Can you run my final version on your sample data and
report back. I'd be curious to know the results. I'm running XP SP3 and
did the test in xl2007. Thanks in advance...

Jim Cone · Jan 16, 2012

Garry,
It may turn out to be one of those days, especially if the snow sticks.

I plugged in a new set of sample data into xl2010 this morning.
It appears that your code is returning mismatched items... items in col A that are not in Col B.
But it is not eliminating duplicates.
Column A has 360,000 random 6 digit numbers.
Column B has 240,000 random 6 digit numbers.

Ron's code returns 231,414 unique entries.
Your code returns 279,200 entries: 231,514 unique and 47,686 duplicates.
(i ran my own unique counter on your returned data)

It's too early in the day for me to try to figure out why. <g>
'---
Regards,
Jim Cone

wrote in message

Thanks, Jim. Can you run my final version on your sample data and report back. I'd be curious to
know the results. I'm running XP SP3 and did the test in xl2007. Thanks in advance...

Ron Rosenfeld · Jan 16, 2012

Garry,
It may turn out to be one of those days, especially if the snow sticks.

I plugged in a new set of sample data into xl2010 this morning.
It appears that your code is returning mismatched items... items in col A that are not in Col B.
But it is not eliminating duplicates.
Column A has 360,000 random 6 digit numbers.
Column B has 240,000 random 6 digit numbers.

Ron's code returns 231,414 unique entries.
Your code returns 279,200 entries: 231,514 unique and 47,686 duplicates.
(i ran my own unique counter on your returned data)

It's too early in the day for me to try to figure out why. <g>
'---
Regards,
Jim Cone

Jim,

I believe I mentioned that in response to Gary's posting a version where he did not use the dictionary for column A. He actually WAS looking for a way to PRESERVE the duplicates in Column A, and I opined that that particular version should do so.

The "why" is because by not using a dictionary to collect the non-matches for column A, the duplicates do not get filtered.

So, if preserving duplicate entries in Column A is a requirement, Gary's version will do so.

-- Ron

GS · Jan 16, 2012

Garry,
It may turn out to be one of those days, especially if the snow sticks.

I plugged in a new set of sample data into xl2010 this morning.
It appears that your code is returning mismatched items... items in col A
that are not in Col B.
But it is not eliminating duplicates.
Column A has 360,000 random 6 digit numbers.
Column B has 240,000 random 6 digit numbers.

Ron's code returns 231,414 unique entries.
Your code returns 279,200 entries: 231,514 unique and 47,686 duplicates.
(i ran my own unique counter on your returned data)

It's too early in the day for me to try to figure out why. <g>
'---
Regards,
Jim Cone

wrote in message

Jim,
Ron explains the what/why of my version of the task fairly well. What
I'm more interested in is how long it took on your machine to process
the same amount of data as when you ran Ron's version.

Jim Cone · Jan 16, 2012

Garry,
Ran your code and Ron's twice each.
Ron: 14.0 seconds
Garry: 8.2 seconds
Times were identical for tests on each.
Xl2010 on WindowsXP - 360,000 nums in col A, 240,000 nums in col B.

Changes from yesterday: xl2010 vs. xl2007 and more data overlap between columns.
I'm thinking that the xl2010 vba Rnd function may be different.
'---
Jim Cone
Portland, Oregon USA
http://blog.contextures.com/archives/2011/07/18/find-last-row-with-excel-vba/
(workbook with "universal" Last Row function code - free)

GS · Jan 16, 2012

Jim Cone wrote :

Garry,
Ran your code and Ron's twice each.
Ron: 14.0 seconds
Garry: 8.2 seconds
Times were identical for tests on each.
Xl2010 on WindowsXP - 360,000 nums in col A, 240,000 nums in col B.

Changes from yesterday: xl2010 vs. xl2007 and more data overlap between
columns.
I'm thinking that the xl2010 vba Rnd function may be different.
'--- Jim Cone
Portland, Oregon USA
http://blog.contextures.com/archives/2011/07/18/find-last-row-with-excel-vba/
(workbook with "universal" Last Row function code - free)

Thanks, Jim. I guess I was expecting a slower time as compared to Ron's
(approx RonsTime * 0.75), but I'm very happy that you report it was
better by almost half.

Obviously, the Scripting Dictionary is the better way to compare
columns of data. What I find interesting is how slow doing VB
comparison using arrays is. I've learned something valuable here..! My
thanks to you and Ron for your efforts...

Ron Rosenfeld · Jan 16, 2012

Jim,
Ron explains the what/why of my version of the task fairly well. What
I'm more interested in is how long it took on your machine to process
the same amount of data as when you ran Ron's version.

Garry,

I tried it on a dataset similar to Jim's. 360,000 entries in column A; 240,000 entries in Column B. They were 13 digit text strings with values from 1 to 10^6.

Removing duplicates by using two dictionaries: 28 seconds
Not removing duplicates using code similar to yours: 16.1 seconds

Ignore the timer stuff. It depends on a class installed in my personal .xlam add-in

=====================
Option Explicit
Sub PreserveDups()
Dim oTimer As RonsLibrary.CHiResTimer
Set oTimer = RonsLibrary.New_CHiResTimer
oTimer.StartTimer
'Requires setting reference (tools/references) to
' Microsoft Scripting Runtime

Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim vResults As Variant
Dim dColA As Dictionary, dColB As Dictionary
Dim i As Long
Dim lBlanks As Long
Dim d As Variant
Dim rDest As Range

Set dColA = New Dictionary
Set dColB = New Dictionary
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
Set rDest = .Cells(1, 7)
End With

vColB = rColB
vColA = rColA

For i = LBound(vColB, 1) To UBound(vColB, 1)
With dColB
If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i

For i = LBound(vColA, 1) To UBound(vColA, 1)
If dColB.Exists(Key:=vColA(i, 1)) Then
vColA(i, 1) = ""
lBlanks = lBlanks + 1
End If
Next i

ReDim vResults(1 To UBound(vColA) - lBlanks, 1 To 1)
i = 0
For Each d In vColA
If d <> "" Then
i = i + 1
vResults(i, 1) = d
End If
Next d

rDest.EntireColumn.ClearContents
rDest.EntireColumn.NumberFormat = "@"

Set rDest = rDest.Resize(rowsize:=UBound(vResults, 1))
rDest = vResults

oTimer.StopTimer
Debug.Print oTimer.Elapsed
End Sub
===========================

GS · Jan 16, 2012

Thanks, Ron. I really appreciate your efforts!

I was thinking to now create a function that returns a boolean on
success, and accepts "Optional AllowDupes As Boolean = True" as its arg
so the faster code runs unless the user wants to remove dupes in ColA.
Does that make sense?

GS · Jan 16, 2012

Something to play with...

Function StripDupes(Optional AllowDupes As Boolean = True) As Boolean
' Compares colA to colB and removes colA matches found in colB.
' Args In: AllowDupes: True by default
' Keeps duplicate values found in colA that are not found
in colB;
' If False, duplicate values in colA not found in colB are
removed.
'
' Returns: True if matches are found --AND-- no error occurs;
' False if matches are NOT found --OR-- error occurs.
'
' Sources: Ron Rosenfeld, Jim Cone, Garry Sansom

Dim i&, j&, lRows1&, lRows2&, lMatchesFound& 'as long
Dim vRngA, vRngB, vRngOut() 'as variant
Dim dRngB As New Dictionary

On Error GoTo ErrExit

lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2)

For j = LBound(vRngB) To UBound(vRngB)
With dRngB
If Not .Exists(Key:=vRngB(j, 1)) Then _
.Add Key:=vRngB(j, 1), Item:=vRngB(j, 1)
End With
Next 'j

If AllowDupes Then '//fastest
For i = LBound(vRngA) To UBound(vRngA)
If dRngB.Exists(Key:=vRngA(i, 1)) Then _
vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1
Next 'i

j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 0)
For i = LBound(vRngA) To UBound(vRngA)
If Not vRngA(i, 1) = "" Then _
vRngOut(j, 0) = vRngA(i, 1): j = j + 1
Next 'i

Else '//slowest
Dim dRngA As New Dictionary
For i = LBound(vRngA) To UBound(vRngA)
If Not dRngB.Exists(vRngA(i, 1)) Then
With dRngA
If Not .Exists(Key:=vRngA(i, 1)) Then _
.Add Key:=vRngA(i, 1), Item:=vRngA(i, 1)
End With 'dRngA
End If 'Not dRngB.Exists(vRngA(i, 1))
Next 'i

Dim v As Variant
j = 0: ReDim vRngOut(dRngA.Count, 0)
For Each v In dRngA
vRngOut(j, 0) = dRngA(v): j = j + 1
Next 'v
End If 'AllowDupes

Range("A1:A" & lRows1).ClearContents
Range("A1").Resize(UBound(vRngOut), 1) = vRngOut

ErrExit:
StripDupes = (Err = 0)
End Function 'StripDupes()

GS · Jan 16, 2012

We can shave off even more time if we eliminate the checks when adding
items to the dictionary because the dictionary won't allow dupes...

Function StripDupes(Optional AllowDupes As Boolean = True) As Boolean
' Compares colA to colB and removes colA matches found in colB.
' Args In: AllowDupes: True by default. Keeps duplicate values
' found in colA that are not found in colB. If False,
' duplicate values in colA not found in colB are removed.
'
' Returns: True if matches found and no error occurs;
' False if matches not found --OR-- error occurs.
'
' Sources: Ron Rosenfeld, Jim Cone, Garry Sansom

Dim i&, j&, lRows1&, lRows2&, lMatchesFound& 'as long
Dim vRngA, vRngB, vRngOut() 'as variant
Dim dRngB As New Dictionary

On Error GoTo ErrExit

lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2)

On Error Resume Next
For j = LBound(vRngB) To UBound(vRngB)
dRngB.Add Key:=vRngB(j, 1), Item:=vRngB(j, 1)
Next 'j
On Error GoTo 0

If AllowDupes Then '//fastest
For i = LBound(vRngA) To UBound(vRngA)
If dRngB.Exists(Key:=vRngA(i, 1)) Then _
vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1
Next 'i

j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 0)
For i = LBound(vRngA) To UBound(vRngA)
If Not vRngA(i, 1) = "" Then _
vRngOut(j, 0) = vRngA(i, 1): j = j + 1
Next 'i

Else '//slowest
Dim dRngA As New Dictionary
On Error Resume Next
For i = LBound(vRngA) To UBound(vRngA)
If Not dRngB.Exists(vRngA(i, 1)) Then _
dRngA.Add Key:=vRngA(i, 1), Item:=vRngA(i, 1)
Next 'i
On Error GoTo 0

Dim v As Variant
j = 0: ReDim vRngOut(dRngA.Count, 0)
For Each v In dRngA
vRngOut(j, 0) = dRngA(v): j = j + 1
Next 'v
End If 'AllowDupes

Range("A1:A" & lRows1).ClearContents
Range("A1").Resize(UBound(vRngOut), 1) = vRngOut

ErrExit:
StripDupes = (Err = 0)
End Function 'StripDupes()

Ron Rosenfeld · Jan 17, 2012

We can shave off even more time if we eliminate the checks when adding
items to the dictionary because the dictionary won't allow dupes...

Your efforts prompted me to study whether dictionary or collection would work faster. And it turns out that my "prune" routine, which eliminates duplicates in Col A, when rewritten using Collections, runs in about 1/6 the time!

My last effort, using dictionaries for col a and col b, on the Jim style database (240,000 entries colA; 360,000 entries col b) took about 29 sec to run. The following process that same data base in 5.5 seconds!!

=============================
Option Explicit
Sub PruneColA2()
Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim vResults As Variant
Dim cColB As Collection
Dim i As Long
Dim lBlanks As Long
Dim v As Variant
Dim rDest As Range

Set cColB = New Collection
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
Set rDest = .Cells(1, 10)
End With

vColB = rColB
vColA = rColA

On Error Resume Next
For i = LBound(vColB, 1) To UBound(vColB, 1)
With cColB
.Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i
On Error GoTo 0

On Error GoTo NotUniqueItem
For i = LBound(vColA, 1) To UBound(vColA, 1)
cColB.Add Item:=vColA(i, 1), Key:=vColA(i, 1)
Next i

ReDim vResults(1 To UBound(vColA) - lBlanks, 1 To 1)
i = 0
For Each v In vColA
If v <> "" Then
i = i + 1
vResults(i, 1) = v
End If
Next v

rDest.EntireColumn.ClearContents
rDest.EntireColumn.NumberFormat = "@"

Set rDest = rDest.Resize(rowsize:=UBound(vResults, 1))

rDest = vResults

Exit Sub

NotUniqueItem:
vColA(i, 1) = ""
lBlanks = lBlanks + 1
Resume Next
End Sub
=================================

Ron Rosenfeld · Jan 17, 2012

From the responses and their results, I think it'd be best to re-state
my OP:

I need a list of the values in Col A that are NOT found in Col B.

My spreadsheet contains:

A B
0000000021957 0000000022002
0000000022002 0000000032002

Gary,

Hopefully you've got the formatting issue sorted.
While you've been away, Jim, GS and I have been doing further work on this method. Here is a routine that also provides a list of unique (no duplicates) items in Column A that are not found in Column B, and it runs in 1/6 the time of my last macro. If the previous took a minute to run on your data set, I expect this one will run in about 10 seconds. Note that it does NOT require a reference to Microsoft Scripting Runtime

================================
Option Explicit
Sub PruneColA2()
Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim vResults As Variant
Dim cColB As Collection
Dim i As Long
Dim lBlanks As Long
Dim v As Variant
Dim rDest As Range

Set cColB = New Collection
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
Set rDest = .Cells(1, 5) 'sets column for the results
End With

vColB = rColB
vColA = rColA

On Error Resume Next
For i = LBound(vColB, 1) To UBound(vColB, 1)
With cColB
.Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i
On Error GoTo 0

On Error GoTo NotUniqueItem
For i = LBound(vColA, 1) To UBound(vColA, 1)
cColB.Add Item:=vColA(i, 1), Key:=vColA(i, 1)
Next i

ReDim vResults(1 To UBound(vColA) - lBlanks, 1 To 1)
i = 0
For Each v In vColA
If v <> "" Then
i = i + 1
vResults(i, 1) = v
End If
Next v

rDest.EntireColumn.ClearContents
rDest.EntireColumn.NumberFormat = "@"

Set rDest = rDest.Resize(rowsize:=UBound(vResults, 1))

rDest = vResults

Exit Sub

NotUniqueItem:
vColA(i, 1) = ""
lBlanks = lBlanks + 1
Resume Next
End Sub
==================================

Delete matching cells

Ron Rosenfeld

GS

Ron Rosenfeld

Ron Rosenfeld

GS

Jim Cone

Ron Rosenfeld

GS

GS

Jim Cone

Ron Rosenfeld

GS

Jim Cone

GS

Ron Rosenfeld

GS

GS

GS

Ron Rosenfeld

Ron Rosenfeld

Ask a Question

Similar Threads