Count unique records?

J

Jason O

Hi

Can anyone tell me if there is a simple way of counting the number of unique
records in a range. For example, a simple list of cities in cells A2 -
A1000, but with many cities being in the list more than once. How can I
count the number of unique cities?

TIA (& Happy holidays)

Jason
___
 
J

JE McGimpsey

Jason O said:
Hi

Can anyone tell me if there is a simple way of counting the number of unique
records in a range. For example, a simple list of cities in cells A2 -
A1000, but with many cities being in the list more than once. How can I
count the number of unique cities?

One way:

=SUMPRODUCT((A2:A1000<>"")/COUNTIF(A2:A1000,A2:A1000&""))
 
K

Ken Johnson

Hi Jason,

=SUMPRODUCT((A2:A1000<>"")/COUNTIF(A2:A1000,A2:A1000&""))

I got this gem off Bob Phillips last year. SUMPRODUCT is the most
amazing formula

Ken Johnson
 
J

Jason O

Hi Jason,

=SUMPRODUCT((A2:A1000<>"")/COUNTIF(A2:A1000,A2:A1000&""))

I got this gem off Bob Phillips last year. SUMPRODUCT is the most
amazing formula

Ken Johnson

I'm kinda getting that impression:)

I understand the principle but don't quite get the countif criteria
A2:A1000&"" Could you enlighten me?

*Many* thanks,

Jay
__
 
P

Paul Berkowitz

One way:

=SUMPRODUCT((A2:A1000<>"")/COUNTIF(A2:A1000,A2:A1000&""))

What does the argument ( A2:A1000&"" ) represent, John? It seems to be
concatenating a range to blank text. What sort of value is that meant to
indicate?
 
J

JE McGimpsey

One way:

=SUMPRODUCT((A2:A1000<>"")/COUNTIF(A2:A1000,A2:A1000&""))

What does the argument ( A2:A1000&"" ) represent, John? It seems to be
concatenating a range to blank text. What sort of value is that meant to
indicate?[/QUOTE]

It allows the formula to work when there are blanks in the range.
 
J

Jason O

What does the argument ( A2:A1000&"" ) represent, John? It seems to be
concatenating a range to blank text. What sort of value is that meant to
indicate?

It allows the formula to work when there are blanks in the range.[/QUOTE]

I think I must be missing somrthing. How does the formula actually work?
What is the countif actually counting? When I just applied that (on its own)
to my range to try & understand it, it just returns zero? I also don't
understand how sumproduct is being used. Could I trouble you for a
breakdown of exactly how it works.

Sorry to be a pain :)

Jason
___
 
P

Paul Berkowitz

What does the argument ( A2:A1000&"" ) represent, John? It seems to be
concatenating a range to blank text. What sort of value is that meant to
indicate?

It allows the formula to work when there are blanks in the range.[/QUOTE]

OK, thanks. But the formula

COUNTIF(A2:A1000,A2:A1000&"")

is supposed to contain the range it's operating on as its first argument,
and the value you're looking for as the second argument. What value is this
looking for, aside from not disallowing blank cells?

--
Paul Berkowitz
MVP MacOffice
Entourage FAQ Page: <http://www.entourage.mvps.org/faq/index.html>
AppleScripts for Entourage: <http://macscripter.net/scriptbuilders/>

Please "Reply To Newsgroup" to reply to this message. Emails will be
ignored.

PLEASE always state which version of Microsoft Office you are using -
**2004**, X or 2001. It's often impossible to answer your questions
otherwise.
 
P

Paul Berkowitz

OK, thanks. But the formula

COUNTIF(A2:A1000,A2:A1000&"")

is supposed to contain the range it's operating on as its first argument,
and the value you're looking for as the second argument. What value is
this looking for, aside from not disallowing blank cells?


And perhaps you could help with the next stage, too? SUMPRODUCT is supposed
to operate on a series of arrays. No matter what the result of the COUNTIF
function, that result is an integer, right? So if you divide the first outer
argument (A2:A1000<>"") - an array of cells which are not blank, by this
integer, do you not still only get one array? In fact, I'm trying to work
out what the "/" division operator on an array of cells actually does. So
far, I haven't found it in the Help.
 
J

JE McGimpsey

Jason O said:
I think I must be missing somrthing. How does the formula actually work?
What is the countif actually counting? When I just applied that (on its own)
to my range to try & understand it, it just returns zero? I also don't
understand how sumproduct is being used. Could I trouble you for a
breakdown of exactly how it works.

OK, here's what's happening.

First, SUMPRODUCT() is an array formula, so all the terms within it are
handled as if they had been entered with CTRL-RETURN. So

(A2:A1000<>"")

returns an array of 999 boolean values: TRUE if the cell in column A is
non-blank, FALSE if the cell is blank.

COUNTIF(A2:A1000,A2:A1000)

similarly returns an array of 999 values representing the number of
instances of each cell (i.e., the first element of the array returns the
number of instances of the value in A2, the second element returns the
number of instances of the value in A3, etc.). However, if one of the
cells in column A is blank, the COUNTIF() element will be 0, which, when
divided into the first array, will produce a #DIV/0 error. So we append
a null string to the target range, which forces the blank in the first
COUNTIF argument to a null string (so the types are the same), and
returns 1 (or more, if there are more blanks).

When the first array is divided by the second, the boolean values
TRUE/FALSE in the first array are converted to 1/0, respectively, then
divided by the second array. This effectively counts a unique value
once, a value with a single duplicate 1/2, a value with two duplicates
1/3, etc...

Since the denominator is the count of all non-blanks multiplied by the
number of their duplication, the result of the division is the count of
all non-blank, unique values.

For example, if A2:A7 is filled with

1,2,{blank},2,5,6

with 4 unique values and one blank. The the first array (A2:A7<>"")
returns

{TRUE,TRUE,FALSE,TRUE,TRUE,TRUE}

The COUNTIF(A2:A7,A2:A7&"") returns

{1,2,1,2,1,1}

The division operator now coerces the booleans to numeric:

{1,1,0,1,1,1}/{1,2,1,2,1,1}

or

{1/1,1/2,0/1,1/2,1/1,1/1}

which returns

{1,0.5,0,0.5,1,1}

SUMPRODUCT now sums the array to return 4.
 
J

JE McGimpsey

Paul Berkowitz said:
OK, thanks. But the formula

COUNTIF(A2:A1000,A2:A1000&"")

is supposed to contain the range it's operating on as its first argument,
and the value you're looking for as the second argument. What value is this
looking for, aside from not disallowing blank cells?

See if my explanation in the other subthread is comprehensible.
 
J

JE McGimpsey

Paul Berkowitz said:
And perhaps you could help with the next stage, too? SUMPRODUCT is supposed
to operate on a series of arrays. No matter what the result of the COUNTIF
function, that result is an integer, right? So if you divide the first outer
argument (A2:A1000<>"") - an array of cells which are not blank, by this
integer, do you not still only get one array? In fact, I'm trying to work
out what the "/" division operator on an array of cells actually does. So
far, I haven't found it in the Help.

The key is that SUMPRODUCT is an array formula, even though it's not
entered with CMD-RETURN.

So {x1,x2,x3,x4} / {y1,y2,y3,y4}

is evaluated as

{x1/y1,x2/y2,x3/y3,x4/y4}

which SUMPRODUCT then adds together.
 
P

Paul Berkowitz

Thank you, John. Mostly clear now. I did not realize that (A2:A1000<>"")
would return an array, for starters.

Just one last query: You say
However, if one of the
cells in column A is blank, the COUNTIF() element will be 0

Why is that? If A2:A1000 contains 1 blank cell, why doesn't

COUNTIF(A2:A1000,A2:A1000)

return that cell's result in the array as 1? If it contains 15 blank cells,
why doesn't it return those 15 elements in the array each with the value of
15? Why would it return 0 for these?

And how does appending "" via A2:A1000&"" force the blank to a null string,
which is different from a blank? Is this just a convention you have to know?

I'm still not understanding why blank is not a valid value, I guess. Is that
the operation can't be performed at all with a blank cell, so 0 is returned
by default?
 
J

JE McGimpsey

Paul Berkowitz said:
I'm still not understanding why blank is not a valid value, I guess. Is that
the operation can't be performed at all with a blank cell, so 0 is returned
by default?

Almost. A reference to a cell always returns a value. If the cell is
blank, the reference returns either numeric zero, or a null string,
depending on the data type that's expected. A zero is returned by
default.

COUNTIF(), however, appears to coerce both the range values and the
target value to a string.

Without the &"", the returned value is coerced to "0". This will not
match the blank's null string.

By concatenating the returned reference, the returned value is coerced
into a null string, and the result of the concatenation is also a null
string. This *does* match the blank cell's value in the range argument.
 
K

Ken Johnson

Hi Jason,
That formula never fails to stir the possoms.
I retired for the evening straight after posting my reply and missed
all the action.
JE was kept pretty busy while I was snoozing!
It's good to get JE's opinions on the workings of that formula. I'm
sure this thread will remain bookmarked on my computer for quite some
time.

I first saw the formula last Sept when Bob Phillips replied to a
similar request.
My immediate reaction was disbelief and confusion so I started a new
thread
"Can anybody explain to me how Bob Phillips' unique record counter
work?"
in microsoft.public.excel.programming

To save you going to the trouble of doing a search I've pasted Bob's
responses below.
Bob's website, xldynamic, has heaps of other SUMPRODUCT examples worth
looking at.

Ken,
Spookily, Richard Buttrey asked the same question this morning.
Here is my reply re-posted

Let's start by defining the range A1:A20 to talk specifics.
Bob,John,Bob,Bob,John,John,Bob,Bill,Bill,Max
or data in just A1:A10
The basic formula to count unique items is
=SUMPRODUCT(1/COUNTIF($A$1:$A$10,$A$1:$A$10))
The COUNTIF($A$1:$A$10,$A$1:$A$10) part of the formula builds an array
of
the number of occurrences of each item, in this
case{4;3;4;4;3;3;4;2;2;1}.
As can be seen, each occurrence of the repeated value is counted, so
there
are four occurrences of Bob in the array. There will always be the same

number of occurrences of value as the count of that value, unless two
or
more items are repeated the same number of times, in which case it will
be
some multiple of that count.
Thus the item that is repeated 4 times has 4 instances of that count,
dividing 1 by the count of 4, gives 0.25 4 times. The full array of
values
is
{0.25;0.333333333333333;0.25;0.25;0.333333333333333;0.333333333333333;0.25;0

..5;0.5;1}.
The item that repeats 4 times sums to 1. The item that repeats 3 times
also
sums to 1. It should be clear from this that every value works in the
same
way and sums to 1. In other words, 1 is returned for every unique item.
The
sum of these values becomes the count of unique items.
As our test range is A1:A20, and some of the items in A1:A20 are blank,

extending this formula to A1:A20 would return a #DIV/0! Error.
The reason for the error is blank cells in the full range A1:A20. Each
blank
cell returns a 0 value from the COUNTIF formula, which gives the
#DIV/0!
Error when divided into 1.
The solution to this is to force it to count the empty cells as well,
and
not return a zero. Adding &"" to the end of the COUNTIF formula forces
a
count of the blanks.
This addition on its own removes the #DIV/0! error, but will cause the
blanks to be counted as a unique item. A further addition to the
formula
resolves this by testing for those blanks. Instead of dividing the
array of
counts into 1 each time, adding the test creates an array of TRUE/FALSE

values to be divided by the equivalent element in the counts array.
Each
blank will resolve to FALSE in the dividend array, and the count of the

blanks in the divisor array. The result of this will be 0, so the
blanks do
not get counted.
--
HTH
Bob Phillips

(Bob,
Thanks for that, it's going to take me a while to work through it. It's
easily the most amazing bit of excel wizardry I've ever seen. Is it
your own invention?
Ken Johnson)


I wish I could lay some claim to that but I cannot. It is doubtful that
anyone can claim these things absolutely as so many contribute to their
evolution. I first saw it in a more simplistic form

=SUMPRODUCT(1/COUNTIF(A1:A20,A1:A20)

and it has evolved over the years to what I showed that day.

There are better ones out there IMO, although that does have the appeal
of
brevity. I saw one from Domenic a few weeks back that sums in
non-contiguous
cells that I really liked

=SUMPRODUCT(--ISNUMBER(MATCH(COLUMN(C1:N1)-COLUMN(C1)+1,{1,3,6,8,12},0)),--(
C1:N1>=1))

This sums C1,E1,H1,J1,N1 only. I had a solution to this problem, but
not as
elegant.

Here is one of mine for extracting the numbers from a delimited string
and
sum them. So if A1 contains 1,12,123 it will get the 1 and 123 out and
sum
them

=SUMPRODUCT(--MID(A1,FIND("~",SUBSTITUTE(B1&A1&B1,B1,"~",ROW(INDIRECT("1:"&L
EN(A1)-LEN(SUBSTITUTE(A1,B1,""))+1)))),
FIND("~",SUBSTITUTE(B1&A1&B1,B1,"~",ROW(INDIRECT("2:"&LEN(A1)-LEN(SUBSTITUTE
(A1,B1,""))+2))))-
FIND("~",SUBSTITUTE(B1&A1&B1,B1,"~",ROW(INDIRECT("1:"&LEN(A1)-LEN(SUBSTITUTE
(A1,B1,""))+1))))-1))

If you think that all this is useful, take a look at
http://www.xldynamic.com/source/xld.SUMPRODUCT.html

--
HTH

Bob Phillips



That should keep you busy for a while Jason.

Ken Johnson
 
J

JE McGimpsey

Ken Johnson said:
Bob's website, xldynamic, has heaps of other SUMPRODUCT examples worth
looking at.

Note that, at least for me, xldynamic's javascript causes errors with
Safari that can be difficult to get out of. I've emailed Bob a couple of
times, but he's indicated that the number of Safari users doesn't
justify changing his site, which I can understand.

Most of the javascript is used for formatting, not content, so you can
turn of js and see the site's content.
 
A

AntBlabby

I don't have time to do fancy functions, if I don't have to. The way
I'd do it is use the Advance Filter. I'd highlight the column, go to
Advance Filter, check the box for unique records, and have the results
go to some other column. Then, I'd highlight the results and the
number of rows is the number of unique entries. Not too elegant, but
it works really fast.
 
K

Ken Johnson

Hi Jason,
Just for fun:) I was looking at Bob's last formula for extracting and
summing the numbers in a delimited string and noticed an error, not in
the formula (as if I could ever do that!), but in what Bob says about
it.
After pasting it into C1 it appeared broken up and shared by C1:C6 so I
removed the leading = in C1 then used =CONCATENATE(C1,C2,C3,C4,C5,C6)
in C7 then Paste Special>Values to get that result into C1 then
replaced the leading = to get the whole formula working in C1.
After all that I tried it out with 1,12,123 in A1 and only got #VALUE.
Then I figured with all those references to B1 in the formula there
must be something important in B1 to make it work, something Bob has
forgotten to mention.
It didn't take me long to discover that B1 has to contain the
delimiter, in this case a comma; then the formula returned 136
(1+12+123).
Most single character delimiters seem to work, the only exceptions I've
found are the ~ (tilde) and ' (apostrophe), even the ` (don't know its
name, but it shares the same key as the ~) works.

Ken Johnson
 
P

Paul Berkowitz

I don't have time to do fancy functions, if I don't have to. The way
I'd do it is use the Advance Filter. I'd highlight the column, go to
Advance Filter, check the box for unique records, and have the results
go to some other column. Then, I'd highlight the results and the
number of rows is the number of unique entries. Not too elegant, but
it works really fast.

The fancy function is a faster, with fewer steps, however.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top