Average # in total group with multiple criteria?

J

Jonathan

Hi all,

I have a large population of data and need to find the average number of
"Subproducts" per "ID". Basically, the data has thousands of different "IDs"
and then anywhere from 1 to 6 different subproducts that can be associated
with each ID. Some have 1, some have 6, some have a number in between. I
need to find the average number of unique subproduct values per ID for the
entire population in one cell. So, basically it would come down to finding
the number of unique values for each ID and then averaging out that entire
number over the number of total IDs.

For the data below (which is much more simplified than the real set
obviously), the "Average # of Subproducts per ID" field would read: (2 for A,
3 for B, 4 for C, 3 for D, 1 for E, and 3 for F) So you get a total of 16
unique combinations of ID and subproduct over 6 different IDs which equals
2.667, which is what I would want the cell to show.

ID Subproduct
A Cat
A Dog
B Cat
D Dog
F Mouse
D Rabbit
D Rabbit
A Cat
C Dog
C Cat
B Dog
B Mouse
E Rabbit
E Rabbit
F Cat
F Dog
F Cat
D Dog
D Mouse
C Rabbit
C Rabbit
B Cat
B Dog
A Cat
B Dog
C Mouse
 
J

Jonathan

No--

I know how to get the answer that I'm looking for going through a variety of
steps, but I need it all consolidated into one formula in one cell that does
all of it.
 
R

ryguy7272

Ah! I see. Ok, another, more ambitious, idea:
With IDs in column A and animals in column B, place and ID in E1 and use
this function to list all animals that are associated with that ID:
=IF(ROWS(B$1:B1)<=COUNTIF($A$1:$A$26,$E$1),INDEX($B$1:$B$26,SMALL(IF($A$1:$A$26=$E$1,ROW($A$1:$A$26)-ROW($E$1)+1),ROWS(B$1:B1))),"")
(this is a CES function; ctrl + shift + enter...not just enter)

Change the range to suit your needs...

If that doesn't work, try any of these:
=SUMPRODUCT((A2:A78<>"")/(COUNTIF(A2:A78,A2:A78&"")))
=SUMPRODUCT((A2:A78<>"")/(COUNTIF(A2:A78,A2:A78)+(A2:A78="")))
=SUM(IF(FREQUENCY(IF(LEN(A2:A971)>0,MATCH(A2:A971,A2:A971,0),""),IF(LEN(A2:A971)>0,MATCH(A2:A971,A2:A971,0),""))>0,1))
(this last one is a CES function; ctrl + shift + enter...not just enter)

Does that help?

Regards,
Ryan---
 
J

Jonathan

Not sure what you are implying for column E in the first formula--please
explain?

As for the other formulas, they're not working on the more complex data
set--I know what my answer should be, but I'm not getting it. Also, I don't
know if it makes a difference, but it is not a weighted average by any means,
it is just an average, so even if ID #1 had 30 lines with the same animal,
and ID # 2 had 2 lines but each had a different animal, the cell after the
formula would read 1.5 because it would have a total of 1 unique from ID #1
and 2 uniques from ID#2. The number of appearances has no weight at all.
 
R

ryguy7272

Ah! A tad bit more complex than I initially thought
=SUMPRODUCT(1/COUNTIF(B1:B26,B1:B26))/SUMPRODUCT((1/COUNTIF(A1:A26,A1:A26&"")*(B1:B26<>"")))

Hope that works for you.

Regards,
Ryan--
 
D

Domenic

Assuming that A2:A27 contains the ID, and B2:B27 contains the
Subproduct, first define the following...

Insert > Name > Define

Name: Array1

Refers to:

=IF($A$2:$A$27<>"",IF(MATCH($A$2:$A$27,$A$2:$A$27,0)=ROW($A$2:$A$27)-ROW(
$A$2)+1,$A$2:$A$27))

Click Add

Name: Array2

Refers to:

=IF($A$2:$A$27<>"",IF(MATCH($A$2:$A$27&"#"&$B$2:$B$27,$A$2:$A$27&"#"&$B$2
:$B$27,0)=ROW($A$2:$A$27)-ROW($A$2)+1,$A$2:$A$27,""),"")

Click Add

Name: Array3

Refers to:

=MMULT((Array1=TRANSPOSE(Array2))+0,ROW($A$2:$A$27)^0)

Click Ok

Then try the following formula, which needs to be confirmed with
CONTROL+SHIFT+ENTER...

=AVERAGE(IF(A2:A27<>"",IF(MATCH(A2:A27,A2:A27,0)=ROW(A2:A27)-ROW(A2)+1,Ar
ray3)))

Note that the solution would need to be modified if Column B can contain
empty cells.

Hope this helps!
 
T

Teethless mama

Try this:

=SUM(N(FREQUENCY(MATCH(ID&Subproduct,ID&Subproduct,0),MATCH(ID&Subproduct,ID&Subproduct,0))>0))

ctrl+shift+enter, not just enter
 
D

Domenic

Teethless mama said:
Try this:

=SUM(N(FREQUENCY(MATCH(ID&Subproduct,ID&Subproduct,0),MATCH(ID&Subproduct,ID&S
ubproduct,0))>0))

ctrl+shift+enter, not just enter

Actually, based on the sample data provided by the OP, the above formula
returns 16. However, the desired result is 2.667.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top