Top N Records on multiple fields

C

CodeCrazy

Hello -
I have a table in access that I need to pull the Top 20 records for each
different concatenation field. Here is an example table:

Concatenation Month Year Vendor MMC Sales
16-1000 5 2008 12345 1000 20000
16-1000 5 2008 12345 1000 15000
16-1000 5 2008 12345 1000 10000
16-2000 5 2008 56789 2000 20000
16-2000 5 2008 56789 2000 15000
16-2000 5 2008 56789 2000 10000
16-3000 5 2008 45678 3000 20000
16-3000 5 2008 56789 3000 15000
16-3000 5 2008 56789 3000 10000

Each concatenation has hundreds of rows. I need the Top 20 for each
different concatenation. Is that possible? If I select Top 20 values it
only returns the first 20 for concatenation 16-1000. Any and all help
greatly appreciated!
Thanks
 
M

Michel Walsh

You can rank by group, or use a sub-query.

It is easier if you have a primary key: pk


Ranking by group:



-------------------------------------------
SELECT a.concatenation, a.pk,
LAST(a.month), LAST(a.year), LAST(a.vendor), LAST(a.mmc), LAST(a.sales)

FROM table AS a INNER JOIN table AS b
ON a.concatenation = b.concatenation
AND a.sales >= b.sales
AND a.pk >= b.pk

GROUP BY a.concatenation, a.pk

HAVING COUNT(*) <= 10

ORDER BY a.concatenation, LAST(a.sales) DESC
--------------------------------------------


Using sub-query:


------------------------------------------
SELECT concatenation, pk, month, year, vendor, mmc, sales

FROM table AS a

WHERE pk IN( SELECT TOP 10 b.pk
FROM table AS b
WHERE b.concatenation = a.concatenation
ORDER BY b.sales DESC, b,pk)

ORDER BY concatenation, sales DESC
-------------------------------------------


(probably slower, though)




The last ORDER BY clause, ofeach of these queries, is just to get a nice
output, but that is not necessary to just 'get' the records.




Vanderghast, Access MVP
 
C

CodeCrazy

Thanks! The ranking by group was acting funny but the subquery worked like a
charm (although slow like you said). Thanks so much!
 
A

Anthony

I have a very similar problem, and due to the complexity of the database as
it currently stands, it is not practical to use the sub-query option. When I
try applying the 'rank by group' solution you give, I end up with an output
which returns (for each group) the 10 smallest pk numbers and then sorts each
of these groups of 10 entries by the 'sales' value.

Any ideas?
 
M

Michel Walsh

Indeed, I messed up the query. Try the following on Northwind:


----------------------------------

SELECT a.EmployeeID, a.freight,
COUNT(*) AS rank, LAST(a.OrderID), LAST(a.OrderDate)

FROM Orders AS a INNER JOIN Orders AS b
ON a.employeeID = b.employeeID
AND ( a.freight > b.freight OR (a.freight=b.freight
AND a.OrderID >= b.OrderID))

GROUP BY a.EmployeeID, a.freight

HAVING COUNT(*) <=10
-----------------------------------


where I take the 10 lowest freight, per employee. If two freight are equal,
the primary key is used to break the tie.



Vanderghast, Access MVP
 
A

Anthony

This is nearly what I'm after - but how would you go about taking the 10
highest freight per employee?

Thanks for your help!!
 
A

Anthony

I am almost there with this - I really appreciate your help!!!

I have:

SiteNumber, Type, Score

SiteNumber is unique. I group by Type and want the top 10 of each Type by
Score.

My last problem I have with the query you have written is that I have a
number of Scores that are in the top 10 but which are the same as for other
sites (also in the top 10 obviously).

The query at the moment will not include any of the rows which have
duplicates in the Score field.

I either want the first 10 sites which appear when sorted from largest to
smallest Score. Or I want all of the sites that are captured from returning
the 10 largest scores.

I don't care which of these I get - but I cannot allow any sites to be
disregarded.

If you can help it would be gretaly appreciated - I'm pulling my hair out on
this one!!!
 
M

Michel Walsh

Ok, since you want the top 10 WITH ties, we cannot use the primary key as
tie breaker (since we do not want to break the ties in the first place).

Try something like:


SELECT a.type, a.score
FROM myTable As a LEFT JOIN myTable As b
ON a.type = b.type AND a.score < b.score
GROUP BY a.type, a.score
HAVING COUNT(b.type) < 10



Note the inner join has been changed to an outer join, <= to < and COUNT(*)
as COUNT(unpreservedSideField)


Note that technically, the rank is, here, given by:

1+ COUNT(b.type)


Since you want rank <= 10, that makes: ( rank -1 ) < 10, or we could
have wrote


HAVING 1+COUNT(b.type) <= 10

or

HAVING COUNT(b.type) <= 9


but I just found prettier:

HAVING COUNT(b.type) <10



Simply artistic considerations, at that point. :)




Hoping it may help,
Vanderghast, Access MVP
 
A

Anthony

That's solved it - thank you!!!!!!

Michel Walsh said:
Ok, since you want the top 10 WITH ties, we cannot use the primary key as
tie breaker (since we do not want to break the ties in the first place).

Try something like:


SELECT a.type, a.score
FROM myTable As a LEFT JOIN myTable As b
ON a.type = b.type AND a.score < b.score
GROUP BY a.type, a.score
HAVING COUNT(b.type) < 10



Note the inner join has been changed to an outer join, <= to < and COUNT(*)
as COUNT(unpreservedSideField)


Note that technically, the rank is, here, given by:

1+ COUNT(b.type)


Since you want rank <= 10, that makes: ( rank -1 ) < 10, or we could
have wrote


HAVING 1+COUNT(b.type) <= 10

or

HAVING COUNT(b.type) <= 9


but I just found prettier:

HAVING COUNT(b.type) <10



Simply artistic considerations, at that point. :)




Hoping it may help,
Vanderghast, Access MVP
 
N

Natasha

Hello,

I am trying to figure out the same problem, by applying the solution listed
in this chain, but the resulting dataset is not ranking properly.

I am trying to get the top three "class" by "store" based on "2008 Shrink
Unit %"

The final data set needs to be ranked in the following manner:

Store 3 Shoes Dress Shoes 2.1%
Store 3 Accessories Bags 1.8%
Store 3 Clothes Dresses 1.5%
Store 4 Clothes Pants 2.6%
Store 4 Clothes Dresses 2.3%
Store 4 .... etc.

I noticed that the inner join was being created on a mirror table, so I
copied the table and called it "shrinkcalb".

Query used:

SELECT
shrinkcal.Store,
shrinkcal.Division,
shrinkcal.Class,
count(*) AS rank,
Last (shrinkcal.[2007 Shrink Units]),
Last (shrinkcal.[2008 Shrink Units]),
Last (shrinkcal.[2007 Shrink Units %]),
Last (shrinkcal.[2008 Shrink Units %]),
Last (shrinkcal.BPC)


FROM shrinkcal INNER JOIN shrinkcalb
ON shrinkcal.store = shrinkcalb.store
AND shrinkcal.division = shrinkcalb.division
AND shrinkcal.class = shrinkcalb.class
AND (shrinkcal.[2008 shrink units %] > shrinkcalb.[2008 shrink units %]
OR (shrinkcal.[2008 shrink units %] = shrinkcalb.[2008 shrink units %]
AND shrinkcal.store >= shrinkcalb.store))

GROUP BY shrinkcal.store, shrinkcal.division, shrinkcal.class,
shrinkcal.[2008 shrink units %]

HAVING COUNT (*) <=3

I don't understand where it went wrong. I am not the best at Access or SQL,
but I am hopping you can help.

Thanks.

- Natasha
 
J

John Spencer

PERHAPS the following will work to give you what you want

SELECT
shrinkcal.Store,
shrinkcal.Division,
shrinkcal.Class,
count(*) AS rank,
Last (shrinkcal.[2007 Shrink Units]),
Last (shrinkcal.[2008 Shrink Units]),
Last (shrinkcal.[2007 Shrink Units %]),
Last (shrinkcal.[2008 Shrink Units %]),
Last (shrinkcal.BPC)


FROM shrinkcal INNER JOIN shrinkcal as shrinkcalb
ON shrinkcal.store = shrinkcalb.store
AND shrinkcal.division = shrinkcalb.division
AND shrinkcal.class = shrinkcalb.class
AND (shrinkcal.[2008 shrink units %] >= shrinkcalb.[2008 shrink units %]
AND shrinkcal.store = shrinkcalb.store))

GROUP BY shrinkcal.store, shrinkcal.division, shrinkcal.class,
shrinkcal.[2008 shrink units %]

HAVING COUNT (*) <=3

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
Hello,

I am trying to figure out the same problem, by applying the solution listed
in this chain, but the resulting dataset is not ranking properly.

I am trying to get the top three "class" by "store" based on "2008 Shrink
Unit %"

The final data set needs to be ranked in the following manner:

Store 3 Shoes Dress Shoes 2.1%
Store 3 Accessories Bags 1.8%
Store 3 Clothes Dresses 1.5%
Store 4 Clothes Pants 2.6%
Store 4 Clothes Dresses 2.3%
Store 4 .... etc.

I noticed that the inner join was being created on a mirror table, so I
copied the table and called it "shrinkcalb".

Query used:

SELECT
shrinkcal.Store,
shrinkcal.Division,
shrinkcal.Class,
count(*) AS rank,
Last (shrinkcal.[2007 Shrink Units]),
Last (shrinkcal.[2008 Shrink Units]),
Last (shrinkcal.[2007 Shrink Units %]),
Last (shrinkcal.[2008 Shrink Units %]),
Last (shrinkcal.BPC)


FROM shrinkcal INNER JOIN shrinkcalb
ON shrinkcal.store = shrinkcalb.store
AND shrinkcal.division = shrinkcalb.division
AND shrinkcal.class = shrinkcalb.class
AND (shrinkcal.[2008 shrink units %] > shrinkcalb.[2008 shrink units %]
OR (shrinkcal.[2008 shrink units %] = shrinkcalb.[2008 shrink units %]
AND shrinkcal.store >= shrinkcalb.store))

GROUP BY shrinkcal.store, shrinkcal.division, shrinkcal.class,
shrinkcal.[2008 shrink units %]

HAVING COUNT (*) <=3

I don't understand where it went wrong. I am not the best at Access or SQL,
but I am hopping you can help.

Thanks.

- Natasha


Anthony said:
That's solved it - thank you!!!!!!
 
N

Natasha

No. It's still not working. My rank order is still 1 for all the records.

John Spencer said:
PERHAPS the following will work to give you what you want

SELECT
shrinkcal.Store,
shrinkcal.Division,
shrinkcal.Class,
count(*) AS rank,
Last (shrinkcal.[2007 Shrink Units]),
Last (shrinkcal.[2008 Shrink Units]),
Last (shrinkcal.[2007 Shrink Units %]),
Last (shrinkcal.[2008 Shrink Units %]),
Last (shrinkcal.BPC)


FROM shrinkcal INNER JOIN shrinkcal as shrinkcalb
ON shrinkcal.store = shrinkcalb.store
AND shrinkcal.division = shrinkcalb.division
AND shrinkcal.class = shrinkcalb.class
AND (shrinkcal.[2008 shrink units %] >= shrinkcalb.[2008 shrink units %]
AND shrinkcal.store = shrinkcalb.store))

GROUP BY shrinkcal.store, shrinkcal.division, shrinkcal.class,
shrinkcal.[2008 shrink units %]

HAVING COUNT (*) <=3

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
Hello,

I am trying to figure out the same problem, by applying the solution listed
in this chain, but the resulting dataset is not ranking properly.

I am trying to get the top three "class" by "store" based on "2008 Shrink
Unit %"

The final data set needs to be ranked in the following manner:

Store 3 Shoes Dress Shoes 2.1%
Store 3 Accessories Bags 1.8%
Store 3 Clothes Dresses 1.5%
Store 4 Clothes Pants 2.6%
Store 4 Clothes Dresses 2.3%
Store 4 .... etc.

I noticed that the inner join was being created on a mirror table, so I
copied the table and called it "shrinkcalb".

Query used:

SELECT
shrinkcal.Store,
shrinkcal.Division,
shrinkcal.Class,
count(*) AS rank,
Last (shrinkcal.[2007 Shrink Units]),
Last (shrinkcal.[2008 Shrink Units]),
Last (shrinkcal.[2007 Shrink Units %]),
Last (shrinkcal.[2008 Shrink Units %]),
Last (shrinkcal.BPC)


FROM shrinkcal INNER JOIN shrinkcalb
ON shrinkcal.store = shrinkcalb.store
AND shrinkcal.division = shrinkcalb.division
AND shrinkcal.class = shrinkcalb.class
AND (shrinkcal.[2008 shrink units %] > shrinkcalb.[2008 shrink units %]
OR (shrinkcal.[2008 shrink units %] = shrinkcalb.[2008 shrink units %]
AND shrinkcal.store >= shrinkcalb.store))

GROUP BY shrinkcal.store, shrinkcal.division, shrinkcal.class,
shrinkcal.[2008 shrink units %]

HAVING COUNT (*) <=3

I don't understand where it went wrong. I am not the best at Access or SQL,
but I am hopping you can help.

Thanks.

- Natasha


Anthony said:
That's solved it - thank you!!!!!!

:

Ok, since you want the top 10 WITH ties, we cannot use the primary key as
tie breaker (since we do not want to break the ties in the first place).

Try something like:


SELECT a.type, a.score
FROM myTable As a LEFT JOIN myTable As b
ON a.type = b.type AND a.score < b.score
GROUP BY a.type, a.score
HAVING COUNT(b.type) < 10



Note the inner join has been changed to an outer join, <= to < and COUNT(*)
as COUNT(unpreservedSideField)


Note that technically, the rank is, here, given by:

1+ COUNT(b.type)


Since you want rank <= 10, that makes: ( rank -1 ) < 10, or we could
have wrote


HAVING 1+COUNT(b.type) <= 10

or

HAVING COUNT(b.type) <= 9


but I just found prettier:

HAVING COUNT(b.type) <10



Simply artistic considerations, at that point. :)




Hoping it may help,
Vanderghast, Access MVP


I am almost there with this - I really appreciate your help!!!

I have:

SiteNumber, Type, Score

SiteNumber is unique. I group by Type and want the top 10 of each Type by
Score.

My last problem I have with the query you have written is that I have a
number of Scores that are in the top 10 but which are the same as for
other
sites (also in the top 10 obviously).

The query at the moment will not include any of the rows which have
duplicates in the Score field.

I either want the first 10 sites which appear when sorted from largest to
smallest Score. Or I want all of the sites that are captured from
returning
the 10 largest scores.

I don't care which of these I get - but I cannot allow any sites to be
disregarded.

If you can help it would be gretaly appreciated - I'm pulling my hair out
on
this one!!!

:

Change the > and >= for < and <= in the ON clause.


Vanderghast, Access MVP


This is nearly what I'm after - but how would you go about taking the
10
highest freight per employee?

Thanks for your help!!

:

Indeed, I messed up the query. Try the following on Northwind:


----------------------------------

SELECT a.EmployeeID, a.freight,
COUNT(*) AS rank, LAST(a.OrderID), LAST(a.OrderDate)

FROM Orders AS a INNER JOIN Orders AS b
ON a.employeeID = b.employeeID
AND ( a.freight > b.freight OR (a.freight=b.freight
AND a.OrderID >= b.OrderID))

GROUP BY a.EmployeeID, a.freight

HAVING COUNT(*) <=10
-----------------------------------


where I take the 10 lowest freight, per employee. If two freight are
equal,
the primary key is used to break the tie.



Vanderghast, Access MVP




I have a very similar problem, and due to the complexity of the
database
as
it currently stands, it is not practical to use the sub-query
option.
When
I
try applying the 'rank by group' solution you give, I end up with an
output
which returns (for each group) the 10 smallest pk numbers and then
sorts
each
of these groups of 10 entries by the 'sales' value.

Any ideas?

:

You can rank by group, or use a sub-query.

It is easier if you have a primary key: pk


Ranking by group:



-------------------------------------------
SELECT a.concatenation, a.pk,
LAST(a.month), LAST(a.year), LAST(a.vendor), LAST(a.mmc),
LAST(a.sales)

FROM table AS a INNER JOIN table AS b
ON a.concatenation = b.concatenation
AND a.sales >= b.sales
AND a.pk >= b.pk

GROUP BY a.concatenation, a.pk

HAVING COUNT(*) <= 10

ORDER BY a.concatenation, LAST(a.sales) DESC
--------------------------------------------


Using sub-query:


------------------------------------------
SELECT concatenation, pk, month, year, vendor, mmc, sales

FROM table AS a

WHERE pk IN( SELECT TOP 10 b.pk
FROM table AS b
WHERE b.concatenation = a.concatenation
ORDER BY b.sales DESC, b,pk)

ORDER BY concatenation, sales DESC
-------------------------------------------


(probably slower, though)




The last ORDER BY clause, ofeach of these queries, is just to get
a
nice
output, but that is not necessary to just 'get' the records.




Vanderghast, Access MVP
 
J

Jennifer

Hello,
I'm trying to apply your answer to a similar situation I have, but am not
really following it. Maybe you can help me a little more. Here's the general
layout of my table...

Table looks like:
Div. Region District Store # Vendor Rate(%) Contact
02 11 2211 05675 55555 .308987 Person's
name
03 54 1111 01245 04716 .141239 Person's
name
02 11 3215 11254 00779 .499746 Person's
name
01 51 5448 05987 65987 .597411 Person's
name
03 52 0444 04365 77897 .597961 Person's
name


The table is about 4191 records... I'm looking to find the top rate values
for each region. I do not want to create 27 different queries using the Top
Value property(as this is the # of regions). I tried creating "region" as the
primary key, to try to follow your post below, but says it can't because
there are duplicate values.

Any ideas how I can query this table for the top 25 values of each region?

Thanks,
Jennifer
 
V

vanderghast

SELECT a.region,
LAST(a.[div.]),
LAST(a.district),
LAST(a.[store #]),
LAST(a.vendor),
a.[rate %],
LAST(a.contact)
FROM tablename AS a INNER JOIN tableName AS b
ON a.region= b.region
AND
( a.[rate %] < b.[rate %]
OR (a.[rate %]= b.[rate %] AND a.vendor<= b.vendor))
GROUP BY a.region, a.[rate %]
HAVING COUNT(*)=1



Note that if there is a tie, for a given region, then the vendor having the
largest vendor number will win the tie. If there is no possible tie, you can
simplify the query to:


SELECT a.region,
LAST(a.[div.]),
LAST(a.district),
LAST(a.[store #]),
LAST(a.vendor),
a.[rate %],
LAST(a.contact)
FROM tablename AS a INNER JOIN tableName AS b
ON a.region= b.region AND a.[rate %] <= b.[rate %]
GROUP BY a.region, a.[rate %]
HAVING COUNT(*)=1



Vanderghast, Access MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top