Query to show top x% by group

J

John

Hello.

I'm trying to figure out a query that will show me the top x% by group.
I have a query that counts the number of occurrences of a given event,
grouped by a store number. So, my results might look something like this:

Store Occurrences
1 6
2 105
3 3
4 35
5 5
6 258
7 167
8 43
9 42
10 810
11 3
12 28
13 109
14 204
15 5
16 5
17 3
18 3
19 6
20 3



First, I would like to have a query that returns, for example, the top
25%. I tried doing this by sorting the Occurrences field descending, and
then specifying the "top" property in the query as 25%. However, if I do
this, it will return the top 5 records (and any subsequent records that
are tied with the 5th record in the Occurrences field). So basically,
the results would be this:

Store Occurrences
10 810
6 258
14 204
7 167
13 109

However, I would like a different measure of the top 25%. If you add up
all the occurrences in the first data set above, you get a total of
1843. 25% of this number is 460.75, so what I want to show is the top x
number of stores whose occurrences add up to 460.75. Or, put another
way, if we start at the top of the results and work our way down,
keeping a running sum of the occurrences, how many stores do we go
through before we reach 460.75. In fact, this is the exact logic that
I've used in my current implementation, but it's all done with VBA code
iterating through a recordset and outputting to Excel - I would like it
all done in a query .In this example, I want the results to be:

Store Occurrences
10 810

The next piece of this is that I also need to be able to do this by
group. Each store is assigned to a region - maybe something like
"British Columbia" or "Greater Toronto Area". I would like the query to
show the top x% for each region.


Any thoughts?
 
K

KARL DEWEY

Use two queries (unless you know subqueries). The first calculates the total
of
1843 and a prompt for percent give results of this number is 460.75.
Use that query in second query as criteria - > [FirstQuery].[Percent
Result]
 
J

John Spencer

First query - is simple

SELECT Sum(Occurences)*.25
FROM YourTable

Next query mean you have to establish a Running Sum for the stores.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store

Combining those into one query.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
HAVING Sum(B.Occurences) >=
(SELECT Sum(Occurences)*.25
FROM YourTable)

Your final criteria of by region I will leave you to solve by noting that you
will need to include the region in the first query and second query.

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
 
J

John

Thanks for the reply - the solution looks a lot simpler than I thought
it would be. I'll have to try it when I get into work tomorrow (I don't
have access to the necessary data from home).
 
J

jrtwynam

First query - is simple

SELECT Sum(Occurences)*.25
FROM YourTable

Next query mean you have to establish a Running Sum for the stores.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store

Combining those into one query.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
HAVING Sum(B.Occurences) >=
   (SELECT Sum(Occurences)*.25
    FROM YourTable)

Your final criteria of by region I will leave you to solve by noting thatyou
will need to include the region in the first query and second query.

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County












- Show quoted text -

I tried the queries you suggested, but I'm having some trouble getting
it to show what I want. Using an actual data set and doing the
calculations with a combination of Access and Excel, I get this:

STORE OCCURENCES
452 121
449 87
417 79
362 76
205 73
466 57
83 46
304 46
368 45
132 44
322 43
453 43
326 42
248 41
329 41
428 41

Please note that this is only a subset of my actual data (the first 16
out of 374 records). The total number of occurrences is 4038. Having
looked again at my requirements, I see that I'm actually looking for
the top 10%, not 25%, so 10% of 4038 is 403.8. Therefore, what I want
the query to return is this:

STORE OCCURENCES
452 121
449 87
417 79
362 76
205 73

Because store 205 is the point at which we pass 403.8 in a running
total of the occurrences. However, what the query is actually
returning is this:

Store RunningTotal
5 1315
7 438
11 1054
13 1770
16 614
20 2224
22 614
32 438
34 3037
35 614
36 438
37 1054
39 438
41 1950
43 558
44 438

Again, this is only a subset - the first 16 out of 180 records. This
is my SQL:

SELECT A.Store, Sum(B.Occurences) as RunningTotal FROM
[RPT0800_qry0300_StoreOccurrences4-7] as A LEFT JOIN
[RPT0800_qry0300_StoreOccurrences4-7] as B ON A.Occurences >
B.Occurences GROUP BY A.Store HAVING Sum(B.Occurences) >=
(SELECT Sum(Occurences)*GetNumParam("RPT0800_BOTTOM_X_PERCENT")
FROM [RPT0800_qry0300_StoreOccurrences4-7])

The result of GetNumParam("RPT0800_BOTTOM_X_PERCENT") is 0.1 (it's
just a parameter stored in a table, so that it's easily changeable
without having to go searching through queries).
"RPT0800_qry0300_StoreOccurrences4-7" is the name of my query that
gets the data into the format that I need. It's just a naming
convention I've adopted in an effort to keep track of which queries
are used for what. No, there are not 300 queries involved (qry0300) -
there are 3.

Am I doing something wrong?
 
J

jrtwynam

First query - is simple
SELECT Sum(Occurences)*.25
FROM YourTable
Next query mean you have to establish a Running Sum for the stores.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
Combining those into one query.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
HAVING Sum(B.Occurences) >=
   (SELECT Sum(Occurences)*.25
    FROM YourTable)
Your final criteria of by region I will leave you to solve by noting that you
will need to include the region in the first query and second query.
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
- Show quoted text -

I tried the queries you suggested, but I'm having some trouble getting
it to show what I want. Using an actual data set and doing the
calculations with a combination of Access and Excel, I get this:

STORE   OCCURENCES
452     121
449     87
417     79
362     76
205     73
466     57
83      46
304     46
368     45
132     44
322     43
453     43
326     42
248     41
329     41
428     41

Please note that this is only a subset of my actual data (the first 16
out of 374 records). The total number of occurrences is 4038. Having
looked again at my requirements, I see that I'm actually looking for
the top 10%, not 25%, so 10% of 4038 is 403.8. Therefore, what I want
the query to return is this:

STORE   OCCURENCES
452     121
449     87
417     79
362     76
205     73

Because store 205 is the point at which we pass 403.8 in a running
total of the occurrences. However, what the query is actually
returning is this:

Store   RunningTotal
5       1315
7       438
11      1054
13      1770
16      614
20      2224
22      614
32      438
34      3037
35      614
36      438
37      1054
39      438
41      1950
43      558
44      438

Again, this is only a subset - the first 16 out of 180 records. This
is my SQL:

SELECT A.Store, Sum(B.Occurences) as RunningTotal FROM
[RPT0800_qry0300_StoreOccurrences4-7] as A LEFT JOIN
[RPT0800_qry0300_StoreOccurrences4-7] as B ON A.Occurences >
B.Occurences  GROUP BY A.Store HAVING Sum(B.Occurences) >=
   (SELECT Sum(Occurences)*GetNumParam("RPT0800_BOTTOM_X_PERCENT")
    FROM [RPT0800_qry0300_StoreOccurrences4-7])

The result of GetNumParam("RPT0800_BOTTOM_X_PERCENT") is 0.1 (it's
just a parameter stored in a table, so that it's easily changeable
without having to go searching through queries).
"RPT0800_qry0300_StoreOccurrences4-7" is the name of my query that
gets the data into the format that I need. It's just a naming
convention I've adopted in an effort to keep track of which queries
are used for what. No, there are not 300 queries involved (qry0300) -
there are 3.

Am I doing something wrong?- Hide quoted text -

- Show quoted text -

Ok, so I tried a slightly different method right after I posted my
previous reply (using a cartesian product), and I got most of the way
towards what I want. The first few records of my results show this:

STORE OCCURENCES SumOfOCCURENCES
452 121 121
449 87 208
417 79 287
362 76 363
205 73 436
466 57 493
83 46 585
304 46 585
368 45 630
132 44 674

This works, for the most part, but it does have trouble when there is
more than one store with the same number of occurrences (eg stores 83
and 304 above). Both of those results get added to the running total
in store 83's record. In this example, it will make no difference to
the results I want though.

My next challenge is to limit the results to the bottom 10%. I could
put some criteria in the SumOfOCCURENCES field to say <403.8 (which I
would not hard-code in there... that's just for this example).
However, what I actually want to include in this case are all records
where SumOfOCCURENCES is <403.8, PLUS the first record where the
running total passes 403.8.

This is my current SQL:

SELECT [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES,
Sum([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES) AS
SumOfOCCURENCES
FROM [RPT0800_qry0300_StoreOccurrences4-7],
[RPT0800_qry0300_StoreOccurrences4-7] AS
[RPT0800_qry0300_StoreOccurrences4-7_1]
WHERE
((([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES)>=[RPT0800_qry0300_StoreOccurrences4-7].
[OCCURENCES]))
GROUP BY [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES
ORDER BY [RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES DESC;
 
J

John Spencer

Sorry I don't have a solution to that problem (the record where the sum goes
over the amount) at present.

If I think of something, I'll try to find this thread and post my idea.

Good luck.

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County

First query - is simple
SELECT Sum(Occurences)*.25
FROM YourTable
Next query mean you have to establish a Running Sum for the stores.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
Combining those into one query.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
HAVING Sum(B.Occurences) >=
(SELECT Sum(Occurences)*.25
FROM YourTable)
Your final criteria of by region I will leave you to solve by noting that you
will need to include the region in the first query and second query.
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
John wrote:
Hello.
I'm trying to figure out a query that will show me the top x% by group.
I have a query that counts the number of occurrences of a given event,
grouped by a store number. So, my results might look something like this:
Store Occurrences
1 6
2 105
3 3
4 35
5 5
6 258
7 167
8 43
9 42
10 810
11 3
12 28
13 109
14 204
15 5
16 5
17 3
18 3
19 6
20 3
First, I would like to have a query that returns, for example, the top
25%. I tried doing this by sorting the Occurrences field descending, and
then specifying the "top" property in the query as 25%. However, if I do
this, it will return the top 5 records (and any subsequent records that
are tied with the 5th record in the Occurrences field). So basically,
the results would be this:
Store Occurrences
10 810
6 258
14 204
7 167
13 109
However, I would like a different measure of the top 25%. If you add up
all the occurrences in the first data set above, you get a total of
1843. 25% of this number is 460.75, so what I want to show is the top x
number of stores whose occurrences add up to 460.75. Or, put another
way, if we start at the top of the results and work our way down,
keeping a running sum of the occurrences, how many stores do we go
through before we reach 460.75. In fact, this is the exact logic that
I've used in my current implementation, but it's all done with VBA code
iterating through a recordset and outputting to Excel - I would like it
all done in a query .In this example, I want the results to be:
Store Occurrences
10 810
The next piece of this is that I also need to be able to do this by
group. Each store is assigned to a region - maybe something like
"British Columbia" or "Greater Toronto Area". I would like the query to
show the top x% for each region.
Any thoughts?- Hide quoted text -
- Show quoted text -
I tried the queries you suggested, but I'm having some trouble getting
it to show what I want. Using an actual data set and doing the
calculations with a combination of Access and Excel, I get this:

STORE OCCURENCES
452 121
449 87
417 79
362 76
205 73
466 57
83 46
304 46
368 45
132 44
322 43
453 43
326 42
248 41
329 41
428 41

Please note that this is only a subset of my actual data (the first 16
out of 374 records). The total number of occurrences is 4038. Having
looked again at my requirements, I see that I'm actually looking for
the top 10%, not 25%, so 10% of 4038 is 403.8. Therefore, what I want
the query to return is this:

STORE OCCURENCES
452 121
449 87
417 79
362 76
205 73

Because store 205 is the point at which we pass 403.8 in a running
total of the occurrences. However, what the query is actually
returning is this:

Store RunningTotal
5 1315
7 438
11 1054
13 1770
16 614
20 2224
22 614
32 438
34 3037
35 614
36 438
37 1054
39 438
41 1950
43 558
44 438

Again, this is only a subset - the first 16 out of 180 records. This
is my SQL:

SELECT A.Store, Sum(B.Occurences) as RunningTotal FROM
[RPT0800_qry0300_StoreOccurrences4-7] as A LEFT JOIN
[RPT0800_qry0300_StoreOccurrences4-7] as B ON A.Occurences >
B.Occurences GROUP BY A.Store HAVING Sum(B.Occurences) >=
(SELECT Sum(Occurences)*GetNumParam("RPT0800_BOTTOM_X_PERCENT")
FROM [RPT0800_qry0300_StoreOccurrences4-7])

The result of GetNumParam("RPT0800_BOTTOM_X_PERCENT") is 0.1 (it's
just a parameter stored in a table, so that it's easily changeable
without having to go searching through queries).
"RPT0800_qry0300_StoreOccurrences4-7" is the name of my query that
gets the data into the format that I need. It's just a naming
convention I've adopted in an effort to keep track of which queries
are used for what. No, there are not 300 queries involved (qry0300) -
there are 3.

Am I doing something wrong?- Hide quoted text -

- Show quoted text -

Ok, so I tried a slightly different method right after I posted my
previous reply (using a cartesian product), and I got most of the way
towards what I want. The first few records of my results show this:

STORE OCCURENCES SumOfOCCURENCES
452 121 121
449 87 208
417 79 287
362 76 363
205 73 436
466 57 493
83 46 585
304 46 585
368 45 630
132 44 674

This works, for the most part, but it does have trouble when there is
more than one store with the same number of occurrences (eg stores 83
and 304 above). Both of those results get added to the running total
in store 83's record. In this example, it will make no difference to
the results I want though.

My next challenge is to limit the results to the bottom 10%. I could
put some criteria in the SumOfOCCURENCES field to say <403.8 (which I
would not hard-code in there... that's just for this example).
However, what I actually want to include in this case are all records
where SumOfOCCURENCES is <403.8, PLUS the first record where the
running total passes 403.8.

This is my current SQL:

SELECT [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES,
Sum([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES) AS
SumOfOCCURENCES
FROM [RPT0800_qry0300_StoreOccurrences4-7],
[RPT0800_qry0300_StoreOccurrences4-7] AS
[RPT0800_qry0300_StoreOccurrences4-7_1]
WHERE
((([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES)>=[RPT0800_qry0300_StoreOccurrences4-7].
[OCCURENCES]))
GROUP BY [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES
ORDER BY [RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES DESC;
 
J

jrtwynam

Sorry I don't have a solution to that problem  (the record where the sum goes
over the amount) at present.

If I think of something, I'll try to find this thread and post my idea.

Good luck.

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County



First query - is simple
SELECT Sum(Occurences)*.25
FROM YourTable
Next query mean you have to establish a Running Sum for the stores.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
Combining those into one query.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
HAVING Sum(B.Occurences) >=
   (SELECT Sum(Occurences)*.25
    FROM YourTable)
Your final criteria of by region I will leave you to solve by noting that you
will need to include the region in the first query and second query.
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
John wrote:
Hello.
I'm trying to figure out a query that will show me the top x% by group.
I have a query that counts the number of occurrences of a given event,
grouped by a store number. So, my results might look something like this:
Store    Occurrences
1    6
2    105
3    3
4    35
5    5
6    258
7    167
8    43
9    42
10    810
11    3
12    28
13    109
14    204
15    5
16    5
17    3
18    3
19    6
20    3
First, I would like to have a query that returns, for example, the top
25%. I tried doing this by sorting the Occurrences field descending,and
then specifying the "top" property in the query as 25%. However, if I do
this, it will return the top 5 records (and any subsequent records that
are tied with the 5th record in the Occurrences field). So basically,
the results would be this:
Store    Occurrences
10    810
6    258
14    204
7    167
13    109
However, I would like a different measure of the top 25%. If you addup
all the occurrences in the first data set above, you get a total of
1843. 25% of this number is 460.75, so what I want to show is the top x
number of stores whose occurrences add up to 460.75. Or, put another
way, if we start at the top of the results and work our way down,
keeping a running sum of the occurrences, how many stores do we go
through before we reach 460.75. In fact, this is the exact logic that
I've used in my current implementation, but it's all done with VBA code
iterating through a recordset and outputting to Excel - I would likeit
all done in a query .In this example, I want the results to be:
Store    Occurrences
10    810
The next piece of this is that I also need to be able to do this by
group. Each store is assigned to a region - maybe something like
"British Columbia" or "Greater Toronto Area". I would like the queryto
show the top x% for each region.
Any thoughts?- Hide quoted text -
- Show quoted text -
I tried the queries you suggested, but I'm having some trouble getting
it to show what I want. Using an actual data set and doing the
calculations with a combination of Access and Excel, I get this:
STORE   OCCURENCES
452     121
449     87
417     79
362     76
205     73
466     57
83      46
304     46
368     45
132     44
322     43
453     43
326     42
248     41
329     41
428     41
Please note that this is only a subset of my actual data (the first 16
out of 374 records). The total number of occurrences is 4038. Having
looked again at my requirements, I see that I'm actually looking for
the top 10%, not 25%, so 10% of 4038 is 403.8. Therefore, what I want
the query to return is this:
STORE   OCCURENCES
452     121
449     87
417     79
362     76
205     73
Because store 205 is the point at which we pass 403.8 in a running
total of the occurrences. However, what the query is actually
returning is this:
Store   RunningTotal
5       1315
7       438
11      1054
13      1770
16      614
20      2224
22      614
32      438
34      3037
35      614
36      438
37      1054
39      438
41      1950
43      558
44      438
Again, this is only a subset - the first 16 out of 180 records. This
is my SQL:
SELECT A.Store, Sum(B.Occurences) as RunningTotal FROM
[RPT0800_qry0300_StoreOccurrences4-7] as A LEFT JOIN
[RPT0800_qry0300_StoreOccurrences4-7] as B ON A.Occurences >
B.Occurences  GROUP BY A.Store HAVING Sum(B.Occurences) >=
   (SELECT Sum(Occurences)*GetNumParam("RPT0800_BOTTOM_X_PERCENT")
    FROM [RPT0800_qry0300_StoreOccurrences4-7])
The result of GetNumParam("RPT0800_BOTTOM_X_PERCENT") is 0.1 (it's
just a parameter stored in a table, so that it's easily changeable
without having to go searching through queries).
"RPT0800_qry0300_StoreOccurrences4-7" is the name of my query that
gets the data into the format that I need. It's just a naming
convention I've adopted in an effort to keep track of which queries
are used for what. No, there are not 300 queries involved (qry0300) -
there are 3.
Am I doing something wrong?- Hide quoted text -
- Show quoted text -
Ok, so I tried a slightly different method right after I posted my
previous reply (using a cartesian product), and I got most of the way
towards what I want. The first few records of my results show this:
STORE      OCCURENCES      SumOfOCCURENCES
452        121     121
449        87      208
417        79      287
362        76      363
205        73      436
466        57      493
83 46      585
304        46      585
368        45      630
132        44      674
This works, for the most part, but it does have trouble when there is
more than one store with the same number of occurrences (eg stores 83
and 304 above). Both of those results get added to the running total
in store 83's record. In this example, it will make no difference to
the results I want though.
My next challenge is to limit the results to the bottom 10%. I could
put some criteria in the SumOfOCCURENCES field to say <403.8 (which I
would not hard-code in there... that's just for this example).
However, what I actually want to include in this case are all records
where SumOfOCCURENCES is <403.8, PLUS the first record where the
running total passes 403.8.
This is my current SQL:
SELECT [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES,
Sum([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES) AS
SumOfOCCURENCES
FROM [RPT0800_qry0300_StoreOccurrences4-7],
[RPT0800_qry0300_StoreOccurrences4-7] AS
[RPT0800_qry0300_StoreOccurrences4-7_1]
WHERE
((([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES)>=[RPT0800_qry0300_St­oreOccurrences4-7].
[OCCURENCES]))
GROUP BY [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES
ORDER BY [RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES DESC;- Hide quoted text -

- Show quoted text -

I had thought of doing a union query with the query above and another
query that takes the top value from
RPT0800_qry0300_StoreOccurrences4-7 that DOESN'T show in the above
query. I hesitate to do unions though, simply because they take
longer. I think I could use union all in this case, since I'm not
concerned about tuplicates.
 
J

jrtwynam

Sorry I don't have a solution to that problem  (the record where the sum goes
over the amount) at present.
If I think of something, I'll try to find this thread and post my idea.
Good luck.
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
On Oct 30, 8:53 am, (e-mail address removed) wrote:
First query - is simple
SELECT Sum(Occurences)*.25
FROM YourTable
Next query mean you have to establish a Running Sum for the stores.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
Combining those into one query.
SELECT Store, Sum(B.Occurences) as RunningTotal
FROM YourTable as A LEFT JOIN YourTable as B
ON A.Occurences > B.Occurences
GROUP BY Store
HAVING Sum(B.Occurences) >=
   (SELECT Sum(Occurences)*.25
    FROM YourTable)
Your final criteria of by region I will leave you to solve by noting that you
will need to include the region in the first query and second query..
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
John wrote:
Hello.
I'm trying to figure out a query that will show me the top x% by group.
I have a query that counts the number of occurrences of a given event,
grouped by a store number. So, my results might look something like this:
Store    Occurrences
1    6
2    105
3    3
4    35
5    5
6    258
7    167
8    43
9    42
10    810
11    3
12    28
13    109
14    204
15    5
16    5
17    3
18    3
19    6
20    3
First, I would like to have a query that returns, for example, thetop
25%. I tried doing this by sorting the Occurrences field descending, and
then specifying the "top" property in the query as 25%. However, if I do
this, it will return the top 5 records (and any subsequent recordsthat
are tied with the 5th record in the Occurrences field). So basically,
the results would be this:
Store    Occurrences
10    810
6    258
14    204
7    167
13    109
However, I would like a different measure of the top 25%. If you add up
all the occurrences in the first data set above, you get a total of
1843. 25% of this number is 460.75, so what I want to show is the top x
number of stores whose occurrences add up to 460.75. Or, put another
way, if we start at the top of the results and work our way down,
keeping a running sum of the occurrences, how many stores do we go
through before we reach 460.75. In fact, this is the exact logic that
I've used in my current implementation, but it's all done with VBAcode
iterating through a recordset and outputting to Excel - I would like it
all done in a query .In this example, I want the results to be:
Store    Occurrences
10    810
The next piece of this is that I also need to be able to do this by
group. Each store is assigned to a region - maybe something like
"British Columbia" or "Greater Toronto Area". I would like the query to
show the top x% for each region.
Any thoughts?- Hide quoted text -
- Show quoted text -
I tried the queries you suggested, but I'm having some trouble getting
it to show what I want. Using an actual data set and doing the
calculations with a combination of Access and Excel, I get this:
STORE   OCCURENCES
452     121
449     87
417     79
362     76
205     73
466     57
83      46
304     46
368     45
132     44
322     43
453     43
326     42
248     41
329     41
428     41
Please note that this is only a subset of my actual data (the first 16
out of 374 records). The total number of occurrences is 4038. Having
looked again at my requirements, I see that I'm actually looking for
the top 10%, not 25%, so 10% of 4038 is 403.8. Therefore, what I want
the query to return is this:
STORE   OCCURENCES
452     121
449     87
417     79
362     76
205     73
Because store 205 is the point at which we pass 403.8 in a running
total of the occurrences. However, what the query is actually
returning is this:
Store   RunningTotal
5       1315
7       438
11      1054
13      1770
16      614
20      2224
22      614
32      438
34      3037
35      614
36      438
37      1054
39      438
41      1950
43      558
44      438
Again, this is only a subset - the first 16 out of 180 records. This
is my SQL:
SELECT A.Store, Sum(B.Occurences) as RunningTotal FROM
[RPT0800_qry0300_StoreOccurrences4-7] as A LEFT JOIN
[RPT0800_qry0300_StoreOccurrences4-7] as B ON A.Occurences >
B.Occurences  GROUP BY A.Store HAVING Sum(B.Occurences) >=
   (SELECT Sum(Occurences)*GetNumParam("RPT0800_BOTTOM_X_PERCENT")
    FROM [RPT0800_qry0300_StoreOccurrences4-7])
The result of GetNumParam("RPT0800_BOTTOM_X_PERCENT") is 0.1 (it's
just a parameter stored in a table, so that it's easily changeable
without having to go searching through queries).
"RPT0800_qry0300_StoreOccurrences4-7" is the name of my query that
gets the data into the format that I need. It's just a naming
convention I've adopted in an effort to keep track of which queries
are used for what. No, there are not 300 queries involved (qry0300) -
there are 3.
Am I doing something wrong?- Hide quoted text -
- Show quoted text -
Ok, so I tried a slightly different method right after I posted my
previous reply (using a cartesian product), and I got most of the way
towards what I want. The first few records of my results show this:
STORE      OCCURENCES      SumOfOCCURENCES
452        121     121
449        87      208
417        79      287
362        76      363
205        73      436
466        57      493
83 46      585
304        46      585
368        45      630
132        44      674
This works, for the most part, but it does have trouble when there is
more than one store with the same number of occurrences (eg stores 83
and 304 above). Both of those results get added to the running total
in store 83's record. In this example, it will make no difference to
the results I want though.
My next challenge is to limit the results to the bottom 10%. I could
put some criteria in the SumOfOCCURENCES field to say <403.8 (which I
would not hard-code in there... that's just for this example).
However, what I actually want to include in this case are all records
where SumOfOCCURENCES is <403.8, PLUS the first record where the
running total passes 403.8.
This is my current SQL:
SELECT [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES,
Sum([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES) AS
SumOfOCCURENCES
FROM [RPT0800_qry0300_StoreOccurrences4-7],
[RPT0800_qry0300_StoreOccurrences4-7] AS
[RPT0800_qry0300_StoreOccurrences4-7_1]
WHERE
((([RPT0800_qry0300_StoreOccurrences4-7_1].OCCURENCES)>=[RPT0800_qry0300_St­­oreOccurrences4-7].
[OCCURENCES]))
GROUP BY [RPT0800_qry0300_StoreOccurrences4-7].STORE,
[RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES
ORDER BY [RPT0800_qry0300_StoreOccurrences4-7].OCCURENCES DESC;- Hidequoted text -
- Show quoted text -

I had thought of doing a union query with the query above and another
query that takes the top value from
RPT0800_qry0300_StoreOccurrences4-7 that DOESN'T show in the above
query. I hesitate to do unions though, simply because they take
longer. I think I could use union all in this case, since I'm not
concerned about tuplicates.- Hide quoted text -

- Show quoted text -

I think I solved it. Instead of showing all records where the running
sum is less than or equal to 10% of the total, what I did is a couple
of sub queries. One of them gets the minimum value of the running sum
that is greater than or equal to 10% of the total. The next uses that
value to select all records where the running sum is less than or
equal to that value. This actually works well, because say the store
that first passes the 10% mark has 10 occurrences, but there are 3
other stores that also have 10 occurrences. This method will return
all of these stores, rather than just the first. Here's the SQL, if
you're curious:

SELECT [RPT0800_qry0400_4-7RunningSum].STORE,
[RPT0800_qry0400_4-7RunningSum].OCCURENCES,
[RPT0800_qry0400_4-7RunningSum].RUNNING_TOTAL
FROM [RPT0800_qry0400_4-7RunningSum]
WHERE ((([RPT0800_qry0400_4-7RunningSum].RUNNING_TOTAL)<=(SELECT
MIN(RUNNING_TOTAL) FROM [RPT0800_qry0400_4-7RunningSum] WHERE
RUNNING_TOTAL >= (SELECT
SUM(OCCURENCES)*GetNumParam("RPT0800_BOTTOM_X_PERCENT") FROM
[RPT0800_qry0300_StoreOccurrences4-7]))));
 
J

jrtwynam

Now I have a slightly different problem.

I have a bunch of stores, each of which is assigned to a region (this
data set is the same as the previous one). However, this time I'm
dealing with a different value (Unload Rate, rather than Occurrences).
In this version, I think somehow I can specify the query's "Top"
property, but I don't really know how to do that by group. (Note that
in this query, I don't need to use the weird running-sum method
described in previous posts.)

I have three fields - Store, Region, and Unload Rate, and it's sorted
by Unload Rate ascending. I can specify to return the top 10%, but it
will return the top 10% of all records. I want the top 10% for each
region. I can do this with an iterate query, writing the results to a
recordset, but again I'd prefer not to write to temp tables if
possible. I would also prefer not to do something such as create a
separate query for each region and then combine them with a union,
because it seems like that would be rather slow to run.

Any ideas?
 
J

John Spencer

You can use a correlated sub query

SELECT Region, Store, UnloadRate
FROM YourTable
WHERE Store in
(SELECT TOP 10% Store
FROM YourTable as Temp
WHERE Temp.Region = YourTable.Region
ORDER BY UnloadRate DESC)

If that is too slow, then post back for a more complex solution.

'====================================================
John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
'====================================================
 
J

jrtwynam

Thanks for the reply. That method worked, but it is far too slow to be
operationally feasible. Right now, I'm dealing with data from Jan 1
2008 to today, so that's about 10 months worth of data, totalling
approximately 500K records in the DB. I have a form that allows a user
to select a time period that they would like to look at - at first, I
tried running it for all records in the DB. I thought that might be
what's causing it to take so long (and certainly, that would be one of
the reasons). I then tried limiting it to Oct 1 through Oct 31, since
in general we'd be looking at a month's worth of data at a time. The
query has now been running for about 8 minutes, and it's not done yet.

I have it set up with a series of queries. The first one basically
just eliminates all records that I don't want to use in the
calculations, so that the subsequent queries are looking at far fewer
records. This first query eliminates all records NOT in the chosen
date range, as well as some stuff relating to other criteria, and runs
within about 3 or 4 seconds for a month's worth of data. The second
query, based on the first, is the one that calculates the UnloadRate
mentioned below (though the field is actually called AVG_CUBE_HR).
This one runs in about 8 seconds, since it has to run the first query
first. This second query is the one that I'm using for the top 10%
thing in this topic. Here's my current SQL:

SELECT DEALER_REGION, FACILITY_NUM, AVG_CUBE_HR
FROM RPT1100_qry0200
WHERE FACILITY_NUM in
(SELECT TOP 10 PERCENT FACILITY_NUM
FROM RPT1100_qry0200 as Temp
WHERE Temp.DEALER_REGION = RPT1100_qry0200.DEALER_REGION
ORDER BY AVG_CUBE_HR)

This should return probably around 100 records in total
(RPT1100_qry0200 itself returns 463 records). The problem, as far as I
can tell, is that the subquery takes about 8 seconds to run. So, for
each of the 463 records, it has to run the subquery to see if it
should return that record. 8 seconds per record, times 463 records,
means this query will take around an hour to run. At least, that's
what I'm assuming, though I'm not sure if it works quite like that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top