M
Michel Walsh
To illustrate what GROUP BY is and is NOT, so things can be clear in mind.
SELECT f1, f2, f3, MAX(f4), MIN(f5), SUM(f6)
FROM somewhere
GROUP BY f1, f2, f3
can be 'described' as: do all different (distinct) groups of possible
triples {f1, f2, f3} and once these groups are made, for each of them,
compute the aggregate.
If you imagine that it is what is effectively done, ie. first, define 'm'
buckets, one per triple {f1, f2, f3}, fill the each bucket with the records
that belong to them, THEN, once it is done, look again on each bucket and
compute the aggregate,... well, that is a possible 'plan', but a very
not-efficient one since you 'touch' each record twice.
We can easily solve the query by touching each record only ONCE. Indeed,
assume there is already a "group" created for f1='a', f2='b', f3='c', then
once a new record has the same values for that triple, do we have to store
it? Well, surely we don't need to store f1, f2, and f3 again, since they are
the same values than those of the 'group' identifier. What about f4 of the
new record? again, the existing group can already has the MAXupToNowOf(f4),
MINupToNowOf(f5) and SUMupToNowOf(f6). So, why not using f4 of the new
record and 'aggregate it one term at a time' with MaxUpToNowOf(f4), and same
thing with f5, and with f6. Do we need to 'touch' that new record? no! So
not only you don't need to touch each record more than once, but each
'bucket' is at most, one row deep. That is a 'line', or a 'row', if you
wish. Then, what look can take the whole set of 'rows'? you got it, a list,
a SORTED list, no dup on the sorted fields: f1, f2, f3.
So, what DISTINCT is?
Vanderghast, Access MVP
SELECT f1, f2, f3, MAX(f4), MIN(f5), SUM(f6)
FROM somewhere
GROUP BY f1, f2, f3
can be 'described' as: do all different (distinct) groups of possible
triples {f1, f2, f3} and once these groups are made, for each of them,
compute the aggregate.
If you imagine that it is what is effectively done, ie. first, define 'm'
buckets, one per triple {f1, f2, f3}, fill the each bucket with the records
that belong to them, THEN, once it is done, look again on each bucket and
compute the aggregate,... well, that is a possible 'plan', but a very
not-efficient one since you 'touch' each record twice.
We can easily solve the query by touching each record only ONCE. Indeed,
assume there is already a "group" created for f1='a', f2='b', f3='c', then
once a new record has the same values for that triple, do we have to store
it? Well, surely we don't need to store f1, f2, and f3 again, since they are
the same values than those of the 'group' identifier. What about f4 of the
new record? again, the existing group can already has the MAXupToNowOf(f4),
MINupToNowOf(f5) and SUMupToNowOf(f6). So, why not using f4 of the new
record and 'aggregate it one term at a time' with MaxUpToNowOf(f4), and same
thing with f5, and with f6. Do we need to 'touch' that new record? no! So
not only you don't need to touch each record more than once, but each
'bucket' is at most, one row deep. That is a 'line', or a 'row', if you
wish. Then, what look can take the whole set of 'rows'? you got it, a list,
a SORTED list, no dup on the sorted fields: f1, f2, f3.
So, what DISTINCT is?
Vanderghast, Access MVP