Select Distinct Help

M

masterkeys

Hi,

I'm trying to get a query to run, for use in a mail merge, where it
selects only 1 record, when more than 1 record has same address. We
need the pupil's data for that record, but we don't want to use the
value for the pupil as a DISTINCT criteria. If I try using first() or
min() it complains about Each of the records about the parent not being
part of an aggregate function. It would help if these were sorted by
pupil's form.

SELECT DISTINCT Students.[Parent Surname], Students.[Pupil Home Add1],
Students.[Pupil Home Add2], Students.[Pupil Home Add3], Students.[Pupil
Home Add4], Students.[Pupil Home PCode], Students.Form, Students.[Pupil
Surname], Students.[Pupil First Name]
FROM Students
WHERE (((Students.Form)<>"Leaver") AND ((Students.[Sibling]) Is Null))
ORDER BY Students.Form;

thanks in advance for any help
 
T

Tom Ellison

Dear frustration:

Let's start with this:

SELECT DISTINCT [Pupil Home Add1], [Pupil Home Add2],
[Pupil Home Add3], [Pupil Home Add4], [Pupil Home PCode]
FROM Students
WHERE Form <> "Leaver" AND Sibling IS NULL

Now, what's wrong with the results this gives? Well, it doesn't give you
the "pupil's data" for that record. But the point is, there could be more
than one record, even more than one pupil. Now, how could the query give
you "the pupil's data for that record" when there isn't "a record" but
potentially many records.

Do you want to see the pupil's data for just any one of the several pupils
that may be at that address? You don't care which of the several pupils at
that address may be shown? You could try an aggregate query. Start with:

SELECT [Pupil Home Add1], [Pupil Home Add2],
[Pupil Home Add3], [Pupil Home Add4], [Pupil Home PCode]
FROM Students
WHERE Form <> "Leaver" AND Sibling IS NULL
GROUP BY [Pupil Home Add1], [Pupil Home Add2],
[Pupil Home Add3], [Pupil Home Add4], [Pupil Home PCode]

Now, look at this closely. It should be the same as the DISTINCT. If the
addresses have been added by almost any human I've known, the same address
will not always be entered identically. Perhaps some have City, State while
others have City State (omitting the comma) or there might be 2 spaces. Any
little difference like this blows your theory of what you're doing right out
of the chute.

To this, you could add the name of one of the pupils at that address:

SELECT FIRST([Pupil Surname]) Surname,
FIRST([Pupil First Name]) [Last Name],
[Pupil Home Add1], [Pupil Home Add2],
[Pupil Home Add3], [Pupil Home Add4], [Pupil Home PCode]
FROM Students
WHERE Form <> "Leaver" AND Sibling IS NULL
GROUP BY [Pupil Home Add1], [Pupil Home Add2],
[Pupil Home Add3], [Pupil Home Add4], [Pupil Home PCode]

I prefer FIRST here because I believe it will give you a "consistent pupil
name." If you use MIN() and there are two pupils at that address:

John Adams
Becky Smith

(perhaps a mixed family) then with MIN you'd get Becky Adams, a non-existent
person. That doesn't sound too cool! So first is better. It at least
should pick a consistent row for both name parts.

You can expand this as needed. Does this help at all?

Tom Ellison


masterkeys said:
Hi,

I'm trying to get a query to run, for use in a mail merge, where it
selects only 1 record, when more than 1 record has same address. We
need the pupil's data for that record, but we don't want to use the
value for the pupil as a DISTINCT criteria. If I try using first() or
min() it complains about Each of the records about the parent not being
part of an aggregate function. It would help if these were sorted by
pupil's form.

SELECT DISTINCT Students.[Parent Surname], Students.[Pupil Home Add1],
Students.[Pupil Home Add2], Students.[Pupil Home Add3], Students.[Pupil
Home Add4], Students.[Pupil Home PCode], Students.Form, Students.[Pupil
Surname], Students.[Pupil First Name]
FROM Students
WHERE (((Students.Form)<>"Leaver") AND ((Students.[Sibling]) Is Null))
ORDER BY Students.Form;

thanks in advance for any help
 
V

Vincent Johns

masterkeys said:
Hi,

I'm trying to get a query to run, for use in a mail merge, where it
selects only 1 record, when more than 1 record has same address. We
need the pupil's data for that record, but we don't want to use the
value for the pupil as a DISTINCT criteria. If I try using first() or
min() it complains about Each of the records about the parent not being
part of an aggregate function. It would help if these were sorted by
pupil's form.

SELECT DISTINCT Students.[Parent Surname], Students.[Pupil Home Add1],
Students.[Pupil Home Add2], Students.[Pupil Home Add3], Students.[Pupil
Home Add4], Students.[Pupil Home PCode], Students.Form, Students.[Pupil
Surname], Students.[Pupil First Name]
FROM Students
WHERE (((Students.Form)<>"Leaver") AND ((Students.[Sibling]) Is Null))
ORDER BY Students.Form;

thanks in advance for any help

It's not totally clear to me what's going on here, so what follows is my
best guess at what you need. It may be wrong. (Caveat lector.)

Omitting what I think are immaterial (for now) fields, I start with a
Table of sample data. It's possible that [Students].[Sibling] is a
pointer to another record in [Students] (i.e., its value matches some
other [Students_ID] value), but I didn't know, so I omitted it.
Instead, I matched records based on address and parent's surname.

But, especially if you have several households with multiple pupils, I
would recommend putting the address information, parent's surname, &c.,
into a separate Table and linking the pupils ("many" side of JOIN) to
the addresses ("one" side of JOIN). You'd still need to decide, if
you're sending a letter to the household, how to include information
about the pupil(s) living there -- do you include a list of all of their
names, or do you omit them entirely and address the letter to "Mr/Ms
Smith" or to "Occupant"? You said you wanted the list sorted by [Form],
but my guess is that that's a characteristic of a pupil, not of a
household, so some [Form] values will be omitted where more than one
pupil lives at an address. You'll have to decide what to do with
situations like that, and design your Query accordingly.

[Students] Table Datasheet View:

Students_ID Parent Pupil Home Form Pupil Pupil
Surname Add1 Surname First
Name
----------- ------- ----------- ------ ------- -----
-1379959456 Miller 77 Sunset 3 Miller Mary
-1310353310 Smith 123 Main Leaver Beam Jim
223168993 Jones 333 Central 2 Jones Sue
985912635 Miller 77 Sunset 4 Miller Bob

The following Query omits the record in which [Form] = "Leaver" (BTW, I
don't know what that means), and randomly chooses one pupil record for
each equivalence class of [Parent Surname] and address. I included only
one address field here, but you'd need to include all four of them.

[Q_Students] SQL:

SELECT S1.Students_ID, S1.[Parent Surname],
S1.[Pupil Home Add1], S1.Form,
Min(S2.Students_ID) AS MinOfStudents_ID
FROM Students AS S1 INNER JOIN Students AS S2
ON (S1.[Parent Surname] = S2.[Parent Surname])
AND (S1.[Pupil Home Add1] = S2.[Pupil Home Add1])
GROUP BY S1.Students_ID, S1.[Parent Surname],
S1.[Pupil Home Add1], S1.Form, S1.[Pupil Surname],
S1.[Pupil First Name]
HAVING (((S1.Form)<>"Leaver" Or (S1.Form) Is Null)
AND ((Min(S2.Students_ID))=[S1]![Students_ID]))
ORDER BY S1.Form;

The results look like the following list, which does include [Form], but
as I said, the displayed [Form] value is chosen randomly from among the
pupils living at the address, so it probably shouldn't even be listed.
Something you might do is to list the minimum (or maximum) value of this
field for each address, which might be a bit more informative.

[Q_Students] Query Datasheet View:

Students_ID Parent Surname Pupil Home Add1 Form
----------- -------------- --------------- ----
223168993 Jones 333 Central 2
-1379959456 Miller 77 Sunset 3


-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 
Top