Random Function for Selecting Records

E

esn

Hi everyone,

This should be simple. I'm trying to pull one random record from each
group within a table. The groups are "Units" - geographic areas, and
I want to select one random point (conveniently enough, "Point") from
each Unit that satisfies certain criteria. Step 1 - query with a
random number field - Rnd([EventID]), sorted by the random number
field. EventID is an autonumber PK field. Then a query that performs
the grouping (by Unit) and pulls the first value of interest (in this
case Point) from the first record for each group. The problem is the
first query doesn't actually perform the sort correctly. Here are the
first few values in the random number field, which is set to sort in
ascending order:

RandomNumber
0.212475836277008
0.456852912902832
0.35159033536911
0.721272110939026
0.638044655323029

Clearly that's not ascending order - it doesn't appear to be any order
at all. Seems like the logic of using this setup to select a random
record is violated if the records don't actually get sorted
correctly. Every time I run the query I end up with a different
record on top, so it seems to be sorting "randomly" somehow but not by
the random number field. Any idea what's up? Is it recalculating the
random numbers after sorting the records or something?

PS - I know there are additional problems with trusting a last or
first function to do anything meaningful - that's the next hurdle but
at the moment I'd like to get the first step worked out. If anyone
has a good suggestion for returning the top 1 record within a group
(without using the first or last functions) that would help too.
 
E

esn

Also, I just noticed Access redraws the random number every time I
click in one of the records of the query results, and seems to
struggle with recalculating (often tries to display the original and
new values at the same time in the same cell). Is this query just
recalculating and that's why the records never really appear in any
sort of order?
 
P

PieterLinden via AccessMonster.com

esn said:
Also, I just noticed Access redraws the random number every time I
click in one of the records of the query results, and seems to
struggle with recalculating (often tries to display the original and
new values at the same time in the same cell). Is this query just
recalculating and that's why the records never really appear in any
sort of order?

Read this:
http://www.mvps.org/access/queries/qry0011.htm
 
M

Marshall Barton

esn said:
This should be simple. I'm trying to pull one random record from each
group within a table. The groups are "Units" - geographic areas, and
I want to select one random point (conveniently enough, "Point") from
each Unit that satisfies certain criteria. Step 1 - query with a
random number field - Rnd([EventID]), sorted by the random number
field. EventID is an autonumber PK field. Then a query that performs
the grouping (by Unit) and pulls the first value of interest (in this
case Point) from the first record for each group. The problem is the
first query doesn't actually perform the sort correctly. Here are the
first few values in the random number field, which is set to sort in
ascending order:

RandomNumber
0.212475836277008
0.456852912902832
0.35159033536911
0.721272110939026
0.638044655323029

Clearly that's not ascending order - it doesn't appear to be any order
at all. Seems like the logic of using this setup to select a random
record is violated if the records don't actually get sorted
correctly. Every time I run the query I end up with a different
record on top, so it seems to be sorting "randomly" somehow but not by
the random number field. Any idea what's up? Is it recalculating the
random numbers after sorting the records or something?

PS - I know there are additional problems with trusting a last or
first function to do anything meaningful - that's the next hurdle but
at the moment I'd like to get the first step worked out. If anyone
has a good suggestion for returning the top 1 record within a group
(without using the first or last functions) that would help too.


The problem is that you are looking at the numbers ;-)

Displaying a query's datasheet is done on an as needed basis
so the window is filled with what looks like decent data.
But, the rest of the data is not calculated and retrieved
until you scroll down to see more records. Then, when you
scroll back up to see the earlier records, they too are
calculated and retrieved ... again, but with different
random numbers.

If you use code to access the records or display them in a
report, it should be fine regardless if they are calculated
only once or more than once. OTOH, since you want a random
record, why do you care how the record was chosen?
 
E

esn

Thanks for the replies - I care how the record was chosen only in that
I need it to be random (or reasonably close to random). When I
checked to make sure that Access was functioning properly to select a
random record, there seemed to be a glitch, so I thought I would run
it by the experts. I figured this was just a recalculating issue and
that, at some point, the order of the records had been randomized, but
I wanted to be sure before I went too much further. And it's good to
know it's possible to build a function to stop Access from
recalculating the random field - given the crummy performance of
queries based on this one I might end up using that to speed things
up.

Now I have a question about the next step - here's the SQL I'm using
right now:

SELECT [GLSA Caps with Unit].Unit, [GLSA Caps with Unit].Point
FROM [GLSA Caps with Unit]
WHERE ((([GLSA Caps with Unit].Point) In
(SELECT TOP 3 [GLSA Caps with Unit_1].Point
FROM [GLSA Caps with Unit] AS [GLSA Caps with Unit_1]
WHERE ((([GLSA Caps with Unit_1].Unit)=[GLSA Caps with Unit].Unit))
ORDER BY Rnd([RndSeed]))))
ORDER BY [GLSA Caps with Unit].Unit;

And here's the output:

Unit Point
1 OO007
1 RR007
2 II006
2 LL001
2 LL005
2 MM001
2 MM002
3 II009
3 LL011
3 OO008
4 BB002
4 BB005
4 CC003
5 BB013
5 CC008
5 FF011
5 GG010
5 HH009
5 HH011
6 FF013
7 S002
7 U002
7 V003

Note the variable number of records per unit. FYI - Point is a text
field that identifies a geographic location (as I stated above) within
a grid based on a row identifier (a single or double letter) and a
column identifier (3 digits from 000 to 110). The source query (GLSA
Caps with Unit):

SELECT [Grid Point Info].Unit, [Trapping Data Records Table].Point,
Min([Trapping Data Records Table].[Capture/Event ID]) AS RndSeed
FROM [Grid Point Info] INNER JOIN [Trapping Data Records Table] ON
[Grid Point Info].LetterNumb = [Trapping Data Records Table].Point
WHERE ((([Trapping Data Records Table].[Species/Event])="GLSA"))
GROUP BY [Grid Point Info].Unit, [Trapping Data Records Table].Point;

To anticipate the first question - I already checked to make sure that
"GLSA Caps with Unit" returns at least three points per unit, and it
does. I've also tried using the "randomizer" custom function from the
link above, but I still get similar results. If I run the subquery on
it's own using "Unit=1" as criteria I get the right results (3 random
points in unit 1). So why does the query return less than three
points for units 1 and 6, and how can a subquery with a TOP 3 clause
be returning more than 3 points for some of the units?
 
M

Marshall Barton

esn said:
Thanks for the replies - I care how the record was chosen only in that
I need it to be random (or reasonably close to random). When I
checked to make sure that Access was functioning properly to select a
random record, there seemed to be a glitch, so I thought I would run
it by the experts. I figured this was just a recalculating issue and
that, at some point, the order of the records had been randomized, but
I wanted to be sure before I went too much further. And it's good to
know it's possible to build a function to stop Access from
recalculating the random field - given the crummy performance of
queries based on this one I might end up using that to speed things
up.

Now I have a question about the next step - here's the SQL I'm using
right now:

SELECT [GLSA Caps with Unit].Unit, [GLSA Caps with Unit].Point
FROM [GLSA Caps with Unit]
WHERE ((([GLSA Caps with Unit].Point) In
(SELECT TOP 3 [GLSA Caps with Unit_1].Point
FROM [GLSA Caps with Unit] AS [GLSA Caps with Unit_1]
WHERE ((([GLSA Caps with Unit_1].Unit)=[GLSA Caps with Unit].Unit))
ORDER BY Rnd([RndSeed]))))
ORDER BY [GLSA Caps with Unit].Unit;

And here's the output:

Unit Point
1 OO007
1 RR007
2 II006
2 LL001
2 LL005
2 MM001
2 MM002
3 II009
3 LL011
3 OO008
4 BB002
4 BB005
4 CC003
5 BB013
5 CC008
5 FF011
5 GG010
5 HH009
5 HH011
6 FF013
7 S002
7 U002
7 V003

Note the variable number of records per unit. FYI - Point is a text
field that identifies a geographic location (as I stated above) within
a grid based on a row identifier (a single or double letter) and a
column identifier (3 digits from 000 to 110). The source query (GLSA
Caps with Unit):

SELECT [Grid Point Info].Unit, [Trapping Data Records Table].Point,
Min([Trapping Data Records Table].[Capture/Event ID]) AS RndSeed
FROM [Grid Point Info] INNER JOIN [Trapping Data Records Table] ON
[Grid Point Info].LetterNumb = [Trapping Data Records Table].Point
WHERE ((([Trapping Data Records Table].[Species/Event])="GLSA"))
GROUP BY [Grid Point Info].Unit, [Trapping Data Records Table].Point;

To anticipate the first question - I already checked to make sure that
"GLSA Caps with Unit" returns at least three points per unit, and it
does. I've also tried using the "randomizer" custom function from the
link above, but I still get similar results. If I run the subquery on
it's own using "Unit=1" as criteria I get the right results (3 random
points in unit 1). So why does the query return less than three
points for units 1 and 6, and how can a subquery with a TOP 3 clause
be returning more than 3 points for some of the units?


Sorry, but I am having a seriously tough time unraveling
where the randon numbers are being recalculated. This is
especially compounded by the query optimizer doing whatever
it wants to combine your three queries into one with who
knows what effect on the random numbers.

I have not been able to explain the various number of
records, even when including the fact that TOP 3 will return
more than 3 records when there is a tie for the third value
in the sorted list.
 
E

esn

Thanks for checking it out, I appreciate it. I guess I have to go
through the units one by one, which certainly isn't the end of the
world.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top