Median and Quartiles

S

ScottMsp

Thanks for reading this. I have read most/all of the posts out there re:
median, quartiles and other statistics. I have attempted a variety of the
recommendations however I have not been able to get them to work and so I am
posting my question on the board. I am using Access 2007. I have
little/some knowledge of VBA and SQL.

I have a table tEmployeeMaster and three of the fields are: JobCode,
PersonID, and HourlyRate

I need to be able to calculate the 1st, 2nd (Median), and 3rd quartiles of
HourlyRate based on JobCode.

If I have the following table data:

JobCode PersonID HourlyRate
01 532 $10
01 4685 $11
01 1256 $12
01 9856 $13
02 7813 $5
02 6523 $6
02 9568 $7
02 1245 $8

I need a query to produce the 1st, 2nd (Median), and 3rd quartile for each
job code

JobCode Q1 Q2 (Median) Q3
01 10.75 11.5 12.25
02 5.75 6.5 7.25

Thanks in advance for your help.

-ScottMSP
 
V

vanderghast

It seems your definition of quartile differs from mine. Why, as example, the
first quartile of {10, 11, 12, 13} is 10.75? (and not 10 ? or 10.5 ?) Is
it not the value (in the sequence) where 1/4 of the values in the sequence
are less or equal to it? (in this case, that would be 10 ).


Vanderghast, Access MVP
 
S

ScottMSP via AccessMonster.com

Vanderghast,

There will be debate on how to calculate the quartiles (in this case the
first quartile), but this is the way I am calculating it for my purposes.

Mathematically (using Method 1 of Freund, J and Perles, B (1987) "A New Look
at Quartiles of Ungrouped Data", The American Statistician), these are the
steps:

1. You order 10, 11, 12, and 13 from low to high.
2. Then you count the number of records (in this case there are 4 records)
add 3 to the count (now total 7)
3. Then multiple the percentile/quartile you want (in this case 25th
percentile or 0.25) by the total records (7 x 0.25 = 1.75)
4. 1.75 identifies which record it is (in this case the value is somewhere
between the first and second record (between 10 and 11)) which requires
interpolation

5. To interpolate, you find the difference between the two records (11 - 10 =
1)
6. then take the decimal reminder of the record (.75 of 1.75) and multiply it
by the difference of the record values (1 x .75 =.75)
7. Then add the .75 to the lower value of the two records (10 + .75 = 10.75)

Does that help?


It seems your definition of quartile differs from mine. Why, as example, the
first quartile of {10, 11, 12, 13} is 10.75? (and not 10 ? or 10.5 ?) Is
it not the value (in the sequence) where 1/4 of the values in the sequence
are less or equal to it? (in this case, that would be 10 ).

Vanderghast, Access MVP
Thanks for reading this. I have read most/all of the posts out there re:
median, quartiles and other statistics. I have attempted a variety of the
[quoted text clipped - 31 lines]
-ScottMSP
 
V

vanderghast

I won't have time to package the solution inside a VBA procedure (with error
trapping, and so on), so I will simply supply the steps.

1- Create an empty temporary table, I called it RankedJobCodeRates, three
fields: rank (autonumber, primary key), jobCode and HourlyRate.

2- Execute the following query which append all the records from your
initial data (from table or query called JobCodeRates) into the freshly
created table:


INSERT INTO RankedJobCodeRates ( jobCode, hourlyRate )
SELECT JobCodeRates.JobCode, JobCodeRates.HourlyRate
FROM JobCodeRates
ORDER BY JobCodeRates.JobCode, JobCodeRates.HourlyRate;



3- Have the following two queries:


INSERT INTO RankedJobCodeRates ( jobCode, hourlyRate )
SELECT JobCodeRates.JobCode, JobCodeRates.HourlyRate
FROM JobCodeRates
ORDER BY JobCodeRates.JobCode, JobCodeRates.HourlyRate;


saved as ForInterpollation,

and

SELECT ForInterpollation.jobCode,
RankedJobCodeRates.hourlyRate*frac0+RankedJobCodeRates_1.hourlyRate*frac1 As
Quartile
FROM (ForInterpollation INNER JOIN RankedJobCodeRates ON
(ForInterpollation.jobCode = RankedJobCodeRates.jobCode) AND
(ForInterpollation.N0 = RankedJobCodeRates.rank)) INNER JOIN
RankedJobCodeRates AS RankedJobCodeRates_1 ON (ForInterpollation.jobCode =
RankedJobCodeRates_1.jobCode) AND (ForInterpollation.N1 =
RankedJobCodeRates_1.rank);


saved as QuartileX


4- Run QuartileX, supplying 0.25 for first quartile, 0.5 for median,
0.75 for third quartile.




With the initial data:

JobCode HourlyRate
1 10
1 11
1 12
1 13
2 1
2 2
2 3
2 4
2 5
2 6
3 20
3 30
3 40



I got, for the first quartile (QuartilePC =0.25)

jobCode Quartile
1 10.75
2 2.75
3 22.5




That should be fast, even with relatively large amount of data. It is the
step 2 which should take the more time to be executed.





Hoping it makes sense,
Vanderghast, Access MVP





ScottMSP via AccessMonster.com said:
Vanderghast,

There will be debate on how to calculate the quartiles (in this case the
first quartile), but this is the way I am calculating it for my purposes.

Mathematically (using Method 1 of Freund, J and Perles, B (1987) "A New
Look
at Quartiles of Ungrouped Data", The American Statistician), these are the
steps:

1. You order 10, 11, 12, and 13 from low to high.
2. Then you count the number of records (in this case there are 4 records)
add 3 to the count (now total 7)
3. Then multiple the percentile/quartile you want (in this case 25th
percentile or 0.25) by the total records (7 x 0.25 = 1.75)
4. 1.75 identifies which record it is (in this case the value is somewhere
between the first and second record (between 10 and 11)) which requires
interpolation

5. To interpolate, you find the difference between the two records (11 -
10 =
1)
6. then take the decimal reminder of the record (.75 of 1.75) and multiply
it
by the difference of the record values (1 x .75 =.75)
7. Then add the .75 to the lower value of the two records (10 + .75 =
10.75)

Does that help?


It seems your definition of quartile differs from mine. Why, as example,
the
first quartile of {10, 11, 12, 13} is 10.75? (and not 10 ? or 10.5 ?)
Is
it not the value (in the sequence) where 1/4 of the values in the sequence
are less or equal to it? (in this case, that would be 10 ).

Vanderghast, Access MVP
Thanks for reading this. I have read most/all of the posts out there
re:
median, quartiles and other statistics. I have attempted a variety of
the
[quoted text clipped - 31 lines]
-ScottMSP
 
S

ScottMSP via AccessMonster.com

VanderGhast,

I followed your instructions to the letter and and when I executed the
QuartileX query, it gave me an error message "The action query
'ForInterpollation' cannot be used as a row source".

When I created an empty table, I literally created a new table and named it
RankedJobCodeRates and followed your instructions from there.

Obviously it worked for you. Not sure what I am doing wrong. Suggestions?

Thanks again.
I won't have time to package the solution inside a VBA procedure (with error
trapping, and so on), so I will simply supply the steps.

1- Create an empty temporary table, I called it RankedJobCodeRates, three
fields: rank (autonumber, primary key), jobCode and HourlyRate.

2- Execute the following query which append all the records from your
initial data (from table or query called JobCodeRates) into the freshly
created table:

INSERT INTO RankedJobCodeRates ( jobCode, hourlyRate )
SELECT JobCodeRates.JobCode, JobCodeRates.HourlyRate
FROM JobCodeRates
ORDER BY JobCodeRates.JobCode, JobCodeRates.HourlyRate;

3- Have the following two queries:

INSERT INTO RankedJobCodeRates ( jobCode, hourlyRate )
SELECT JobCodeRates.JobCode, JobCodeRates.HourlyRate
FROM JobCodeRates
ORDER BY JobCodeRates.JobCode, JobCodeRates.HourlyRate;

saved as ForInterpollation,

and

SELECT ForInterpollation.jobCode,
RankedJobCodeRates.hourlyRate*frac0+RankedJobCodeRates_1.hourlyRate*frac1 As
Quartile
FROM (ForInterpollation INNER JOIN RankedJobCodeRates ON
(ForInterpollation.jobCode = RankedJobCodeRates.jobCode) AND
(ForInterpollation.N0 = RankedJobCodeRates.rank)) INNER JOIN
RankedJobCodeRates AS RankedJobCodeRates_1 ON (ForInterpollation.jobCode =
RankedJobCodeRates_1.jobCode) AND (ForInterpollation.N1 =
RankedJobCodeRates_1.rank);

saved as QuartileX

4- Run QuartileX, supplying 0.25 for first quartile, 0.5 for median,
0.75 for third quartile.

With the initial data:

JobCode HourlyRate
1 10
1 11
1 12
1 13
2 1
2 2
2 3
2 4
2 5
2 6
3 20
3 30
3 40

I got, for the first quartile (QuartilePC =0.25)

jobCode Quartile
1 10.75
2 2.75
3 22.5

That should be fast, even with relatively large amount of data. It is the
step 2 which should take the more time to be executed.

Hoping it makes sense,
Vanderghast, Access MVP
Vanderghast,
[quoted text clipped - 42 lines]
 
V

vanderghast

I have wrongly pasted twice the same expression (step 2 and first part of
step 3 are identical ! ). The first part of step 3 is wrong.

Step 3 should have been:


-------------------------------------------
3- Have the following two queries:

SELECT jobCode,
QuartilePC*(2*MAX(rank)-2*MIN(rank)+1)+MIN(rank)-1 AS interpol,
int(interpol) AS N0,
1-(interpol-N0) AS frac0,
N0+1 AS N1,
1-frac0 AS frac1
FROM RankedJobCodeRates
GROUP BY jobCode;

saved as ForInterpollation,

and

SELECT ForInterpollation.jobCode,
RankedJobCodeRates.hourlyRate*frac0+RankedJobCodeRates_1.hourlyRate*frac1 As
Quartile
FROM (ForInterpollation INNER JOIN RankedJobCodeRates ON
(ForInterpollation.jobCode = RankedJobCodeRates.jobCode) AND
(ForInterpollation.N0 = RankedJobCodeRates.rank)) INNER JOIN
RankedJobCodeRates AS RankedJobCodeRates_1 ON (ForInterpollation.jobCode =
RankedJobCodeRates_1.jobCode) AND (ForInterpollation.N1 =
RankedJobCodeRates_1.rank);


saved as QuartileX
------------------------------------------

Sorry for the original wrong cut and paste.



Vanderghast, Access MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top