Calculating linear regression slope

B

Bonnie

I need to make a query that will calculate the slope for
the change in weight over time. My data includes: ID,
date, weight. Can anyone help?
 
M

Michel Walsh

Hi,




slope = ( COUNT(*)*SUM(x*y) -SUM(x)*SUM(y) ) /
(COUNT(*)*SUM(x^2)-SUM(x)^2)

where x = [date] and y=weight; the slope, if positive, is the increase of
weight in function of time (if you use standard date_time value, the time
unit is "per day")


intercept=(SUM(y)-slope*SUM(x))/COUNT(*)


estStdErr=(SUM((y-intercept-slope*x)^2)/(COUNT(*)-2))^0.5


then, roughly, if the distribution of the data is from a bell shape
population, the real slope is, 99%, or more, of the cases, between the two
values: slope-k and slope+k where

k= 3 * estStdErr * SUM((x-AVG(x))^2) ^-0.5

if COUNT(*)>=16


( ref. CRC Standard Mathematical Tables and Formulae, 30th ed., article
7.10.1 and table 7.12.3)


Clearly, that is only an approximation, since a weight cannot be negative,
so, technically, the weight distribution is not a perfect bell shape curve.



Hoping it may help,
Vanderghast, Access MVP
 
B

Bonnie

Thanks for your reply, Mike. I haven't had much experience with formulas. Does the COUNT refer to all the weights, say for a distinct ID? I'm trying to get the slope for many distinct IDs with many weights, each with different date ranges so I can track any significant weight changes. I'll try to apply this to my data. If you have any more helpful comments I'd appreciate it.
 
M

Michel Walsh

Hi,


The COUNT and the SUM apply to the groups as defined in the GROUP BY clause.
A complete query could look like:

SELECT Id,
( COUNT(*)*SUM(x*y) -SUM(x)*SUM(y) ) /
(COUNT(*)*SUM(x^2)-SUM(x)^2) AS Slope

FROM MyTable
GROUP BY Id


Hoping it may help,
Vanderghast, Access MVP




Bonnie said:
Thanks for your reply, Mike. I haven't had much experience with formulas.
Does the COUNT refer to all the weights, say for a distinct ID? I'm trying
to get the slope for many distinct IDs with many weights, each with
different date ranges so I can track any significant weight changes. I'll
try to apply this to my data. If you have any more helpful comments I'd
appreciate it.
 
B

Bonnie

I'm still having trouble. The slope returned is always 0. First I made a query to extract patient weights for the last three years, grouping by ID. I used that query for making the slope query

SELECT [MyTable].NewID, [MyTable].Patient,
(Count(*)*Sum([Date]*[weight])-Sum([Date])*Sum([weight]))
(Count(*)*Sum([Date]^2)-Sum([Date])^2) AS Slop
FROM [MyTable
GROUP BY [MyTable].NewID, [MyTable].Patient

Was I supposed to define the date to be "in days" somewhere in the query? If so, how do I do that
I would appreciate any help. Thanks


----- Michel Walsh wrote: ----

Hi


The COUNT and the SUM apply to the groups as defined in the GROUP BY clause
A complete query could look like

SELECT Id
( COUNT(*)*SUM(x*y) -SUM(x)*SUM(y) )
(COUNT(*)*SUM(x^2)-SUM(x)^2) AS Slop

FROM MyTabl
GROUP BY I


Hoping it may help
Vanderghast, Access MV




Bonnie said:
Thanks for your reply, Mike. I haven't had much experience with formulas
Does the COUNT refer to all the weights, say for a distinct ID? I'm tryin
to get the slope for many distinct IDs with many weights, each wit
different date ranges so I can track any significant weight changes. I'l
try to apply this to my data. If you have any more helpful comments I'
appreciate it
 
M

Michel Walsh

Hi,


Is Date really a date_time data field?


With


weightsDate
ID weight date
0 100 2001-01-01
0 101 2001-01-02
0 102 2001-01-03
0 103 2001-01-04
1 10 2001-01-01
1 20 2001-01-02
1 30 2001-01-03



and the query

SELECT Id,
( COUNT(*)*SUM(date*weight) -SUM(date)*SUM(weight) ) /
(COUNT(*)*SUM(date^2)-SUM(date)^2) AS Slope

FROM WeightsDate
GROUP BY Id


I obtain

Query30
Id Slope
0 1
1 10



and indeed, I increased the weigth by one, for each date, for id=0 , and by
10 for id=1 (if you look back at my data), so the slopes are right, with
that data.

Can you get the numerator:

SELECT NewID, Patient, COUNT(*), SUM(date), SUM(weight), SUM(date*weight)
FROM MyTable
GROUP BY NewID, Patient


and manually check if COUNT(*) * SUM( date*weight ) = SUM(date) *
SUM(weight)

which the only way to get a zero, without a division by zero. Note that you
must have more than a single point per group to have a non zero value for
the slope.





Hoping it may help,
Vanderghast, Access MVP



Bonnie said:
I'm still having trouble. The slope returned is always 0. First I made a
query to extract patient weights for the last three years, grouping by ID.
I used that query for making the slope query:
SELECT [MyTable].NewID, [MyTable].Patient,
(Count(*)*Sum([Date]*[weight])-Sum([Date])*Sum([weight])) /
(Count(*)*Sum([Date]^2)-Sum([Date])^2) AS Slope
FROM [MyTable]
GROUP BY [MyTable].NewID, [MyTable].Patient;

Was I supposed to define the date to be "in days" somewhere in the query? If so, how do I do that?
I would appreciate any help. Thanks.


----- Michel Walsh wrote: -----

Hi,


The COUNT and the SUM apply to the groups as defined in the GROUP BY clause.
A complete query could look like:

SELECT Id,
( COUNT(*)*SUM(x*y) -SUM(x)*SUM(y) ) /
(COUNT(*)*SUM(x^2)-SUM(x)^2) AS Slope

FROM MyTable
GROUP BY Id


Hoping it may help,
Vanderghast, Access MVP




Bonnie said:
Thanks for your reply, Mike. I haven't had much experience with
formulas.
Does the COUNT refer to all the weights, say for a distinct ID? I'm trying
to get the slope for many distinct IDs with many weights, each with
different date ranges so I can track any significant weight changes. I'll
try to apply this to my data. If you have any more helpful comments I'd
appreciate it.
 
B

Bonnie

Thank you for your help, Mike. I got it to work. Couldn't have done it without you!

----- Michel Walsh wrote: -----

Hi,


Is Date really a date_time data field?


With


weightsDate
ID weight date
0 100 2001-01-01
0 101 2001-01-02
0 102 2001-01-03
0 103 2001-01-04
1 10 2001-01-01
1 20 2001-01-02
1 30 2001-01-03



and the query

SELECT Id,
( COUNT(*)*SUM(date*weight) -SUM(date)*SUM(weight) ) /
(COUNT(*)*SUM(date^2)-SUM(date)^2) AS Slope

FROM WeightsDate
GROUP BY Id


I obtain

Query30
Id Slope
0 1
1 10



and indeed, I increased the weigth by one, for each date, for id=0 , and by
10 for id=1 (if you look back at my data), so the slopes are right, with
that data.

Can you get the numerator:

SELECT NewID, Patient, COUNT(*), SUM(date), SUM(weight), SUM(date*weight)
FROM MyTable
GROUP BY NewID, Patient


and manually check if COUNT(*) * SUM( date*weight ) = SUM(date) *
SUM(weight)

which the only way to get a zero, without a division by zero. Note that you
must have more than a single point per group to have a non zero value for
the slope.





Hoping it may help,
Vanderghast, Access MVP



Bonnie said:
I'm still having trouble. The slope returned is always 0. First I made a
query to extract patient weights for the last three years, grouping by ID.
I used that query for making the slope query:
SELECT [MyTable].NewID, [MyTable].Patient,
(Count(*)*Sum([Date]*[weight])-Sum([Date])*Sum([weight])) /
(Count(*)*Sum([Date]^2)-Sum([Date])^2) AS Slope
FROM [MyTable]
GROUP BY [MyTable].NewID, [MyTable].Patient;
Was I supposed to define the date to be "in days" somewhere in the query?
If so, how do I do that?
I would appreciate any help. Thanks. clause.
A complete query could look like:
SELECT Id,
( COUNT(*)*SUM(x*y) -SUM(x)*SUM(y) ) /
(COUNT(*)*SUM(x^2)-SUM(x)^2) AS Slope
FROM MyTable GROUP BY Id Thanks for your reply, Mike. I haven't had much experience with
formulas.
Does the COUNT refer to all the weights, say for a distinct ID? I'm trying
to get the slope for many distinct IDs with many weights, each with
different date ranges so I can track any significant weight changes. I'll
try to apply this to my data. If you have any more helpful comments I'd
appreciate it.
 
Top