How do I run a regression on data that is not numerical?

N

NG

I am using Microsoft Excel 2003. I have been running regressions on
numerical data and am curious to know how to run one if part of my data is
non-numerical such as gender or race.
 
J

Jerry W. Lewis

Where only two values are possible (as with gender) then you use a single
variable with +1 for one gender and -1 for the other. Extending to more than
two values is possible, but non-trivial.

Alternately, if you have Excel 2003 or later, you can create an indicator
variable (0 or 1) for each possible non-numeric value. This approach
directly permits more than 2 possible values.

Jerry
 
N

NG

Can you explain the Excel 2003 or later indicator variables a little more? I
have four non-numerical values for race.

Thanks for the information on gender. I was using 1 for men and 2 for
females.
 
M

Mike Middleton

NG -

For four levels of a categorical variable, e.g., A or B or C or D, use three
indicator variables. Select one level as the base case, e.g., A, and the
value of each indicator variable (B, C, D) shows whether an observation is B
or not B, C or not C, etc. For an observation with level A, the value of all
three indicator variables is zero. The regression coefficients measure how
different B,C,D are from the base case A, on the average.

I use the same approach of gender, e.g., 0 for male and 1 for female, in
which case the regression coefficent for the gender indicator shows how
females differ from males, on the average.

- Mike
http://www.mikemiddleton.com
 
Top