Published on *Explorable.com* (https://explorable.com)

Multiple regression analysis is a powerful technique used for predicting the unknown value of a variable from the known value of two or more variables- also called the predictors.

More precisely, multiple regression analysis helps us to predict the value of Y for given values of X_{1}, X_{2}, …, X_{k}.

For example the yield of rice per acre depends upon quality of seed, fertility of soil, fertilizer used, temperature, rainfall. If one is interested to study the joint affect of all these variables on rice yield, one can use this technique.

An additional advantage of this technique is it also enables us to study the individual influence of these variables on yield.

By multiple regression, we mean models with just one dependent and two or more independent (exploratory) variables. The variable whose value is to be predicted is known as the dependent variable [3] and the ones whose known values are used for prediction are known independent [4] (exploratory) variables.

In general, the multiple regression equation of Y on X_{1}, X_{2}, …, X_{k} is given by:

Y = b

_{0}+ b_{1}X_{1}+ b_{2 }X_{2}+ …………………… + b_{k}X_{k}

Here b_{0} is the intercept and b_{1}, b_{2}, b_{3}, …, b_{k} are analogous to the slope in linear regression [5] equation and are also called regression coefficients. They can be interpreted the same way as slope. Thus if b_{i} = 2.5, it would indicates that Y will increase by 2.5 units if X_{i} increased by 1 unit.

The appropriateness of the multiple regression model as a whole can be tested by the F-test in the ANOVA [6] table. A significant F indicates a linear relationship between Y and at least one of the X's.

Once a multiple regression equation has been constructed, one can check how good it is (in terms of predictive ability) by examining the coefficient of determination (R2). R2 always lies between 0 and 1.

R

_{2}- coefficient of determination

All software provides it whenever regression procedure is run. The closer R_{2} is to 1, the better is the model and its prediction.

A related question is whether the independent variables individually influence the dependent variable significantly. Statistically, it is equivalent to testing [7] the null hypothesis [8] that the relevant regression coefficient is zero.

This can be done using t-test. If the t-test of a regression coefficient is significant, it indicates that the variable is in question influences Y significantly [9] while controlling for other independent explanatory variables.

Multiple regression technique does not test whether data are linear [10]. On the contrary, it proceeds by assuming that the relationship between the Y and each of X_{i}'s is linear. Hence as a rule, it is prudent to always look at the scatter plots of (Y, X_{i}), i= 1, 2,…,k. If any plot suggests non linearity [11], one may use a suitable transformation to attain linearity.

Another important assumption is non existence of multicollinearity- the independent variables are not related among themselves. At a very basic level, this can be tested by computing the correlation coefficient between each pair of independent variables.

Other assumptions include those of homoscedasticity and normality.

Multiple regression analysis is used when one is interested in predicting a continuous dependent variable from a number of independent variables. If dependent variable is dichotomous, then logistic regression should be used.

**Links**

[1] https://explorable.com/multiple-regression-analysis

[2] https://explorable.com/

[3] https://explorable.com/dependent-variable

[4] https://explorable.com/independent-variable

[5] https://explorable.com/linear-regression-analysis

[6] https://explorable.com/anova

[7] https://explorable.com/hypothesis-testing

[8] https://explorable.com/null-hypothesis

[9] https://explorable.com/statistically-significant-results

[10] https://explorable.com/linear-relationship

[11] https://explorable.com/non-linear-relationship