## Names of polynomials by degree:

- Degree 0 – constant
- Degree 1 – linear
- Degree 2 – quadratic
- Degree 3 – cubic
- Degree 4 – quartic (or, if all terms have even degree, biquadratic)
- Degree 5 – quintic

and so on... See Polynomials : Wikipedia

### How far do we go?

In simple linear regression we face the problem of under-fitting whereas in polynomial regression, if we go on adding the degree of polynomials we could be over-fitting by explaining the noise along with the signal and thus making the model unsuitable for prediction. So in most practical scenarios we do not fit beyond cubic polynomials.

.

Linear_model = lm(y ~ x)

Quadratic_model = lm(y ~ x + x^2)

Cubic_model = lm(y ~ x + x^2 + x^3)

## Programming Logic

Steps to fit the simple linear & polynomial regression models to compare the influence of independent variables on the response variable.

**Step 1:**

Find the correlation between dependent variable *dist* and independent variable *speed*

**Step 2:**

Scatter plot dependent vs independent variables to see if there is any pattern in the distribution

**Step 3:
**Fit the linear regression model, note the significance and multiple r-squared value

**Step 4:
**Fit the quadratic and cubic polynomial regression models and note the significance and multiple r-squared value

**Step 5:**

Plot the lines for predicted values of response using the linear, quadratic and cubic regression models

**Step 6:**

Do the analysis of vairance for the linear, quadratic and cubic models to decide which is the best fit for prediction.

## Plot to visualize the correlation

#scatter plot dist~speed

# pch=19 is solid circle

plot(cars$dist~cars$speed, pch=19,

xlab="Car Speed (mph)",

ylab="Distance Covered (ft)",

main = "Car Speed And Stops Taken",

las=1)

## Fit the quadratic polynomial regression model

# now fit quadratic polynomial model

fitQ = lm(dist~poly(speed,2,raw=TRUE), data=cars)

#analysis of variance

anova(fitQ)

# Analysis of Variance Table

# Response: dist

# Df Sum Sq Mean Sq F value Pr(>F)

# poly(speed, 2, raw = TRUE) 2 21714 10857.1 47.141 5.852e-12 ***

# Residuals 47 10825 230.3

# ---

# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#summary to get the r-squared value

summary(fitQ)

# Call:

# lm(formula = dist ~ poly(speed, 2, raw = TRUE), data = cars)

# Residuals:

# Min 1Q Median 3Q Max

# -28.720 -9.184 -3.188 4.628 45.152

# Coefficients:

# Estimate Std. Error t value Pr(>|t|)

# (Intercept) 2.47014 14.81716 0.167 0.868

# poly(speed, 2, raw = TRUE)1 0.91329 2.03422 0.449 0.656

# poly(speed, 2, raw = TRUE)2 0.09996 0.06597 1.515 0.136

# Residual standard error: 15.18 on 47 degrees of freedom

# Multiple R-squared: 0.6673, Adjusted R-squared: 0.6532

# F-statistic: 47.14 on 2 and 47 DF, p-value: 5.852e-12

# R-squared is 0.67, that 67 percent variability is due to predictors

# Residual Error is 15.18

## Plot the predicted values using the three regression models

Linear

Quadratic

Cubic

#plot the prediction using the linear model

plot(cars$dist~cars$speed, pch=19,

xlab="Car Speed (mph)",

ylab="Distance Covered (ft)",

main = "Linear Fit",

las=1)

#draw the linear regression fit line

lines(cars$speed, predict(fitlm), col="blue", lwd=2)

#plot the prediction using the quadratic model

plot(cars$dist~cars$speed, pch=19,

xlab="Car Speed (mph)",

ylab="Distance Covered (ft)",

main = "Quadratic Fit",

las=1)

#draw the quadratic regression fit line

lines(cars$speed, predict(fitQ), col="green", lwd=2)

#plot the prediction using the cubic model

plot(cars$dist~cars$speed, pch=19,

xlab="Car Speed (mph)",

ylab="Distance Covered (ft)",

main = "Cubic Fit",

las=1)

#draw the cubic regression fit line

lines(cars$speed, predict(fitC), col="red", lwd=2)

## Linear Vs Quadratic Vs Cubic

### Which regression model do we select?

As compared to the Linear model, Quadratic model explains the variability more significantly and also the curvature is a better fit than the straight line..

Adding the cubic term does not improve the significance greatly as compared to quadratic term of the predictor, speed, so there is no added advantage to using it.

So we prefer quadratic over cubic and linear models.

#plot the residuals for linear model

plot(fitlm,which=1)

#plot the residuals for quadratic model

plot(fitQ,which=1)