Home > database >  What is the equivalent of R's lm function for fitting simple linear regressions in python?
What is the equivalent of R's lm function for fitting simple linear regressions in python?

Time:02-02

I have a dataset in CSV format and I have stored it into a pandas data frame. I know that using R's lm function, I can get the following results:

lm.fit=lm(response~predictor1 ,data=my_dataset)
summary(lm.fit)

By running the command above I get results similar to the ones mentioned below:

Call:
lm(formula = response ~ predictor1, data = my_dataset)
Residuals:
Min 1Q Median 3Q Max
-1.519533 -3.990 -1.318 2.034 24.500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.55384 0.56263 61.41 <2e-16 ***
lstat -0.95005 0.03873 -24.53 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared: 0.5441,Adjusted R-squared: 0.5432
F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16

All I want to do is moving this code to python, I've already tried the following:

from sklearn.linear_model import LinearRegression
X = dataset.iloc[:, 12].values.reshape(-1, 1)  # values converts it into a numpy array
Y = dataset.iloc[:, 13].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression()  # create object for the class
linear_regressor.fit(X, Y)  # perform linear regression
Y_pred = linear_regressor.predict(X)  # make predictions
#print(Y_pred.describe())
df = pd.DataFrame(Y_pred, columns = ['Column_A'])
print(df.describe())

Which produces the following results but these are not what I want.

       Column_A
count  506.000000
mean    22.532806
std      6.784361
min     -1.519533
25%     18.445754
50%     23.761280
75%     27.950998
max     32.910255

Is there another way to fit a linear regression using python and pandas data frames?

CodePudding user response:

Use OLS implementation from statsmodels and its .summary attribute, don't forget to add constant manually using add_constant since it's not added by default.

import statsmodels.api as sm

reg = sm.OLS(y, sm.add_constant(X)).fit()
reg.summary
  •  Tags:  
  • Related