What is the equivalent of R's lm function for fitting simple linear regressions in python?-CodePudding

I have a dataset in CSV format and I have stored it into a pandas data frame. I know that using R's lm function, I can get the following results:

lm.fit=lm(response~predictor1 ,data=my_dataset)
summary(lm.fit)

By running the command above I get results similar to the ones mentioned below:

Call:
lm(formula = response ~ predictor1, data = my_dataset)
Residuals:
Min 1Q Median 3Q Max
-1.519533 -3.990 -1.318 2.034 24.500
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.55384 0.56263 61.41 <2e-16 ***
lstat -0.95005 0.03873 -24.53 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared: 0.5441,Adjusted R-squared: 0.5432
F-statistic: 601.6 on 1 and 504 DF, p-value: < 2.2e-16

All I want to do is moving this code to python, I've already tried the following:

from sklearn.linear_model import LinearRegression
X = dataset.iloc[:, 12].values.reshape(-1, 1)  # values converts it into a numpy array
Y = dataset.iloc[:, 13].values.reshape(-1, 1)  # -1 means that calculate the dimension of rows, but have 1 column
linear_regressor = LinearRegression()  # create object for the class
linear_regressor.fit(X, Y)  # perform linear regression
Y_pred = linear_regressor.predict(X)  # make predictions
#print(Y_pred.describe())
df = pd.DataFrame(Y_pred, columns = ['Column_A'])
print(df.describe())

Which produces the following results but these are not what I want.

       Column_A
count  506.000000
mean    22.532806
std      6.784361
min     -1.519533
25%     18.445754
50%     23.761280
75%     27.950998
max     32.910255

Is there another way to fit a linear regression using python and pandas data frames?

CodePudding user response：

Use OLS implementation from statsmodels and its .summary attribute, don't forget to add constant manually using add_constant since it's not added by default.

import statsmodels.api as sm

reg = sm.OLS(y, sm.add_constant(X)).fit()
reg.summary