1.数据:
依旧使用经典的房价预测数据集。
2.sklearn中的线型回归:
回归线参数:
- LinearRegression()
- coef_:权重
- intercept_:偏置项
- 回归线:y = x*coef_ + intercept_
3.代码:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
data_file = './data/house_data.csv'
# 使用的特征列
FEAT_COLS = ['bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'sqft_above', 'sqft_basement']
def plot_fitting_line(linear_model,X,y,feat):
"""
绘制线性回归线
"""
plt.figure()
# 绘制样本店
plt.scatter(X,y,alpha=0.5)
# 获取权重和偏置项
w = linear_model.coef_
b = linear_model.intercept_
# 绘制拟合线
plt.plot(X,w*X+b,c='red')
plt.title(feat)
plt.show()
def main():
"""
主函数
"""
# 读取数据
house_data = pd.read_csv(data_file,usecols=FEAT_COLS + ['price'])
for feat in FEAT_COLS:
# 获取数据集特征
X = house_data[feat].values.reshape(-1,1)
# 获取数据标签
y = house_data['price'].values
# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=1/3,random_state=10)
# 建立线性回归模型
linear_model = LinearRegression()
# 训练模型
linear_model.fit(X_train,y_train)
# 评价模型
r2 = linear_model.score(X_test,y_test)
print('特征为{}时,模型的R2值为{}'.format(feat,r2))
# 绘制拟合直线
plot_fitting_line(linear_model,X_train,y_train,feat)
if __name__ == '__main__':
main()
4.输出结果:
特征为bedrooms时,模型的R2值为0.09258779614933532
特征为bathrooms时,模型的R2值为0.2704350160884538
特征为sqft_living时,模型的R2值为0.49617378557685365
特征为sqft_lot时,模型的R2值为0.006598795645454514
特征为sqft_above时,模型的R2值为0.35340999022703223
特征为sqft_basement时,模型的R2值为0.11806711865379671