以如下三阶多项式函数来生成样本的标签:
$$
y\,\,=\,\,1.2x\,\,-\,\,3.4x^2\,\,+\,\,5.6x^3\,\,+\text{5 }+\,\,\varepsilon
$$
import d2lzh as d2l
from mxnet import autograd, gluon, nd
from mxnet.gluon import data as gdata, loss as gloss, nn
# 生成数据集
n_train, n_test, true_w, true_b = 100, 100, [1.2, -3.4, 5.6], 5
features = nd.random.normal(shape=(n_train + n_test, 1))
poly_features = nd.concat(features, nd.power(features, 2), nd.power(features, 3))
labels = (true_w[0] * poly_features[:, 0] + true_w[1] * poly_features[:, 1] + true_w[2] * poly_features[:, 2] + true_b)
labels += nd.random.normal(scale=0.1, shape=labels.shape)
# 定义作图函数semilogy,其中y轴使用了对数尺度
def semilogy(x_vals, y_vals, x_label, y_label, x2_vals=None, y2_vals=None, legend=None, figsize=(3.5, 2.5)):
d2l.set_figsize(figsize)
d2l.plt.xlabel(x_label)
d2l.plt.ylabel(y_label)
d2l.plt.semilogy(x_vals, y_vals)
if x2_vals and y2_vals:
d2l.plt.semilogy(x2_vals, y2_vals, linestyle=':')
d2l.plt.legend(legend)
d2l.plt.show()
# 定义模型
num_epochs, loss = 100, gloss.L2Loss()
def fit_and_plt(train_features, test_features, train_labels, test_labels):
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize()
batch_size = min(10, train_labels.shape[0])
train_iter = gdata.DataLoader(gdata.ArrayDataset(train_features, train_labels), batch_size, shuffle=True)
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.01})
train_ls, test_ls = [], []
for _ in range(num_epochs):
for X, y in train_iter:
with autograd.record():
l = loss(net(X), y)
l.backward()
trainer.step(batch_size)
train_ls.append(loss(net(train_features), train_labels).mean().asscalar())
test_ls.append(loss(net(test_features), test_labels).mean().asscalar())
print('final epoch:train loss', train_ls[-1], 'test loss', test_ls[-1])
semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'loss', range(1, num_epochs + 1), test_ls, ['train', 'test'])
print('weight:', net[0].weight.data().asnumpy(), '\nbias:', net[0].bias.data().asnumpy())
1.三阶多项式函数拟合(正常):
我们先使用与数据生成函数同阶的三阶多项式函数拟合。实验表明,这个模型的训练误差和在测试数据集的误差都较低。训练出的模型参数也接近真实值:
# 拟合正常的情况——训练误差和在测试数据集上的误差都很小
fit_and_plt(poly_features[:n_train, :], poly_features[n_train:, :], labels[:n_train], labels[n_train:])
得到如下结果:
final epoch:train loss 0.00795475 test loss 0.010587299
weight: [[ 1.0719422 -3.32675 5.6385565]]
bias: [4.8979206]
2.线性函数拟合(欠拟合):
我们再试试线性函数拟合。很明显,该模型的训练误差在迭代早期下降后便很难继续降低。在完成最后一次迭代周期后,训练误差依旧很高。线性模型在非线性模型(如三阶多项式函数)生成的数据集上容易欠拟合。
# 欠拟合情况
fit_and_plt(features[:n_train, :], features[n_train:, :], labels[:n_train], labels[n_train:])
得到如下结果:
final epoch:train loss 148.14888 test loss 74.99101
weight: [[18.996931]]
bias: [-0.06482039]
3.训练样本不足(过拟合):
事实上,即便使用与数据生成模型同阶的三阶多项式函数模型,如果训练样本不足,该模型依然容易过拟合。让我们只使用两个样本来训练模型。显然,训练样本过少了,甚至少于模型参数的数量。这使模型显得过于复杂,以至于容易被训练数据中的噪声影响。在迭代过程中,尽管训练误差较低,但是测试数据集上的误差却很高。这是典型的过拟合现象。
# 过拟合情况
fit_and_plt(poly_features[0:5, :], poly_features[n_train:, :], labels[0:5], labels[n_train:])
得到如下结果:
final epoch:train loss 1.6807058 test loss 208.46396
weight: [[0.04566861 0.3971918 0.09108218]]
bias: [2.4448707]
Reference:
《动手学深度学习》