稚久|特征工程之处理时间序列数据( 四 ) 12345678

fig, ax = plt.subplots(figsize = (12,6))index_ordered = raw.date_time.astype('str').tolist()[-len(X_test):][-100:]ax.set_xlabel('Date')ax.set_ylabel('Traffic Volume') # the actual valuesax.plot(index_ordered, y_test[-100:].to_numpy(), color='k', ls='-', label = 'actual')# predictions of model with engineered featuresax.plot(index_ordered, gb_reg.predict(X_test)[-100:], color='b', ls='--', label = 'predicted; with date-time features')# predictions of model without engineered featuresax.plot(index_ordered, gb_reg_lite.predict(X_test_lite)[-100:], color='r', ls='--', label = 'predicted; w/o date-time features')every_nth = 5for n, label in enumerate(ax.xaxis.get_ticklabels()):if n % every_nth != 0:label.set_visible(False)ax.tick_params(axis='x', labelrotation= 90)plt.legend()plt.title('Actual vs predicted on the last 100 data points')plt.draw()
后100个点的预测结果
该图中蓝色虚线与黑色实线十分接近。也就是说，我们提出的gradient-boosting模型可以很好地预测地铁交通量。
同时，我们看到不使用日期时间特征的模型在性能上出现了差异（红色虚线）。为什么会这样？只是因为我们会依赖交通工具，交通流量在周末趋于减少，但在高峰时段出现高峰。因此，如果我们不对日期时间数据进行特征工程处理，我们将错过这些重要的预测因子！
作者:Pararawendy Indarjo
deephub翻译组 OliverLee