【Python数据分析】54.时间序列——时间序列基础

2023年2月6日

641

本系列文章配套代码获取有以下三种途径：

可以在以下网站查看，该网站是使用JupyterLite搭建的web端Jupyter环境，因此无需在本地安装运行环境即可使用，首次运行浏览器需要下载一些配置文件（大约20M）：

https://returu.github.io/Python_Data_Analysis/lab/index.html

也可以通过百度网盘获取，需要在本地配置代码运行环境，环境配置可以查看【Python基础】2.搭建Python开发环境：

链接：https://pan.baidu.com/s/1MYkeYeVAIRqbxezQECHwcA?pwd=mnsj 提取码：mnsj

前往GitHub详情页面，单击 code 按钮，选择Download ZIP选项：

https://github.com/returu/Python_Data_Analysis

根据《Python for Data Analysis 3rd Edition》翻译整理

—————————————————–

pandas中一种基本时间序列种类是由时间戳索引的Series，在pandas外部通常表示为python字符串或datetime 对象。

在这种情况下，这些datetime 对象可以被放入DatetimeIndex中：

 1>>> dates = [datetime(2022, 12, 2), datetime(2022, 12, 5),
 2...          datetime(2022, 12, 8), datetime(2022, 12, 12),
 3...          datetime(2022, 12, 15), datetime(2022, 12, 20)]
 4
 5>>> ts = pd.Series(np.arange(6), index=pd.to_datetime(dates))
 6>>> ts
 72022-12-02    0
 82022-12-05    1
 92022-12-08    2
102022-12-12    3
112022-12-15    4
122022-12-20    5
13dtype: int32
14
15>>> ts.index
16DatetimeIndex(['2022-12-02', '2022-12-05', '2022-12-08', '2022-12-12',
17               '2022-12-15', '2022-12-20'],
18              dtype='datetime64[ns]', freq=None)

pandas 使用 NumPy 的 datetime64 数据类型以纳秒分辨率存储时间戳：

1>>> ts.index.dtype
2dtype('<M8[ns]')

DatetimeIndex 中的标量值是 pandas Timestamp 对象：

1>>> ts.index[2]
2Timestamp('2022-12-08 00:00:00')

与其他 Series 一样，不同索引的时间序列之间的算术运算会在日期上自动对齐：

 1>>> ts[::2]
 22022-12-02    0
 32022-12-08    2
 42022-12-15    4
 5dtype: int32
 6
 7>>> ts + ts[::2]
 82022-12-02    0.0
 92022-12-05    NaN
102022-12-08    4.0
112022-12-12    NaN
122022-12-15    8.0
132022-12-20    NaN
14dtype: float64

pandas.Timestamp 可以替换大多数使用 datetime 对象的地方，反之则不然，因为 pandas.Timestamp 可以存储纳秒精度的数据，而 datetime 最多只能存储微秒。此外，pandas.Timestamp 可以存储频率信息（如果有的话）并了解如何进行时区转换和其他类型的操作。

1.索引、选择、子集：

当根据标签索引和选择数据时，时间序列的行为与任何其他 Series 一样：

1>>> stamp = ts.index[-1]
2
3>>> ts[stamp]
45

为方便起见，还可以传递一个可解释为日期的字符串：

1>>> ts["2022-12-20"]
25

对于较长的时间序列，可以传递一个年份或一个年份和一个月份，来轻松选择数据切片：

 1# pandas.date_range 用于生成日期范围
 2>>> long_ts = pd.Series(np.arange(1000) , index=pd.date_range("2021-12-20" , periods=1000))
 3>>> long_ts
 42021-12-20      0
 52021-12-21      1
 62021-12-22      2
 72021-12-23      3
 82021-12-24      4
 9             ...
102024-09-10    995
112024-09-11    996
122024-09-12    997
132024-09-13    998
142024-09-14    999
15Freq: D, Length: 1000, dtype: int32
16
17# 字符串“2021”被解释为年份,并选择该时间段对应的数据
18>>> long_ts["2021"]
192021-12-20     0
202021-12-21     1
212021-12-22     2
222021-12-23     3
232021-12-24     4
242021-12-25     5
252021-12-26     6
262021-12-27     7
272021-12-28     8
282021-12-29     9
292021-12-30    10
302021-12-31    11
31Freq: D, dtype: int32
32
33# 指定月份
34>>> long_ts["2021-12"]
352021-12-20     0
362021-12-21     1
372021-12-22     2
382021-12-23     3
392021-12-24     4
402021-12-25     5
412021-12-26     6
422021-12-27     7
432021-12-28     8
442021-12-29     9
452021-12-30    10
462021-12-31    11
47Freq: D, dtype: int32

因为大多数时间序列数据是按时间顺序排列的，所以可以使用时间序列中不包含的时间戳进行切片，以执行范围查询：

 1# long_ts是从"2021-12-20"开始
 2>>> long_ts["2021-12-1":"2021-12-30"]
 32021-12-20     0
 42021-12-21     1
 52021-12-22     2
 62021-12-23     3
 72021-12-24     4
 82021-12-25     5
 92021-12-26     6
102021-12-27     7
112021-12-28     8
122021-12-29     9
132021-12-30    10
14Freq: D, dtype: int32

使用 datetime 对象进行切片也可以：

 1>>> long_ts[datetime(2022,12,1):datetime(2022,12,10)]
 22022-12-01    346
 32022-12-02    347
 42022-12-03    348
 52022-12-04    349
 62022-12-05    350
 72022-12-06    351
 82022-12-07    352
 92022-12-08    353
102022-12-09    354
112022-12-10    355
12Freq: D, dtype: int32

和之前一样，可以传递字符串日期、datetime或时间戳。需要注意的是，以这种方式切片会生成原时间序列的视图，就像切片 NumPy 数组一样。这意味着没有数据被复制，切片上的修改将反映在原始数据中。

有一个等效的实例方法 truncate，它在两个日期之间对 Series 进行切片：

1>>> long_ts.truncate(before="2022-12-5" , after="2022-12-10")
22022-12-05    350
32022-12-06    351
42022-12-07    352
52022-12-08    353
62022-12-09    354
72022-12-10    355
8Freq: D, dtype: int32

所有这一切也适用于 DataFrame，在其行上建立索引：

 1>>> dates = pd.date_range("2022-01-01" , periods=100 , freq='W-FRI')
 2
 3>>> df = pd.DataFrame(np.arange(300).reshape(100,3) , columns=["X","Y","Z"] , index=dates)
 4>>> df
 5              X    Y    Z
 62022-01-07    0    1    2
 72022-01-14    3    4    5
 82022-01-21    6    7    8
 92022-01-28    9   10   11
102022-02-04   12   13   14
11...         ...  ...  ...
122023-11-03  285  286  287
132023-11-10  288  289  290
142023-11-17  291  292  293
152023-11-24  294  295  296
162023-12-01  297  298  299
17
18[100 rows x 3 columns]
19
20# 使用loc索引
21>>> df.loc["2022-01",["X","Z"]]
22            X   Z
232022-01-07  0   2
242022-01-14  3   5
252022-01-21  6   8
262022-01-28  9  11

2.含有重复索引的时间序列：

在某些应用中，可能有多个数据观测值落在特定时间戳上。例如：

 1>>> dates = pd.DatetimeIndex(["2022-01-01", "2022-01-02", "2022-01-02","2022-01-02", "2022-01-03"])
 2
 3>>> dup_ts = pd.Series(np.arange(5) , index=dates)
 4
 5>>> dup_ts
 62022-01-01    0
 72022-01-02    1
 82022-01-02    2
 92022-01-02    3
102022-01-03    4
11dtype: int32

可以通过检查其 is_unique 属性来判断索引不是唯一的：

1>>> dup_ts.index.is_unique
2False

对该时间序列的索引结果是标量值或切片具体取决于时间戳是否重复：

 1# 唯一值
 2>>> dup_ts["2022-01-01"]
 30
 4
 5# 重复值
 6>>> dup_ts["2022-01-02"]
 72022-01-02    1
 82022-01-02    2
 92022-01-02    3
10dtype: int32

假设您想要聚合具有非唯一时间戳的数据。一种方法是使用 groupby 并传递 level=0 （唯一的级别）：

1>>> grouoped = dup_ts.groupby(level=0)
2>>> grouoped.count()
32022-01-01    1
42022-01-02    3
52022-01-03    1
6dtype: int64

本篇文章来源于微信公众号: 码农设计师

Previous article【杂】AI大佬抨击ChatGPT的同时，OpenAI上线ChatGPT Plus付费会员服务

Next article【Python数据分析】55.时间序列——生成时间范围

欢迎留下您的宝贵建议 Cancel reply

Please enter your comment!

Please enter your name here

You have entered an incorrect email address!

Please enter your email address here

【Python数据分析】54.时间序列——时间序列基础

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

欢迎留下您的宝贵建议 Cancel reply

Most Popular

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

【Python内置函数】help()函数

Recent Comments

EDITOR PICKS

RSS

3D Map Generator Terrain

1.ENVI软件操作基础——窗口介绍及打开、浏览数...

POPULAR POSTS

【ArcGIS工具箱】84.多元分析——波段集统计...

【ArcGIS工具箱】11.表面分析——等值线

【ArcGIS工具箱】56.水文分析——流向

POPULAR CATEGORY