【Python数据分析】14.Pandas数据结构——Series类型

2022年12月20日

618

本系列文章配套代码获取有以下三种途径：

可以在以下网站查看，该网站是使用JupyterLite搭建的web端Jupyter环境，因此无需在本地安装运行环境即可使用，首次运行浏览器需要下载一些配置文件（大约20M）：

https://returu.github.io/Python_Data_Analysis/lab/index.html

也可以通过百度网盘获取，需要在本地配置代码运行环境，环境配置可以查看【Python基础】2.搭建Python开发环境：

链接：https://pan.baidu.com/s/1MYkeYeVAIRqbxezQECHwcA?pwd=mnsj 提取码：mnsj

前往GitHub详情页面，单击 code 按钮，选择Download ZIP选项：

https://github.com/returu/Python_Data_Analysis

根据《Python for Data Analysis 3rd Edition》翻译整理

—————-————————————-

1.Pandas库介绍：

Pandas是Python第三方库，提供高性能易用数据类型和分析工具,其所包含的数据结构和数据处理工具的设计使得在Python中进行数据清洗和分析非常方便。

Pandas是基于Numpy实现，常与Numpy、Scipy和Matplotlib一同使用。

Pandas的引用直接使用以下语句即可：

1import pandas as pd # 尽管别名可以省略或者修改，建议使用上述约定的别名

2.Pandas库的理解：

Pandas库主要有两种数据类型：Series（相当于一个一维数据类型）、DataFrame（相当于一个二维到多维数据类型），并基于上述两个数据类型提供了各类操作：基本操作、运算操作、特征类操作、关联类操作等。

Numpy	Pandas
基础数据类型	扩展数据类型
关注数据的结构表达	关注数据的应用表达
维度：数据间关系	数据与索引间关系

3.Pandas库的Series类型：

3.1 Series类型介绍

Series 是一个类似一维数组的对象，包含相同类型的值序列（value，与 NumPy 类型相似）和关联的数据标签数组（称为其索引，index）。

也就是说，Series类型包括index和value两部分。

3.2 Series类型的创建

通过pd.Series()方法传递数据，从而实例化一个Series对象。

Python列表：

1>>> obj = pd.Series([2,5,8,-12])
2>>> obj
30     2
41     5
52     8
63   -12
7dtype: int64

由于没有为数据指定索引，Panda默认生成的索引是从0到N-1（N为数据的长度）。

如果需要自定义索引，可以通过index参数指定索引序列，index索引序列要与数据长度一致。

1>>> obj = pd.Series([2,5,8,-12] , index=['a','b','c','d'])
2>>> obj
3a     2
4b     5
5c     8
6d   -12
7dtype: int64

标量值：

此时，index参数表达Series类型的尺寸。

1>>> obj = pd.Series(25 , index=['a','b','c','d','e'])
2>>> obj
3a    25
4b    25
5c    25
6d    25
7e    25
8dtype: int64

Python字典：

Series对象中的数据值与索引值是按位置配对的，可以看做一个长度固定且有序的字典。

可以传递字典数据实例化Series对象，索引是字典键。

1>>> obj = pd.Series({'d': 9,'a': 8,'b': 7,'c': 6})
2>>> obj
3d    9
4a    8
5b    7
6c    6
7dtype: int64

可以使用 to_dict 方法将 Series 转换回字典。

1>>> obj.to_dict()
2{'d': 9, 'a': 8, 'b': 7, 'c': 6}

当传递一个字典时，Series中的索引取决于字典键的顺序。可以通过传递index索引序列，从而使得生成的Series的索引顺序符合预期。

1>>> obj = pd.Series({'d': 9,'a': 8,'b': 7,'c': 6},index=['a','b','c','d'])
2>>> obj
3a    8
4b    7
5c    6
6d    9
7dtype: int64

ndarray数组：

索引和数据都可以用过ndarray类型创建。

 1>>> obj = pd.Series(np.arange(5))
 2>>> obj
 30    0
 41    1
 52    2
 63    3
 74    4
 8dtype: int32
 9
10
11>>> obj = pd.Series(np.arange(5) , index=np.arange(9,4,-1))
12>>> obj
139    0
148    1
157    2
166    3
175    4
18dtype: int32

3.3 检查缺失数据：

通过字典生成Series对象，因为index参数中的‘e’没有出现在字典的键中，因此其对应的值为NaN用于表示缺失数据。

1>>> obj = pd.Series({'d': 9,'a': 8,'b': 7,'c': 6},index=['a','b','c','d','e'])
2>>> obj
3a    8.0
4b    7.0
5c    6.0
6d    9.0
7e    NaN
8dtype: float64

Pandas中使用isnull()和notnull()来检查缺失数据。

 1>>> pd.isnull(obj)
 2a    False
 3b    False
 4c    False
 5d    False
 6e     True
 7dtype: bool
 8
 9>>> pd.notnull(obj)
10a     True
11b     True
12c     True
13d     True
14e    False
15dtype: bool

isnull()和notnull()也是Series对象的实例方法。

 1>>> obj.isnull()
 2a    False
 3b    False
 4c    False
 5d    False
 6e     True
 7dtype: bool
 8
 9>>> obj.notnull()
10a     True
11b     True
12c     True
13d     True
14e    False
15dtype: bool

3.4 values和index属性：

可以通过array和index属性分别获得Series对象的值数组和索引信息。

其中，array 属性的结果是一个 PandasArray，它通常包装一个 NumPy 数组，但也可以包含特殊的扩展数组类型。

 1>>> obj = pd.Series({'d': 9,'a': 8,'b': 7,'c': 6},index=['a','b','c','d'])
 2>>> obj
 3a    8
 4b    7
 5c    6
 6d    9
 7dtype: int64
 8
 9>>> obj.array
10<PandasArray>
11[8, 7, 6, 9]
12Length: 4, dtype: int64
13>>> obj.index
14Index(['a', 'b', 'c', 'd'], dtype='object')

可以通过标签索引的方式选择单个数据或一组数据。

 1# 获取单个数据
 2>>> obj[0]
 38
 4
 5>>> obj['a']
 68
 7
 8# 使用索引列表，获取一组数据
 9>>> obj[['d','a','c']]
10d    9
11a    8
12c    6
13dtype: int64

可以通过按位置赋值的方式更改Series对象索引。

1>>> obj.index = ['AA','BB','CC','DD']
2>>> obj
3AA    8
4BB    7
5CC    6
6DD    9
7dtype: int64

Series对象自身及其索引都有name属性，该特性与Pandas其他重要功能集成在一起。

1>>> obj.name = '数据'
2>>> obj.index.name = '索引'
3>>> obj
4索引
5AA    8
6BB    7
7CC    6
8DD    9
9Name: 数据, dtype: int64

3.5 自动对齐索引：

Series一个有用的特性是它在算术运算中会自动按索引标签对齐。

 1>>> obj_1 = pd.Series({'d': 9,'a': 8,'b': 7,'c': 6})
 2>>> obj_1
 3d    9
 4a    8
 5b    7
 6c    6
 7dtype: int64
 8
 9>>> obj_2 = pd.Series({'f': 5,'e': 4,'d': 3,'c': 2})
10>>> obj_2
11f    5
12e    4
13d    3
14c    2
15dtype: int64
16
17>>> obj_1 + obj_2
18a     NaN
19b     NaN
20c     8.0
21d    12.0
22e     NaN
23f     NaN
24dtype: float64

本篇文章来源于微信公众号: 码农设计师

Previous article【Python数据分析】13.Numpy线性代数函数及文件读写

Next article【Python数据分析】15.Pandas数据结构——DataFrame类型

欢迎留下您的宝贵建议 Cancel reply

Please enter your comment!

Please enter your name here

You have entered an incorrect email address!

Please enter your email address here

【Python数据分析】14.Pandas数据结构——Series类型

1.Pandas库介绍：

2.Pandas库的理解：

3.Pandas库的Series类型：

3.1 Series类型介绍

3.2 Series类型的创建

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

欢迎留下您的宝贵建议 Cancel reply

Most Popular

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

【Python内置函数】help()函数

Recent Comments

EDITOR PICKS

RSS

3D Map Generator Terrain

1.ENVI软件操作基础——窗口介绍及打开、浏览数...

POPULAR POSTS

【Python数据分析】47.数据聚合和分组操作—...

【ArcGIS小操作】53.点集空间特征分析

【数据可视化（ECharts篇）】18.EChar...

POPULAR CATEGORY