【Python数据分析】45.数据规整——重塑和透视2（数据透视）

2023年1月25日

676

本系列文章配套代码获取有以下三种途径：

可以在以下网站查看，该网站是使用JupyterLite搭建的web端Jupyter环境，因此无需在本地安装运行环境即可使用，首次运行浏览器需要下载一些配置文件（大约20M）：

https://returu.github.io/Python_Data_Analysis/lab/index.html

也可以通过百度网盘获取，需要在本地配置代码运行环境，环境配置可以查看【Python基础】2.搭建Python开发环境：

链接：https://pan.baidu.com/s/1MYkeYeVAIRqbxezQECHwcA?pwd=mnsj 提取码：mnsj

前往GitHub详情页面，单击 code 按钮，选择Download ZIP选项：

https://github.com/returu/Python_Data_Analysis

根据《Python for Data Analysis 3rd Edition》翻译整理

—————————————————–

1.将“长”转换为“宽”格式：

以下面的DataFrame为例：

 1>>> df = pd.DataFrame({"Style":["one", "two", "three","one", "two", "three","one", "two", "three"],
 2...                   "variable":["A","A","A","B","B","B","C","C","C"],
 3...                   "value":[1,2,3,4,5,6,7,8,9]})
 4
 5>>> df
 6   Style variable  value
 70    one        A      1
 81    two        A      2
 92  three        A      3
103    one        B      4
114    two        B      5
125  three        B      6
136    one        C      7
147    two        C      8
158  three        C      9

pivot方法中传入的前两个值分别用于生成结果中的行和列索引，然后以可选的数值列填充DataFrame。

 1>>> df_pivot = df.pivot(index="Style",columns="variable",values="value")
 2>>> df_pivot
 3variable  A  B  C
 4Style
 5one       1  4  7
 6three     3  6  9
 7two       2  5  8
 8
 9>>> df_pivot.index
10Index(['one', 'three', 'two'], dtype='object', name='Style')
11
12>>> df_pivot.columns
13Index(['A', 'B', 'C'], dtype='object', name='variable')

如果想重塑时保留两个数据列的话，去除最后一个参数即可。

 1>>> df['value2'] = [10,11,12,13,14,15,16,17,18]
 2>>> df
 3   Style variable  value  value2
 40    one        A      1      10
 51    two        A      2      11
 62  three        A      3      12
 73    one        B      4      13
 84    two        B      5      14
 95  three        B      6      15
106    one        C      7      16
117    two        C      8      17
128  three        C      9      18
13
14>>> df.pivot(index="Style",columns="variable")
15         value       value2
16variable     A  B  C      A   B   C
17Style
18one          1  4  7     10  13  16
19three        3  6  9     12  15  18
20two          2  5  8     11  14  17

pivot方法等价于使用set_index方法创建分层索引，然后再调用unstack方法。

 1>>> df.set_index(["Style","variable"])
 2                value  value2
 3Style variable
 4one   A             1      10
 5two   A             2      11
 6three A             3      12
 7one   B             4      13
 8two   B             5      14
 9three B             6      15
10one   C             7      16
11two   C             8      17
12three C             9      18
13
14>>> df.set_index(["Style","variable"]).unstack()
15         value       value2
16variable     A  B  C      A   B   C
17Style
18one          1  4  7     10  13  16
19three        3  6  9     12  15  18
20two          2  5  8     11  14  17

如果指定的 index + columns 构成的数据里面存在重复的情况，将会报错。

 1>>> df
 2   Style variable  value
 30    one        A      1
 41    two        A      2
 52  three        A      3
 63    one        B      4
 74    two        B      5
 85  three        B      6
 96    one        C      7
107    two        C      8
118  three        C      9
12
13# 修改第二行第一列的数据
14>>> df.iloc[1,0] = "one"
15>>> df
16   Style variable  value
170    one        A      1
181    one        A      2
192  three        A      3
203    one        B      4
214    two        B      5
225  three        B      6
236    one        C      7
247    two        C      8
258  three        C      9
26>>> df.pivot(index="Style",columns="variable")
27ValueError: Index contains duplicate entries, cannot reshape

2.将“宽”转换为“长”格式：

在DataFrame中，pivot方法的反操作是pandas.melt方法。该方法将多列合并为一列，生成一个新的DataFrame，其长度比输入更长。

使用melt方法时需要指明哪些列是分组指标（如果有的话）。

 1>>> df = pd.DataFrame({'Style':['one','two','three'],'A':[1,2,3],'B':[4,5,6],'C':[7,8,9]})
 2>>> df
 3   Style  A  B  C
 40    one  1  4  7
 51    two  2  5  8
 62  three  3  6  9
 7
 8# 将Style列作为分组指标
 9>>> df_melted = pd.melt(df,id_vars=['Style'])
10>>> df_melted
11   Style variable  value
120    one        A      1
131    two        A      2
142  three        A      3
153    one        B      4
164    two        B      5
175  three        B      6
186    one        C      7
197    two        C      8
208  three        C      9

使用pivot方法，可以将数据重塑会原来的布局。

由于pivot之后的结果根据作为行标签的列生成索引，此时需要使用reset_index方法将数据回移一列。

 1>>> df_re = df_melted.pivot(index="Style",columns="variable",values="value")
 2>>> df_re
 3variable  A  B  C
 4Style
 5one       1  4  7
 6three     3  6  9
 7two       2  5  8
 8
 9>>> df_re.reset_index()
10variable  Style  A  B  C
110           one  1  4  7
121         three  3  6  9
132           two  2  5  8

也可以指定列的子集作为值列。

 1>>> df
 2   Style  A  B  C
 30    one  1  4  7
 41    two  2  5  8
 52  three  3  6  9
 6
 7>>> pd.melt(df , id_vars=["Style"] , value_vars=["A","B"])
 8   Style variable  value
 90    one        A      1
101    two        A      2
112  three        A      3
123    one        B      4
134    two        B      5
145  three        B      6
15
16>>> pd.melt(df , id_vars=["Style"] , value_vars=["A"])
17   Style variable  value
180    one        A      1
191    two        A      2
202  three        A      3

pandas.melt 方法也可以无须任何分组指标。

 1>>> df
 2   Style  A  B  C
 30    one  1  4  7
 41    two  2  5  8
 52  three  3  6  9
 6
 7>>> pd.melt(df ,value_vars=["A","B"])
 8  variable  value
 90        A      1
101        A      2
112        A      3
123        B      4
134        B      5
145        B      6
15
16>>> pd.melt(df ,value_vars=["Style","A","B"])
17  variable  value
180    Style    one
191    Style    two
202    Style  three
213        A      1
224        A      2
235        A      3
246        B      4
257        B      5
268        B      6

本篇文章来源于微信公众号: 码农设计师

Previous article【Python数据分析】44.数据规整——重塑和透视1（使用多层索引进行重塑）

Next article【Python数据分析】46.数据聚合和分组操作——groupby方法1

【Python数据分析】45.数据规整——重塑和透视2（数据透视）

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

欢迎留下您的宝贵建议 Cancel reply

Most Popular

【Python计算生态】Dooit——待办事项管理...

【Python内置函数】hex()函数

【Python计算生态】Black——代码格式化工...

【Python内置函数】help()函数

Recent Comments

EDITOR PICKS

RSS

3D Map Generator Terrain

1.ENVI软件操作基础——窗口介绍及打开、浏览数...

POPULAR POSTS

【Python数据分析】18.索引、选择、过滤

【数据可视化(Matplotlib篇)】7.调整坐...

【ArcGIS工具箱】232.连接——添加连接

POPULAR CATEGORY