Working with Python Pandas and XlsxWriter

Python Pandas是一个Python数据分析库。它可以读取,过滤和重新排列大小数据集,并以包括Excel在内的多种格式输出它们。

Pandas使用Xlwt模块(用于xls文件)和Openpyxl或XlsxWriter模块(用于xlsx文件)写入Excel 文件。

将XlsxWriter与Pandas一起使用

要将XlsxWriter与Pandas一起使用,请将其指定为Excel writer 引擎

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

其输出如下所示:

_images / pandas_simple.png

请参阅示例:Pandas Excel示例中的完整示例

从熊猫访问XlsxWriter

为了将图表,条件格式设置和列格式设置等XlsxWriter功能应用于Pandas输出,我们需要访问基础 工作簿工作表对象。之后,我们可以将它们视为普通的XlsxWriter对象。

从上面的示例继续,我们执行以下操作:

import pandas as pd

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

单独使用XlsxWriter时,这等效于以下代码:

workbook  = xlsxwriter.Workbook('filename.xlsx')
worksheet = workbook.add_worksheet()

然后可以使用Workbook和Worksheet对象访问其他XlsxWriter功能,请参见下文。

将图表添加到数据框输出

如上一节所示,一旦有了Workbook和Worksheet对象,我们就可以使用它们来应用其他功能,例如添加图表:

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Create a chart object.
chart = workbook.add_chart({'type': 'column'})

# Configure the series of the chart from the dataframe data.
chart.add_series({'values': '=Sheet1!$B$2:$B$8'})

# Insert the chart into the worksheet.
worksheet.insert_chart('D2', chart)

输出如下所示:

_images / pandas_chart.png

请参阅示例:带有图表的Pandas Excel输出中的完整示例

向数据框输出添加条件格式

另一种选择是应用这样的条件格式:

# Apply a conditional format to the cell range.
worksheet.conditional_format('B2:B8', {'type': '3_color_scale'})

这将给:

_images / pandas_conditional.png

请参阅示例:条件格式的Pandas Excel输出中的完整示例

数据框输出的格式

XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.

If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.

For example it is possible to set the default date and datetime formats via the Pandas interface:

writer = pd.ExcelWriter("pandas_datetime.xlsx",
                        engine='xlsxwriter',
                        datetime_format='mmm d yyyy hh:mm:ss',
                        date_format='mmmm dd yyyy')

Which would give:

_images / pandas_datetime.png

See the full example at Example: Pandas Excel output with datetimes.

It is possible to format any other, non date/datetime column data using set_column():

# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})

# Set the column width and format.
worksheet.set_column('B:B', 18, format1)

# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)
_images / pandas_column_formats.png

Note: This feature requires Pandas >= 0.16.

See the full example at Example: Pandas Excel output with column formatting.

Formatting of the Dataframe headers

Pandas writes the dataframe header with a default cell format. Since it is a cell format it cannot be overridden using set_row(). If you wish to use your own format for the headings then the best approach is to turn off the automatic header from Pandas and write your own. For example:

# Turn off the default header and skip one row to allow us to insert a
# user defined header.
df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False)

# Get the xlsxwriter workbook and worksheet objects.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Add a header format.
header_format = workbook.add_format({
    'bold': True,
    'text_wrap': True,
    'valign': 'top',
    'fg_color': '#D7E4BC',
    'border': 1})

# Write the column headers with the defined format.
for col_num, value in enumerate(df.columns.values):
    worksheet.write(0, col_num + 1, value, header_format)
_images / pandas_header_format.png

See the full example at Example: Pandas Excel output with user defined header format.

Handling multiple Pandas Dataframes

It is possible to write more than one dataframe to a worksheet or to several worksheets. For example to write multiple dataframes to multiple worksheets:

# Write each dataframe to a different worksheet.
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')

See the full example at Example: Pandas Excel with multiple dataframes.

It is also possible to position multiple dataframes within the same worksheet:

# Position the dataframes in the worksheet.
df1.to_excel(writer, sheet_name='Sheet1')  # Default position, cell A1.
df2.to_excel(writer, sheet_name='Sheet1', startcol=3)
df3.to_excel(writer, sheet_name='Sheet1', startrow=6)

# Write the dataframe without the header and index.
df4.to_excel(writer, sheet_name='Sheet1',
             startrow=7, startcol=4, header=False, index=False)
_images / pandas_positioning.png

See the full example at Example: Pandas Excel dataframe positioning.

Passing XlsxWriter constructor options to Pandas

XlsxWriter supports several Workbook() constructor options such as strings_to_urls(). These can also be applied to the Workbook object created by Pandas as follows:

writer = pd.ExcelWriter('pandas_example.xlsx',
                        engine='xlsxwriter',
                        options={'strings_to_urls': False})

Saving the Dataframe output to a string

It is also possible to write the Pandas XlsxWriter DataFrame output to a byte array:

import pandas as pd
import io

# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

output = io.BytesIO()

# Use the BytesIO object as the filehandle.
writer = pd.ExcelWriter(output, engine='xlsxwriter')

# Write the data frame to the BytesIO object.
df.to_excel(writer, sheet_name='Sheet1')

writer.save()
xlsx_data = output.getvalue()

# Do something with the data...

Note: This feature requires Pandas >= 0.17.

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容