Python Pandas是一个Python数据分析库。它可以读取,过滤和重新排列大小数据集,并以包括Excel在内的多种格式输出它们。
Pandas使用Xlwt模块(用于xls文件)和Openpyxl或XlsxWriter模块(用于xlsx文件)写入Excel 文件。
将XlsxWriter与Pandas一起使用
要将XlsxWriter与Pandas一起使用,请将其指定为Excel writer 引擎:
import pandas as pd # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter') # Convert the dataframe to an XlsxWriter Excel object. df.to_excel(writer, sheet_name='Sheet1') # Close the Pandas Excel writer and output the Excel file. writer.save()
其输出如下所示:
请参阅示例:Pandas Excel示例中的完整示例。
从熊猫访问XlsxWriter
为了将图表,条件格式设置和列格式设置等XlsxWriter功能应用于Pandas输出,我们需要访问基础 工作簿和工作表对象。之后,我们可以将它们视为普通的XlsxWriter对象。
从上面的示例继续,我们执行以下操作:
import pandas as pd # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) # Create a Pandas Excel writer using XlsxWriter as the engine. writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter') # Convert the dataframe to an XlsxWriter Excel object. df.to_excel(writer, sheet_name='Sheet1') # Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1']
单独使用XlsxWriter时,这等效于以下代码:
workbook = xlsxwriter.Workbook('filename.xlsx') worksheet = workbook.add_worksheet()
然后可以使用Workbook和Worksheet对象访问其他XlsxWriter功能,请参见下文。
将图表添加到数据框输出
如上一节所示,一旦有了Workbook和Worksheet对象,我们就可以使用它们来应用其他功能,例如添加图表:
# Get the xlsxwriter objects from the dataframe writer object. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Create a chart object. chart = workbook.add_chart({'type': 'column'}) # Configure the series of the chart from the dataframe data. chart.add_series({'values': '=Sheet1!$B$2:$B$8'}) # Insert the chart into the worksheet. worksheet.insert_chart('D2', chart)
输出如下所示:
请参阅示例:带有图表的Pandas Excel输出中的完整示例。
向数据框输出添加条件格式
另一种选择是应用这样的条件格式:
# Apply a conditional format to the cell range. worksheet.conditional_format('B2:B8', {'type': '3_color_scale'})
这将给:
请参阅示例:条件格式的Pandas Excel输出中的完整示例。
数据框输出的格式
XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.
If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.
For example it is possible to set the default date and datetime formats via the Pandas interface:
writer = pd.ExcelWriter("pandas_datetime.xlsx", engine='xlsxwriter', datetime_format='mmm d yyyy hh:mm:ss', date_format='mmmm dd yyyy')
Which would give:
See the full example at Example: Pandas Excel output with datetimes.
It is possible to format any other, non date/datetime column data using
set_column():
# Add some cell formats. format1 = workbook.add_format({'num_format': '#,##0.00'}) format2 = workbook.add_format({'num_format': '0%'}) # Set the column width and format. worksheet.set_column('B:B', 18, format1) # Set the format but not the column width. worksheet.set_column('C:C', None, format2)
Note: This feature requires Pandas >= 0.16.
See the full example at Example: Pandas Excel output with column formatting.
Formatting of the Dataframe headers
Pandas writes the dataframe header with a default cell format. Since it is a
cell format it cannot be overridden using set_row(). If you wish to
use your own format for the headings then the best approach is to turn off the
automatic header from Pandas and write your own. For example:
# Turn off the default header and skip one row to allow us to insert a # user defined header. df.to_excel(writer, sheet_name='Sheet1', startrow=1, header=False) # Get the xlsxwriter workbook and worksheet objects. workbook = writer.book worksheet = writer.sheets['Sheet1'] # Add a header format. header_format = workbook.add_format({ 'bold': True, 'text_wrap': True, 'valign': 'top', 'fg_color': '#D7E4BC', 'border': 1}) # Write the column headers with the defined format. for col_num, value in enumerate(df.columns.values): worksheet.write(0, col_num + 1, value, header_format)
See the full example at Example: Pandas Excel output with user defined header format.
Handling multiple Pandas Dataframes
It is possible to write more than one dataframe to a worksheet or to several worksheets. For example to write multiple dataframes to multiple worksheets:
# Write each dataframe to a different worksheet. df1.to_excel(writer, sheet_name='Sheet1') df2.to_excel(writer, sheet_name='Sheet2') df3.to_excel(writer, sheet_name='Sheet3')
See the full example at Example: Pandas Excel with multiple dataframes.
It is also possible to position multiple dataframes within the same worksheet:
# Position the dataframes in the worksheet. df1.to_excel(writer, sheet_name='Sheet1') # Default position, cell A1. df2.to_excel(writer, sheet_name='Sheet1', startcol=3) df3.to_excel(writer, sheet_name='Sheet1', startrow=6) # Write the dataframe without the header and index. df4.to_excel(writer, sheet_name='Sheet1', startrow=7, startcol=4, header=False, index=False)
See the full example at Example: Pandas Excel dataframe positioning.
Passing XlsxWriter constructor options to Pandas
XlsxWriter supports several Workbook() constructor options such as
strings_to_urls(). These can also be applied to the Workbook object
created by Pandas as follows:
writer = pd.ExcelWriter('pandas_example.xlsx', engine='xlsxwriter', options={'strings_to_urls': False})
Saving the Dataframe output to a string
It is also possible to write the Pandas XlsxWriter DataFrame output to a byte array:
import pandas as pd import io # Create a Pandas dataframe from the data. df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) output = io.BytesIO() # Use the BytesIO object as the filehandle. writer = pd.ExcelWriter(output, engine='xlsxwriter') # Write the data frame to the BytesIO object. df.to_excel(writer, sheet_name='Sheet1') writer.save() xlsx_data = output.getvalue() # Do something with the data...
Note: This feature requires Pandas >= 0.17.