描述:
xlrd模块的功能主要负责对Excel文件进行读取操作,xlrd官网将其描述为:xlrd is a library for reading data and formatting information from Excel files in the historical .xls format.【注意historical】
注意:
该模块只可以读取xls格式的Excel文件,xls格式与xlms格式的区别简单来讲主要在于Excel对宏的处理上,xlrd官网将其描述为:This library will no longer read anything other than .xls
files. For alternatives that read newer file formats, please see http://www.python-excel.org/.
警告:
截止2022年09月06日更新时,官方显示以下模块将不在继续支持,在安全性上可以继续保证,但在其可靠性上不在保证,xlrd官方将描述为:The following are also not supported but will safely and reliably be ignored:
- Charts, Macros, Pictures, any other embedded object, including embedded worksheets.
- VBA modules
- Formulas, but results of formula calculations are extracted.
- Comments
- Hyperlinks
- Autofilters, advanced filters, pivot tables, conditional formatting, data validation
Password-protected files are not supported and cannot be read by this library.
建议:
当使用Python在进行数据可视化处理时,将Python与Excel在数据可视化处理中各自应发挥的优势和功能做好定位和区分,减少试图通过Python来影响Excel内容或希望Python可以一键操作数据的要求,各自发挥各自的优势,这也是Python的精髓,Python官网将其描述为:Python is a programming language that lets you work quickly and integrate systems more effectively
官网地址:
https://xlrd.readthedocs.io/en/latest/
xlwt API:
https://xlrd.readthedocs.io/en/latest/api.html
- xlrd常用类有
1. book类
《该类用于链接工作簿,但该类不建议直接实例化使用,需要使用open_workbook_xls函数进行工作簿链接,官方将其描述为:Contents of a "workbook".warning::You should not instantiate this class yourself. You use the : class:Book
object that was returned when you called :func:~xlrd.open_workbook
.》
2. sheet类
《该类用于链接工作表,该类与book类一样均不建议直接实例化使用,可由book类提供的sheet_by_name()函数返回的sheet类实例进行操作,官方将其描述为:Contains the data for one worksheet. In the cell access functions, rowx
is a row index, counting from zero, and colx
is a column index, counting from zero. Negative values for row/column indexes and slice positions are supported in the expected fashion. For information about cell types and cell values, refer to the documentation of the :class:Cell
class.
.. warning:: You don't instantiate this class yourself. You access :class:Sheet
objects via the :class:~xlrd.book.Book
object that was returned when you called :func:xlrd.open_workbook
.》
-book类常用方法
1. xlrd.book.open_workbook_xls(*)
《该函数用来打开工作簿,官方描述为:Open a spreadsheet file for data extraction.该函数接受若干参数,其中用的最多的为filename参数和formatting_info参数,其中filename参数指定要打开的工作簿,formatting_info参数指定是否读取工作簿的样式》
- 部分源码如下
def open_workbook_xls(filename=None,
logfile=sys.stdout, verbosity=0, use_mmap=True,
file_contents=None,
encoding_override=None,
formatting_info=False, on_demand=False, ragged_rows=False,
ignore_workbook_corruption=False):
:param filename:
The path to the spreadsheet file to be opened.
:param logfile:
An open file to which messages and diagnostics are written.
:param verbosity:
Increases the volume of trace material written to the logfile.
:param use_mmap:
Whether to use the mmap module is determined heuristically.Use this arg to override
the result.Current heuristic: mmap is used if it exists.
:param file_contents:
A string or an :class:`mmap.mmap` object or some other behave-alike object. If
``file_contents`` is supplied, ``filename`` will not be used, except (possibly)
in messages.
:param encoding_override:
Used to overcome missing or bad codepage information in older-version files. See :
doc:`unicode`.
:param formatting_info:
The default is ``False``, which saves memory. In this case, "Blank" cells, which
are those with their own formatting information but no data, are treated as empty
by ignoring the file's ``BLANK`` and ``MULBLANK`` records. This cuts off any bottom
or right "margin" of rows of empty or blank cells. Only :meth:`~xlrd.sheet.Sheet.cell_value`
and :meth:`~xlrd.sheet.Sheet.cell_type` are available.
When ``True``, formatting information will be read from the spreadsheet file. This
provides all cells, including empty and blank cells. Formatting information is available
for each cell.
Note that this will raise a NotImplementedError when used with an xlsx file.
:param on_demand:
Governs whether sheets are all loaded initially or when demanded by the caller. See :doc:
`on_demand`.
:param ragged_rows:
The default of ``False`` means all rows are padded out with empty cells so that all rows
have the same size as found in :attr:`~xlrd.sheet.Sheet.ncols`.
``True`` means that there are no empty cells at the ends of rows. This can result in
substantial memory savings if rows are of widely varying sizes. See also the :meth:
`~xlrd.sheet.Sheet.row_len` method.
:param ignore_workbook_corruption:
This option allows to read corrupted workbooks.
When ``False`` you may face CompDocError: Workbook corruption.
When ``True`` that exception will be ignored.
:returns:
An instance of the :class:`~xlrd.book.Book` class.
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
2.xlrd.book.nsheets
《该方法为book类属性,记录了该工作簿的工作表(sheet)数量》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
print(book.nsheets)
输出:1
3.xlrd.book.sheets()
《返回整个工作簿的所有工作表(sheet)并以数组形式返回》
- 部分源码如下
def sheets(self):
"""
:returns: A list of all sheets in the book.
All sheets not already loaded will be loaded.
"""
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
print(book.nsheets)
print(book.sheet_names())
print(book.sheets())
输出:1
输出:FirstSheet
输出:[Sheet 0:<FistSheet>]
4.xlrd.book.sheet_names()
《返回工作簿中所有工作表(sheet)的名称》
- 部分源码如下
def sheet_names(self):
"""
:returns:
A list of the names of all the worksheets in the workbook file.
This information is available even when no sheets have yet been
loaded.
"""
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
print(book.nsheets)
print(book.sheet_names())
输出:1
输出:FirstSheet
5.xlrd.book.sheet_loaded(sheet_name_or_index)
《返回参数指定的工作表是否装载完毕,若是返回TRUE,否则返回FALSE》
- 部分源码如下
def sheet_loaded(self, sheet_name_or_index):
"""
:param sheet_name_or_index: Name or index of sheet enquired upon
:returns: ``True`` if sheet is loaded, ``False`` otherwise.
.. versionadded:: 0.7.1
"""
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
print(book.nsheets)
print(book.sheet_names())
print(book.sheets())
print(book.sheet_loaded(0))
print(book.sheet_loaded(book.sheet_names()[0]))
print(book.sheet_loaded("FirstSheet"))
输出:1
输出:FirstSheet
输出:[Sheet 0:<FistSheet>]
输出:TRUE
输出:TRUE
输出:TRUE
6.xlrd.book.sheet_by_name(sheet_name)
7.xlrd.book.sheet_by_index(sheetx)
《以上两个函数都用来打开工作表,第一个函数使用工作表的名称打开,第二个函数使用工作表在工作簿中的索引打开》
- 部分源码如下
def sheet_by_name(self, sheet_name):
"""
:param sheet_name: Name of the sheet required.
:returns: A :class:`~xlrd.sheet.Sheet`.
"""
def sheet_by_index(self, sheetx):
"""
:param sheetx: Sheet index in ``range(nsheets)``
:returns: A :class:`~xlrd.sheet.Sheet`.
"""
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
openFirstBook2 = book.sheet_by_index(0)
print(openFirstBook1.name)
print(openFirstBook2.name)
输出:FirstSheet
输出:FirstSheet
8.xlrd.book.xf_list
《该方法是book类的属性,记录了该工作簿中工作表的样式集合,只有将open_workbook_xls()函数中,formatting_info设置为TRUE时,xf_list属性才会保存样式,官方描述为:A list of :class:~xlrd.formatting.XF
class instances, each corresponding to an XF
record. 返回集合的索引由具体工作表中的cell_xf_index()函数返回的单元格样式提供》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
print(book.xf_list[0])
第一个输出:<xlrd.formatting.XF object at 0x10bf2a550>
9. xlrd.book.font_list
《该方法同样是book类的属性,记录了该工作簿中单元格的字体样式集合,官方描述为:A list of :class:~xlrd.formatting.XF
class instances, each corresponding to an XF
record. 注意!返回集合的索引由单元格样式的字体索引提供,而不是由各工作表中cell_xf_index()函数指定的单元格样式提供》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
firstBook = book.sheet_by_name(book.sheet_names()[0])
cellStyle = firstBook.cell_xf_index(1,0)
xfStyle = book.xf_list[cellStyle]
#注意book.font_list的参数
xfFontStyle = book.font_list[xfStyle.font_index]
print(xfStyle.font_index)
print(xfFontStyle.name)
print(xfFontStyle.italic)
第一个输出:9
第二个输出:Arial
第三个输出:0
- sheet类常用方法
1. xlrd.sheet.nrows
2. xlrd.sheet.ncols
《以上两个方法为sheet类的属性,记录了该工作表的总行数和总列数》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.nrows,openFirstBook1.ncols)
输出:4 4
3. xlrd.sheet.cell_value(rowx, colx)
4. xlrd.sheet.cell_type(rowx, colx)
5. xlrd.sheet.cell_xf_index(rowx, colx)
6. xlrd.sheet.cell(rowx, colx)
《函数3返回参数指定的单元格内容,参数由行数和列数构成,函数4返回参数指定的单元格类型,函数5返回参数指定的单元格样式索引,该索引来自book.Book.xf_list数组保存的单元格样式,函数6返回参数指定的单元格类型、内容、样式索引》
- xlrd.sheet.cell_xf_index(rowx,colx)部分源码
def cell_xf_index(self, rowx, colx):
"""
XF index of the cell in the given row and column.
This is an index into :attr:`~xlrd.book.Book.xf_list`.
.. versionadded:: 0.6.1
"""
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.cell_value(0,0))
print(openFirstBook1.cell_type(0,0))
print(openFirstBook1.cell_xf_index(0,0))
print(openFirstBook1.cell(0,0))
第一个输出:林州欢迎你
第二个输出:1
第三个输出:17
第四个输出:text:'林州欢迎你' (XF:17)
其中type项输出为1,1代表了单元格的数据类型,详细列表如下
0:empty
1:text
2:number
3:date
4:boolean
5:error
6. xlrd.sheet.row_len(rowx)
7. xlrd.sheet.row(rowx)
8. xlrd.sheet.get_rows()
《函数6返回参数指定的行所包括的单元格数量,实际上以最大单元格数量所在行的数量为准,函数7返回参数指定的行的内容,信息构成与xrld.sheet.cell()函数返回一致,函数8返回所有行的内容,以迭代器体现》
- 部分源码如下
def row_len(self, rowx):
"""
Returns the effective number of cells in the given row. For use with
``open_workbook(ragged_rows=True)`` which is likely to produce rows
with fewer than :attr:`~Sheet.ncols` cells.
.. versionadded:: 0.7.2
"""
def row(self, rowx):
"""
Returns a sequence of the :class:`Cell` objects in the given row.
"""
def get_rows(self):
"Returns a generator for iterating through each row."
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.row_len(16))
print(openFirstBook1.row(0))
print(openFirstBook1.get_rows())
for key in openFirstBook1.get_rows():
print(key)
第一个输出:16【16实际上是第2行最大的单元格数量,代码上表示为1】
第二个输出:[text:'林州欢迎你' (XF:57), empty:'' (XF:15), empty:'' (XF:15), empty:'' (XF:15), empty:'' (XF:15), empty:'' (XF:15), empty:'' (XF:15), empty:'' (XF:15)]
第三个输出:<generator object Sheet.get_rows.<locals>.<genexpr> at 0x100f903d0>
第四个输出:每行的内容,类型同第二个输出
9. xlrd.sheet.row_types(rowx, start_colx=0, end_colx=None)
10. xlrd.sheet.row_values(rowx, start_colx=0, end_colx=None)
11. xlrd.sheet.col_types(colx, start_rowx=0, end_rowx=None)
12. xlrd.sheet.col_values(colx, start_rowx=0, end_rowx=None)
《函数9返回参数指定的行和限定列数的类型,函数10返回参数指定的行和限定列数的内容,函数11返回指定的列和限定的行数的类型,函数12返回指定的列和限定的行数的内容》
- 注意
其中函数9、10、11、12中用于限定列数或行数的参数,其本质上是指定取列数或行数的多少,而不是指定其范围,其部分源码实现如下
def row_types(self, rowx, start_colx=0, end_colx=None):
"""
Returns a slice of the types of the cells in the given row.
"""
if end_colx is None:
return self._cell_types[rowx][start_colx:]
return self._cell_types[rowx][start_colx:end_colx]
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.row_types(2,2,3)) #注意这里的2,2,3,代表从第2行开始(实际上是第三行),从2列开始取1列(3-2),而不是从第2列开始取到第3列取2列
print(openFirstBook1.row_values(2,2,3))
print(openFirstBook1.col_types(0,0,17))
print(openFirstBook1.col_values(0,0,17))
第一个输出:array('B', [1])
第二个输出:['红旗渠欢迎你']
第三个输出:[1, 2, 1, 0, 0, 0, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 1]
第四个输出:['林州欢迎你', 12345.0, '阿斯达', '', '', '', '', '', 123.0, '', '', ' 阿斯达', '', '', '', '', ' sad']
13. xlrd.sheet.row_slice(rowx, start_colx=0, end_colx=None)
14. xlrd.sheet.col_slice(olx, start_rowx=0, end_rowx=None)
《函数13返回参数指定的行,限定的列的单元格类型、内容、样式索引,与xlrd_sheet_cell()函数返回值的类型完全相同,函数14则返回指定的列,限定的行的数据,数据构成同上》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.row_slice(1,1,4))
print(openFirstBook1.col_slice(1,1,6))
第一个输出:[text:'太行大峡谷欢迎你' (XF:57), empty:'' (XF:59), number:345.0 (XF:59)]
第二个输出:[text:'太行大峡谷欢迎你' (XF:57), empty:'' (XF:15), empty:'' (XF:15), empty:'' (XF:15), text:'阿斯达' (XF:60)]
15. xlrd.sheet.hyperlink_list
16. xlrd.sheet.hyperlink_map
《方法15是sheet类的属性,记录了工作表中所有的超链接对象,方法16同样也是sheet类的属性,记录了工作表中所有超链接对象及该对象的位置,该位置是代码表示位置而非视觉位置,如第3列,代码位置显示为第4列(起步0),通过这两个方法获得的超链接对象均可通过Hyperlink类属性来进行访问,该类记录了关于超链接的部分内容,官方描述为:Contains the attributes of a hyperlink. Hyperlink objects are accessible through :attr:Sheet.hyperlink_list
and :attr:Sheet.hyperlink_map
.》
- Hyperlink类部分源码如下
class Hyperlink(BaseObject):
#: Index of first row
frowx = None
#: Index of last row
lrowx = None
#: Index of first column
fcolx = None
#: Index of last column
lcolx = None
#: Type of hyperlink. Unicode string, one of 'url', 'unc',
#: 'local file', 'workbook', 'unknown'
type = None
#: The URL or file-path, depending in the type. Unicode string, except
#: in the rare case of a local but non-existent file with non-ASCII
#: characters in the name, in which case only the "8.3" filename is
#: available, as a :class:`bytes` (3.x) or :class:`str` (2.x) string,
#: *with unknown encoding.*
url_or_path = None
#: Description.
#: This is displayed in the cell,
#: and should be identical to the cell value. Unicode string, or ``None``.
#: It seems impossible NOT to have a description created by the Excel UI.
desc = None
#: Target frame. Unicode string.
#:
#: .. note::
#: No cases of this have been seen in the wild.
#: It seems impossible to create one in the Excel UI.
target = None
#: The piece after the "#" in
#: "http://docs.python.org/library#struct_module", or the ``Sheet1!A1:Z99``
#: part when type is "workbook".
textmark = None
g#: The text of the "quick tip" displayed when the cursor
#: hovers over the hyperlink.
quicktip = None
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.hyperlink_list)
print(openFirstBook1.hyperlink_map)
print(openFirstBook1.hyperlink_list[0].url_or_path,openFirstBook1.hyperlink_list[0].textmark,openFirstBook1.hyperlink_list[0].desc,openFirstBook1.hyperlink_list[0].target,openFirstBook1.hyperlink_list[0].quicktip)
第一个输出:[<xlrd.sheet.Hyperlink object at 0x1037e2350>]
第二个输出:{(12, 1): <xlrd.sheet.Hyperlink object at 0x1037e2350>}
第三个输出:https://max.book118.com/html/2019/0109/8115017112002000.shtm None https://max.book118.com/html/2019/0109/8115017112002000.shtm None None
17.xlrd.sheet.merged_cells
《该方法为sheet类的属性,记录了合并单元格的范围,官方描述为:List of address ranges of cells which have been merged》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.merged_cells)
第一个输出:[(3, 4, 0, 2)]
该输出表达的内容同row_values()中第1、2个参数表达的意思相同
即表示为从第4行开始(视觉行:3+1),合并1行(4-3)
从第1列开始(视觉行:0+1),合并2列,(2-0)
18. xlrd.sheet.cell_xf_index(rowx, colx)
《该函数返回参数指定单元格的xf样式,官方描述为:XF index of the cell in the given row and column.This is an index into :attr:~xlrd.book.Book.xf_list
.》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
firstBook = book.sheet_by_name(book.sheet_names()[0])
cellStyle = firstBook.cell_xf_index(1,0)
xfStyle = book.xf_list[cellStyle]
print(xfStyle)
print(xfStyle.protection.cell_locked)
print(xfStyle.protection.formula_hidden)
第一个输出:<xlrd.formatting.XF object at 0x10de2a910>
第二个输出:1
第三个输出:0
19. xlrd.sheet.rowinfo_map
20. xlrd.sheet.colinfor_map
《方法19和方法20均为sheet类的属性,其方法19记录了行的相关信息,官方描述为:Height and default formatting information that applies to a row in a sheet. Derived from ROW
records. 方法20记录了列的相关信息,官方描述为:Width and default formatting information that applies to one or more columns in a sheet. Derived from COLINFO
records.》
- 部分源码如下
class Rowinfo(BaseObject):
if _USE_SLOTS:
__slots__ = (
"height",
"has_default_height",
"outline_level",
"outline_group_starts_ends",
"hidden",
"height_mismatch",
"has_default_xf_index",
"xf_index",
"additional_space_above",
"additional_space_below",
)
class Colinfo(BaseObject):
#: Width of the column in 1/256 of the width of the zero character,
#: using default font (first ``FONT`` record in the file).
width = 0
#: XF index to be used for formatting empty cells.
xf_index = -1
#: 1 = column is hidden
hidden = 0
#: Value of a 1-bit flag whose purpose is unknown
#: but is often seen set to 1
bit1_flag = 0
#: Outline level of the column, in ``range(7)``.
#: (0 = no outline)
outline_level = 0
#: 1 = column is collapsed
collapsed = 0
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
firstBook = book.sheet_by_name(book.sheet_names()[0])
print(firstBook.rowinfo_map[0].height,firstBook.rowinfo_map[0].hidden,firstBook.rowinfo_map[0].xf_index)
print(firstBook.colinfo_map[0].width,firstBook.colinfo_map[0].hidden,firstBook.colinfo_map[0].collapsed)
第一个输出:260 0 -1
第二个输出:7893 0 0
3. xldata类
《该类主要用于单元格时间数据的处理,官方描述为:Tools for working with dates and times in Excel files.The conversion from days
to (year, month, day)
starts with an integral "julian day number" aka JDN.
FWIW:
- JDN 0 corresponds to noon on Monday November 24 in Gregorian year -4713.
More importantly:
- Noon on Gregorian 1900-03-01 (day 61 in the 1900-based system) is JDN 2415080.0
-
Noon on Gregorian 1904-01-02 (day 1 in the 1904-based system) is JDN 2416482.0》
加入时间的工作簿
- xldata类常用方法
1. xlrd.xldata.xldate_as_tuple(xldate, datemode)
《该函数将参数指定的单元格时间数据转换为时间元组类型的数据,detemode指定要转换基准时间类型,0表示从1900-01-01号开始计算,1表示从1904-01-01号开始计算,官方描述为:Convert an Excel number (presumed to represent a date, a datetime or a time) into a tuple suitable for feeding to datetime or mx.DateTime constructors.》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.col_values(3,9)[0])
convertData1 = xlrd.xldate.xldate_as_tuple(openFirstBook1.col_values(3,9)[0],0)
convertData2 = xlrd.xldate.xldate_as_tuple(openFirstBook1.col_values(3,9)[0],1)
print(convertData1)
print(convertData2)
#元组数据操作
serialData = '%d-%d-%d'%(convertData1[0],convertData1[1],convertData1[2])
print(serialData)
第一个输出:44812.0
第二个输出:(2022, 9, 8, 0, 0, 0)
第三个输出:(2026, 9, 9, 0, 0, 0)
第四个输出:2022-9-8
2. xlrd.xldata.xldate_as_datetime(xldate, datemode)
《该函数将参数指定的单元格时间数据转换为时间类型的数据,detemode参数同xlrd.xldata.xldata_as_tuple()函数相同,官方描述为: Convert an Excel date/time number into a :class:datetime.datetime
object.》
例:
book = xlrd.open_workbook_xls("TheRedCanyon.xls",formatting_info=True)
openFirstBook1 = book.sheet_by_name(book.sheet_names()[0])
print(openFirstBook1.col_values(3,9)[0])
converData1 = xlrd.xldate.xldate_as_datetime(openFirstBook1.col_values(3,9)[0],1)
converData2 = xlrd.xldate.xldate_as_datetime(openFirstBook1.col_values(3,9)[0],0)
print(converData1)
print(converData2)
#时间对象处理
print(convertData2.strftime("%Y/%m/%d"))
第一个输出:2026-09-09 00:00:00
第二个输出:2022-09-08 00:00:00
第三个输出:2022/09/08