Python代码库之PDF转换为图片
安装
1、下载后,在系统中环境变量中配置一下poppler的bin目录
http://macappstore.org/poppler/
2、运行
pip install pdf2image
在国内可以用下面的方式加速
pip3 install -i https://pypi.douban.com/simple pdf2image
样例代码
from pdf2image import convert_from_path, convert_from_bytes
pdf_pdf='D:\\pythondev\\dev\\abc.pdf'
outpath='D:\\pythondev\\dev\\output'
#指定一下文件格式,避免 MemoryError
images = convert_from_path('D:\\pythondev\\dev\\abc.pdf', fmt='jpeg')
#还可以设置一下输出目录
images_from_path = convert_from_path(pdf_pdf, output_folder=outpath, fmt='png')
- paths_only
参数将返回图像路径而不是Image对象,以防止在转换大文件时发生内存泄露
size parameter allows you to define the shape of the resulting images (-scale-to in pdftoppm CLI)
size=400 will fit the image to a 400x400 box, preserving aspect ratio
size=(400, None) will make the image 400 pixels wide, preserving aspect ratio
size=(500, 500) will resize the image to 500x500 pixels, not preserving aspect ratio
grayscale parameter allows you to convert images to grayscale (-gray in pdftoppm CLI)
single_file parameter allows you to convert the first PDF page only, without adding digits at the end of the output_file
Allow the user to specify poppler's installation path with poppler_path
Fixed a bug where PNGs buffer with a non-terminating I-E-N-D sequence would throw an exception
Fixed a bug that left open file descriptors when using convert_from_bytes() (Thank you @FabianUken)
fmt='tiff' parameter allows you to create .tiff files (You need pdftocairo for this)
相关参数配置
convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None)
convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None)
参考链接
- https://pypi.org/project/pdf2image/
- https://stackoverflow.com/questions/56471728/how-to-solve-memoryerror-using-python-3-7-pdf2image-library