借助软件合成(有些是收费的)或者在线的(不安全),也可以使用强大的python库PyPDF2
PyPDF2安装
pip或者conda
代码
# -*- coding:utf-8*-
from PyPDF2 import PdfFileMerger
#保存需要合并的文件路径,我这里是本地路径,也可以借助os等库实现合并一个文件夹下所有pdf
paths = ['PDF1.pdf', 'PDF2.pdf']
# merge_pdfs(paths, output='merged.pdf')
file_merger = PdfFileMerger()
for pdf in paths:
file_merger.append(pdf)
file_merger.write("merge.pdf")
可以说是非常简单了
遇到的问题
运行时,提示如下错误:
latin-1′ codec can’t encode characters in position 8-11: ordinal not in range(256)
网上查阅得知是中文编码的原因。
解决
需要更改PyPDF2库中的代码,首先找到PyPDF2的安装位置,我的是D:\Anaconda\Lib\site-packages\PyPDF2,然后改下面两个文件:
- generic.py
大概在483-488行,原代码是:
try:
return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
# Name objects should represent irregular characters
# with a '#' followed by the symbol's hex number
if not pdf.strict:
warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
return NameObject(name)
else:
raise utils.PdfReadError("Illegal character in Name Object")
改为:
try:
return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
try:
return NameObject(name.decode('gbk'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
# Name objects should represent irregular characters
# with a '#' followed by the symbol's hex number
if not pdf.strict:
warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
return NameObject(name)
else:
raise utils.PdfReadError("Illegal character in Name Object")
- utils.py
大概在238行,原代码为:
r = s.encode('latin-1')
if len(s) < 2:
bc[s] = r
return r
改为:
try:
r = s.encode('latin-1')
if len(s) < 2:
bc[s] = r
return r
except Exception as e:
print(s)
r = s.encode('utf-8')
if len(s) < 2:
bc[s] = r
return r
完成。
参考文章:https://www.codenong.com/cs105218309/
题外话
第一次发布文章!!!!!!
2021年希望俺和小明一切顺利!!!!!!