alpine容器
ali仓库版本:
https://mirrors.aliyun.com/alpine/v3.16/main/
https://mirrors.aliyun.com/alpine/v3.16/community/
宿主机docker版本要求:
docker version 20.10.8
使用apk安装基础环境:
apk add python3 python3-dev py3-pip gcc g++ make --allow-untrusted
安装环境python版本: python 3.10
注:不建议安装PyMuPDF,非常的慢,而且容易报错,坑多且不好平,可以用pdf2image替代
pdf2image介绍
废话不多,直接上dockerfile
FROM 容器地址
RUN echo 'https://mirrors.aliyun.com/alpine/v3.16/main/' > /etc/apk/repositories
RUN echo 'https://mirrors.aliyun.com/alpine/v3.16/community/' >> /etc/apk/repositories
RUN cat /etc/apk/repositories
RUN apk update --allow-untrusted
#python3基础环境
RUN apk add python3 python3-dev py3-pip gcc g++ make --allow-untrusted
#pdf转图片
RUN apk add poppler poppler-utils --allow-untrusted
RUN pip3 install poppler-utils
RUN pip3 install pdf2image
#pdf提取文字
RUN pip3 install PyPDF2
RUN pip3 install python-pptx
#libreoffice
RUN apk add libreoffice openjdk13-jre-headless freetype freetype-dev --allow-untrusted
RUN mkdir /usr/share/fonts
#字体文件,不然字体会解析失败
COPY docker/msyh.ttf /usr/share/fonts
RUN chmod 777 /usr/share/fonts
libreoffice实现pptx转pdf命令:
soffice --headless --convert-to pdf ./test.pptx --outdir ./