Python convert doc to pdf linux

Преобразование docx в pdf с помощью чистого python (на linux, без libreoffice)

Я имею дело с проблемой, пытаясь разработать веб-приложение, часть которого преобразует загруженные файлы DOCX в файлы PDF (после некоторой обработки). С python-docx и другими методами мне не требуется машина Windows с установленным словом или даже libreoffice на linux для большей части обработки (мой веб-сервер pythonanywhere — linux, но без libreoffice и без sudo или apt install разрешения). Но конвертация в pdf, похоже, требует одного из них. От изучения вопросов здесь и в другом месте, это то, что я до сих пор:

Как видите, один метод требует comtypes , другой требует libreoffice в качестве подпроцесса. Есть ли какое-нибудь решение, кроме перехода на более сложный хостинг-сервер?

2 ответа

Справочные страницы PythonAnywhere предлагают информацию о работе с файлами PDF здесь: https://help.pythonanywhere.com/pages/ PDF

Описание: PythonAnywhere имеет несколько пакетов Python для работы с PDF, и один из них может делать то, что вы хотите. Однако обстреливать abiword мне кажется проще всего. Команда оболочки abiword —to=pdf filetoconvert.docx преобразует файл docx в PDF и создаст файл с именем filetoconvert.pdf в том же каталоге, что и docx. Обратите внимание, что эта команда выведет сообщение об ошибке в стандартный поток ошибок с жалобой на XDG_RUNTIME_DIR (или, по крайней мере, так оно и было для меня), но все равно работает, и сообщение об ошибке можно игнорировать.

Вы также можете использовать libreoffice, однако, как сказал первый респондент, качество никогда не будет таким хорошим, как использование собственно комтипы.

В любом случае, после того, как вы установили libreoffice, вот код для этого.

Источник

Convert Docx to Pdf using docx2pdf Module in Python

Tired of having to use online docx to PDF converters with crappy interfaces and conversion limits? Then, look no further than your friendly neighborhood language python’s docx2pdf module. This module is a hidden gem among the many modules for the python language.

Читайте также:  Linux mint не устанавливается grub

This module can be used to convert files singly or in bulk using the command line or a python program.

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning — Basic Level Course

Installation

This module does not come built-in with Python. To install this module type the below command in the terminal.

Conversion using the command line

The basic structure of the docx2pdf command line usage is:

If only the input file is specified, it generates a pdf from the docx and stores it in the same folder.

Example:

docx2pdf usage using the command line

GeeksforGeeks folder containing both the original GFG.docx and the converted GFG.pdf

Original GFG.docx on the left and GFG.pdf on the right

For the bulk conversion, you can specify the folder containing all the Docx files. The converted pdfs will get stored in the same folder.

You can also explicitly specify the input and output file or folder by specifying the path.

Conversion by importing the module and using it in the program

An endless number of useful applications can be made using this module.

Источник

.doc to pdf using python

I’am tasked with converting tons of .doc files to .pdf. And the only way my supervisor wants me to do this is through MSWord 2010. I know I should be able to automate this with python COM automation. Only problem is I dont know how and where to start. I tried searching for some tutorials but was not able to find any (May be I might have, but I don’t know what I’m looking for).

Right now I’m reading through this. Dont know how useful this is going to be.

13 Answers 13

A simple example using comtypes, converting a single file, input and output filenames given as commandline arguments:

You could also use pywin32, which would be the same except for:

You can use the docx2pdf python package to bulk convert docx to pdf. It can be used as both a CLI and a python library. It requires Microsoft Office to be installed and uses COM on Windows and AppleScript (JXA) on macOS.

Читайте также:  Динамические процессорные такты windows 10

Disclaimer: I wrote the docx2pdf package. https://github.com/AlJohri/docx2pdf

I have worked on this problem for half a day, so I think I should share some of my experience on this matter. Steven’s answer is right, but it will fail on my computer. There are two key points to fix it here:

(1). The first time when I created the ‘Word.Application’ object, I should make it (the word app) visible before open any documents. (Actually, even I myself cannot explain why this works. If I do not do this on my computer, the program will crash when I try to open a document in the invisible model, then the ‘Word.Application’ object will be deleted by OS. )

(2). After doing (1), the program will work well sometimes but may fail often. The crash error «COMError: (-2147418111, ‘Call was rejected by callee.’, (None, None, None, 0, None))» means that the COM Server may not be able to response so quickly. So I add a delay before I tried to open a document.

After doing these two steps, the program will work perfectly with no failure anymore. The demo code is as below. If you have encountered the same problems, try to follow these two steps. Hope it helps.

Источник

How to convert Word (doc) to PDF in linux?

I have a set of files in .doc format, that need to be converted to .pdf format. I am using Ubuntu linux.

10 Answers 10

Then navigate to System > Administration > Printing and create a new printer, set it as a PDF file printer, and name it as «pdf».

Now you’ll find your .pdf file in

If the tetex-extra package is not available with your distribution, try texlive-base plus texlive-latex-base:

/PDF path to somewhere else ?

Printing to PDF loses a lot of the document metadata (title, authorship, the headings tree that is used for navigation, and so on).

Install unoconv, convert with: unoconv -fpdf file1.doc file2.doc…

If you’re running X then you can do it through Open Office. Since you’re about to object to doing it manually, remember there’s some nice macro scripts in Open Office so you can automate it. You can do something similar with AbiWord (AbiWord —to=pdf).

Читайте также:  Как исправить windows с помощью диска

If you’ve not got X then there is antiword, but that just extracts the text — doesn’t do any formatting or graphics. There’s also wvWare which I’ve used to bulk extract images from doc files, but I’ve never tried using it to convert doc files to pdfs.

Oh and .docx files may well need something different, but since they’re just zipped xml files it shouldn’t be too difficult to do something useful with them. For bulk extracting images you just unzip them and copy the images directory, but I’ve never needed to convert them in Linux.

Источник

docx2pdf 0.1.7

pip install docx2pdf Copy PIP instructions

Released: Apr 28, 2020

Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Author: Al Johri

Requires: Python >=3.5, al.johri

Classifiers

  • Environment
    • MacOS X
    • Win32 (MS Windows)
  • License
    • OSI Approved :: MIT License
  • Operating System
    • MacOS
    • Microsoft :: Windows
  • Programming Language
    • Python :: 3
    • Python :: 3.5
    • Python :: 3.6
    • Python :: 3.7
    • Python :: 3.8
  • Topic
    • Office/Business :: Office Suites
    • Software Development :: Libraries

Project description

docx2pdf

Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).

On Windows, this is implemented via win32com while on macOS this is implemented via JXA (Javascript for Automation, aka AppleScript in JS).

Install

Library

See CLI docs above (or in docx2pdf —help ) for all the different invocations. It is the same for the CLI and python library.

Project details

Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

License: MIT License (MIT)

Author: Al Johri

Requires: Python >=3.5, al.johri

Classifiers

  • Environment
    • MacOS X
    • Win32 (MS Windows)
  • License
    • OSI Approved :: MIT License
  • Operating System
    • MacOS
    • Microsoft :: Windows
  • Programming Language
    • Python :: 3
    • Python :: 3.5
    • Python :: 3.6
    • Python :: 3.7
    • Python :: 3.8
  • Topic
    • Office/Business :: Office Suites
    • Software Development :: Libraries

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you’re not sure which to choose, learn more about installing packages.

Источник

Оцените статью