- Is there any GNU/Linux command line utility that converts .doc(x) files to .pdf? [closed]
- 2 Answers 2
- software for linux — transform xml using xsl to pdf
- 2 Answers 2
- Not the answer you’re looking for? Browse other questions tagged html xml xslt pdf or ask your own question.
- Related
- Hot Network Questions
- Subscribe to RSS
- Xml to pdf linux
- Free Online XML to PDF Converter
- Convert XML to PDF documents online from any device, with a modern browser like Chrome, Opera and Firefox.
- Discover GroupDocs.Conversion free online app!
- Free Document Conversion, Viewer, Merger app for Windows
- XML Extended Markup Language
- PDF Portable Document
- Генерация документов в doc, excel, pdf и других форматах на сервере
- Запуск конвертации из PHP
- Несколько граблей
Is there any GNU/Linux command line utility that converts .doc(x) files to .pdf? [closed]
Want to improve this question? Update the question so it’s on-topic for Stack Overflow.
Closed 4 years ago .
Surely, I am the 100th user who is asking this but after I have searched through similar topics here and on other websites I still cannot find what I need.
I like to have a simple command line tool for my GNU/Linux which converts .doc(x) files to .pdf BUT the output should look the same as the original.
LibreOffice doesn’t seem like a good choice for this because it does not convert well in some cases. I have found a website freepdfconvert.com which does the job very well, but I cannot upload any sensitive files since it is a big risk. I don’t say they would do anything bad with them but it is how it is.
If I can’t find any good tool maybe I will have to write one myself.
2 Answers 2
Unfortunately there are no Linux-based guaranteed 1-to-1 convertors for Word (doc/docx) to PDF. This is because Word, a Microsoft product, uses a proprietary format that changes slightly with every release. As it was not traditionally a publicly documented format and Microsoft does not port Word/Office to Linux (nor ever will) then you must rely upon reverse engineered third party tools for older formats (doc) and proper interpretation of the Office Open XML format by third party developers.
We found the best open source solution is LibreOffice (which was forked from OpenOffice.org, which itself was called Star Office before it was open sourced). It is much more actively developed than AbiWord, as another answer suggested.
The usage from the command line is simple and well documented with plenty of examples:
Or also you can use libreoffice instead of soffice on newer versions.
There is also Pandoc .
Pandoc, mainly known for its Markdown-capable processing goodness (for outputting HTML, LaTeX, PDF, EPUB and what-not) in recent months has gained a rather well-working capability to process DOCX input files.
(NOTE: Pandoc only works for DOCX, not for DOC files.)
For its PDF output to work, it requires a working LaTeX installation (with either or all of pdflatex , lualatex and xelatex included). In this case the following simple command should work:
Note however, that the output layout and font styles now will not look at all similar to what it would look if you exported the DOCX from Word to PDF. It will be using the styles of a default LaTeX document.
You can influence the output style of the LaTeX-generated PDF by using a custom template file like this.
. but this is a feature more for Pandoc/LaTeX experts to use than for beginners.
Источник
software for linux — transform xml using xsl to pdf
I have two files.
I want to create pdf with these two files.
- does anyone know program on linux that can do that?
- whats the command?
The first one is xhtml/xml file:
and the second one is .xsl file — combined xslt+xsl-fo:
2 Answers 2
You’ll need an XSLT processor, xsltproc is probably already in your Linux distribution. Then you’ll need a processor to convert the FO (Formatting Objects) to a PDF. Apache has a free FO processor (FOP): Apache™ FOP: Downloading A Distribution
Once you have a FOP downloaded and extracted, your pipeline might look something like this:
I’ve tried as much with your provided XML source and XSLT, and there were errors when running Apache FOP. I don’t know anything about your XSLT, so you might be able to get around the errors.
xsltproc can apply the stylesheet. Apache FOP can generate the PDF.
Not the answer you’re looking for? Browse other questions tagged html xml xslt pdf or ask your own question.
Related
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
site design / logo © 2021 Stack Exchange Inc; user contributions licensed under cc by-sa. rev 2021.10.8.40416
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.
Источник
Xml to pdf linux
What is PyXML2PDF?
PyXML2PDF is a pure python module that can generate PDF files from XML. It can be used with the command line or integrated in a python application. PyXML2PDF allows to generate pixel precise PDF documents in any page size. It can generate very complex pages while being easily edited as an XML file.
PyXML2PDF wraps over the excellent Reportlab python module to generate PDFs. All PyXML2PDF does is to bring an XML semantic to the concepts used in Reportlab. Instead of having to code your PDFs using Reportlab modules, you simply write easily maintainable PyXML2PDF files which in turn generate PDF pages.
We have been using this module in healthcare applications for 3 years now, and have had great results. Performance is adequate, however it certainly can be optimised for greater speed.
Ease of use: Anyone who knows some basic HTML/XML concepts will be able to use PyXML2PDF within minutes.
Snapping elements: Elements can be positionned relative to one another to create intricate tables. Check the details of this feature.
Styles (CSS): PyXML2PDF supports the concepts of a simple CSS implementation. This allows for flexibility and decoupling of content/appearance (like HTML).
HTML: PyXML2PDF is NOT compatible with any XHTML/HTML/CSS. It uses a small set of tags to quickly allow generation of PDFs.
PyXML2PDF Tags: Please refer to the list of tags in the ‘Reference’ section (below)
Scripting/Templating/Dynamic content: PyXML2PDF does not include a templating language for dynamically generating XML files. But you can use any templating tool or other method to generate XML files which can then be fed to PyXML2PDF in order to generate a PDF. We have been successfully using Genshi as a templating engine.
Reporting: PyXML2PDF is not a reporting engine. Rather it could be part of the backend for an existing reporting engine to help generate PDFs.
Language support: PyXML2PDF supports UTF8, but all language specific stuff should be handled by whoever generates the XML. In our case, we use Genshi and python language tools to generate our PyXML2PDF files in several languages.
- Windows (tested)
- Linux (tested)
- Mac should also work (untested)
- Anywhere you can get Python/Reportlab to work.
Want to help this project?
We will gladly accept your help if you have any ideas and suggestions to make this project faster/better. You can contact me if you wish to discuss any improvement you would like to make. I strongly urge you not to do this on your own, as this project’s goal is to evolve and improve over time. So please feel free to contribute patches and ideas.
Simply run the following command within root directory of the project:
###Command line: python xml2pdf.py -f input.xml out.pdf
###In your python code: from PyXML2PDF import xml2pdf xml2pdf.genpdf(in_xml_filename, out_pdf_filename)
Create an xml file named ‘sample.xml’. Paste the following in it then save it.
In the same directory as ‘sample.xml’ , create a file named ‘sample.py’. Put the following code in the sample.py and save it.
import os from PyXML2PDF import xml2pdf from reportlab.pdfbase.ttfonts import TTFont
xml = os.path.abspath(‘./sample.xml’) pdf = os.path.abspath(‘./sample.pdf’)
Show the PDF (this line optionnal)
From the command line in the same directory as the ‘sample.py’ file, type: (assuming python is installed and in path)
We strongly urge you yo visit the links below in order to fully understand the possibilities of PyXML2PDF. Also, they explain in further detail such things as:
- Fonts
- Colors
- Coordinate systems
- Positionning elements relative to one another
- And much more .
Источник
Free Online XML to PDF Converter
Convert XML to PDF documents online from any device, with a modern browser like Chrome, Opera and Firefox.
Discover GroupDocs.Conversion free online app!
- Convert PDF to WORD, DOCX to PDF, XLSX to PDF, PPTX to JPGs, VSDX to PDF, HTML to DOCX, EPUB to PDF, RTF to DOCX, XPS to PDF, ODT to DOCX, ODP to PPTX and many more document formats (see supported formats list)
- Simple way to instant convert XML to PDF
- Save WORD to PDF, EXCEL to PDF, PDF to WORD, POWERPOINT to IMAGE and many more document formats (see supported formats list)
- Convert XML from anywhere — it works on all platforms including Windows, MacOS, Android and iOS
- All XML files are processed on our servers so no additional plugins or software installation is required
- All XML files are processed using GroupDocs.Conversion document conversion API
Free Document Conversion, Viewer, Merger app for Windows
- Easily convert, view or merge unlimited files on your own Windows PC.
- Process Word, Excel, PowerPoint, PDF and more than 100 file formats.
- No limit of file size.
- Batch conversion of multiple files.
- One app with rich features like Conversion, Viewer, Merger, Parser, Comparison, Signature
- Regular FREE updates with new features coming every month
Free Download
XML Extended Markup Language
XML stands for Extensible Markup Language that is similar to HTML but different in using tags for defining objects. The whole idea behind creation of XML file format was to store and transport data without being dependent on software or hardware tools. Its popularity is due to it being both human as well as machine readable. This enables it to create common data protocols in the form of objects to be stored and shared over network such as World Wide Web (WWW).
PDF Portable Document
Portable Document Format (PDF) is a type of document created by Adobe back in 1990s. The purpose of this file format was to introduce a standard for representation of documents and other reference material in a format that is independent of application software, hardware as well as Operating System. PDF files can be opened in Adobe Acrobat Reader/Writer as well in most modern browsers like Chrome, Safari, Firefox via extensions/plug-ins.
Источник
Генерация документов в doc, excel, pdf и других форматах на сервере
Выгрузка отчетов в различных форматах — типовая задача для многих проектов. И сейчас есть немало инструментов для этого. Среди них есть интересный вариант, который применяется, как мне кажется, не часто, но он однозначно стоит внимания. Потому что позволяет получить документ в нужном формате буквально одной командой. О нем и расскажу.
Я буду не многословен и сразу скажу, что речь идет о конвертере, встроенном в пакет LibreOffice. Вы можете запустить конвертацию из консоли, чтобы увидеть как это работает:
Эта команда конвертирует файл html.html в pdf файл. Количество поддерживаемых форматов впечатляет.
Выгода от использования такого инструмента очевидна. Вместо того, чтобы писать код для генерации документов в каждом из нужных форматов, просто создаем обычное html-представление. Далее сгенерированную страницу прогоняем через конвертер.
Запуск конвертации из PHP
Для установки конвертера на сервере придется установить пакет libreoffice-core:
Чтобы было удобно работать с утилитой из PHP, я написал обертку.
Обертка позволяет вам не думать о работе с временными файлами, подставляет в команду некоторые параметры по умолчанию, содержит константы с описанием доступных форматов, а также дает возможность задать таймаут на выполнение конвертации.
Для работы с оберткой подключаем ее к своему проекту через composer:
Использовать ее можно так:
В результате будет сформирован docx файл. Больше примеров можно найти на гитхабе.
Разумеется, в качестве бонуса можно запускать конвертацию в другую сторону — из doc в html и отображать содержимое офисных документов в браузере. Качество конвертации будет не всегда на высоте, но для каких-то случаев вполне подойдет.
Несколько граблей
Будет полезно рассказать про несколько особенностей, с которыми я столкнулся при работе с этой утилитой.
1. Применение CSS стилей. При преобразовании html в нужный формат имейте ввиду, что такая запись воспринимается корректно:
А такие записи будут обработаны точно так же, как если бы class мы совсем не указали:
2. При преобразовании html в нужный формат не всегда срабатывают описания стилей и иногда приходится экспериментировать, чтобы заработало. Например, так не работает:
Но так работает:
3. Одно и то же преобразование можно выполнять с помощью разных конвертеров. При этом результат будет существенно отличаться. Если у вас на выходе получится не очень красивый документ, попробуйте принудительно задать используемый модуль, например:
4. Можно ли настроить ширину строк в таблице — для меня пока загадка. И в целом со стилизацией таблицы при преобразовании html в docx или pdf у меня возникли затруднения. Поэтому на мой взгляд подход трудно будет применять для генерации сложных печатных форм, таких как счет-фактура.
Источник