Xml с parser linux

Содержание

The XML C parser and toolkit of Gnome
libxml
Парсинг XML и BASH
BASH парсить XML
А Вот и Победитель!
Parsing XML using unix terminal
8 Answers 8
How to parse XML using shellscript? [duplicate]
10 Answers 10

The XML C parser and toolkit of Gnome

libxml

Main Menu

Home
Reference Manual
Introduction
FAQ
Developer Menu
Reporting bugs and getting help
How to help
Downloads
Releases
XML
XSLT
Validation & DTDs
Encodings support
Catalog support
Namespaces
Contributions
Code Examples
API Menu
XML Guidelines
Recent Changes

Related links

Mail archive
XSLT libxslt
DOM gdome2
XML-DSig xmlsec
FTP
Windows binaries
Solaris binaries
MacOsX binaries
lxml Python bindings
Perl bindings
C++ bindings
PHP bindings
Pascal bindings
Ruby bindings
Tcl bindings
Bug Tracker

«Programming with libxml2 is like the thrilling embrace of an exotic stranger.» Mark Pilgrim

Libxml2 is the XML C parser and toolkit developed for the Gnome project (but usable outside of the Gnome platform), it is free software available under the MIT License. XML itself is a metalanguage to design markup languages, i.e. text language where semantic and structure are added to the content using extra «markup» information enclosed between angle brackets. HTML is the most well-known markup language. Though the library is written in C a variety of language bindings make it available in other environments.

Libxml2 is known to be very portable, the library should build and work without serious troubles on a variety of systems (Linux, Unix, Windows, CygWin, MacOS, MacOS X, RISC Os, OS/2, VMS, QNX, MVS, VxWorks, . )

Libxml2 implements a number of existing standards related to markup languages:

the XML standard: http://www.w3.org/TR/REC-xml
Namespaces in XML: http://www.w3.org/TR/REC-xml-names/
XML Base: http://www.w3.org/TR/xmlbase/
RFC 2396 : Uniform Resource Identifiers http://www.ietf.org/rfc/rfc2396.txt
XML Path Language (XPath) 1.0: http://www.w3.org/TR/xpath
HTML4 parser: http://www.w3.org/TR/html401/
XML Pointer Language (XPointer) Version 1.0: http://www.w3.org/TR/xptr
XML Inclusions (XInclude) Version 1.0: http://www.w3.org/TR/xinclude/
ISO-8859-x encodings, as well as rfc2044 [UTF-8] and rfc2781 [UTF-16] Unicode encodings, and more if using iconv support
part of SGML Open Technical Resolution TR9401:1997
XML Catalogs Working Draft 06 August 2001: http://www.oasis-open.org/committees/entity/spec-2001-08-06.html
Canonical XML Version 1.0: http://www.w3.org/TR/xml-c14n and the Exclusive XML Canonicalization CR draft http://www.w3.org/TR/xml-exc-c14n
Relax NG, ISO/IEC 19757-2:2003, http://www.oasis-open.org/committees/relax-ng/spec-20011203.html
W3C XML Schemas Part 2: Datatypes REC 02 May 2001
W3C xml:id Working Draft 7 April 2004

In most cases libxml2 tries to implement the specifications in a relatively strictly compliant way. As of release 2.4.16, libxml2 passed all 1800+ tests from the OASIS XML Tests Suite.

To some extent libxml2 provides support for the following additional specifications but doesn’t claim to implement them completely:

Document Object Model (DOM) http://www.w3.org/TR/DOM-Level-2-Core/ the document model, but it doesn’t implement the API itself, gdome2 does this on top of libxml2
RFC 959 : libxml2 implements a basic FTP client code
RFC 1945 : HTTP/1.0, again a basic HTTP client code
SAX: a SAX2 like interface and a minimal SAX1 implementation compatible with early expat versions

A partial implementation of XML Schemas Part 1: Structure is being worked on but it would be far too early to make any conformance statement about it at the moment.

the libxslt page providing an implementation of XSLT 1.0 and common extensions like EXSLT for libxml2
the gdome2 page : a standard DOM2 implementation for libxml2
the XMLSec page: an implementation of W3C XML Digital Signature for libxml2
also check the related links section for more related and active projects.

Hosting sponsored by Open Source CMS services from AOE media.

Источник

Парсинг XML и BASH

Доброго времени суток! Ситуация следующая: Где-то далеко-далеко существует контроллер, который мониторит состояние определенного оборудования. Через XML доступна нужная информация, где все значения указаны в виде значение . В моем случае одна из строк: stop . Всего 2 возможных значения — stop и work. Необходимо сделать парсер этой самой XML-страницы и написать bash скрипт, выдающий при значении stop цифру 0, а при значении work 1. (судя по всему, передавать данные необходимо POST-запросом, чтобы получить в ответ XML со счетчиками).

Плюс ко всему при переходе на страницу с данными (192.168.1.1/protect/status.xml) контроллер запрашивает логин и пароль, то есть надо еще и залогиниться.

Кое-что проясню: Программистов у нас нет, поэтому задачу поручили мне. В последний раз я сталкивался с программированием еще в школе, так что никакие скрипты никогда в жизни не писал и ничего близко к этому не делал, поэтому прошу помощи. Сейчас наверно посыпятся гневные комментарии в стиле «тупой, чего вообще полез в эту сферу?». Я вас прекрасно понимаю, надо сначала читать, изучить основы программирования, я это сделаю, но задача стоит срочная, никого не волнует, что я в этом полный ноль, без посторонней помощи я пропал.

Чтобы вы не думали, что я совсем ничего не пытался, и просто чтобы вы улыбнулись, вот что я вымучил:

Прекрасно понимаю, что это бредятина. Прошу помочь у кого есть время, или ссылки дать на подобные решения, в которых смогу хоть что-то понять. Спасибо!

Источник

BASH парсить XML

рекурсивно grep+регексп?
афаик, нативных средств для хмл в баше нету.

Напиши лучше скрипт на чем-то, имеющем хорошую либу для обработки XML (например, питон + xml.dom.minidom), и вызови этот скрипт из баша.

>(например, питон + xml.dom.minidom)
lxml.etree тока посоветовал бы.
http://lxml.de/tutorial.html

не охото тянуть лишние зависимости из-за одного скрипта.

для этого придумали perl

Ну вывести эту строку можно так

Пообещайте печенек тому, кто перепишет это полностью на sed. С интересом посмотрю, так как должно быть более красивое решение

Давайте мои печеньки.

А Вот и Победитель!

Ну, splinter , я сделал чисто на sed, смотри постом выше.

как задать полное соответствие шаблону?
есть два значения AutomaticLoginEnable и AutomaticLogin
если делать awk ‘/AutomaticLogin/ file то попадает оба значения, а мне необходимо строгое соответствие шаблону.

как вы это все в голове держите?

А если есть два значения AutomaticLoginEnable и AutomaticLogin, то

Открыл ман просто, прочитал чуток, кстати тебе спасибо за бесплатную тренировку sed-скилов

Это после того, как я в sedtest заменил AutomaticLoginEnable на AutomaticLogin, конечно, потому и вывелось daemon/AutomaticLogin

Ну это все хорошо, а что будет, если кто-то случайно вставит пустую строку? Или наоборот, разместист signature и default в одной строке? Очень ненадежное решение, ИМХО.

Решение zolden’a точно такое же ненадёжное. И да, читай поставленную задачу
«спуститься на 3 строки и вместо false сделать true? »
Это предполагает, что ТС уверен на 100%, что третьей строчкой будет то, что ему нужно.

Я свое предложение написал во втором посте. Оно намного тяжеловеснее чем твое, но будет работать для любого валидного XML. Выбирать ТСу, конечно, думаю он понимает плюсы и минусы каждого предложенного решения.

Источник

Parsing XML using unix terminal

Sometimes I need to quickly extract some arbitrary data from XML files to put into a CSV format. What’s your best practices for doing this in the Unix terminal? I would love some code examples, so for instance how can I get the following problem solved?

Example XML input:

My desired CSV output:

8 Answers 8

Peter’s answer is correct, but it outputs a trailing line feed.

to generate the CSV results into standard output.

Use a command-line XSLT processor such as xsltproc, saxon or xalan to parse the XML and generate CSV. Here’s an example, which for your case is the stylesheet:

If you just want the name attributes of any element, here is a quick but incomplete solution.

(Your example text is in the file example)

grep «name» example | cut -d»\»» -f2,2 | xargs -I<> echo «<>,»

XMLStarlet is a command line toolkit to query/edit/check/transform XML documents (for more information, see XMLStarlet Command Line XML Toolkit)

No files to write, just pipe your file to xmlstarlet and apply an xpath filter.

-m expression -v value » included literal -n newline

So for your xpath the xpath expression would be //myel/@name which would provide the two attribute values.

Very handy tool.

Here’s a little ruby script that does exactly what your question asks (pull an attribute called ‘name’ out of elements called ‘myel’). Should be easy to generalize

your test file is in test.xml.

It has it’s pitfalls, for example if it is not strictly given that each myel is on one line you have to «normalize» the xml file first (so each myel is on one separate line)

Источник

How to parse XML using shellscript? [duplicate]

I would like to know what would be the best way to parse an XML file using shellscript ?

Should one do it by hand ?
Does third tiers library exist ?

If you already made it if you could let me know how did you manage to do it

10 Answers 10

The xmllint program parses one or more XML files, specified on the command line as xmlfile. It prints various types of output, depending upon the options selected. It is useful for detecting errors both in XML code and in the XML parser itse

It allows you select elements in the XML doc by xpath, using the —pattern option.

On Mac OS X (Yosemite), it is installed by default.
On Ubuntu, if it is not already installed, you can run apt-get install libxml2-utils

Here’s a full working example.
If it’s only extracting email addresses you could just do something like:
1) Suppose XML file spam.xml is like

2) You can get the emails and process them with this short bash code:

Result of this example is:

Important note:
Don’t use this for serious matters. This is OK for playing around, getting quick results, learning grep, etc. but you should definitely look for, learn and use an XML parser for production (see Micha’s comment below).

There’s also xmlstarlet (which is available for Windows as well).

I am surprised no one has mentioned xmlsh. The mission statement :

A command line shell for XML Based on the philosophy and design of the Unix Shells

xmlsh provides a familiar scripting environment, but specifically tailored for scripting xml processes.

A list of shell like commands are provided here.

I use the xed command a lot which is equivalent to sed for XML, and allows XPath based search and replaces.

Try sgrep. It’s not clear exactly what you are trying to do, but I surely would not attempt writing an XML parser in bash.

Do you have xml_grep installed? It’s a perl based utility standard on some distributions (it came pre-installed on my CentOS system). Rather than giving it a regular expression, you give it an xpath expression.

. . See XPath syntax

A rather new project is the xml-coreutils package featuring xml-cat, xml-cp, xml-cut, xml-grep, .

Try using xpath. You can use it to parse elements out of an xml tree.

This really is beyond the capabilities of shell script. Shell script and the standard Unix tools are okay at parsing line oriented files, but things change when you talk about XML. Even simple tags can present a problem:

Imagine trying to write a shell script that can read the data enclosed in . The three very, very simply XML examples all show different ways this can be an issue. The first two examples are the exact same syntax in XML. The third simply has an attribute attached to it. The fourth contains the data in another tag. Simple sed , awk , and grep commands cannot catch all possibilities.

You need to use a full blown scripting language like Perl, Python, or Ruby. Each of these have modules that can parse XML data and make the underlying structure easier to access. I’ve use XML::Simple in Perl. It took me a few tries to understand it, but it did what I needed, and made my programming much easier.

Источник