Linux quoted printable decode

The following functions are provided: encode_qp($str) encode_qp($str, $eol) encode_qp($str, $eol, $binmode) This function returns an encoded version of the string ($str) given as argument.

The second argument ($eol) is the line-ending sequence to use. It is optional and defaults to «\n». Every occurrence of «\n» is replaced with this string, and it is also used for additional «soft line breaks» to ensure that no line end up longer than 76 characters. Pass it as «\015\012» to produce data suitable for external consumption. The string «\r\n» produces the same result on many platforms, but not all.

The third argument ($binmode) will select binary mode if passed as a TRUE value. In binary mode «\n» will be encoded in the same way as any other non-printable character. This ensures that a decoder will end up with exactly the same string whatever line ending sequence it uses. In general it is preferable to use the base64 encoding for binary data; see MIME::Base64.

An $eol of «« (the empty string) is special. In this case, no »soft line breaks« are introduced and binary mode is effectively enabled so that any »\n» in the original data is encoded as well. decode_qp($str); This function returns the plain text version of the string given as argument. The lines of the result are «\n» terminated, even if the $str argument contains «\r\n» terminated lines.

If you prefer not to import these routines into your namespace, you can call them as:

Perl v5.8 and better allow extended Unicode characters in strings. Such strings cannot be encoded directly, as the quoted-printable encoding is only defined for single-byte characters. The solution is to use the Encode module to select the byte encoding you want. For example:

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Источник

Декодер кириллицы из quoted-printable

Доброго времени суток, друзья! Нашел себе занимательную задачку на ближайшее время, решил написать «звонилку» для Android. Приложение будет синхронизироваться с контактами в системе и выполнять определенные действия. При чем здесь quoted-printable, что это и зачем мне понадобилось — рассказываю в статье.

Итак, quoted-printable это система кодирования двоичного текста в текст, использующая печатаемые символы ASCII, и, судя по странице в английской версии википедии применяемая для кодирования/декодирования данных в сообщениях e-mail.

На самом деле это не совсем так. Есть такой формат файла — vCard. И именно в этом формате происходит импорт/экспорт контактов из любого смартфона с Android. Так вот, этот формат (имеющий расширение .vcf) в версии 2.1 также использует кодировку quoted-printable. Кириллица в этой кодировке имеет вид (пример): «=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82» , т.е. сначала каждый символ кириллицы кодируется в UTF-8 в последовательность из двух байтов, а затем каждый байт записывается в hex-представлении со знаком равно «=».

И вот в таком виде импортируются все контакты с кириллическими символами. Понятно что о чтении и редактировании файла речи не идет. А мне то как раз это и нужно. Попробовал через плагины в текстовых редакторах… Можно решить эту проблему, да, но уж слишком много манипуляций. Короче, пришлось засесть за написание декодера.

В процессе столкнулся с еще одной загвоздкой. Дело в том, что стандарт кодировки quoted-printable предусматривает строки максимальной длины в 75 символов, а потом делает переносы, дублируя символы «=». Понадобилась дополнительная функция для обьединения перенесенных строк.

Скрипт использует модуль quopri (у меня импортировался сразу, без установки).

Итог работы скрипта. Из строк вида:

После редактирования файла при необходимости производим обратную кодировку:

На этом, по сути, можно было бы и закончить, но есть еще одна шутка штука. Показались мне строки, закодированые в quoted-printable странно похожими на некоторые url-адреса, которые каждый, пожалуй, встречал в адресной строке браузера, только вместо знака «=» со знаком «%». Вида (пример) «%D0%9F%D1%80%D0%B8%D0%B2%D0%B5%D1%82» . И что бы вы думали? Да-да. По всей видимости это тоже quoted-printable (надо будет поинтересоваться у html-мастеров). Все декодируется в кириллицу вышеописанным способом при условии замены «%» на «=».

Ах да. Совсем забыл. Если вдруг кому то понадобится, то китайские иероглифы и арабские буквы декодируются так же как и кириллические символы (лично проверил).

Ну вот и все, друзья, до свидания, авось и будет от трудов моих кому то маленькая польза.

Добавлено 07.07.2020г.
Недавно обратился ко мне пользователь хабра с просьбой декодировать файл формата .vmg, там тоже используется quoted-printable, только для кодирования смс-сообщений. Манипуляци с кодом те же самые.

Источник

jjarmoc / gist:1571540

# To decode:

# qp -d string

# To encode:

# qp string

alias qpd=’perl -MMIME::QuotedPrint -pe ‘\»$_=MIME::QuotedPrint::decode($_);’\»’

alias qpe=’perl -MMIME::QuotedPrint -pe ‘\»$_=MIME::QuotedPrint::encode($_);’\»’

function qp <

if [[ «$1» = «-d» ]]

then

echo $ <@:2>| qpd

else

echo $ <@>| qpe

This comment has been minimized.

Copy link Quote reply

klepsydra commented May 1, 2012

Don’t forget to «shopt -s expand_aliases» if you plan to use these aliases in bash scripts

This comment has been minimized.

Copy link Quote reply

jjarmoc commented May 1, 2012

Good call, thanks! I tend to use them mostly in one liners so I can pipeline to other tools easily, but that’s a good point worth noting.

This comment has been minimized.

Copy link Quote reply

Hubro commented Apr 27, 2018

Alternatively, use Python’s quopri module:

This comment has been minimized.

Copy link Quote reply

AloisMahdal commented Mar 4, 2019

@Hubro: this should be the accepted answer!

(oh wait, I’m not on stackoverflow?)

This comment has been minimized.

Copy link Quote reply

eiro commented Apr 26, 2019

the perl version can be much simple:

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Источник

quoted_printable_decode

(PHP 4, PHP 5, PHP 7, PHP 8)

quoted_printable_decode — Convert a quoted-printable string to an 8 bit string

Description

This function returns an 8-bit binary string corresponding to the decoded quoted printable string (according to » RFC2045, section 6.7, not » RFC2821, section 4.5.2, so additional periods are not stripped from the beginning of line).

This function is similar to imap_qprint() , except this one does not require the IMAP module to work.

Parameters

The input string.

Return Values

Returns the 8-bit binary string.

User Contributed Notes 21 notes

As soletan at toxa dot de reported, that function is very bad and does not provide valid enquoted printable strings. While using it I saw spam agents marking the emails as QP_EXCESS and sometimes the email client did not recognize the header at all; I really lost time :(. This is the new version (we use it in the Drake CMS core) that works seamlessly:

//L: note $encoding that is uppercase
//L: also your PHP installation must have ctype_alpha, otherwise write it yourself
function quoted_printable_encode ( $string , $encoding = ‘UTF-8’ ) <
// use this function with headers, not with the email body as it misses word wrapping
$len = strlen ( $string );
$result = » ;
$enc = false ;
for( $i = 0 ; $i $len ;++ $i ) <
$c = $string [ $i ];
if ( ctype_alpha ( $c ))
$result .= $c ;
else if ( $c == ‘ ‘ ) <
$result .= ‘_’ ;
$enc = true ;
> else <
$result .= sprintf ( «=%02X» , ord ( $c ));
$enc = true ;
>
>
//L: so spam agents won’t mark your email with QP_EXCESS
if (! $enc ) return $string ;
return ‘=?’ . $encoding . ‘?q?’ . $result . ‘?=’ ;
>

I hope it helps 😉

Taking a bunch of the earlier comments together, you can synthesize a nice short and reasonably efficient quoted_printable_encode function like this:

Note that I put this in my standard library file, so I wrap it in a !function_exists in order that if there is a pre-existing PHP one it will just work and this will evaluate to a noop.

if ( ! function_exists ( «quoted_printable_encode» ) ) <
/**
* Process a string to fit the requirements of RFC2045 section 6.7. Note that
* this works, but replaces more characters than the minimum set. For readability
* the spaces aren’t encoded as =20 though.
*/
function quoted_printable_encode ( $string ) <
return preg_replace ( ‘/[^\r\n]<73>[^=\r\n]<2>/’ , «$0=\r\n» , str_replace ( «%» , «=» , str_replace ( «%20″ , » » , rawurlencode ( $string ))));
>
>
?>

Regards,
Andrew McMillan.

I use a hack for this bug:

$str = str_replace(«=\r\n», », quoted_printable_encode($str));
if (strlen($str) > 73)

I modified the below version of legolas558 at users dot sausafe dot net and added a wrapping option.

/**
* Codeer een String naar zogenaamde ‘quoted printable’. Dit type van coderen wordt
* gebruikt om de content van 8 bit e-mail berichten als 7 bits te versturen.
*
* @access public
* @param string $str De String die we coderen
* @param bool $wrap Voeg linebreaks toe na 74 tekens?
* @return string
*/

function quoted_printable_encode ( $str , $wrap = true )
<
$return = » ;
$iL = strlen ( $str );
for( $i = 0 ; $i $iL ; $i ++)
<
$char = $str [ $i ];
if( ctype_print ( $char ) && ! ctype_punct ( $char )) $return .= $char ;
else $return .= sprintf ( ‘=%02X’ , ord ( $char ));
>
return ( $wrap === true )
? wordwrap ( $return , 74 , » =\n» )
: $return ;
>

Please note that in the below encode function there is a bug!

if (( $c == 0x3d ) || ( $c >= 0x80 ) || ( $c 0x20 ))
?>

$c should be checked against less or equal to encode spaces!

so the correct code is

if (( $c == 0x3d ) || ( $c >= 0x80 ) || ( $c 0x20 ))
?>

Fix the code or post this note, please

Some browser (netscape, for example)
send 8-bit quoted printable text like this:
=C5=DD=A3=D2=C1= =DA

«= =» means continuos word.
php function not detect this situations and translate in string like:
abcde=f

If you want a function to do the reverse of «quoted_printable_decode()», follow the link you will find the «quoted_printable_encode()» function:
http://www.memotoo.com/softs/public/PHP/quoted printable_encode.inc.php

Compatible «ENCODING=QUOTED-PRINTABLE»
Example:
quoted_printable_encode(ut8_encode(«c’est quand l’été ?»))
-> «c’est quand l’=C3=A9t=C3=A9 ?»

my approach for quoted printable encode using the stream converting abilities

/**
* @param string $str
* @return string
* */
function quoted_printable_encode ( $str ) <
$fp = fopen ( ‘php://temp’ , ‘w+’ );
stream_filter_append ( $fp , ‘convert.quoted-printable-encode’ );
fwrite ( $fp , $str );
fseek ( $fp , 0 );
$result = » ;
while(! feof ( $fp ))
$result .= fread ( $fp , 1024 );
fclose ( $fp );
return $result ;
>
?>

Be warned! The method below for encoding text does not work as requested by RFC1521!

Consider a line consisting of 75 ‘A’ and a single é (or similar non-ASCII character) . the method below would encode and return a line of 78 octets, breaking with RFC 1521, 5.1 Rule #5: «The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long.»

Good QP-encoding takes a bit more than this.

If you do not have access to imap_* and do not want to use
�$message = chunk_split( base64_encode($message) );�
because you want to be able to read the �source� of your mails, you might want to try this:
(any suggestions very welcome!)

function qp_enc($input = «quoted-printable encoding test string», $line_max = 76) <

while( list(, $line) = each($lines) ) <
//$line = rtrim($line); // remove trailing white space -> no =20\r\n necessary
$linlen = strlen($line);
$newline = «»;
for($i = 0; $i 126) ) < // always encode "\t", which is *not* required
$h2 = floor($dec/16); $h1 = floor($dec%16);
$c = $escape.$hex[«$h2»].$hex[«$h1»];
>
if ( (strlen($newline) + strlen($c)) >= $line_max ) < // CRLF is not counted
$output .= $newline.$escape.$eol; // soft line break; » =\r\n» is okay
$newline = «»;
>
$newline .= $c;
> // end of for
$output .= $newline.$eol;
>
return trim($output);

$eight_bit = «\xA7 \xC4 \xD6 \xDC \xE4 \xF6 \xFC \xDF = xxx yyy zzz \r\n»
.» \xA7 \r \xC4 \n \xD6 \x09 «;
print $eight_bit.»\r\n—————\r\n»;
$encoded = qp_enc($eight_bit);
print $encoded;

A small update for Andrew’s code below. This one leaves the original CRLF pairs intact (and allowing the preg_replace to work as intended):

if (! function_exists ( «quoted_printable_encode» )) <
/**
* Process a string to fit the requirements of RFC2045 section 6.7. Note that
* this works, but replaces more characters than the minimum set. For readability
* the spaces and CRLF pairs aren’t encoded though.
*/
function quoted_printable_encode ( $string ) <
return preg_replace ( ‘/[^\r\n]<73>[^=\r\n]<2>/’ , «$0=\r\n» ,
str_replace ( «%» , «=» , str_replace ( «%0D%0A» , «\r\n» ,
str_replace ( «%20″ , » » , rawurlencode ( $string )))));
>
>
?>

Regards, André

If you’re getting black diamonds or weird characters that seemingly block an echo but still encounter strlen($string) > 0 you’re probably encountering an encoding issue. Unlike the people writing ENCODE functions on a DECODE page I will actually talk about DECODE on a DECODE page.

The specific problem I encountered was that an email was encoded using a Russian encoding (KOI8-R) though I output everything as UTF-8 because: compatibility.

If you try to do this with a Russian encoding:

echo quoted_printable_decode ( ‘=81’ );
?>

You’ll get that corrupted data.

I did a couple of tests and it turns out the following is how you nest the mb_convert_encoding function:

Test: «‘ . mb_convert_encoding ( quoted_printable_decode ( ‘=81’ ), ‘UTF-8’ , ‘KOI8-R’ ). ‘».

‘ ;
?>

Unfortunately I could not find a character mapping table or anything listed under RFC 2045 Section 6.7. However I came across the website https://dencode.com/en/string/quoted-printable which allows you to manually choose the encoding (it’s an open source site, they have a GIT repository for the morbidly curious).

As it turns out the start of the range is relative to the encoding. So Latin (ISO-8859-1) and Russian (KOI8-R) will likely (not tested this) encode to different characters **relative to the string encoding**.

If you’re really lazy and producing HTML anyways and the end, just convert it to HTML entities and move the Unicode/ISO struggling to the document’s encoding:

If there is a NULL byte in the string that is passed, quoted_printable_decode will crop everything after the NULL byte and the NULL byte itself.

= quoted_printable_decode ( «This is a\0 test.» );
// $result === ‘This is a’
?>

This is not a bug, but the intended behaviour and defined by RFC 2045 (see https://www.ietf.org/rfc/rfc2045.txt) in paragraph 2.7 and 2.8.

Another (improved) version of quoted_printable_encode(). Please note the order of the array elements in str_replace().
I’ve just rewritten the previous function for better readability.

if (! function_exists ( «quoted_printable_encode» )) <
/**
* Process a string to fit the requirements of RFC2045 section 6.7. Note that
* this works, but replaces more characters than the minimum set. For readability
* the spaces and CRLF pairs aren’t encoded though.
*/
function quoted_printable_encode ( $string ) <
$string = str_replace (array( ‘%20’ , ‘%0D%0A’ , ‘%’ ), array( ‘ ‘ , «\r\n» , ‘=’ ), rawurlencode ( $string ));
$string = preg_replace ( ‘/[^\r\n]<73>[^=\r\n]<2>/’ , «$0=\r\n» , $string );

function quoted_printable_encode ( $str , $chunkLen = 72 )
<
$offset = 0 ;

$str = strtr ( rawurlencode ( $str ), array( ‘%’ => ‘=’ ));
$len = strlen ( $str );
$enc = » ;

while ( $offset $len )
<
if ( $str < $offset + $chunkLen - 1 >=== ‘=’ )
<
$line = substr ( $str , $offset , $chunkLen — 1 );
$offset += $chunkLen — 1 ;
>
elseif ( $str < $offset + $chunkLen - 2 >=== ‘=’ )
<
$line = substr ( $str , $offset , $chunkLen — 2 );
$offset += $chunkLen — 2 ;
>
else
<
$line = substr ( $str , $offset , $chunkLen );
$offset += $chunkLen ;
>

if ( $offset + $chunkLen $len )
$enc .= $line . «=\n» ;
else
$enc .= $line ;
>

In Addition to david lionhead’s function:

function quoted_printable_encode ( $txt ) <
/* Make sure there are no %20 or similar */
$txt = rawurldecode ( $txt );
$tmp = «» ;
$line = «» ;
for ( $i = 0 ; $i strlen ( $txt ); $i ++) <
if (( $txt [ $i ]>= ‘a’ && $txt [ $i ] ‘z’ ) || ( $txt [ $i ]>= ‘A’ && $txt [ $i ] ‘Z’ ) || ( $txt [ $i ]>= ‘0’ && $txt [ $i ] ‘9’ )) <
$line .= $txt [ $i ];
if ( strlen ( $line )>= 75 ) <
$tmp .= » $line =\n» ;
$line = «» ;
>
>
else <
/* Important to differentiate this case from the above */
if ( strlen ( $line )>= 72 ) <
$tmp .= » $line =\n» ;
$line = «» ;
>
$line .= «=» . sprintf ( «%02X» , ord ( $txt [ $i ]));
>
>
$tmp .= » $line \n» ;
return $tmp ;
>
?>

My version of quoted_printable encode, as the convert.quoted-printable-encode filter breaks on outlook express. This one seems to work on express/outlook/thunderbird/gmail.

= This function enables you to convert text to a quoted-printable string as well as to create encoded-words used in email headers (see http://www.faqs.org/rfcs/rfc2047.html).

No line of returned text will be longer than specified. Encoded-words will not contain a newline character. Special characters are removed.
EOF;

define ( ‘QP_LINE_LENGTH’ , 75 );
define ( ‘QP_LINE_SEPARATOR’ , «\r\n» );

function quoted_printable_encode ( $string , $encodedWord = false )
<
if(! preg_match ( ‘//u’ , $string )) <
throw new Exception ( ‘Input string is not valid UTF-8’ );
>

static $wordStart = ‘=?UTF-8?Q?’ ;
static $wordEnd = ‘?=’ ;
static $endl = QP_LINE_SEPARATOR ;

$lineLength = $encodedWord
? QP_LINE_LENGTH — strlen ( $wordStart ) — strlen ( $wordEnd )
: QP_LINE_LENGTH ;

$string = $encodedWord
? preg_replace ( ‘

‘ , ‘ ‘ , $string ) // we need encoded word to be single line
: preg_replace ( ‘

‘ , «\n» , $string ); // normalize line endings
$string = preg_replace ( ‘

‘ , » , $string ); // remove control characters

$output = $encodedWord ? $wordStart : » ;
$charsLeft = $lineLength ;

$chr = isset( $string < 0 >) ? $string < 0 >: null ;
$ord = ord ( $chr );

for ( $i = 0 ; isset( $chr ); $i ++) <
$nextChr = isset( $string < $i + 1 >) ? $string < $i + 1 >: null ;
$nextOrd = ord ( $nextChr );

if ( $ord === 10 ) < // line feed
$output .= $endl ;
$charsLeft = $lineLength ;
> elseif (
strlen ( $chr ) $charsLeft or
strlen ( $chr ) === $charsLeft and $nextOrd === 10 || $encodedWord
) < // add character
$output .= $chr ;
$charsLeft -= strlen ( $chr );
> elseif (isset( $nextOrd )) < // another line needed
$output .= $encodedWord
? $wordEnd . $endl . «\t» . $wordStart . $chr
: ‘=’ . $endl . $chr ;
$charsLeft = $lineLength — strlen ( $chr );
>

$chr = $nextChr ;
$ord = $nextOrd ;
>

return $output . ( $encodedWord ? $wordEnd : » );
>

echo quoted_printable_encode ( $text /*, true*/ );

Источник

Linux quoted printable decode