Linux convert all files

How to Convert Files to UTF-8 Encoding in Linux

In this guide, we will describe what character encoding and cover a few examples of converting files from one character encoding to another using a command line tool. Then finally, we will look at how to convert several files from any character set (charset) to UTF-8 encoding in Linux.

As you may probably have in mind already, a computer does not understand or store letters, numbers or anything else that we as humans can perceive except bits. A bit has only two possible values, that is either a 0 or 1 , true or false , yes or no . Every other thing such as letters, numbers, images must be represented in bits for a computer to process.

In simple terms, character encoding is a way of informing a computer how to interpret raw zeroes and ones into actual characters, where a character is represented by set of numbers. When we type text in a file, the words and sentences we form are cooked-up from different characters, and characters are organized into a charset.

There are various encoding schemes out there such as ASCII, ANSI, Unicode among others. Below is an example of ASCII encoding.

In Linux, the iconv command line tool is used to convert text from one form of encoding to another.

You can check the encoding of a file using the file command, by using the -i or —mime flag which enables printing of mime type string as in the examples below:

Check File Encoding in Linux

The syntax for using iconv is as follows:

Where -f or —from-code means input encoding and -t or —to-encoding specifies output encoding.

To list all known coded character sets, run the command below:

List Coded Charsets in Linux

Convert Files from UTF-8 to ASCII Encoding

Next, we will learn how to convert from one encoding scheme to another. The command below converts from ISO-8859-1 to UTF-8 encoding.

Consider a file named input.file which contains the characters:

Let us start by checking the encoding of the characters in the file and then view the file contents. Closely, we can convert all the characters to ASCII encoding.

After running the iconv command, we then check the contents of the output file and the new encoding of the characters as below.

Convert UTF-8 to ASCII in Linux

Note: In case the string //IGNORE is added to to-encoding, characters that can’t be converted and an error is displayed after conversion.

Again, supposing the string //TRANSLIT is added to to-encoding as in the example above (ASCII//TRANSLIT), characters being converted are transliterated as needed and if possible. Which implies in the event that a character can’t be represented in the target character set, it can be approximated through one or more similar looking characters.

Consequently, any character that can’t be transliterated and is not in target character set is replaced with a question mark (?) in the output.

Convert Multiple Files to UTF-8 Encoding

Coming back to our main topic, to convert multiple or all files in a directory to UTF-8 encoding, you can write a small shell script called encoding.sh as follows:

Save the file, then make the script executable. Run it from the directory where your files ( *.txt ) are located.

Читайте также:  Windows 10 планшет прошивка

Important: You can as well use this script for general conversion of multiple files from one given encoding to another, simply play around with the values of the FROM_ENCODING and TO_ENCODING variable, not forgetting the output file name «$.utf8.converted» .

For more information, look through the iconv man page.

To sum up this guide, understanding encoding and how to convert from one character encoding scheme to another is necessary knowledge for every computer user more so for programmers when it comes to dealing with text.

Lastly, you can get in touch with us by using the comment section below for any questions or feedback.

If You Appreciate What We Do Here On TecMint, You Should Consider:

TecMint is the fastest growing and most trusted community site for any kind of Linux Articles, Guides and Books on the web. Millions of people visit TecMint! to search or browse the thousands of published articles available FREELY to all.

If you like what you are reading, please consider buying us a coffee ( or 2 ) as a token of appreciation.

We are thankful for your never ending support.

Источник

Linux Bash shell script for recursively converting all files with various charsets in a directory into UTF-8 (Shell-Skript für das rekursive Konvertieren von allen Files in einem Verzeichnis mit beliebigem Charset in UTF-8)

Linux Bash Shell script for recursively converting files, that are saved in any charset, into UTF-8.

LEXO created a script that can convert the Charset of all files within a directory (and it’s subdirectories) into UTF-8 files.

How it works

  1. The script will first create a list of all files
  2. This list will be iterated.
  3. For each and every file found the files’ extension is being determined
  4. If the found files’ extension is OK (in the filestoconvert array – see usage below) it will be converted using the iconv Linux tool.
  5. If iconv returns an error on conversion (sometimes that happens) you will see the file that produced the error when the conversion has finished. If that happens you will need to check the file manually or just trust the system that it was converted correctly despite the error message. In our tests the iconv errors were always false negatives. All the files that were reported as “could not convert” were UTF-8 after all.

Usage

  1. First, edit the file and check the variable filestoconvert. This variable stores a list of file extensions like “.htm” or “.php”. Basically the script will only convert files which extension is listed in this array. The reason is that you won’t need to convert binary files like images or PDF documents to be converted into UTF-8. So please take a close look at this array (variable) and add/remove the appropriate file extensions.
  2. It’s a standard Linux shell script. You can run it in SH or BASH like this
    1. First set the script to be executable using chmod:
    2. Run the script on your shell console (SSH, telnet whatever) like this:
  3. Before you run the script, make a backup of all your files . Conversions like this may always break some things.

User feedbacks and colors

The script will generate a detailed output while running. Every message type has its own color:

  1. File successfully converted (green)
    Displays the filename, its original charset and that it has been converted to UTF8
  2. File does not need conversion (blue)
    Dispplays the filename and the information, that it’s already in UTF-8 format
  3. File skipped (white)
    The file was not converted because its extension is not mentioned in the filestoconvert variable/array
  4. File conversion error (red)
    All the files that could not be converted are displayed in red text at the end of the process

Disclaimer

LEXO (we) do not take any responsibilities or give any guarantee on success. The script works perfectly for our needs. Feel free to change it to your needs. You can share/distribute these information.

If we could provide a good service for you, Flattr us and share the link on Facebook! We’d very much appreciate it! 😉

Читайте также:  Отменить разделение жесткого диска windows 10

Download the script

You can download our shell script here or copy/paste the code below into your own script (remember to change file extension and set the shell script to be executable -> See usage above):

Источник

How to bulk convert all the file in a file system branch between Unix and Windows line break format?

Everybody knows 🙂 that in Windows plain text files lines are terminated with CR+LF, and in Unix&Linux — with LF only. How can I quickly convert all my source code files from one format to another and back?

4 Answers 4

That depends: if the files are under version control, this could be a rather unpopular history-polluting decision. Git has the option to automagically convert line endings on check-out.

If you do not care and want to quickly convert, there are programs like fromdos / todos and dos2unix / unix2dos that do this for you. You can use find : find . -type f -name ‘*.php’ -exec dos2unix ‘<>‘ + .

There are several dedicated programs, including

  • dos2unix and unix2dos from the dos2unix tools
  • todos and fromdos from the tofrodos tools

Simply pick the tool for the appropriate direction and pass the names of the files to convert on the command line.

If you don’t have either, but have Linux or Cygwin:

If you have perl:

With only POSIX tools (including BusyBox), to go from unix to dos, you’ll need to pass the CR character literally in the sed command.

In the other direction, you can simply delete all CRs:

You can use wildcards to convert many files in the same directory at once, e.g.

To convert all files in the current directory and its subdirectories, if your shell is zsh, you can use **/ , e.g.

You can use **/ in bash ≥4, but you need to run shopt -s globstar first (you can put this line in your

/.bashrc ). You can use **/ in ksh93, but you need to run set -o globstar first (you can put this line in your

If you can only use the tools that require a redirection, use a for loop.

If you don’t have **/ or need more complex matching to select which files to convert, use the find command. Here’s a Linux/Cygwin example which converts all files under the current directory and its subdirectories recursively, except for files called .o and under subdirectories called bin .

Here’s a POSIX example. We tell find to start a shell that can perform the necessary redirection.

You can make the find method slightly faster, at the expense of more complex code, by using a loop in the shell command.

Источник

How can I automatically convert all source code files in a folder (recursively) to a single PDF with syntax highlighting?

I would like to convert source code of a few projects to one printable file to save on a usb and print out easily later. How can I do that?

First off I want to clarify that I only want to print the non-hidden files and directories(so no contents of .git e.g.).

To get a list of all non-hidden files in non-hidden directories in the current directory you can run the find . -type f ! -regex «.*/\..*» ! -name «.*» command as seen as the answer in this thread.

As suggested in that same thread I tried making a pdf file of the files by using the command find . -type f ! -regex «.*/\..*» ! -name «.*» ! -empty -print0 | xargs -0 a2ps -1 —delegate no -P pdf but unfortunately the resulting pdf file is a complete mess.

3 Answers 3

I was intrigued by your question and got kinda carried away. This solution will generate a nice PDF file with a clickable index and color highlighted code. It will find all files in the current directory and subdirectories and create a section in the PDF file for each of them (see the notes below for how to make your find command more specific).

It requires that you have the following installed (the install instructions are for Debian-based systems but these should be available in your distribution’s repositories):

Читайте также:  Epson l1110 драйвер windows 10

This should also install a basic LaTeX system if you don’t have one installed.

Once these are installed, use this script to create a LaTeX document with your source code. The trick is using the listings (part of texlive-latex-recommended ) and color (installed by latex-xcolor ) LaTeX packages. The \usepackage[..] is what makes the listings in the table of contents clickable links.

Run the script in the directory that contains the source files

That will create a file called all.pdf in the current directory. I tried this with a couple of random source files I found on my system (specifically, two files from the source of vlc-2.0.0 ) and this is a screenshot of the first two pages of the resulting PDF:

A couple of comments:

  • The script will not work if your source code file names contain spaces. Since we are talking about source code, I will assume they don’t.
  • I added ! -name «*

» to avoid backup files.

I recommend you use a more specific find command to find your files though, otherwise any random file will be included in the PDF. If your files all have specific extensions ( .c and .h for example), you should replace the find in the script with something like this

  • Play around with the listings options, you can tweak this to be exactly as you want it.
  • Источник

    Convert all file extensions to lower-case

    I’m trying to lower-case all my extensions regardless of what it is. So far, from what I’ve seen, you have to specify what file extensions you want to convert to lower-case. However, I just want to lower-case everything after the first last dot . in the name.

    How can I do that in bash ?

    11 Answers 11

    Solution

    You can solve the task in one line:

    Note: this will break for filenames that contain newlines. But bear with me for now.

    Example of usage

    Explanation

    You find all files in current directory ( . ) that have period . in its name ( -name ‘*.*’ ) and run the command for each file:

    That command means: try to convert file extension to lowercase (that makes sed ):

    and save the result to the a variable.

    If something was changed [ «$a» != «$0» ] , rename the file mv «$0» «$a» .

    The name of the file being processed ( <> ) passed to sh -c as its additional argument and it is seen inside the command line as $0 . It makes the script safe, because in this case the shell take <> as a data, not as a code-part, as when it is specified directly in the command line. (I thank @gniourf_gniourf for pointing me at this really important issue).

    As you can see, if you use <> directly in the script, it’s possible to have some shell-injections in the filenames, something like:

    In this case the injection will be considered by the shell as a part of the code and they will be executed.

    While-version

    Clearer, but a little bit longer, version of the script:

    This still breaks for filenames containing newlines. To fix this issue, you need to have a find that supports -print0 (like GNU find ) and Bash (so that read supports the -d delimiter switch):

    This still breaks for files that contain trailing newlines (as they will be absorbed by the a=$(. ) subshell. If you really want a foolproof method (and you should!), with a recent version of Bash (Bash≥4.0) that supports the ,, parameter expansion here’s the ultimate solution:

    Back to the original solution

    Or in one find go (back to the original solution with some fixes that makes it really foolproof):

    I added -type f so that only regular files are renamed. Without this, you could still have problems if directory names are renamed before file names. If you also want to rename directories (and links, pipes, etc.) you should use -depth :

    so that find performs a depth-first search.

    You may argue that it’s not efficient to spawn a bash process for each file found. That’s correct, and the previous loop version would then be better.

    Источник

    Оцените статью