Linux split binary file

Изучаем команды linux: split

1. Введение

Если вы не слышали о команде split, вы многое потеряли. Как можно понять из ее названия, эта команда может помочь вам разделить файл на несколько меньших файлов. split работает с любыми файлами, как текстовыми, так и бинарными. Это очень полезно, если ваш файл не помещается на флешку, или вам нужно отправить большой файл по электронной почте. Вы также можете разделять большие текстовые файлы, такие как логи, на несколько частей заданного размера. В данной статье описан синтакис и применение команды split.

2. Часто используемые опции

-b, —bytes=SIZE
Эта опция задает размер выводимых файлов.
-d, —numeric-suffixes
использовать цифровые суффиксы вместо алфавитных
-n, —number=CHUNKS
задает количество выводимых файлов

3. Разрезание файла на части заданного размера

Если не используется опция suffix, по умолчанию команда split будет разделять файлы на части, название которых будет начинаться с x, за которым будут следовать еще два символа в алфавитном порядке. Например, первый файл будет называться xaa, за которым будет идти xab, xac, а последний файл будет называться xzz. Это значит, что в данном случае вы можете разбить свой файл не более чем на 676 частей (26×26). Возьмем для примера файл размером 10 Мб:

Теперь мы можем разбить этот файл на части размером 1 Мб с помощью опции -b:

4. Разрезание файла на заданное количество частей

Также может возникнуть ситуация, когда нам нужно разбить файл на определенное количество частей. В приведенном ниже примере мы разрезаем наш файл размером 10 Мб на максимум три части с помощью опции -n:

5. Использование числовых суффиксов

Команда split позволяет создавать файлы с числовыми суффиксами вместо символьных. Приведенная ниже команда создает файл для каждого символа в строке «linuxcareer.com». Секрет заключается в том, что мы разрезаем файл, задавая размер части в байтах, а 1 байт эквивалентен 1 символу. Также вместо символьных мы используем числовые суффиксы:

6. Восстановление исходного файла

Разрезать файл легко, но как собрать его обратно? Допустим, у нас есть какой-либо ISO-образ, и мы хотим разделить его пополам.

Теперь мы разрезаем образ пополам с помощью опции -n:

Мы можем восстановить исходный файл ubuntu-12.04.1-server-amd64.iso, используя команду cat и оператор перенаправления STDOUT:

Чтобы убедиться, что файл restored-ubuntu-12.04.1-server-amd64.iso восстановлен корректно и представляет собой точную копию оригинального файла ubuntu-12.04.1-server-amd64.iso, мы используем команду md5sum для создания контрольной суммы обоих файлов:

Как вы можете видеть, восстановленный файл является точной копией исходного, так как их контрольные суммы совпадают.

7. Разрезание архивов tar на лету

Теперь, когда мы ознакомились с основами, рассмотрим более сложный пример.
В приведенном ниже примере мы архивируем директорию /tmp/Software. Однако вместо создания архива мы разрежем его на лету с помощью команды split.

Как вы можете видеть, наш архив диретории /tmp/Software разрезан на части с максимальным размером 2 Мб. Далее мы восстановим исходную директорию:

8. Заключение

Как вы можете видеть, при использовании операционной системы GNU/Linux вы ограничены только своим воображением и навыками, а не размером вашего кошелька. В данной статье описаны только основы работы с командой split. Узнать больше вы можете с помощью команды:

Источник

Split Command in Linux with Examples

Split command in Linux is used to split large files into smaller files. It splits the files into 1000 lines per file(by default) and even allows users to change the number of lines as per requirement.

The names of the files are PREFIXaa, PREFIXab, PREFIXac, and so on. By default the PREFIX of files name is x and the default size of each split file is 1000 lines per file and both the parameters can be changed with ease. It is generally used with log and archive files as they are very large and have a lot of lines, So in order to break them into small files for analysis split command is used.

Syntax:

Working with Split Command

1. Split file into short files. Assume a file name with name index.txt. Use below split command to break it into pieces.

Читайте также:  Home linux что это такое

Index.txt file is split into two pieces with name ‘xaa’ and ‘xab’. It will have 1000 lines in each file by default. The name of split commands is ‘xaa’ and ‘xab’ as we have not set any prefix value.

2. Split file based on number of lines.

Index.txt file is split into short files based on the number of lines which we want using -l option as shown.

3. Split command with verbose option. We can also run split command in verbose mode by using ‘–verbose’. It will give a diagnostic message each time a new split file is created.

Here, we have created a file with name index.txt which will be split into short files and verbose will give us the details of what are the tasks performed.

Note: Here -l 4 is not necessary to use. It is used just for understanding purposes.

4. Split file size using ‘-b’ option.

Here, it will split the file index.txt into separate files called indexaa, indexab, …..with each file containing 16 bytes of data in it.

5. Change in suffix length. By default, the suffix length is 2. We can also change it using ‘-a’ option.

In this it has suffix length 4 on the split files.

Note: Here -l 4 is not necessary to use. It is used just for understanding purposes.

6. Split files created with numeric suffix. In general, the output has a format of x** where ** are alphabets. We can change the split files suffix to numeric by using the ‘-d’ option.

Note: Here -l 4 is not necessary to use. It is used just for understanding purposes.

7. Create n chunks output files. If we want to split a file into three chunk output files then use the ‘-n’ option with the split command which limits the number of split output files.

It will create three chunks of split files.

8. Split file with customize suffix. With this command, we can create split output files with customizing suffix. Assume, if we want to create split output files with index suffix, execute the following command.

Split output files with index suffix will be created.

Note: Here -l 4 is not necessary to use. It is used just for understanding purposes.

9. Avoid zero-sized split files. There are situations when we split a small file into a large number of chunk files and this may lead to zero size split output files. They do not add any value so to avoid it we use the option ‘-e’.

By using this no zero size split output files will be created.

Note: Here -l 4 is not necessary to use. It is used just for understanding purposes.

10. Split the file into two files of equal length. To split a file equally into two files, we use the ‘-n’ option. By specifying ‘-n 2’ the file is split equally into two files.

Источник

Split a Binary File into Multiple Chunks

This page describes a program, splits, which splits a file into multiple pieces of a specified size.

splits – split a binary file into multiple chunks

SYNOPSIS

DESCRIPTION

splits reads an input file infile (or standard input if infile is “-”) and creates multiple output files which consist of the contents of infile broken into sequential pieces of size chunksize, given in “K” (units of 1024 bytes). If no chunksize is specified, splits assumes an output file size of 100K (102400 bytes).

splits is useful when transmitting large binary files over unreliable modem links with uucp. An accidental disconnection during a transmission causes the loss of everything received up to that point, requiring the user to start over from scratch. Breaking the file into multiple chunks with splits means that a disconnection only requires re-sending the chunk being transmitted when the hang-up occurred; if a chunk size substantially smaller than the average time between disconnects is chosen, lost communication time will be minimised.

splits may also be used to split large files being sent by electronic mail into pieces small enough to pass through intermediate mail forwarding sites. Some Internet mailers cannot process messages larger than 64K. splits allows you to circumvent this limitation. When sending binary files through electronic mail, you’ll also have to encode the output of splits with a program such as uuencode or base64 since some mail systems accept only 7 bit ASCII characters.

Finally, splits allows subdividing large files into pieces which fit on various kinds of removable media such as 1.44 megabyte floppy discs.

On Unix the collection of chunks created with splits can be reassembled with cat. Simply use:

to concatenate all the chunks together into an output file identical to the original splits input file. MS-DOS users can use the:

command to concatenate chunks created with splits.

When sending files in multiple chunks, it’s wise to accompany the transmission with a checksum created, for example, with sum or md5, so that the recipient can verify that all the pieces were correctly received and assembled in the proper order.

Читайте также:  Где лежат исполняемые файлы linux

OPTIONS

FILES

If infile is “-splits obtains its input from standard input. In this case the output files are named StdIn.001 , StdIn.002 , etc.

splits assumes it can allocate memory buffers as large as the chunk size and that it can read and write blocks that large. Surgery will be required if you wish to port it to 16 bit architectures.

splits must read input and write output files in binary mode, without any translation of end of line or end of file characters. The splits source code contains code conditional on WIN32 which sets binary mode on that system. If you’re porting splits to another platform which distinguishes text and binary I/O (Unix systems do not), you’ll need to add equivalent code to set binary I/O mode.

Download splits.zip (Zipped archive)

The program is provided as splits.zip, a Zipped archive containing an ready-to-run WIN32 command-line executable program, splits.exe (compiled using Microsoft Visual C++ 5.0), and in source code form along with a Makefile to build the program under Unix.

SEE ALSO

EXIT STATUS

splits returns status 0 if processing was completed without errors, 1 if an error occurs while splitting a file, and 2 if invalid command line arguments are given.

COPYING

This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions. This software is provided “as is” without express or implied warranty.

Источник

How to Split Large Text File into Smaller Files in Linux

Linux has several utilities for breaking down large files into small files. Split and csplit are two of the popular commands which are used for this purpose. These utilities will help to break down big log files and even archive files to make it into a smaller size. This will make convenient to split large files into smaller sizes so that it fits on smaller media storage devices like USB to meet our purpose. By this technique, we can even speed up network file transfers, because parallel transfers of small files are usually faster.

In this tutorial, I’ll explain more on how to use these split and csplit utilities to break-down large files in Linux.

Split

To split large files into smaller files, we can use this command utility in Linux.

You can replace filename with the name of the large file you wish to split. And «prefix» with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:

The split command will give each output file it creates the name prefix with an extension tacked to the end that indicates its order. By default, the split command adds aa to the first output file, proceeding through the alphabet to zz for subsequent files. By default, most systems use x as the prefix.

Split Examples

Split command splits the file into n lines per file and names the files as PREFIXaa, PREFIXab, PREFIXac, and so on. By default the PREFIX is x , and the number of lines is 1000 lines per file.

Split a file into multiple pieces by default usage

I’ve my log file namely system log with 1099 lines, let’s see the status of my log file after splitting it using this command.

The command splits the log file into two files xaa and xab, with the first one having 1000 lines and dumps the leftover in the second file.

Split the file, based upon the number of lines

We can split the file into multiple pieces based on the number of lines using -l option. Here, I’m splitting my system log file with 1099 lines into smaller files with 200 lines each. Let’s see the commands for the same:

You can see that the command has split my log file into five smaller files with 200 lines each and the last one with the leftover.

Split a large file into 500MB files

You can use the option -b to specify the required size limit to split the files. Please see this command which I used for splitting my 1GB Apache log file into two 500MB files each.

Split a large file into 200MB files with the given prefix

You can use the option -b to specify the 200M file size and the required prefix as the second argument. Please see the command which I used to split my 1GB Apache log to 200MB files with a prefix named split.log below:

In this example, you can see that my log files are broken down into 200MB files with my required prefix.

Split the file and name it with numbers

You can use the option -d to name the files with number suffixes as 00, 01, 02 .. and so on, instead of aa, ab, ac. Please see the command which I used to split my 1GB Apache log to 200MB files with a prefix named log and add numbers to the suffix using the option -d instead of alphabets below:

Читайте также:  Как отключить сочетание клавиш с кнопкой windows

You can see the manual page of split command using the command man split to see more information.

Csplit

Csplit is another command utility which divides single files into multiple files determined by context lines.

The files created by csplit normally have names of the form

xxnumber
where number is a two digit decimal number which begins at zero and it increments by one for each new file that csplit creates.

csplit also displays the size, in bytes, of each file that it creates as output.

Csplit Examples

By default, the files that csplit produces in output have ‘xx’ as the prefix and the numbers produced in the output are the byte count for the files the command produced.

Split files based on the number of lines

I have a file which contains 8 lines with the domain names, and my requirement is to split that file at the fourth line, then this can be done by passing ‘4’ as a command line argument after the command and file name.

By passing 4 as a command-line argument, this command splits our domainslist file at the 4th line. The numbers produced in the output are the byte count for the files the command produced. Apparently, two files were produced in the output, namely xx00 and xx01.

Split files using regular expressions

We can use regular expressions with the csplit command. For example, in the previous case, if you want the command to repeat the pattern one more time, then you can do this using the following command:

In this case, we can get three output files.

You can use the asterisk wildcard <*>to tell csplit to repeat your split as many times as possible.

Split files with the given prefix

By default, csplit spilts files and produces the output files to have xx as the prefix. However, if you want, you can change that default prefix using the option -f in the command line with a required prefix.

For example, the following command will produce files having ‘domain’ as prefix.

Split a file by suppressing a line that matches the input pattern

This csplit command provides an option to suppress lines that match the input pattern. The option in question is —suppress-matched .

For example, the following command splits our file at line 4 (xx00 will contain upto line 3, while xx11 will contain rest of the lines excluding line 4).

Customize the number of digits in the output files names

By default, the number of digits that follow the prefix in the output filename is 2. We can use this option -n to customize the number of digits following the prefix in the output file names. For example, if you want to have names like xx001, you can use the command line option which requires the input number signifying the number of digits like -n 3 as below:

Forcing csplit to save the output file in case of error

By default, csplit removes the output files created in case of any error situation. However, if you want to forcefully save this output file by using the -k option in the command. Please check this example to see the difference in the execution of this command with and without -k option.

By default, csplit removes the output files created in case of any error situation. However, we can forcefully save this output file by using the ‘-k’ option in the command. Please check this example to see the difference in the execution of this command with and without -k option. On this first example, the command is meant to split our file ‘domainslist’ on line 3 and repeat the command twice like that which means it should split the second file too at line 3 and should repeat it once again. But since our source file has only eight lines, after the first split it repeats once but unable to iterate twice due to the insufficient range. Hence, no output files are produced due to this error.

But when we executed the same command with this option -k, the output files were not deleted. Please see the result below:

You can check the man page for this tool using man csplit to get more information about this.

Wrapping up

These command-line utilities may not be required for a Linux user on daily basis, but this is one of the important utility which will be helpful for you in your server administration. I hope this article explained all the basic options and uses for these tools. Please post your valuable comments and suggestions on this.

Источник

Оцените статью