- Random Stuff
- Monday, April 16, 2012
- Convert Unix, Windows, Mac line endings using OS X command
- jonlabelle / crlf.py
- Configure Visual Studio to use UNIX line endings
- 9 Answers 9
- How to find out line-endings in a text file?
- 11 Answers 11
- Try file , then file -k , then dos2unix -ih
- Try file -k
- Real world example: Certificate Encoding
- Try dos2unix -ih
- Further reading
Random Stuff
Various ramblings of sysadmin, programmer, dancer, coffee snob, food lover and Winnipegger.
Monday, April 16, 2012
Convert Unix, Windows, Mac line endings using OS X command
Today I had to copy some MySQL data from Debian server into test environment on my MacBook. While importing data from tab delimited text files, I noticed warnings that data in the last column of several tables was being truncated. I looked at the tables and noticed MySQL doing some very strange formatting when printing them. It looked almost as if last column was padded with a bunch of white space. I opened import file in TextWrangler and it appeared fine, but when I looked in document options, I saw this:
The good ol’ EOL (end-of-line) character.
Different operating systems use different characters to mark the end of line:
- Unix / Linux / OS X uses LF (line feed, ‘\n‘, 0x0A)
- Macs prior to OS X use CR (carriage return, ‘\r‘, 0x0D)
- Windows / DOS uses CR+LF (carriage return followed by line feed, ‘\r\n‘, 0x0D0A)
I’m guessing the person who sent me those files first transferred them to his Windows machine in ASCII mode, so newline characters got automatically converted during transfer.
Since some of the files were very big, instead of changing line endings in TextWrangler I decided to use command line (shocking, I know).
First I executed
to confirm existence of the dreaded ^M (carriage return) at the end of every line, and then ran
to generate new files without CR characters.
tr (translate character) is a nice little utility that does just that, substitutes one character with another or deletes it (like in my example). It’s available on pretty much any *nix distro so no need to install additional software.
Источник
jonlabelle / crlf.py
#!/usr/bin/env python |
«»»Replace line breaks, from one format to another.»»» |
from __future__ import print_function |
import argparse |
import glob |
import os |
import sys |
import tempfile |
from stat import ST_ATIME , ST_MTIME |
LF = ‘ \n ‘ |
CRLF = ‘ \r \n ‘ |
CR = ‘ \r ‘ |
def _normalize_line_endings ( lines , line_ending = ‘unix’ ): |
r»»»Normalize line endings to unix (\n), windows (\r\n) or mac (\r). |
:param lines: The lines to normalize. |
:param line_ending: The line ending format. |
Acceptable values are ‘unix’ (default), ‘windows’ and ‘mac’. |
:return: Line endings normalized. |
«»» |
lines = lines . replace ( CRLF , LF ). replace ( CR , LF ) |
if line_ending == ‘windows’ : |
lines = lines . replace ( LF , CRLF ) |
elif line_ending == ‘mac’ : |
lines = lines . replace ( LF , CR ) |
return lines |
def _copy_file_time ( source , destination ): |
«»»Copy one file’s atime and mtime to another. |
:param source: Source file. |
:param destination: Destination file. |
«»» |
file1 , file2 = source , destination |
try : |
stat1 = os . stat ( file1 ) |
except os . error : |
sys . stderr . write ( file1 + ‘ : cannot stat \n ‘ ) |
sys . exit ( 1 ) |
try : |
os . utime ( file2 , ( stat1 [ ST_ATIME ], stat1 [ ST_MTIME ])) |
except os . error : |
sys . stderr . write ( file2 + ‘ : cannot change time \n ‘ ) |
sys . exit ( 2 ) |
def _create_temp_file ( contents ): |
«»»Create a temp file. |
:param contents: The temp file contents. |
:return: The absolute path of the created temp file. |
«»» |
tf = tempfile . NamedTemporaryFile ( mode = ‘wb’ , suffix = ‘txt’ , delete = False ) |
tf . write ( contents ) |
tf . close () |
return tf . name |
def _delete_file_if_exists ( filepath ): |
«»»Delete the file if it exists. |
:param filepath: The file path. |
«»» |
if os . path . exists ( filepath ): |
os . remove ( filepath ) |
def _read_file_data ( filepath ): |
«»»Read file data. |
:param filepath: The file path. |
:return: The file contents. |
«»» |
data = open ( filepath , ‘rb’ ). read () |
return data |
def _write_file_data ( filepath , data ): |
«»»Write file data. |
:param filepath: The file path. |
:param data: The data to write. |
«»» |
f = open ( filepath , ‘wb’ ) |
f . write ( data ) |
f . close () |
def main (): |
«»»Main.»»» |
parser = argparse . ArgumentParser ( |
prog = ‘crlf’ , |
description = ‘Replace CRLF (windows) line endings with LF (unix) ‘ |
‘line endings in files, and vice-versa’ ) |
parser . add_argument ( |
‘-q’ , ‘—quiet’ , |
help = ‘suppress descriptive messages from output’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-n’ , ‘—dryrun’ , |
help = ‘show changes, but do not modify files’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-w’ , ‘—windows’ , |
help = ‘replace LF (unix) line endings with CRLF (windows) line endings’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-u’ , ‘—unix’ , |
help = ‘replace CRLF (windows) line endings with LF (unix) ‘ |
‘line endings (default)’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-t’ , ‘—timestamps’ , |
help = «maintains the modified file’s time stamps (atime and mtime)» , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘files’ , |
nargs = ‘+’ , |
help = «a list of files or file glob patterns to process» , |
default = ‘.’ ) |
if len ( sys . argv ) 2 : |
parser . print_help () |
sys . exit ( 2 ) |
args = parser . parse_args () |
if args . windows is True and args . unix is True : |
sys . stderr . write ( «Ambiguous options specified, ‘unix’ and ‘windows’. « |
«Please choose one option, or the other. \n » ) |
sys . exit ( 2 ) |
files_to_process = [] |
for arg_file in args . files : |
files_to_process . extend ( glob . glob ( arg_file )) |
if len ( files_to_process ) 0 : |
if args . quiet is False : |
sys . stderr . write ( ‘No files matched the specified pattern. \n ‘ ) |
sys . exit ( 2 ) |
if args . dryrun is True and args . quiet is False : |
print ( ‘Dry-run only, files will NOT be modified.’ ) |
for file_to_process in files_to_process : |
if os . path . isdir ( file_to_process ): |
if args . quiet is False : |
print ( «- ‘<0>‘ : is a directory (skip)» . format ( file_to_process )) |
continue |
if os . path . isfile ( file_to_process ): |
data = _read_file_data ( file_to_process ) |
if ‘ \\ 0’ in data : |
if args . quiet is False : |
print ( «- ‘<0>‘ : is a binary file (skip)» . format ( file_to_process )) |
continue |
if args . windows is True : |
new_data = _normalize_line_endings ( data , line_ending = ‘windows’ ) |
else : |
new_data = _normalize_line_endings ( data , line_ending = ‘unix’ ) |
if new_data != data : |
if args . quiet is False : |
if args . windows is True : |
if args . dryrun is True : |
print ( «+ ‘<0>‘ : LF would be replaced with CRLF» . format ( file_to_process )) |
else : |
print ( «+ ‘<0>‘ : replacing LF with CRLF» . format ( file_to_process )) |
else : |
if args . dryrun is True : |
print ( «+ ‘<0>‘ : CRLF would be replaced with LF» . format ( file_to_process )) |
else : |
print ( «+ ‘<0>‘ : replacing CRLF with LF» . format ( file_to_process )) |
tmp_file_path = «» |
if args . dryrun is False : |
try : |
if args . timestamps is True : |
# create a temp file with the original file |
# contents and copy the old file’s atime a mtime |
tmp_file_path = _create_temp_file ( data ) |
_copy_file_time ( file_to_process , tmp_file_path ) |
# overwrite the current file with the modified contents |
_write_file_data ( file_to_process , new_data ) |
if args . timestamps is True : |
# copy the original file’s atime and mtime back to |
# the original file w/ the modified contents, |
# and delete the temp file. |
_copy_file_time ( tmp_file_path , file_to_process ) |
_delete_file_if_exists ( tmp_file_path ) |
except Exception as ex : |
sys . stderr . write ( ‘error : <0>\n ‘ . format ( str ( ex ))) |
sys . exit ( 1 ) |
else : |
if args . quiet is False : |
if args . windows is True : |
print ( «- ‘<0>‘ : line endings already CRLF (windows)» . format ( file_to_process )) |
else : |
print ( «- ‘<0>‘ : line endings already LF (unix)» . format ( file_to_process )) |
else : |
sys . stderr . write ( «- ‘<0>‘ : file not found \n » . format ( file_to_process )) |
sys . exit ( 1 ) |
if __name__ == ‘__main__’ : |
main () |
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Источник
Configure Visual Studio to use UNIX line endings
We would like to use Visual Studio 2005 to work on a local copy of an SVN repository. This local copy has been checked out by Mac OS X (and updates and commits will only be made under Mac OS X, so no problem there), and as a consequence the line endings are UNIX-style.
We fear that Visual Studio will introduce Windows-style line endings. Is it possible to force Visual Studio to use UNIX line endings?
9 Answers 9
Warning: This solution no longer works for Visual Studio 2017 and later. Instead, both of the answers by jcox and Munther Jaber are needed. I have combined them into one answer.
As OP states «File > Advanced Save Options», select Unix Line Endings.
This will only affect new files that are created. Fixing any that were previously created can be done file-by-file or you can search for tools that will fix on-bulk.
Here are some options available for Visual Studio Community 2017
- «File > Advanced Save Options» has been removed by microsoft due to «uncommon use». Whatever that means. https://developercommunity.visualstudio.com/content/problem/8290/file-advanced-save-options-option-is-missed.html You can add it back by going to «Tools>Customize», then «Commands» tab, select the drop down next to «Menu Bar» select «File» then «Add Command»>File>Advanced Save Options..». You can then reorder it in the file menu by using «move down».
I don’t know if you will have to then set the advanced save options for each and every file, but it might prevent the issue I was having where my Visual Studio kept adding CL RF line endings into my files that were uniformly LF.
But I took it one step further and I added an extension called «Line Endings Unifier» by going to «Tools>Extensions and Updates>Online» and then searching for «line endings» in the search bar to the right. I will use this to automatically force all of my scripts to save with uniform line endings of my choice, but you can do more with it. https://marketplace.visualstudio.com/items?itemName=JakubBielawa.LineEndingsUnifier
strip’em is another solution that does something similar to Line Endings Unifier. http://www.grebulon.com/software/stripem.php
I am not sure how they differ or the advantages/disadvantages of either. I’m mainly using Line Endings Unifier just because it was in the Visual Studio Marketplace. I think I’ve used all of these methods in the past, but my memory is fuzzy.
Источник
How to find out line-endings in a text file?
I’m trying to use something in bash to show me the line endings in a file printed rather than interpreted. The file is a dump from SSIS/SQL Server being read in by a Linux machine for processing.
Are there any switches within vi , less , more , etc?
In addition to seeing the line-endings, I need to know what type of line end it is ( CRLF or LF ). How do I find that out?
11 Answers 11
You can use the file utility to give you an indication of the type of line endings.
To convert from «DOS» to Unix:
To convert from Unix to «DOS»:
Converting an already converted file has no effect so it’s safe to run blindly (i.e. without testing the format first) although the usual disclaimers apply, as always.
simple cat -e works just fine.
This displays Unix line endings ( \n or LF) as $ and Windows line endings ( \r\n or CRLF) as ^M$ .
:set list to see line-endings.
:set nolist to go back to normal.
While I don’t think you can see \n or \r\n in vi , you can see which type of file it is (UNIX, DOS, etc.) to infer which line endings it has.
Alternatively, from bash you can use od -t c or just od -c to display the returns.
In the bash shell, try cat -v . This should display carriage-returns for windows files.
(This worked for me in rxvt via Cygwin on Windows XP).
Editor’s note: cat -v visualizes \r (CR) chars. as ^M . Thus, line-ending \r\n sequences will display as ^M at the end of each output line. cat -e will additionally visualize \n , namely as $ . ( cat -et will additionally visualize tab chars. as ^I .)
Try file , then file -k , then dos2unix -ih
file will usually be enough. But for tough cases try file -k or dosunix -ih .
Try file -k
Short version: file -k somefile.txt will tell you.
- It will output with CRLF line endings for DOS/Windows line endings.
- It will output with LF line endings for MAC line endings.
- And for Linux/Unix line «CR» it will just output text . (So if it does not explicitly mention any kind of line endings then this implicitly means: «CR line endings».)
Long version see below.
Real world example: Certificate Encoding
I sometimes have to check this for PEM certificate files.
The trouble with regular file is this: Sometimes it’s trying to be too smart/too specific.
Let’s try a little quiz: I’ve got some files. And one of these files has different line endings. Which one?
(By the way: this is what one of my typical «certificate work» directories looks like.)
Let’s try regular file :
Huh. It’s not telling me the line endings. And I already knew that those were cert files. I didn’t need «file» to tell me that.
What else can you try?
You might try dos2unix with the —info switch like this:
So that tells you that: yup, «0.example.end.cer» must be the odd man out. But what kind of line endings are there? Do you know the dos2unix output format by heart? (I don’t.)
But fortunately there’s the —keep-going (or -k for short) option in file :
Excellent! Now we know that our odd file has DOS ( CRLF ) line endings. (And the other files have Unix ( LF ) line endings. This is not explicit in this output. It’s implicit. It’s just the way file expects a «regular» text file to be.)
(If you wanna share my mnemonic: «L» is for «Linux» and for «LF».)
Now let’s convert the culprit and try again:
Good. Now all certs have Unix line endings.
Try dos2unix -ih
I didn’t know this when I was writing the example above but:
Actually it turns out that dos2unix will give you a header line if you use -ih (short for —info=h ) like so:
And another «actually» moment: The header format is really easy to remember: Here’s two mnemonics:
- It’s DUMB (left to right: d for Dos, u for Unix, m for Mac, b for BOM).
- And also: «DUM» is just the alphabetical ordering of D, U and M.
Further reading
To show CR as ^M in less use less -u or type — u once less is open.
You can use xxd to show a hex dump of the file, and hunt through for «0d0a» or «0a» chars.
You can use cat -v as @warriorpostman suggests.
You may use the command todos filename to convert to DOS endings, and fromdos filename to convert to UNIX line endings. To install the package on Ubuntu, type sudo apt-get install tofrodos .
You can use vim -b filename to edit a file in binary mode, which will show ^M characters for carriage return and a new line is indicative of LF being present, indicating Windows CRLF line endings. By LF I mean \n and by CR I mean \r . Note that when you use the -b option the file will always be edited in UNIX mode by default as indicated by [unix] in the status line, meaning that if you add new lines they will end with LF, not CRLF. If you use normal vim without -b on a file with CRLF line endings, you should see [dos] shown in the status line and inserted lines will have CRLF as end of line. The vim documentation for fileformats setting explains the complexities.
Also, I don’t have enough points to comment on the Notepad++ answer, but if you use Notepad++ on Windows, use the View / Show Symbol / Show End of Line menu to display CR and LF. In this case LF is shown whereas for vim the LF is indicated by a new line.
I dump my output to a text file. I then open it in notepad ++ then click the show all characters button. Not very elegant but it works.
Источник