- Convert Windows Line Endings to Unix
- Converting from Windows-style to UNIX-style line endings
- The Problem
- The Symptoms
- In the Slurm job scheduler
- In other programs
- Checking a file’s line ending format
- How to Convert
- Converting using Notepad++
- Converting using dos2unix
- Line Ending Converter package
- Features
- How to use
- Status View Display:
- Perform conversion:
- I think this package is bad news.
- jonlabelle / crlf.py
Convert Windows Line Endings to Unix
From time to time I create projects that need to be supported by both Windows and *NIX systems. Line endings are always a pain and although GIT has helped this a lot there are still issues at times that you must deal with manually.
Here is a quick way to output a project and convert line endings all in one. This is not difficult but probably won’t make much sense unless you’ve messed with .cmd or .bat files in the past.
First grab dos2unix here. You’ll need to select the correct binary for your platform either 32bit or 64bit. As of today 6.0.3 is the current version.
Save the below to “your_project_name.bat”
@ECHO off
setlocal EnableDelayedExpansion
SET source=D:\Path\To\Your\Source
SET dest=D:\Path\To\Your\Destination
SET d2upath=D:\dos2unix\Folder\Path
CALL publish.bat -r -s %source% -d %dest%
cd %dest%
ECHO Converting files to unix line endings…
SET exts=*.json, *.js
FOR %%f in (%exts%) DO (
CALL %d2upath%\dos2unix %%f
)
ECHO Finished publishing.
Save the following to “publish.bat”
@ECHO off
setlocal EnableDelayedExpansion
REM Set npm path, default build directory various variables.
SET execDir=%
dp0
SET sourceDir=%CD%
SET destDir=
SET lastArg=0
SET sourceIdx=0
SET destIdx=0
SET ctr=0
SET empty=n
REM No args passed use defaults goto dependency check.
IF [%
1]==[] GOTO :HELP
IF /i %
REM Loop over args and assign them.
:SETARGS
IF [%
1]==[] GOTO :DIREXISTS
1==-s (
SET /a sourceIdx=%ctr%+1
)
IF /i %
IF %ctr%==%sourceIdx% (
IF %lastArg%==-s SET sourceDir=%
IF %ctr%==%destIdx% (
IF %lastArg%==-d SET destDir=%
1
SET /a ctr+=1
SHIFT & GOTO :SETARGS
:DIREXISTS
IF EXIST %destDir% GOTO :PUBLISH
SET /p create=”Directory “%destDir%” does not exist, would you like to create it (y or n)?”
IF %create%==y (
ECHO Creating directory…
MKDIR %destDir%
GOTO :PUBLISH
)
ECHO ——————————————————-
ECHO Nothing to do, no output directory exiting…
GOTO :END
:PUBLISH
ECHO PUBLISHING: %sourceDir% to %destDir%
ECHO ——————————————————-
IF %empty%==y (
ECHO Removing files and subfolders.
RD %destDir% /s /q
ECHO:
)
MKDIR %destDir%
ECHO Publishing please wait…
ECHO:
IF EXIST pubignore.txt (
XCOPY /s /y /EXCLUDE:pubignore.txt %sourceDir%\* %destDir%
) ELSE (
XCOPY /s /y %sourceDir%\* %destDir%
)
GOTO :END
:HELP
ECHO:
ECHO HELP: Publish Script Help.
ECHO ——————————————————-
ECHO -s (specify source, default is current directory.)
ECHO -d (specify destination directory [required].)
ECHO -r (remove directory and re-create)
GOTO :END
Converting from Windows-style to UNIX-style line endings
The Problem
In a plain text file, to tell the computer that a line of text doesn’t continue forever, the end of each line is marked by a sequence of one or more invisible characters, called control characters. While there are many control characters for different purposes, the relevant ones for line endings are the carriage return (CR) and line feed (LF) characters.
Unfortunately, the programmers of different operating systems have represented line endings using different sequences:
- All versions of Microsoft Windows represent line endings as CR followed by LF.
- UNIX and UNIX-like operating systems (including Mac OS X) represent line endings as LF alone.
Therefore, a text file prepared in a Windows environment will, when copied to a UNIX-like environment such as a NeSI cluster, have an unnecessary carriage return character at the end of each line. To make matters worse, this character will normally be invisible, though in some text editors it will show up as ^M or similar.
Many programs, including the Slurm and LoadLeveler batch queue schedulers, will give errors when given a file containing carriage return characters as input.
Therefore, you will need to convert any such file so it has only UNIX-style line endings before using it on a NeSI cluster.
The Symptoms
In the Slurm job scheduler
If you submit (using sbatch ) a Slurm submission script with Windows-style line endings, you will likely receive the following error:
In other programs
Some UNIX or Linux programs are tolerant to Windows-style line endings, while others give errors. The text of the error is almost infinitely variable, but program behaviours might include the following responses:
- Explicitly stating the problem with line endings
- Complaining more vaguely that the input data is incomplete or corrupt or that there are problems reading it
- Failing in a more serious way such as a segmentation fault
Checking a file’s line ending format
If you have what you think is a text file on the cluster but you don’t know whether its line endings are in the correct format or not, you can run the following command:
Depending on the contents of foo.txt , the output of this command may vary, but if the output has «CR» or «CRLF» in it, you will need to convert foo.txt to UNIX format line endings if you want to use it on the cluster.
How to Convert
Converting using Notepad++
In the Windows text editing program Notepad++ (not to be confused with ordinary Notepad), there is a function to prepare text files with UNIX-style line endings.
To write your file in this way, while you have the file open, go to the Edit menu, select the «EOL Conversion» submenu, and from the options that come up select «UNIX/OSX Format». The next time you save the file, its line endings will, all going well, be saved with UNIX-style line endings.
You can check what format line endings you are currently editing in by looking in the status bar at the bottom of the window. Between the range box (a box containing Ln, Col and Sel entries) and the text encoding box (which will contain UTF-8, ANSI, or some other technical string) will be a box containing the current line ending format.
- In most cases, this box will contain the text «DOS\Windows».
- In a few cases, such as the file having been prepared on a UNIX or Linux machine or a Mac, it will contain the text «UNIX».
- It is possible, though highly unlikely by now, that the file may have old-style (pre-OSX) Mac line endings, in which case the box will contain the text «Macintosh».
Please note that if you change a file’s line ending style, you must save your changes before copying the file anywhere, including to a cluster.
Converting using dos2unix
Suppose, though, that you’ve copied a text file to the cluster already, and you realise you need to convert it to UNIX format. How do you do that?
Simple: Use the program dos2unix .
Just give the name of your file to dos2unix as an argument, and it will convert the file’s line endings to UNIX format:
There are other options in the rare case that you don’t want to just modify your existing file; run man dos2unix for details.
Line Ending Converter package
Features
- Show the line ending (EOL) format of the current file in the status bar (see the note below for details)
- Convert the line endings to Unix/Windows/Old Mac format.
How to use
Status View Display:
It is enabled by default. You can disable it in the package setting.
Notes: The EOL format being shown is the EOL format of the first row of the file. It cannot detect if the file is having inconsistent EOL formats.
Perform conversion:
Click the status view to open the list for conversion
Packages -> Convert Line Endings To -> Unix Format / Windows Format / Old Mac Format
Or, in Context Menu (inside an active editor),
Convert Line Endings To -> Unix Format / Windows Format / Old Mac Format
Or, in Command Palette ( cmd-shift-p or ctrl-shift-p ), type
Convert To Unix Format , or Convert To Windows Format , or Convert to Old Mac Format
(Note: This will convert the line endings of the text in the active editor.)
Notes: The conversion works only when the file has at least one EOL symbol. If the file does not have any EOL symbols, the conversion would not persist, as the current implementation of Atom uses a default EOL (which appears to be the UNIX format) if there is no EOL symbol found in the file.
You can try to use the experimental feature «Normalize On Save» if you really need to have a consistent line ending across all files.
I think this package is bad news.
Good catch. Let us know what about this package looks wrong to you, and we’ll investigate right away.
jonlabelle / crlf.py
#!/usr/bin/env python |
«»»Replace line breaks, from one format to another.»»» |
from __future__ import print_function |
import argparse |
import glob |
import os |
import sys |
import tempfile |
from stat import ST_ATIME , ST_MTIME |
LF = ‘ \n ‘ |
CRLF = ‘ \r \n ‘ |
CR = ‘ \r ‘ |
def _normalize_line_endings ( lines , line_ending = ‘unix’ ): |
r»»»Normalize line endings to unix (\n), windows (\r\n) or mac (\r). |
:param lines: The lines to normalize. |
:param line_ending: The line ending format. |
Acceptable values are ‘unix’ (default), ‘windows’ and ‘mac’. |
:return: Line endings normalized. |
«»» |
lines = lines . replace ( CRLF , LF ). replace ( CR , LF ) |
if line_ending == ‘windows’ : |
lines = lines . replace ( LF , CRLF ) |
elif line_ending == ‘mac’ : |
lines = lines . replace ( LF , CR ) |
return lines |
def _copy_file_time ( source , destination ): |
«»»Copy one file’s atime and mtime to another. |
:param source: Source file. |
:param destination: Destination file. |
«»» |
file1 , file2 = source , destination |
try : |
stat1 = os . stat ( file1 ) |
except os . error : |
sys . stderr . write ( file1 + ‘ : cannot stat \n ‘ ) |
sys . exit ( 1 ) |
try : |
os . utime ( file2 , ( stat1 [ ST_ATIME ], stat1 [ ST_MTIME ])) |
except os . error : |
sys . stderr . write ( file2 + ‘ : cannot change time \n ‘ ) |
sys . exit ( 2 ) |
def _create_temp_file ( contents ): |
«»»Create a temp file. |
:param contents: The temp file contents. |
:return: The absolute path of the created temp file. |
«»» |
tf = tempfile . NamedTemporaryFile ( mode = ‘wb’ , suffix = ‘txt’ , delete = False ) |
tf . write ( contents ) |
tf . close () |
return tf . name |
def _delete_file_if_exists ( filepath ): |
«»»Delete the file if it exists. |
:param filepath: The file path. |
«»» |
if os . path . exists ( filepath ): |
os . remove ( filepath ) |
def _read_file_data ( filepath ): |
«»»Read file data. |
:param filepath: The file path. |
:return: The file contents. |
«»» |
data = open ( filepath , ‘rb’ ). read () |
return data |
def _write_file_data ( filepath , data ): |
«»»Write file data. |
:param filepath: The file path. |
:param data: The data to write. |
«»» |
f = open ( filepath , ‘wb’ ) |
f . write ( data ) |
f . close () |
def main (): |
«»»Main.»»» |
parser = argparse . ArgumentParser ( |
prog = ‘crlf’ , |
description = ‘Replace CRLF (windows) line endings with LF (unix) ‘ |
‘line endings in files, and vice-versa’ ) |
parser . add_argument ( |
‘-q’ , ‘—quiet’ , |
help = ‘suppress descriptive messages from output’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-n’ , ‘—dryrun’ , |
help = ‘show changes, but do not modify files’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-w’ , ‘—windows’ , |
help = ‘replace LF (unix) line endings with CRLF (windows) line endings’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-u’ , ‘—unix’ , |
help = ‘replace CRLF (windows) line endings with LF (unix) ‘ |
‘line endings (default)’ , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘-t’ , ‘—timestamps’ , |
help = «maintains the modified file’s time stamps (atime and mtime)» , |
action = ‘store_true’ , |
default = False ) |
parser . add_argument ( |
‘files’ , |
nargs = ‘+’ , |
help = «a list of files or file glob patterns to process» , |
default = ‘.’ ) |
if len ( sys . argv ) 2 : |
parser . print_help () |
sys . exit ( 2 ) |
args = parser . parse_args () |
if args . windows is True and args . unix is True : |
sys . stderr . write ( «Ambiguous options specified, ‘unix’ and ‘windows’. « |
«Please choose one option, or the other. \n » ) |
sys . exit ( 2 ) |
files_to_process = [] |
for arg_file in args . files : |
files_to_process . extend ( glob . glob ( arg_file )) |
if len ( files_to_process ) 0 : |
if args . quiet is False : |
sys . stderr . write ( ‘No files matched the specified pattern. \n ‘ ) |
sys . exit ( 2 ) |
if args . dryrun is True and args . quiet is False : |
print ( ‘Dry-run only, files will NOT be modified.’ ) |
for file_to_process in files_to_process : |
if os . path . isdir ( file_to_process ): |
if args . quiet is False : |
print ( «- ‘<0>‘ : is a directory (skip)» . format ( file_to_process )) |
continue |
if os . path . isfile ( file_to_process ): |
data = _read_file_data ( file_to_process ) |
if ‘ \\ 0’ in data : |
if args . quiet is False : |
print ( «- ‘<0>‘ : is a binary file (skip)» . format ( file_to_process )) |
continue |
if args . windows is True : |
new_data = _normalize_line_endings ( data , line_ending = ‘windows’ ) |
else : |
new_data = _normalize_line_endings ( data , line_ending = ‘unix’ ) |
if new_data != data : |
if args . quiet is False : |
if args . windows is True : |
if args . dryrun is True : |
print ( «+ ‘<0>‘ : LF would be replaced with CRLF» . format ( file_to_process )) |
else : |
print ( «+ ‘<0>‘ : replacing LF with CRLF» . format ( file_to_process )) |
else : |
if args . dryrun is True : |
print ( «+ ‘<0>‘ : CRLF would be replaced with LF» . format ( file_to_process )) |
else : |
print ( «+ ‘<0>‘ : replacing CRLF with LF» . format ( file_to_process )) |
tmp_file_path = «» |
if args . dryrun is False : |
try : |
if args . timestamps is True : |
# create a temp file with the original file |
# contents and copy the old file’s atime a mtime |
tmp_file_path = _create_temp_file ( data ) |
_copy_file_time ( file_to_process , tmp_file_path ) |
# overwrite the current file with the modified contents |
_write_file_data ( file_to_process , new_data ) |
if args . timestamps is True : |
# copy the original file’s atime and mtime back to |
# the original file w/ the modified contents, |
# and delete the temp file. |
_copy_file_time ( tmp_file_path , file_to_process ) |
_delete_file_if_exists ( tmp_file_path ) |
except Exception as ex : |
sys . stderr . write ( ‘error : <0>\n ‘ . format ( str ( ex ))) |
sys . exit ( 1 ) |
else : |
if args . quiet is False : |
if args . windows is True : |
print ( «- ‘<0>‘ : line endings already CRLF (windows)» . format ( file_to_process )) |
else : |
print ( «- ‘<0>‘ : line endings already LF (unix)» . format ( file_to_process )) |
else : |
sys . stderr . write ( «- ‘<0>‘ : file not found \n » . format ( file_to_process )) |
sys . exit ( 1 ) |
if __name__ == ‘__main__’ : |
main () |
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.