- Opening a Unix file in Windows Notepad++?
- 1 Answer 1
- Converting from Windows-style to UNIX-style line endings
- The Problem
- The Symptoms
- In the Slurm job scheduler
- In other programs
- Checking a file’s line ending format
- How to Convert
- Converting using Notepad++
- Converting using dos2unix
- Hackaday
- 141 thoughts on “ Windows Notepad Now Supports Unix Line Endings ”
Opening a Unix file in Windows Notepad++?
I receive a file from a supplier that I download per SFTP. Our systems are all working on Windows.
When I open the File in Notepad++ the status bar says «UNIX» and «UTF-8» The special characters aren’t displayed correctly.
I tried to convert the file to the different formats Notepad++ allows but no one converted the char ‘OSC’ to the german letter ‘ä’. Is this a known Unix-Windows-thing? My google-foo obviously isn’t good enough.
- Which kind of conversion should I try to display the file correctly?
- How can I achieve the same programmatically in C#?
1 Answer 1
It is common on windows that a file’s encoding doesn’t match what the editor or even its xml header say it is. People are sloppy. Maybe it’s really UTF-16, or the unstandard windows extended ascii thing which I think is probably cp-1252. (It’s not common on *nix since we all usually just use utf-8, no need for others. not saying *nix users are much less sloppy)
To figure out which encoding it is, I would make a copy of the file, then delete the bits that are not a problem (leaving Mägenwil as the entire file) and then save, and use the linux command «file» which will tell what the right encoding is (reliable only for small files. it doesn’t read the whole file; maybe notepad++ will do the exact same thing). The reason for deleting the other bits is that it might be a mix of UTF-8 which the editor has used for detection, plus something else.
I would try the iconv command in linux to test. For example:
And any encoding conversion should be possible in C# or any featureful language, as long as you know how it was mutilated so you can reverse it. And if you find that it is part utf-8 and part something else, then remember not to convert the whole file, but only the important parts.
Converting from Windows-style to UNIX-style line endings
The Problem
In a plain text file, to tell the computer that a line of text doesn’t continue forever, the end of each line is marked by a sequence of one or more invisible characters, called control characters. While there are many control characters for different purposes, the relevant ones for line endings are the carriage return (CR) and line feed (LF) characters.
Unfortunately, the programmers of different operating systems have represented line endings using different sequences:
- All versions of Microsoft Windows represent line endings as CR followed by LF.
- UNIX and UNIX-like operating systems (including Mac OS X) represent line endings as LF alone.
Therefore, a text file prepared in a Windows environment will, when copied to a UNIX-like environment such as a NeSI cluster, have an unnecessary carriage return character at the end of each line. To make matters worse, this character will normally be invisible, though in some text editors it will show up as ^M or similar.
Many programs, including the Slurm and LoadLeveler batch queue schedulers, will give errors when given a file containing carriage return characters as input.
Therefore, you will need to convert any such file so it has only UNIX-style line endings before using it on a NeSI cluster.
The Symptoms
In the Slurm job scheduler
If you submit (using sbatch ) a Slurm submission script with Windows-style line endings, you will likely receive the following error:
In other programs
Some UNIX or Linux programs are tolerant to Windows-style line endings, while others give errors. The text of the error is almost infinitely variable, but program behaviours might include the following responses:
- Explicitly stating the problem with line endings
- Complaining more vaguely that the input data is incomplete or corrupt or that there are problems reading it
- Failing in a more serious way such as a segmentation fault
Checking a file’s line ending format
If you have what you think is a text file on the cluster but you don’t know whether its line endings are in the correct format or not, you can run the following command:
Depending on the contents of foo.txt , the output of this command may vary, but if the output has «CR» or «CRLF» in it, you will need to convert foo.txt to UNIX format line endings if you want to use it on the cluster.
How to Convert
Converting using Notepad++
In the Windows text editing program Notepad++ (not to be confused with ordinary Notepad), there is a function to prepare text files with UNIX-style line endings.
To write your file in this way, while you have the file open, go to the Edit menu, select the «EOL Conversion» submenu, and from the options that come up select «UNIX/OSX Format». The next time you save the file, its line endings will, all going well, be saved with UNIX-style line endings.
You can check what format line endings you are currently editing in by looking in the status bar at the bottom of the window. Between the range box (a box containing Ln, Col and Sel entries) and the text encoding box (which will contain UTF-8, ANSI, or some other technical string) will be a box containing the current line ending format.
- In most cases, this box will contain the text «DOS\Windows».
- In a few cases, such as the file having been prepared on a UNIX or Linux machine or a Mac, it will contain the text «UNIX».
- It is possible, though highly unlikely by now, that the file may have old-style (pre-OSX) Mac line endings, in which case the box will contain the text «Macintosh».
Please note that if you change a file’s line ending style, you must save your changes before copying the file anywhere, including to a cluster.
Converting using dos2unix
Suppose, though, that you’ve copied a text file to the cluster already, and you realise you need to convert it to UNIX format. How do you do that?
Simple: Use the program dos2unix .
Just give the name of your file to dos2unix as an argument, and it will convert the file’s line endings to UNIX format:
There are other options in the rare case that you don’t want to just modify your existing file; run man dos2unix for details.
Hackaday
In what is probably this century’s greatest advancement in technology, Windows Notepad now supports Unix line endings. This is it, people. Where were you when Kennedy was assassinated? Where were you when Neil Armstrong set foot on the moon? Where were you when Challenger blew up? Where are you now?
Previously, Windows Notepad only supported Windows End of Line Characters — a Carriage Return (CR) and Line Feed (LF). Unix text documents use LF for line endings, and Macs use CR for line endings. The end result of this toppling of the Tower of Babel for End of Line characters is a horrific mess; Windows users can’t read Unix text files in Notepad, and everything is just terrible. Opening a Unix text file in Windows produces a solid block of text without any whitespace. Opening a Windows text file in anything else puts little rectangles at the end of each line.
Starting with the current Window 10 Insider build, Notepad now supports Unix line endings, Macintosh line endings, and Windows line endings. Rejoice, the greatest problem in technology has now been solved.
141 thoughts on “ Windows Notepad Now Supports Unix Line Endings ”
The future is now.
“We have seen the enemy, and they are us.”
Why do people even mess with this when Notepad++ exists?
Installing Notepad++ has been part of my “make a fresh Windows install usable” ritual for years.
That’s a bit heavy, but there is also notepad2 and notepad2-mod if you prefer, for the lighter use.
My personal favorite. Still a pain in a pinch on a computer that’s not yours though. About damn time
UNIX line endings are not LF. That’s just a default. UNIX line ending is “”nl” or newline (‘\n’ in printf): it’s whatever character was defined as “newline” by the stty command. ASCII contains no ‘newline’ character. LF was just convenient and intuitive for a default.
This dates back to text terminals and special character meanings, all of which can be mapped to other values. Thus demonstrating even more deeply how flawed Windows (but not its predecessor VMS!) have always been. At least VMS knew about special character mapping.
I found it interesting to see an article about windows notepad now supporting *nix line endings and then using the windows terminology [CR] [LF] instead of the *nix terminology \r \n
The association is like this –
[CR] Carriage Return => \r return
[LF] Line feed => \n new line
in web development this was solved long ago, to convert anything to *nix –
str_replace(“\r\n”, “\r”, $string); str_replace(“\r”, “\n”, $string);
which can be simplified to –
str_replace(“\r”, “\n”, str_replace(“\r\n”, “\r”, $string));
from there you can convert to the other standards easily if needed and without any double line spacing.
I use metapad instead of notepad. Another favorite is Crimson Editor (has a new name now) because you often need to first find the text to be edited and Crimson Editor has search within text, search within file and search withing directory.
A line feed is impossible to add to the ASCII standard, Yet hundreds of emojji’s are added every year. totally makes sense!
Line feed was in the ASCII standard from the beginning. “New line” could not be added because ASCII was a 7 bit code, and all 128 codes were assigned by the time the concept of newline was brought up. Unicode gets crap added to it all the time because that’s a 32 bit code, so there are over two billion “code points” available. Going fast, I’m guessing.
“Rejoice, the greatest problem in technology has now been solved.”
Two spaces after period is better…maybe.
You are wrong about spaces: https://xkcd.com/1989/
Holy crap that sounds like pseudoscience! I can’t believe people have studied this.
Annals of Improbable Research
In what way is that pseduoscience? And why shouldn’t such things be studied given that they may improve the lives of people?
It may be a flawed experiment but that doesn’t make it fake science – just (potentially) incomplete.
Is this a wonderful time to be alive or what.
First, they came for my backslashes, and I did not complain….
“The biggest problem in technology has been solved!”
For a second, I thought that meant that we all agree on how many spaces a tab character represents.
Disagree. See, not solved.
But 42 is the ultimate answer to life, the universe, and everything; therefore, it must in someway be correct……..from a certain perspective at least.
ASCII character 42 is *. So 42 really does mean everything.
That’s gold Conrad
Shouldn’t matter because they should be converted to spaces 🙂
Do that in a Makefile and see how it works for you.
Exactly! Tabs should represent levels of indentation. 1-tab per level of indent. That’s perfect! How much whitespace I want to see per level of indentation is my business and how much you want to see is yours. We can configure that in our editors’ preferences and each see the same source code in the way we want to see it.
I grew up when monitors and resolution were much smaller. Huge indents meant line wrapping and nobody wanted that. We didn’t have wide screens for coding, those were for watching movies! In college my Computer Science professors varied, some wanted 2 spaces per indent, some wanted 3. That’s it, no more than 3! I preferred 3 because that is also how far I was taught to indent the beginning of a new paragraph in English class. I like consistency.
Kids these days like huge indents with large numbers of spaces. That’s fine for you if you like it. I find that incredibly distracting when I try to read the code. All that left/right movement makes me tend to lose my place. I know some people have told me that they find my narrower indents harder to see. Well.. different people see things differently! That’s why we need to use tabs!
But… all the style guides these days push spaces instead of tabs for indents. WTF. With tabs we can ALL have our way! Just configure your editor to display tabs the way YOU like them and leave ME alone about it. That is a way better solution than trying to figuratively club everyone over the head until they agree to use YOUR favorite number of space characters.
Come on, quite being A@@H0l3s!
Ok… Sorry… End Rant
I used to think as you do, and for the same reasons. I’ve recently decided to throw in the towel and use spaces for indentation. If your editor can display tabs as your preferred amount of horizontal space, it should be able to scan the code, figure out the number of spaces used per level of indentation, and display it the same as if tabs had been used (as $DEITY intended). I suppose some unholy mixed-tabs-and-spaces files will defeat the indentation heuristic.
I’ve resolved to write plugins or modify editor code to accomplish this, if necessary, but so far I am able to just deal with it.
Don’t even get me started on where some of these heathens put their braces.
That might be a solution but it’s a difficult and brittle solution to a problem that was already solved by all our keyboards having a TAB key.
It requires you or someone to write plugins.
It only works for text editors for which such plugins have been written.
It will fail if people type the wrong number of spaces.
This solution describes displaying code. How does one write it? Do you still press the space bar x number of times at the beginning of each line? What does the plugin do while the user is entering those spaces?
How does it react to space characters elsewhere in the code. Maybe there are space characters inside string literals for example. How do those display? What is it like to edit them?
You obviously don’t get the problem. What if spaces and tabs are mixed? You may not do that but in a multi-user environment that is a real problem.
That is a really common and yet stupid excuse to use spaces instead of tabs.
Obviously you do the same thing that you do with code that has the incorrect number of spaces. Honestly it’s just another variation of the same problem!
What do you do when your standard is to use 25 spaces per indent (cause that seems to be the direction things are headed) and some dummy submits a million lines of code with only 15? If it’s a workplace the manager reminds them of the company policy and they have to fix it. If it’s an open source project then it depends on how desperate the maintainer is to attract coders. They either painstakingly fix it all or they deny the pull request with a message to go read the f’n style guide.
Switching away from using tabs does nothing to fix the problem of people not doing what they are supposed to.
Maybe I am being a little unfair. If your tab character width matches that of the dummy that mixed in spaces then the problem might go unnoticed. Well.. if someone else has a different tab width it can be dealt with once they notice it. If not… well.. do you enjoy worrying about problems when nobody even notices them? If a tree falls in the woods… or if a problem goes unnoticed is it really even a problem?
How about a text editor plugin that sightly shades tabs? Or.. better yet, maybe it brightly flags space characters that occur somewhere within a region of whitespace that is on the begining edge of a line? Then you know when somebody is a jerk right away and can deal with it.
Or… if you discover that mixed indentation has been piling up for the last 10 years unnoticed because everyone used the same tab width just run it through a code beautifier.
Or… can a script be added to git that either fixes, flags or bounces crappy pull requests that include space indented lines?
If only editors had a search and replace eh.
Ramen. And yeah, if only the placement of those curly braces had some consistency as well. I prefer them in the older style of putting the brace on the next line. I don’t really understand the argument of having it on the same line as the class/method/branch definition. Particularly because there is no consistency on where to break that line. Is everyone else still stuck on 80 columns?
I’ve always considered this a mute dispute. What improves readability for some people can have the opposite effect for others.
I’m dyslexic so I use “whitesmiths” format so that I can quickly scroll through indentation levels to find the return from one level.
https://en.wikipedia.org/wiki/Indentation_style
If someone doesn’t like it then I just say – well can you code, if you can then just change it to what you like, it’s just a string of text
So did windows 10 decide to go with emacs or vim?