Содержание

Handling files with carriage return in filename on Windows
Does Windows carriage return \r\n consist of two characters or one character?
6 Answers 6
What are carriage return, linefeed, and form feed?
12 Answers 12
Carriage return required when printing to the console in Windows?
5 Answers 5
Why does a carriage return creep in when this program runs on Windows?
1 Answer 1

Handling files with carriage return in filename on Windows

I have an external USB, NTFS-formatted hard drive which contains many files which I need to eventually copy to a drive on a Windows Server 2008 R2 machine.

The files on the drive were placed there by scripts run with the drive mounted on Solaris. The user who did this copy was careless and edited their copy script on a Windows machine, resulting in shell script lines such as:

and as such, the files on the external drive have a trailing carriage return in their filenames. Standard Windows copy utilities (copy, xcopy, robocopy) fail to copy these files with error 0x7B / 123 : «The filename, directory name, or volume label syntax is incorrect.»

I have tested, and am fairly sure that if I had the drive mounted again on a Linux box, I should be able to repair the files with commands such as:

However, I do not have immediate access to a Linux machine.

What I have tried so far to repair/move these files:

«Application» solutions on Windows Server 2008 R2:

Renaming files in Windows Explorer — would be unfeasible solution due to sheer volume of files, but it doesn’t work anyways.
Wildcard pattern matching the filenames from cmd prompt, e.g. copy E:\externalDrivePath\targetFileName* anotherPath . Fails with 0x7B error.
Copying files from cmd prompt using 8.3 (short) filenames. Files in question do not have short names, per output of dir /x

«Programming» solutions on Windows Server 2008 R2:

Copying/Renaming files using Python/Java: any attempt to open/copy the carriage-return file throws exception tracing back to the same 0x7B Windows error.
Copying files using Windows C ‘CopyFile’ API: fails with 0x7B error. Here I found the files using FindNextFile API, and passed that source path into CopyFile, but the OS still fails to copy the file.
Writing my own file copy function in C using fopen, ofstream, etc. The fopen call again fails with 0x7B.
Copying files using C++ boost::filesystem APIs: fails with 0x7B error. Again, found the files using a boost::filesystem::directory_iterator and passed the found file’s path to boost::filesystem::copy_file()
Providing file path to Win32 APIs CopyFile / MoveFile as «\?\E:\externalDrivePath\targetFileName\r». Calls fail again with 0x7B error.

I also dabbled with mounting this drive on an OS X machine to run the copy, expecting it would provide support for the NTFS drive more like Solaris did. However, it fails to copy with similar error messages to Windows — I guess OS X’s NTFS implementation is more «Windows-like»?

If this is solvable on Windows, I feel like it’s going to either require a very low-level C function that manipulates the FILE itself, without ‘opening’ it based on its string filename. Not sure how to go about that. That, or some file repair utility that I’m unaware of which incorporates this functionality already.

Any alternative approaches or suggestions how to implement what I’m describing would be most appreciated.

Does Windows carriage return \r\n consist of two characters or one character?

Windows carriage return is \r\n while it is \n in Unix, is \r\n treated as two characters?

6 Answers 6

These are two characters:

\r is carriage return;
\n is line feed.

Two characters combined represent a new line on Windows. Whereas on Linux, \n represents new line. It moves cursor to the start of new line on Linux. On Windows, the cursor will stay at the same column in the console but on the next line.

\r on Linux has the same effect as on Windows: moves cursor to the start of the line. It is possible to print different information on the same line where \r is used instead of \n .

Actually \r is 0x0D (^M) and \n is 0x0A (^J) , but. on windows:

Depends on the setting. \r\n is two bytes wide (in ASCII, UTF-8, etc.), but I/O libraries such C’s stdio library and others, when operating in text mode, may translate between \n and \r\n quasi-transparently.

I.e., on a Windows platform, a C program reading a text-mode stream txt_in with

will not report the ASCII code for \r . Conversely, putc(‘\n’, txt_out) will actually write \r\n to the text-mode stream txt_out .

Windows doesn’t distinguish between \r\n and any other two characters. However, there is one situation where it is treated as one character: if you use the C runtime and open a file as text, \r\n in the file will be read as \n , and \n will be written to the file as \r\n .

Yes, it’s two characters: carriage return ‘\r’ followed by linefeed ‘\n’.

I conducted some tests (nothing fancy) with Notepad++ and Sublime Text 2 on Windows, and it turns out that \r and \n are indeed 2 distinct characters, but.

Different text editors might insert or they might not insert \r when Return key is pressed.

Try the following in your text editor:

Then press Ctrl+F, enable regular expressions and search for \r . Depending on your text editor and its settings, you might ‘hit’ or ‘not hit’ a character at the end of each line. If the above text is copy-pasted from another editor, behavior might also differ.

Some text editors allow for customizations in their settings, where you can specify if you prefer \r\n or unix-style \n to be inserted when you press Return. On top of that, they might allow you to choose to enforce a consistent style by stripping or inserting the \r character before saving a file.

What are carriage return, linefeed, and form feed?

What is the meaning of the following control characters:

12 Answers 12

Carriage return means to return to the beginning of the current line without advancing downward. The name comes from a printer’s carriage, as monitors were rare when the name was coined. This is commonly escaped as \r , abbreviated CR, and has ASCII value 13 or 0x0D .

Linefeed means to advance downward to the next line; however, it has been repurposed and renamed. Used as «newline», it terminates lines (commonly confused with separating lines). This is commonly escaped as \n , abbreviated LF or NL, and has ASCII value 10 or 0x0A . CRLF (but not CRNL) is used for the pair \r\n .

Form feed means advance downward to the next «page». It was commonly used as page separators, but now is also used as section separators. (It’s uncommonly used in source code to divide logically independent functions or groups of functions.) Text editors can use this character when you «insert a page break». This is commonly escaped as \f , abbreviated FF, and has ASCII value 12 or 0x0C .

As control characters, they may be interpreted in various ways.

The most common difference (and probably the only one worth worrying about) is lines end with CRLF on Windows, NL on Unix-likes, and CR on older Macs (the situation has changed with OS X to be like Unix). Note the shift in meaning from LF to NL, for the exact same character, gives the differences between Windows and Unix. (Windows is, of course, newer than Unix, so it didn’t adopt this semantic shift. I don’t know the history of Macs using CR.) Many text editors can read files in any of these three formats and convert between them, but not all utilities can.

Carriage return required when printing to the console in Windows?

It seems like just putting a linefeed is good enough, but I know it is supposed to be carriage return + line feed. Does anything horrible happen if you don’t put the carriage return and only use line feeds?

This is in ANSI C and not going to be redirected to a file or anything else. Just a normal console app.

5 Answers 5

The Windows console follows the same line ending convention that is assumed for files, or for that matter for actual, physical terminals. It needs to see both CR and LF to properly move to the next line.

That said, there is a lot of software infrastructure between an ANSI C program and that console. In particular, any standard C library I/O function is going to try to do the right thing, assuming you’ve allowed it the chance. This is why fopen() ‘s t and b modifiers for the mode parameter were defined.

With t (the default for most streams, and in particular for stdin and stdout ) then any \n printed is converted to a CRLF sequence, and the reverse happens for reads. To turn off that behavior, use the b modifier.

Incidentally, the terminals traditionally hooked to *nix boxes including the DEC VT100 emulated by XTerm also needs both CR and LF. However, in the *nix world, the conversion from a newline character to a CRLF sequence is handled in the tty device driver so most programs don’t need to know about it, and the t and b modifiers are both ignored. On those platforms, if you need to send and receive characters on a tty without that modification, you need to look up stty(1) or the system calls it depends on.

If your otherwise ANSI C program is avoiding C library I/O to the console (perhaps because you need access to the console’s character color and other attributes) then whether you need to send CR or not will depend on which Win32 API calls you are using to send the characters.

Why does a carriage return creep in when this program runs on Windows?

I wrote the following program to translate a hexstring to their corresponding binary data.

Here’s a test file:

Here’s a run on a UNIX system (output perfectly as expected):

Here’s a run on Windows system (a carriage return creeps in after byte 7b):

The right sequence should be [. ] 7b 0a [. ] but it comes out as [. ] 7b 0d 0a [. ]. What’s happening here?

1 Answer 1

Windows text files use the byte sequence 0D 0A to mark the end of a line (Unix only uses a single byte, 0A). The C standard library translates between this external encoding and the internal «virtual newline» character ( ‘\n’ ) that C uses.

That is, when a C program running on Windows writes ‘\n’ to a text stream, it gets translated to 0D 0A. The inverse operation happens on input. Because ‘\n’ is a real char value (typically 10 ), other bytes can be misinterpreted as ‘\n’ .

If you don’t want this behavior (e.g. because you’re writing or reading binary data, not text), you need to use a binary stream, not a text stream.

For normal files this is easy: Just add «b» to the open mode when calling fopen . For the predefined streams ( stdin / stdout / stderr ) there is no portable solution as far as I’m aware, but Windows has an extra function to put an existing stream into binary mode; see e.g. this answer.

It shows what amounts to the following code (also seen in the official Microsoft documentation):

There are a few bugs in your code:

The two if conditions are broken because bf[0] is a char . A char is not big enough to store EOF , which is a special non-character value returned by getchar to signal an error or end-of-file. In general, getchar will return a non-negative value for successful input and a negative value ( EOF , typically -1 ) on error. By assigning this value to a char , you’re truncating EOF and mapping it to some real character value.

The behavior of the bf[0] == EOF check depends on whether char is a signed type on your platform (it probably is). If so, it will confuse some other character (normally 255, which corresponds to ÿ in ISO-8859-1) for end-of-file. If char is unsigned, this condition is never true, so you’ll get an infinite loop.

Similarly, isspace(bf[0]) is broken if char is a signed type because all the is. functions have undefined behavior if their argument does not fit inside an unsigned char (with one special exception: EOF is allowed).

The fix is to store the result of getchar in an int first:

What is carriage return in windows