Double characters in windows

Double-byte Character Sets

A double-byte character set (DBCS), also known as an «expanded 8-bit character set», is an extended single-byte character set (SBCS), implemented as a code page. DBCSs were originally developed to extend the SBCS design to handle languages such as Japanese and Chinese. Some characters in a DBCS, including the digits and letters used for writing English, have single-byte code values. Other characters, such as Chinese ideographs or Japanese kanji, have double-byte code values. A DBCS can correspond either to a Windows code page or an OEM code page. A DBCS code page can also include a non-native code page, for example, an EBCDIC code page. For definitions of these code pages, see Code Pages.

New Windows applications should use Unicode to avoid the inconsistencies of varied code pages and for ease of localization. However, some legacy protocols might require the use of DBCS code pages. Each DBCS code page supports different characters, but no page supports the full breadth of characters provided by Unicode. Each DBCS code page supports a different subset, differently encoded. Data converted from one DBCS code page to another is subject to corruption because the same data value on different code pages can encode a different character. Data converted from Unicode to DBCS is subject to data loss, because a given code page might not be able to represent every character used in that particular Unicode data.

To interpret a DBCS string, an application must start at the beginning of the string and scan forward. It keeps track when it encounters a lead byte in the string, and treats the next byte as the trailing part of the same character. If the application simply scans the string one byte at a time and encounters a byte that appears to be the code value representing a backslash («\»), that byte might simply be the trail byte of a two-byte character. The application cannot just back up one byte to see if the preceding byte is a lead byte, as that byte value might be eligible to be used as both a lead byte and a trail byte. Thus the application has essentially the same problem with it as with the possible backslash. In other words, substring searches are much more complicated with a DBCS than with either SBCSs or Unicode. Accordingly, applications that support a DBCS must use special functions, such as _mbsstr, instead of the StrStr function.

Your applications use DBCS Windows code pages with the «A» versions of Windows functions. See Conventions for Function Prototypes and Code Pages. To help identify a DBCS code page, an application can use the GetCPInfo or GetCPInfoEx function. An application can use the IsDBCSLeadByte function to determine if a given value can be used as the lead byte of a 2-byte character. In addition, an application can use the MultiByteToWideChar and WideCharToMultiByte functions to map between Unicode and DBCS strings.

How-to: Escape Characters, Delimiters and Quotes at the Windows command line.

Using «Double Quotes»

If a single parameter contains spaces, you can still pass it as one item by surrounding in «quotes» — this works well for long filenames.

If a parameter is used to supply a filename like this:

This parameters will be:

To launch a batch script with spaces in the Program Path requiring «quotes»

In the FIND comand, the » quote can be escaped by doubling it to «»

Removing Quotes

Several methods for removing quotes are listed on the dequote page.

Working without Quotes

Without surrounding quotes, a long filename will be passed as %1 %2 %3.

To refer to the pathname above use %* rather than %1 %2 %3 — the %* will cover all parameters — even if there are more than %9

Читайте также:  Без шпионских модулей windows 10 64 bit

You can apply Extended Filename syntax to %* with the following workaround:

Delimiters

Delimiters separate one parameter from the next — they split the command line up into words.

Parameters are most often separated by spaces, but any of the following are also valid delimiters:

Comma (,)
Semicolon (;)
Equals (=)
Space ( )
Tab ( )

If you are passing a parameter to a batch file that contains any of these delimiter characters, it will split the parameter into two parameters unless you surround the whole thing with double quotes: «this is;one=param,»

Notice that although / and — are commonly used to separate command options, they are absent from the list above. This is because batch file parameters are passed to CMD.exe which can accept it’s own parameters (which are invoked using / and — )

One exception to this standard list of delimiters is the FOR command where the default is just [space] and [tab] and you can use the delims= option to specify something different.

When using the TAB character as a delimiter be aware that many text editors will insert a TAB as a series of SPACEs.

When you use %* to refer to all parameters, the value returned will include the delimiters, under NT 4.0 this will include the leading space, under Windows 2000 and above it won’t.

Escape Character

Adding the escape character before a command symbol allows it to be treated as ordinary text.
When piping or redirecting any of these characters you should prefix with the escape character: & \ ^ |

Escaping CR/LF line endings.

The ^ escape character can be used to make long commands more readable by splitting them into multiple lines and escaping the Carriage Return + Line Feed (CR/LF) at the end of a line:

Mark Yocom [MSFT] has more on this technique here.

A couple of things to be aware of:

  • A stray space at the end of a line (after the ^) will break the command, this can be hard to spot unless you have a text editor that displays spaces and tab characters.
  • If you want comment something out with REM, then EVERY line needs to be prefixed with REM.
    Alternatively if you use a double colon :: as a REM comment, that will still parse the caret at the end of a line, so in the example above changing the first line to :: ROBOCOPY… will comment out the whole multi-line command.

Some commands (e.g. REG and FINDSTR) use the standard escape character of \ (as used by C, Python, SQL, bash and many other languages.)
The \ escape can cause problems with quoted directory paths that contain a trailing backslash because the closing quote » at the end of the line will be escaped \» .

To save a directory path with a trailing backslash ( \ ) requires adding a second backslash to ‘escape the escape’
so for example instead of «C:\My Docs\» use «C:\My Docs\\»

To be sure that a path includes a trailing backslash, you can test for it:

Set _prog=C:\Program Files\SS64 App
IF %_prog:

Escaping the pipeline

When a pipe is used, the expressions are parsed twice. First when the expression before the pipe is executed and a second time when the expression after the pipe is executed. So to escape any characters in the second expression double escaping is needed:

The line below will echo a single `&` character:

Escaping Percents

The % character has a special meaning for command line parameters and FOR parameters.
To treat a percent as a regular character, double it:

Many characters such as \ = ( ) do not need to be escaped when they are used within a «quoted string» typically these are characters you might find in a filename/path. The percent character is one exception to this rule, even though under NTFS % is a valid filename character.

Читайте также:  Фоновая передача данных windows 10 как отключить

Escaping Exclamation marks

When the shell is running in EnableDelayedExpansion mode the ! character is used to denote a variable and so must be escaped (twice) if you wish to treat it as a regular character:

Escape the Escape character

The escape character can be used to escape itself ^^ (meaning don’t treat the first ^ as an escape character), so you are escaping the escape character:

Special Cases

A small number of commands follow slightly different rules, FINDSTR, REG and RUNAS all use \ as an escape character instead of ^

“All the best stories in the world are but one story in reality — the story of escape. It is the only thing which interests us all and at all times, how to escape”

Escape double quotes in parameter

In Unix I could run myscript ‘»test»‘ and I would get «test» .

In Windows cmd I get ‘test’ .

How can I pass double-quotes as a parameter? I would like to know how to do this manually from a cmd window so I don’t have to write a program to test my program.

6 Answers 6

I cannot quickly reproduce the symptoms: if I try myscript ‘»test»‘ with a batch file myscript.bat containing just @echo.%1 or even @echo.%

1 , I get all quotes: ‘»test»‘

Perhaps you can try the escape character ^ like this: myscript ‘^»test^»‘ ?

Another way to escape quotes (though probably not preferable), which I’ve found used in certain places is to use multiple double-quotes. For the purpose of making other people’s code legible, I’ll explain.

Here’s a set of basic rules:

  1. When not wrapped in double-quoted groups, spaces separate parameters:
    program param1 param2 param 3 will pass four parameters to program.exe :
    param1 , param2 , param , and 3 .
  2. A double-quoted group ignores spaces as value separators when passing parameters to programs:
    program one two «three and more» will pass three parameters to program.exe :
    one , two , and three and more . Now to explain some of the confusion:
  3. Double-quoted groups that appear directly adjacent to text not wrapped with double-quotes join into one parameter:
    hello»to the entire»world acts as one parameter: helloto the entireworld . Note: The previous rule does NOT imply that two double-quoted groups can appear directly adjacent to one another.
  4. Any double-quote directly following a closing quote is treated as (or as part of) plain unwrapped text that is adjacent to the double-quoted group, but only one double-quote:
    «Tim says, «»Hi!»»» will act as one parameter: Tim says, «Hi!»

Thus there are three different types of double-quotes: quotes that open, quotes that close, and quotes that act as plain-text.
Here’s the breakdown of that last confusing line:

Thus, the text effectively joins four groups of characters (one with nothing, however):
Tim says, is the first, wrapped to escape the spaces
«Hi! is the second, not wrapped (there are no spaces)
is the third, a double-quote group wrapping nothing
» is the fourth, the unwrapped close quote.

As you can see, the double-quote group wrapping nothing is still necessary since, without it, the following double-quote would open up a double-quoted group instead of acting as plain-text.

From this, it should be recognizable that therefore, inside and outside quotes, three double-quotes act as a plain-text unescaped double-quote:

will print Tim said to him, «What’s been happening lately?» as expected. Therefore, three quotes can always be reliably used as an escape.
However, in understanding it, you may note that the four quotes at the end can be reduced to a mere two since it technically is adding another unnecessary empty double-quoted group.

Here are a few examples to close it off:

Читайте также:  Realtek ethernet drivers для windows 10

Final note: I did not read any of this from any tutorial — I came up with all of it by experimenting. Therefore, my explanation may not be true internally. Nonetheless all the examples above evaluate as given, thus validating (but not proving) my theory.

I tested this on Windows 7, 64bit using only *.exe calls with parameter passing (not *.bat, but I would suppose it works the same).

«» escape to a single » in the parameter.

The 2nd document quoted by Peter Mortensen in his comment on the answer of Codesmith made things much clearer for me. That document was written by windowsinspired.com. The link repeated: A Better Way To Understand Quoting and Escaping of Windows Command Line Arguments.

Some further trial and error leads to the following guideline:

Escape every double quote » with a caret ^ . If you want other characters with special meaning to the Windows command shell (e.g., , > , | , & ) to be interpreted as regular characters instead, then escape them with a caret, too.

If you want your program foo to receive the command line text «a\»b c» > d and redirect its output to file out.txt, then start your program as follows from the Windows command shell:

If foo interprets \» as a literal double quote and expects unescaped double quotes to delimit arguments that include whitespace, then foo interprets the command as specifying one argument a»b c , one argument > , and one argument d .

If instead foo interprets a doubled double quote «» as a literal double quote, then start your program as

The key insight from the quoted document is that, to the Windows command shell, an unescaped double quote triggers switching between two possible states.

Some further trial and error implies that in the initial state, redirection (to a file or pipe) is recognized and a caret ^ escapes a double quote and the caret is removed from the input. In the other state, redirection is not recognized and a caret does not escape a double quote and isn’t removed. Let’s refer to these states as ‘outside’ and ‘inside’, respectively.

If you want to redirect the output of your command, then the command shell must be in the outside state when it reaches the redirection, so there must be an even number of unescaped (by caret) double quotes preceding the redirection. foo «a\»b » > out.txt won’t work — the command shell passes the entire «a\»b » > out.txt to foo as its combined command line arguments, instead of passing only «a\»b » and redirecting the output to out.txt.

foo «a\^»b » > out.txt won’t work, either, because the caret ^ is encountered in the inside state where it is an ordinary character and not an escape character, so «a\^»b » > out.txt gets passed to foo.

The only way that (hopefully) always works is to keep the command shell always in the outside state, because then redirection works.

If you don’t need redirection (or other characters with special meaning to the command shell), then you can do without the carets. If foo interprets \» as a literal double quote, then you can call it as

Then foo receives «a\»b c» as its combined arguments text and can interpret it as a single argument equal to a»b c .

Now — finally — to the original question. myscript ‘»test»‘ called from the Windows command shell passes ‘»test»‘ to myscript. Apparently myscript interprets the single and double quotes as argument delimiters and removes them. You need to figure out what myscript accepts as a literal double quote and then specify that in your command, using ^ to escape any characters that have special meaning to the Windows command shell. Given that myscript is also available on Unix, perhaps \» does the trick. Try

Оцените статью