- Strings v2.53
- Introduction
- Using Strings
- Working with Strings
- Unicode and ANSI Functions
- TCHARs
- Strings in Windows API
- ANSI C string functions
- Security enhanced CRT functions
- Windows API kernel and user string functions
- The string length
- Concatenating strings
- Converting characters
- Comparing strings
- Filling a buffer
- Character types
- Windows API Shell Lightweight Utility functions
- Trimming a string
- Converting strings to integers
- Searching strings
- Windows API StrSafe functions
- The string length
- Reading standard input
- Copying strings
- Concatenating strings
- Formatting strings
Strings v2.53
By Mark Russinovich
Published: July 4, 2016
Download StringsВ (506 KB)
Introduction
Working on NT and Win2K means that executables and object files will many times have embedded UNICODE strings that you cannot easily see with a standard ASCII strings or grep programs. So we decided to roll our own. Strings just scans the file you pass it for UNICODE (or ASCII) strings of a default length of 3 or more UNICODE (or ASCII) characters. Note that it works under Windows 95 as well.
Using Strings
usage:
Strings takes wild-card expressions for file names, and additional command line parameters are defined as follows:
Parameter | Description |
---|---|
-a | Ascii-only search (Unicode and Ascii is default) |
-b | Bytes of file to scan |
-f | File offset at which to start scanning. |
-o | Print offset in file string was located |
-n | Minimum string length (default is 3) |
-s | Recurse subdirectories |
-u | Unicode-only search (Unicode and Ascii is default) |
-nobanner | Do not display the startup banner and copyright message. |
To search one or more files for the presence of a particular string using strings use a command like this:
Download StringsВ (506 KB)
Runs on:
- Client: Windows Vista and higher
- Server: Windows Server 2008 and higher
- Nano Server: 2016 and higher
—>
Working with Strings
Windows natively supports Unicode strings for UI elements, file names, and so forth. Unicode is the preferred character encoding, because it supports all character sets and languages. Windows represents Unicode characters using UTF-16 encoding, in which each character is encoded as a 16-bit value. UTF-16 characters are called wide characters, to distinguish them from 8-bit ANSI characters. The Visual C++ compiler supports the built-in data type wchar_t for wide characters. The header file WinNT.h also defines the following typedef.
You will see both versions in MSDN example code. To declare a wide-character literal or a wide-character string literal, put L before the literal.
Here are some other string-related typedefs that you will see:
Typedef | Definition |
---|---|
CHAR | char |
PSTR or LPSTR | char* |
PCSTR or LPCSTR | const char* |
PWSTR or LPWSTR | wchar_t* |
PCWSTR or LPCWSTR | const wchar_t* |
Unicode and ANSI Functions
When Microsoft introduced Unicode support to Windows, it eased the transition by providing two parallel sets of APIs, one for ANSI strings and the other for Unicode strings. For example, there are two functions to set the text of a window’s title bar:
- SetWindowTextA takes an ANSI string.
- SetWindowTextW takes a Unicode string.
Internally, the ANSI version translates the string to Unicode. The Windows headers also define a macro that resolves to the Unicode version when the preprocessor symbol UNICODE is defined or the ANSI version otherwise.
In MSDN, the function is documented under the name SetWindowText, even though that is really the macro name, not the actual function name.
New applications should always call the Unicode versions. Many world languages require Unicode. If you use ANSI strings, it will be impossible to localize your application. The ANSI versions are also less efficient, because the operating system must convert the ANSI strings to Unicode at run time. Depending on your preference, you can call the Unicode functions explicitly, such as SetWindowTextW, or use the macros. The example code on MSDN typically calls the macros, but the two forms are exactly equivalent. Most newer APIs in Windows have just a Unicode version, with no corresponding ANSI version.
TCHARs
Back when applications needed to support both Windows NT as well as Windows 95, Windows 98, and Windows Me, it was useful to compile the same code for either ANSI or Unicode strings, depending on the target platform. To this end, the Windows SDK provides macros that map strings to Unicode or ANSI, depending on the platform.
Macro | Unicode | ANSI |
---|---|---|
TCHAR | wchar_t | char |
TEXT(«x») | L»x» | «x» |
For example, the following code:
resolves to one of the following:
The TEXT and TCHAR macros are less useful today, because all applications should use Unicode. However, you might see them in older code and in some of the MSDN code examples.
The headers for the Microsoft C run-time libraries define a similar set of macros. For example, _tcslen resolves to strlen if _UNICODE is undefined; otherwise it resolves to wcslen, which is the wide-character version of strlen.
Be careful: Some headers use the preprocessor symbol UNICODE , others use _UNICODE with an underscore prefix. Always define both symbols. Visual C++ sets them both by default when you create a new project.
Strings in Windows API
last modified July 16, 2020
In C language there is no string data type. A string literal in a program is an array of characters. Whenever we say string we mean an array of characters.
We have five sets of functions for working with strings; both in C runtime library (CRT) and in Windows API:
- ANSI C standard functions
- Security enhanced CRT functions
- Windows API kernel and user functions
- Windows API Shell Lightweight Utility functions
- Windows API StrSafe functions
It is recommended to prefer either security enhanced standard functions or Windows API safe functions.
ANSI C string functions
The C Run-Time (CRT) library functions have some small overhead since they call Windows API functions underneath. These functions provide portability but have some limitations. When not used properly, they can cause security risks.
These functions do not return an error value when they fail.
In the example we present a few string functions from the CRT library.
The wcslen() returns the number of wide-characters in the string.
The wcscpy() copies a string to a string buffer.
The wcscat() function appends a string to a string buffer.
The wcscmp() compares two string.
This is the output of the example.
Security enhanced CRT functions
Security CRT functions add additional security to the CRT functions. (They are not standard functions but an MS extension.) These functions validate parameters, take size buffers, check that strings are NULL terminated, and provide enhanced error reporting.
Security CRT functions have an _s suffix.
In the example, we present four functions: wcsnlen_s() , wcscpy_s() , wcscat_s() , and wprintf_s() .
The wcsnlen_s() computes the lenght of a wide string. The function only checks the first MAX_CHARS characters.
With the wcscpy_s() function, we copy the L»Wuthering» string into the buffer. The function takes the maximum number of characters in the buffer and it returns an error code if it fails. The function returns 0 on success.
The wcscat_s() is a secure extension of the wcscat() function.
There is even a security enhanced wprintf() function; it has some runtime constraints.
This is the output of the SecurityEnhanced.exe example.
Windows API kernel and user string functions
These functions are specific to Windows OS; they are available in User32.lib and Kernel32.lib . They are lighter than their CRT counterparts.
Kernel string functions have their roots in the development of the Windows kernel. They are prefixed with the l letter.
The string length
One of the most common requirements is to figure out the length of the string. The lstrlen() function returns the length of the specified string in characters. It does not count the terminating null character.
The ANSI and the UNICODE functions take the string as a parameter and return the number of characters in the string.
We compute the length of two strings. The lstrlen() function is in fact a macro to either lstrlenA() or lstrlenW() . The first is used for ANSI strings, the second for wide strings.
We print the length of the L»Bratislava» string using the lstrlenW() function.
This is the output of the WinapiStringLength.exe program.
Concatenating strings
The lstrcatW() function appends one string to another string.
The first parameter is the buffer which should contain both strings. It must be large enough to contain both of them, including the NULL terminating character. The return value is a pointer to the buffer.
In the example, we concatenate four strings.
These are the strings that we are going to concatenate.
We compute the length of the four strings using the lstrlenW() function.
We create a buffer to hold the final string. Note that we add 1 to include the NULL character.
We copy the first string to the buffer using the lstrcpyW() function.
We append the remaining strings with the lstrcatW() function.
This is the output of the WinapiStringConcat.exe program.
Converting characters
We have two methods for converting characters to uppercase or to lowercase. The CharLowerW() function converts a character string or a single character to lowercase. The CharUpperW() function converts a character string or a single character to uppercase. If the operand is a character string, the function converts the characters in place. In other words, they are modified.
The functions modify the strings in place and return a pointer to the modified string.
We have one string which we convert to lowercase and uppercase.
We convert the str string to lowercase with the CharLowerW() method. The string is modified in place.
This is the output of the UpperLower.exe program.
Comparing strings
The lstrcmpW() function compares two strings. It returns 0 if the strings are equal. The comparison is case sensitive. This means that «Cup» and «cup» are two different strings. The lstrcmpiW() yields case insensitive string comparison. For this function, «Cup» and «cup» are equal.
The functions take two strings as parameters. The return value indicates the equality of the strings. 0 value is returned for equal strings.
We have two strings. We compare them using both case sensitive and case insensitive string comparison.
If the lstrcmpW() function returns STR_EQUAL , which is defined to 0, then we print to the console that the two strings are equal. Otherwise we print that they are not equal.
The WinapiStringCompare.exe program gives the above output.
Filling a buffer
Filling a buffer with formatted data is essential in C programming. The wsprintfW() function writes formatted data to the specified buffer.
The function’s first parameter is the buffer that is to receive the formatted output. The second is a string containing format-control specifications. Then we have one or more optional arguments which correspond to format-control specifications.
We build a string which is filled with the current date.
In this particular case we can safely assume that the string will not exceed 128 characters.
The GetLocalTime() function retrieves the current local date and time.
The wsprintfW() fills the buffer with a wide string. Arguments are copied to the string according to the format specifier.
The content of the buffer is printed to the console.
This is the output of the WinapiStringFillBuffer.exe example.
Character types
Characters have various types. They can be digits, spaces, letters, punctuation, or control characters.
The GetStringTypeW() function retrieves character type information for the characters in the specified Unicode string. The first parameter is a flag specifying the info types.
Flag | Meaning |
---|---|
CT_CTYPE1 | Retrieve character type information. |
CT_CTYPE2 | Retrieve bidirectional layout information. |
CT_CTYPE3 | Retrieve text processing information. |
The second parameter is the Unicode string for which to retrieve the character types.
The third parameter is the size of the string. The final parameter is a pointer to an array of 16-bit values. The length of this array must be large enough to receive one 16-bit value for each character in the source string. The array will contain one word corresponding to each character in the source string.
The GetStringTypeW() function returns a value which is a combination of types. We can query a specific type with the & operator.
Value | Meaning |
---|---|
C1_DIGIT | Decimal digits |
C1_SPACE | Space characters |
C1_PUNCT | Punctuation |
C1_CNTRL | Control characters |
C1_ALPHA | Any linguistic character |
The function returns 0 on failure.
We have a short sentence. The GetStringTypeW() function is used to determine the character types of the string.
This is a short sentence consisting of various wide characters.
These variables will be used to count letters, digits, spaces, punctuation, and control characters.
We get the size of the string and create and array of values. The size does not include the NULL terminating character. We can add 1 to include it. It will be counted as a control character.
We get the character types of the sentence. The types array is filled with character type values.
If the value contains the C1_DIGIT flag, we increase the digits counter.
This is the output of the WinapiStringTypes.exe example.
Windows API Shell Lightweight Utility functions
These functions are specific to Windows OS; they are available in the Shlwapi.lib .
Trimming a string
The StrTrimW() function removes specified leading and trailing characters from a string. It returns true if any characters were removed; otherwise, false.
The first parameter is a pointer to the string to be trimmed. When this function returns successfully, psz receives the trimmed string. The second parameter is a pointer to a string that contains the characters to trim from psz .
In the example, we remove any digits from a string.
We will remove all digits from this string.
This string contains all characters to be removed.
With the StrTrimW() function, we trim digits from the buffer.
This is the output of the ShellTrimString.exe example.
Converting strings to integers
The StrToIntExW() converts a string representing a decimal or hexadecimal number to an integer. The function returns true on success.
The first parameter is a pointer to the string to be converted. The second parameter is one of the flags that specify how pszString should be parsed for its conversion to an integer. The third parameter is a pointer to an int that receives the converted string.
In the example, we convert two strings; one representing a decimal value and one a hexadecimal one.
The first string represents a decimal number; the second string represents a hexadecimal number.
With the StrToIntExW() function, we convert the first string into an integer. The STIF_DEFAULT flag tells the function to convert a decimal value.
With the STIF_SUPPORT_HEX flag, we tell the function to convert a hexadecimal value.
This is the output of the ShellConvertString.exe program.
Searching strings
The StrStrW() function finds the first occurrence of a substring within a string. The comparison is case-sensitive.
The first parameter is a pointer to the string to search. The second parameter is a pointer to the substring to search for. The function returns the address of the first occurrence of the matching substring if successful, or NULL otherwise.
In the code example, we search for a word within a sentence.
We search for a word from this sentence.
This is the word that we search for.
The StrStrW() function searches for a word within the sentence. If it succeeds, it returns a pointer to the matching substring.
This is the output of the ShellSearchString.exe program.
Windows API StrSafe functions
To increase application security, StrSafe functions were released. These functions require the size of the destination buffer as an input. The buffers are guaranteed to be null-terminated. The functions return error codes; this enables proper error handling.
Each of the functions is available in a corresponding character count Cch or byte count Cb version.
The string length
The StringCchLengthW() and StringCbLengthW() functions enable to determine the lenght of the string in characters and bytes.
The first parameter of the functions is a string whose length is to be checked. The second parameter is the maximum number of characters (bytes) allowed in the psz parameter. This value cannot exceed STRSAFE_MAX_CCH . The third parameter is the number of characters (bytes) in psz , not including the terminating null character.
The functions return S_OK on success and STRSAFE_E_INVALID_PARAMETER on failure. The functions fail if the value in psz is NULL , cchMax is larger than STRSAFE_MAX_CCH , or psz is longer than cchMax . The SUCCEEDED and FAILED macros can be used to check the return values of the functions.
The code example determines the lenght of a given string in characters and bytes.
We are going to determine the length of this string.
The target_size variable is filled with the counted values when the functions return.
With the sizeof operator, we get the size of the array of characters in bytes. The value serves as a maximum allowable number of characters in the string for the StringCbLengthW() function.
With the StringCbLengthW() function we determine the length of the string in bytes. The length is stored in the target_size variable.
We check the returned value with the SUCCEEDED macro. On success, we print the number of bytes in the string; on error, we print an error message.
Here we determine the maximum allowable characters in the string. The wchar_t is a type for wide characters; its size is compiler specific.
With the StringCchLengthW() function, we get the size of the string in characters.
On success, we print the number of characters in the string to the console. On error, we print an error message.
The string consists of 14 bytes or 7 characters.
Reading standard input
The StringCchGetsW() reads a line from the standard input, including the newline character.
The first parameter is the destination buffer, which receives the copied characters. The second parameter is the size of the destination buffer, in characters.
In the example we read a line from the standard input. The line is printed back to the console.
According to the MSDN documentation, the maximum input on command prompt cannot exceed 8191 characters.
We create a buffer for the input string.
The StringCchGetsW() reads a line from the stdin .
This is a sample run of the SafeGets.exe program.
Copying strings
The StringCchCopyW() copies one string to another.
The first parameter is the destination buffer, which receives the copied string. The second parameter is the size of the destination buffer, in characters. The third parameter is the source string.
In the code example, we copy one string with the StringCchCopyW() function.
This is the string to be copied.
We determine its length with the wcslen() function; one character is reserved for the newline.
We create a buffer and with it with zeros with the ZeroMemory() function.
With the StringCchCopyW() , we copy the string into the buffer. The size of the destination buffer is provided to ensure that it does not write past the end of this buffer.
This is the output of the SafeCopy.exe program.
Concatenating strings
The StringCchCatW() concatenates one string to another string.
The first parameter is the destination buffer. The second parameter is the size of the destination buffer, in characters. The third paramater is the source string that is to be concatenated to the end of the destination buffer.
In the code example, we concatenate two strings with the StringCchCatW() function.
The StringCchCatW() function adds the L»Hello » string to the buf array.
Later, the second string is added to the buffer.
This is the output of the SafeConcat.exe program.
Formatting strings
The StringCchPrintfW() function writes formatted data to the destination buffer.
The first parameter is the destination buffer, which receives the formatted string created from pszFormat and its arguments. The second parameter is the destination buffer, in characters. The third parameter is the format string. The following arguments are inserted into the pszFormat string.
In the code example, we create a formatted string with the StringCchPrintfW() function.
This is the format string; it has two format specifiers: %d and %ls .
With the StringCchPrintfW() function, we insert two values into the destination buffer.
This is the output of the SafeFormat.exe program.
In this part of the Windows API tutorial, we have worked with strings.