- Кодировки в Windows
- Setting UTF8 as default Character Encoding in Windows 7
- 2 Answers 2
- Change encoding on a per file or per extension basis
- 3 Answers 3
- Change default code page of Windows console to UTF-8
- 8 Answers 8
- Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)
- 3 Answers 3
- Optional reading: Why the Windows PowerShell ISE is a poor choice:
Кодировки в Windows
В данной статье пойдёт речь о кодировках в Windows. Все в жизни хоть раз использовали и писали консольные приложения как таковые. Нету разницы для какой причины. Будь-то выбивание процесса или же просто написать «Привет. Я не могу сделать кодировку нормальной, поэтому я смотрю эту статью!».
Тем, кто ещё не понимает, о чём проблема, то вот Вам:
А тут было написано:
Но никто ничего не понял.
В любом случае в Windows до 10 кодировка BAT и других языков, не использует кодировку поддерживающую Ваш язык, поэтому все русские символы будут писаться неправильно.
1. Настройка консоли в батнике
Сразу для тех, кто пишет chcp 1251 лучше написать это:
Первый способ устранения проблемы, это Notepad++. Для этого Вам нужно открыть Ваш батник таким способом:
Не бойтесь, у Вас откроется код Вашего батника, а затем Вам нужно будет сделать следующие действия:
Если Вам ничего не помогло, то преобразуйте в UTF-8 без BOM.
2. Написание консольных программ
Нередко люди пишут консольные программы(потому что на некоторых десктопные писать невозможно), а кодировка частая проблема.
Первый способ непосредственно Notepad++, но а если нужно сначала одну кодировку, а потом другую?
Сразу для использующих chcp 1251 пишите это:
Второй способ это написать десктопную программу, или же использовать Visual Studio. Если же не помогает, то есть первое: изменение кодировки вывода(Пример на C++).
Если же не сработает:
3. Изменение chcp 1251
Если же у Вас батник, то напишите в начало:
Теперь у Нас будет нормальный вывод в консоль. На других языках (С++):
4. Сделать жизнь мёдом
При использовании данного способа Вы не сможете:
- Разрабатывать приложения на Windows ниже 10
- Спасти мир от данной проблемы
- Думать о других людях
- Разрабатывать десктопные приложения, так как Вам жизнь покажется мёдом
- Сменить Windows на версию ниже 10
- Ну и понимать людей, у которых Windows ниже 10
Установить Windows 10. Там кодировка консоли специально подходит для языка страны, и Вам больше не нужно будет беспокоиться об этой проблеме. Но у Вас появится ещё 6 проблем, и вернуться к предыдущей лицензионной версии Windows Вы не сможете.
Данная статья не подлежит комментированию, поскольку её автор ещё не является полноправным участником сообщества. Вы сможете связаться с автором только после того, как он получит приглашение от кого-либо из участников сообщества. До этого момента его username будет скрыт псевдонимом.
Setting UTF8 as default Character Encoding in Windows 7
is there a way to set Windows 7 to globally use UTF-8 as standard?
its really annoying to set every single text editor to use it.
2 Answers 2
The short answer is no, it is not possible.
To elaborate, I am afraid you won’t find a global encoding option in Windows 7 that lets you both 1) set a global default which 2) all the applications you listed would obey.
Also, I would like to ask what is the problem here that you are trying to solve?
It is up to the application to choose whether they use unicode internally to represent data. While use of unicode is encouraged, you may never be sure that all your applications in fact do internally support it.
What you can do, however is change the default character encoding for each of the listed applications:
- For Eclipse, default encoding for new files can be set from Windows > Preferences > General > Content Types (see post on Eclipse Community Forms)
- For Notepad++, navigate to Settings > Preferences > New Document/Default/Directory and set Encoding to UTF-8
- As for Thunderbird, I am pretty sure it already uses UTF-8 as the default encoding? (see these notes about character encoding)
- In the case of OpenOffice (and LibreOffice), you actually don’t even need to care about encoding, since documents saved by OpenOffice are based on XML, in which encoding is specified internally in the XML-files (and UTF-8 is already the default there as well)
- From UTF-8 point-of-view, PowerShell is tricky. It has default encoding of UTF-16LE .
- For outputting files from PowerShell to UTF-8, see this answer
- For changing default encoding see this answer
Change encoding on a per file or per extension basis
I’m using Microsoft Visual Studio Express 2012 for Web. It seems that every file which I open with it gets encoded into UTF-8. For most files which are going to be web-facing, that’s fine. However, I have files in my projects that are specifically for build purposes (e.g., .bat files), which must be encoded in ANSI.
Are there any configuration settings in VS to either designate on a per file or a per extension basis the encoding? Or, if not specify the encoding, at least disable the auto-conversion to UTF-8?
3 Answers 3
Open the problematic file in Visual Studio and.
- On the File menu, click Advanced Save Options .
- In the Encoding dropdown, select Unicode (UTF-8 … or the encoding you require.
- Click OK .
An option to handle the encoding of all files of a given extension on a per open basis can be configured in the Options dialog. See MSDN page on Options, Text Editor, File Extension.
Navigate to Tools > Options > Text Editor > File Extension.
For the bat extension, I selected Source Code (Text) Editor with Encoding. The with Encoding part means that the user will be given options as to what encoding to use when opening the file. The default in this mode is Auto-detect, which preserves the ANSI encoding, if that is what the file already uses. Otherwise, one can explicitly designate it for the individual file.
Unfortunately, it doesn’t seem to remember the setting last used when opening a file, and will thus prompt for an encoding setting every time a file is opened.
Change default code page of Windows console to UTF-8
Currently I’m running Windows 7 x64 and usually I want all console tools to work with UTF-8 rather than with default code page 850.
Running chcp 65001 in the command prompt prior to use of any tools helps but is there any way to set is as default code page?
Update:
Changing HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage\OEMCP value to 65001 appear to make the system unable to boot in my case.
Proposed change of HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\Autorun to @chcp 65001>nul served just well for my purpose. (thanks to Ole_Brun)
8 Answers 8
To change the codepage for the console only, do the following:
- Start -> Run -> regedit
- Go to [HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor\Autorun]
- Change the value to @chcp 65001>nul
If Autorun is not present, you can add a New String
Personally, I don’t like changing the registry. This can cause a lot of problems. I created a batch file:
I saved at C:\Windows\System32 as switch.bat.
I created a link for cmd.exe on the Desktop.
In the properties of the cmd shortcut, changed the destination to: C:\Windows\System32\cmd.exe /k switch
Voilá, when I need to type in UTF-8, I use this link.
Edit the Registry:
Then restart. With this fix, if you are using Consolas font, it seems to lock PowerShell into a small font size. cmd.exe still works fine. As a workaround, you can use Lucida Console, or I switched to Cascadia Mono:
In the 1809 build of Windows 10 I’ve managed to permanently solve this by going to the system’s Language settings , selecting Administrative language settings , clicking Change system locale. and checking the Beta: Use Unicode UTF-8 for worldwide language support box and then restarting my pc.
This way it applies to all applications, even those ones that I don’t start from a command prompt!
(Which was necessary for me, since I was trying to edit Agda code from Atom.)
This can be done by creating a PowerShell profile and adding the command «chcp 65001 >$null» to it:
This doesn’t require editing the registry and, unlike editing a shortcut, will work if PowerShell is started in a specific folder using the Windows Explorer context menu.
The command to change the codepage is chcp . Example: chcp 1252 . You should type it in a Powershell window. To avoid the hassle of typing it everytime (if you always have to change the codepage), you may append it to the program’s command line. To do so, follow these steps:
- Right-click the Powershell icon on Start menu and choose «More» > «Open file Location».
- Right-click the Powershell shortcut and select «Properties».
- Add the following to the end of the «Target» command line: -NoExit -Command «chcp 1252»
Be happy. Don’t fuss with Windows Registry unless you have no other option.
Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)
I’ve been forcing the usage of chcp 65001 in Command Prompt and Windows Powershell for some time now, but judging by Q&A posts on SO and several other communities it seems like a dangerous and inefficient solution. Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry? And if there isn’t, is there a publicly announced timeline or agenda to support UTF-8 in the Windows CLI in the future?
Personally I’ve been using chcp 949 for Korean Character Support, but the weird display of the backslash \ and incorrect/incomprehensible displays in several applications (like Neovim), as well as characters that aren’t Korean not being supported via 949 seems to become more of a problem lately.
3 Answers 3
This answer shows how to switch the character encoding in the Windows console to UTF-8 (code page 65001 ), so that shells such as cmd.exe and PowerShell properly encode and decode characters (text) when communicating with external (console) programs in PowerShell, and in cmd.exe also for file I/O. [1]
If, by contrast, your concern is about the separate aspect of the limitations of Unicode character rendering in console windows, see the middle and bottom sections of this answer, where alternative console (terminal) applications are discussed too.
Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry?
As of (at least) Windows 10, version 1903, you have the option to set the system locale (language for non-Unicode programs) to UTF-8, but the feature is in beta as of this writing.
- Run intl.cpl (which opens the regional settings in Control Panel)
- Follow the instructions in the screen shot below.
This will make all future console windows default to UTF-8 ( chcp 65001 ).
Caveats:
If you’re using Windows PowerShell, this will also make Get-Content and Set-Content (and possibly other contexts where Windows PowerShell default so the system’s active ANSI code page) default to UTF-8 (which PowerShell Core (v6+) always does). This means that, in the absence of an -Encoding argument, BOM-less files that are ANSI-encoded (which is historically common) will then be misread, and files created with Set-Content will be UTF-8 rather than ANSI-encoded.
[Fixed in PowerShell 7.1] Up to at least PowerShell 7.0, a bug in the underlying .NET version (.NET Core 3.1) causes follow-on bugs in PowerShell: a UTF-8 BOM is unexpectedly prepended to data sent to external processes via stdin (irrespective of what you set $OutputEncoding to), which notably breaks Start-Job — see this GitHub issue.
Not all fonts speak Unicode, so pick a TT (TrueType) font, but even they usually support only a subset of all characters, so you may have to experiment with specific fonts to see if all characters you care about are represented — see this answer for details, which also discusses alternative console (terminal) applications that have better Unicode rendering support.
As eryksun points out, legacy console applications that do not «speak» UTF-8 will be limited to ASCII-only input and will produce incorrect output when trying to output characters outside the (7-bit) ASCII range. (In the obsolescent Windows 7 and below, programs may even crash).
If running legacy console applications is important to you, see eryksun’s recommendations in the comments.
However, for Windows PowerShell, that is not enough:
- You must additionally set the $OutputEncoding preference variable to UTF-8 as well: $OutputEncoding = [System.Text.UTF8Encoding]::new() [2] ; it’s simplest to add that command to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file.
- Fortunately, this is no longer necessary in PowerShell Core, which internally consistently defaults to BOM-less UTF-8.
If setting the system locale to UTF-8 is not an option in your environment, use startup commands instead:
Note: The caveat re legacy console applications mentioned above equally applies here. If running legacy console applications is important to you, see eryksun’s recommendations in the comments.
For PowerShell (both editions), add the following line to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file, which is the equivalent of chcp 65001 , supplemented with setting preference variable $OutputEncoding to instruct PowerShell to send data to external programs via the pipeline in UTF-8:
- Note that running chcp 65001 from inside a PowerShell session is not effective, because .NET caches the console’s output encoding on startup and is unaware of later changes made with chcp ; additionally, as stated, Windows PowerShell requires $OutputEncoding to be set — see this answer for details.
- For example, here’s a quick-and-dirty approach to add this line to $PROFILE programmatically:
- For instance, you can use PowerShell to create this value for you:
For cmd.exe , define an auto-run command via the registry, in value AutoRun of key HKEY_CURRENT_USER\Software\Microsoft\Command Processor (current user only) or HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor (all users):
Optional reading: Why the Windows PowerShell ISE is a poor choice:
While the ISE does have better Unicode rendering support than the console, it is generally a poor choice:
First and foremost, the ISE is obsolescent: it doesn’t support PowerShell Core, where all future development will go, and it isn’t cross-platform, unlike the new premier IDE for both PowerShell editions, Visual Studio Code, which already speaks UTF-8 by default for PowerShell Core and can be configured to do so for Windows PowerShell.
The ISE is generally an environment for developing scripts, not for running them in production (if you’re writing scripts (also) for others, you should assume that they’ll be run in the console); notably, the ISE’s behavior is not the same in all aspects when it comes to running scripts.
As eryksun points out, the ISE doesn’t support running interactive external console programs, namely those that require user input:
The problem is that it hides the console and redirects the process output (but not input) to a pipe. Most console applications switch to full buffering when a file is a pipe. Also, interactive applications require reading from stdin, which isn’t possible from a hidden console window. (It can be unhidden via ShowWindow , but a separate window for input is clunky.)
If you’re willing to live with that limitation, switching the active code page to 65001 (UTF-8) for proper communication with external programs requires an awkward workaround:
You must first force creation of the hidden console window by running any external program from the built-in console, e.g., chcp — you’ll see a console window flash briefly.
Only then can you set [console]::OutputEncoding (and $OutputEncoding ) to UTF-8, as shown above (if the hidden console hasn’t been created yet, you’ll get a handle is invalid error ).