By default, Java encodes Strings sent to
in the default code page. On Windows XP, this means a lossy conversion
to an "ANSI" code page. This is unfortunate, because the Windows Command
cmd.exe) can read and write Unicode characters. This post describes
how to use JNA to work round
This post is a follow-up to I18N: Unicode at the Windows command prompt (C++; .Net; Java), so you might want to read that first.
To get this code to support Unicode on Windows XP, you'll need to switch your console to a Unicode font (e.g. Lucida Console on the English language version).
It isn't all a walk in the park, though. You'll need to understand the API you're calling and understand the mapping between Java and native types. Prior C/C++ experience is a definite plus. For native constant values, you may need to either read API header files or write a short application that emits them.
- JNA documentation
- Java primitive types (and their object equivalents) map directly to the native C type of the same size
- MSDN Library: Win32 and COM Development
Here is an example mapping for WriteConsole:
|C++ declaration from API doc||Java interface declaration|
BOOL WINAPI WriteConsole( __in HANDLE hConsoleOutput, __in const VOID *lpBuffer, __in DWORD nNumberOfCharsToWrite, __out LPDWORD lpNumberOfCharsWritten, __reserved LPVOID lpReserved );
public boolean WriteConsoleW( Pointer hConsoleOutput, char lpBuffer, int nNumberOfCharsToWrite, IntByReference lpNumberOfCharsWritten, Pointer lpReserved );
The C++ API offers three ways to call the function. We opt for
because we explicitly want to use Unicode (16-bit C++
types). The alternative is to use an "ANSI" call (
byte arrays instead of Java
arrays) or pass options to the library initialisation to declare which
WriteConsole should delegate to.
Note: all the examples below are single-threaded and do not synchronise access to the native API. JNA is licensed under the LGPL, which may not suit everyone. The examples that follow should still be useful in understanding the requirements for a JNI implementation.
Getting the console mode
The console mode defines the screen buffer's input/output modes (e.g. whether is using insert or overwrite input). These examples don't change these modes, but it is still useful to call this function. It will return false if the handle is being redirected (e.g. if stdout is piped to a file, it will return false).
Setting and getting the console code page
The console code page can be changed to match the input/output
encodings used by Java. Values are Windows
code page identifiers. Equivalent get functions can read existing
values. In some circumstances, like invoking the function in the absence
of a console, you might read a value of zero. It is probably no
coincidence that this is also the value of the constant
(system default Windows ANSI code page).
Writing Unicode to the console
Writing Unicode to the console is simple enough (assuming you've remembered to switch to a Unicode font and the font includes the graphemes you want to display). It doesn't matter which output code page has been set - that information is only required when working with "ANSI"/multibyte characters.
Working with "ANSI"/multibyte characters is something
you will have to think about. If the output is redirected to a file, you
can't use the
WriteConsole function. You will need to test
the console mode.
There are a number of circumstances when writing to the console
WriteConsole cannot be used. If you run this code
under the Eclipse IDE, for example,
Reading Unicode from the console
ReadConsole function can be used to get Unicode
input. The default console mode will let the user enter characters until
the ENTER key is pressed. On Windows, line terminators are marked by a
carriage return followed by a linefeed (
ReadConsole function can only be used if
Encoding to and decoding from the console code page
Encoding/decoding between wide chars and bytes could be done in Java, but that would require mapping the code page identifiers to Java encodings. It is more convenient to pass the code page to the native function.
The functions are invoked twice: the first time to calculate the output buffer size; the second to fill the buffer.
Printing characters as UTF-8
One other way to get the console to emit Unicode characters is to set its code page to UTF-8.
You can then encode and emit the bytes, treating the console handle like a file handle.
It doesn't appear to be possible to read Unicode characters in
UTF-8 mode (i.e. by calling
SetConsoleCP). This appears to
be a limitation of
cmd.exe (it doesn't work with the .Net Console.ReadLine
All the sources are available in a public Subversion repository.
Because the code will usually operate differently when run from an IDE, you may want to look into remote debugging.