Internationalization Frequently Asked Questions


This page answers common questions about internationalization on the JDK software platform. For more information see the JDK software Internationalization home page.

General Questions

What is internationalization?

Internationalization allows software to be adapted to any language and cultural convention. During the internationalization process, the programmer isolates the parts of a program that are dependent on language and culture. For example, the programmer will isolate error messages because they must be translated during localization.

What is localization?

Localization is the process of adapting a program for use in a specific locale. A locale is a geographic or political region that shares the same language and customs. Localization includes the translation of text such as GUI labels, error messages, and online help. It also includes the culture-specific formatting of data items such as monetary values, times, dates, and numbers.

How do I go about internationalizing an existing program?

See the steps outlined in the Checklist section of the The Java Tutorial.

Locales

What is a locale?

A locale is a geographic or political region that shares the same language and customs. In the Java programming language, a locale is represented by a Locale object. Locale-sensitive operations, such as collation and date formatting, vary according to Locale.

Where can I find some coding examples that use Locale objects?

See the Setting the Locale section of the The Java Tutorial.

Which locales are supported?

The locales supported by the JDK software are listed at Supported Locales. A platform other than the JDK may support a different set of locales.

Can a Java application use multiple locales?

Yes. This capability allows you to create multi-lingual applications.

How does setting the default locale affect the results of sorting?

The Collator class, and its subclasses, are used for building sorting routines. These classes are locale-sensitive, and when created with the no-argument constructor will use the collating sequence of the default locale.

Resource Bundles

What is a resource bundle?

A ResourceBundle object allows you to isolate localizable elements from the rest of the application. With all resources separated into a bundle, the application simply loads the appropriate bundle for the active locale. If the user switches locales, the application just loads a different bundle.

Where can I find some coding examples that use ResourceBundle objects?

See the Isolating Locale-Specific Data section of the The Java Tutorial.

How do I specify non-ASCII strings in a properties file?

You can specify any Unicode character with the \uXXXX notation. (The XXXX denotes the 4 hexadecimal digits that comprise the Unicode value of a character.) For example, a properties file might have the following entries:

s1=hello there
s2=\uff2d\uff33\u30b4
If you have edited and saved the file in a non-ASCII encoding, you can convert it to ASCII with the native2ascii tool. For example, you might want to do this when editing a properties file in Shift JIS, a popular Japanese encoding.

How do I compile a non-ASCII ListResourceBundle?

If your source file is in a non-ASCII encoding, you can direct the compiler to convert it into Unicode. For example, you would compile a Japanese resource bundle written in the Shift JIS encoding as follows:

javac -encoding SJIS LabelsResource_ja.java

Dates

How do I format a date?

You can use the SimpleDateFormat to format and parse dates in a locale-sensitive manner. See the section on formatting Dates and Times in the The Java Tutorial.

My Java application has the wrong time zone. Why?

This was caused by a bug that was fixed in release 1.1.6 of the JDK software.

Fonts

What is a font.properties file?

The font.properties file maps the fonts of the host platform, such as Solaris or Win32, to Java virtual fonts. The font.properties file is in the $JAVAHOME/lib directory.

How do I add a font?

See the web page Adding Fonts to the Java Runtime Environment.

Why can't I see a particular character in my TextField and TextArea components?

The proper font is not installed on your platform.

I have installed a Unicode font, but my program cannot display all Unicode characters. What's the problem?

The characters that cannot be displayed may not be in the font.

What font types does the JDK software support for the Win32 and Solaris platforms?

The release for Win32 platforms supports TrueType fonts. The release for Solaris supports outline fonts that can be handled by an X11 server, such as F3, Type1, and TrueType.

What classes of fonts are supported by the Java Runtime Environment?

Version 1.0 of the JDK software included the font names TimesRoman, Courier, and Helvetica, which were very specific and do not apply to many locales. Version 1.1 supports the following classes of fonts:

What is the difference between a virtual font name and a platform font name?

The virtual font name is the name of the font recognized by the Java Runtime Environment. The platform font name is the actual name of the font on the host platform. For example, Dialog and Serif are virtual font names, and Times and Helvetica are the platform font names on a Win32 or Solaris platform.

Is it possible to display more than one language in the Java Runtime Environment?

Yes. To implement a multi-lingual display you make the necessary changes to the font.properties file and remove the language-specific font.properties.xx files. See the web page, Adding Fonts to the Java Runtime Environment, for details.

Why does my Chinese font with Big5 encoding work fine on Windows NT but not on Windows 95?

Windows NT's internal encoding is Unicode, so it can support Unicode Chinese characters if a Big5 font is installed. However, Windows 95 uses the ANSI codepage, which limits it to the 8859_1 code page. Therefore, on Windows 95 a TextArea component won't work correctly with Big5 encoded Chinese characters.

What are the default fonts of CJK (Chinese, Japanese, Korean) environment in Solaris 2.7?

The default fonts are listed in the following table:

lang (locale) screen-width font-typefaces  font-size font-encoding
korean (ko) WIDTH > 1175  Round Gothic 18 (point) ksc5601.1987-0 
korean (ko) 850<WIDTH<1176 Round Gothic 16 (point) ksc5601.1987-0 
korean (ko) 851 > WIDTH  Round Gothic 14 (point) ksc5601.1987-0 
korean (ko.UTF-8) same as above same as above same as above ksc5601.1992-3 
japanese (ja) > 1175  Gothic 16 (point) jisx0201.1976-0
japanese (ja) < 1176 Gothic 14 (point) jisx0201.1976-0
T-chinese (zh_TW) > 1175  Sung 18 (point) cns11643-[1..16] 
T-chinese (zh_TW) < 1176 Sung 16 (point) cns11643-[1..16] 
T-chinese (BIG5) > 1175  Ming 18(point) big5-1
T-chinese (BIG5) < 1176 Ming 16 (point) big5-1
S-chinese (zh) > 1175  Song 16 (point) gb2312.1980-0 
S-chinese (zh) < 1176 Song 14 (point) gb2312.1980-0 

Character Encodings

What is a character encoding?

A character encoding is a mapping between characters and code values.

What is a Unicode?

In the Java programming language, char values represent Unicode characters. Unicode is a 16-bit character encoding that supports the world's major languages. You can learn more about the Unicode standard at the Unicode Consortium web site.

How do I convert data between Unicode and other character encodings?

The Converting Non-Unicode Text section of the The Java Tutorial explains how to peform the conversions within an application. To convert data files use the native2ascii tool.

Which character encodings are supported when converting text to and from Unicode?

See the Supported Encodings web page.

I can't find the CharToByteConverter class. What should I use to convert character encodings?

The CharToByteConverter class is available only in the sun.io package. If you use this package, your program will be platform-dependent. Instead, try using the InputStreamReader and OutputStreamReader classes, which belong to the java.io package.

Can I add a custom converter?

Yes, but this is typically done by licensees, not by application programmers. You'll need to extend the ByteToCharConverter and CharToByteConverter classes. See the Charset Converter section in the Adding Fonts to the Java Runtime web page.

What is UTF8 encoding?

UTF8 stands for Universal Transformation Format 8. It is transmission format for Unicode that is safe for UNIX file systems.

What is a file encoding?

A file encoding is the standard used to encode character data in a file. A string identifying the file encoding is stored in the file.encoding property of the System class. The file encoding is significant because the Java programming language uses Unicode for characters, but the file system of the host platform probably uses some other encoding. This encoding varies with host platform and locale. If the encoding matches the file.encoding property, then the conversion of the character data into Unicode is transparent to the programmer.

What is the default file encoding for the JDK software?

For release 1.1.7 and 1.2.0, the default file encoding is CP1252 for Win32 and ISO8859_1 for Solaris.

Are the CP1252 and ISO8859_1 encodings identical?

No. CP1252 contains some additional characters in the range of \u0080 to \u009F.

Input Methods

What is the Input Method Framework?

The input method framework enables all text editing components to receive Japanese, Chinese,  or Korean text input through input methods. An input method lets users enter thousands of different characters using keyboards with far fewer keys. Typically a sequence of several characters needs to be typed and then converted to create one or more characters. For specifications and examples see the web page, Input Method Framework.

How do you switch between Chinese and English input modes?

Solaris:

Win32:

What does it mean to switch input methods?

A user may have multiple input methods available. For example, the user may have input methods for different languages or input methods that accept various types of input. Such a user must be able to select the input method used for a partiuclar language or the input method that provides the fastest input.

Can an input method be activated programmically?

In release 1.1 of the JDK software an input method can be activated only by the user's keystrokes. The FCS of release 1.2 permits programmatic activation of an input method.

Do the AWT and Swing (JFC) text components work with input methods?

See the Input Methods section of the JDK Software Internationalization Overview.

Miscellaneous

The Collator object supports different levels of decomposition and strength. How do I choose the right decomposition and strength in a locale?

Since decomposing takes time, turning decomposition off makes comparisons go faster. However, for Latin languages the NO_DECOMPOSITION mode is not useful if the text contains accents. You should use the default decomposition unless you really know what you're doing.

The strength property you choose depends on what your application is trying to accomplish. For example, when performing a text search you may allow a "weak" match, in which accents and differences in case (upper vs. lower) are ignored. This type of search employs the PRIMARY strength. If you are sorting a list of words, you might want to use the TERTIARY strength. In this mode the properties that must match are the base character, accent, and case.

Does the JDK software support the euro currency?

Support for the euro currency is available in version 1.2 and later of the Java 2 platform. For information about support in release 1.1 see the web page, EURO CURRENCY PROPOSAL FOR JDK 1.1.x.

This page was updated on 5 October 1998.

Copyright © 1996-1999 Sun Microsystems, Inc. All rights reserved.