Notes on kanji and computers, particularly on the problem of traditional and modern forms of Chinese characters used in Japan

A fragment, really, but I'll leave here for anyone who is interested.

abbreviations [macrons omitted]
Toyo kanjihyo “–—pŠΏŽš•\
Joyo kanjihyo ν—pŠΏŽš•\
KK=kyukaji (traditional form) ‹ŒŠΏŽš
SK=shinkanji (modernized form) VŠΏŽš
JIS=Japanese Industrial Standard

outline history (from Seeley 1994)
1946: establishment of Toyo kanjihyo (list of kanji in ordinary use)
1978: JIS computer code for some 6000+ characters
level 1: 2965 characters arranged in order of reading
level 2: 3384 less common characters arranged in order of radical
1981: replacement of the Toyo kanjihyo with the Joyo kanjihyo
1982-1983: revision to JIS standards
1990: supplementary JIS character code for 5801 characters
[for more details see history in Japanese (Yasuoka, Kyoto U.)]
JIS computer codes for Chinese characters used in Japanese (kanji)
Simplified forms of several thousand characters have been in official use in Japan since 1946. When JIS established character codes in 1978, the simplified forms were included in level 1, while some, but but no means all, of the traditional forms were included in level 2 along with rarer kanji.
Why are 6000+ JIS characters not enough?
JIS codes were established primarily for administrative and business use. The codes include some very rare characters needed in modern personal or place names, but not many characters required in academic works in fields like history or literature. The supplementary character code of 1990 includes many more kanji, but it is not available for most computer users, and in any case still falls far short of the total of 50,000+ characters in the Morohashi dictionary.

People who work with older texts sometimes have recourse to gaiji ŠOŽš or hand-made characters. This only solves the problem of display and printing on piece-meal basis, and only at the individual or workgroup level. The problem of universal accessibility has become more acute now on the Internet, with the conflicting character sets of China, Korea and Japan. UNICODE has sought to solve this, but is still not available to most personal computer users.

What is a "doublet"?
Simplified forms of several thousand characters are in official use. Some differ from the historical forms (kyukanji) in only small ways, but
Why I wanted a list of old/new kanji.
I have been editing an electronic text which uses the traditional forms of characters. While these are very nice and proper in their own way, it does make word searches difficult, and is also less legible in some fonts and sizes. Moreover, the text was inconsistent in its use of old and new kanji. I wanted a way of making a modernized text so I have made a HyperCard stack of the work in which I have substituted the modern forms as I edit. (Scripts here.)

This includes some characters that are classified as itaiji (ˆΩ‘ΜŽš), alternative forms: šδ^–Ή, ™™^Œ• (the old form is ™˜)
not included
Desktop reference
There are now a number of good electronic dictionaries for computers. For quick hard-disk access, I personally use JISPA (ŽšƒXƒp), kokugo, kanji, E-J, and J-E. The kanji dictionary is handy to check readings and definitions, also alternative forms. Searching is by kun/on pronunciation, stroke number, radical, element, or a combination of these. Element (buhin •”•i) is more broadly defined than radical (bushu •”Žρ): the characters ˜I˜H contain the element ‘«, but are not categorized under the radical ‘«. A handy distinction when you are unsure of the correct radical (search for element instead), or want to define a search further (example: radical –Ψ and element ‹ζ uniquely defines •).
What characters are included in JIS?
For literary and historical studies, not nearly enough. JIS (Japan Industrial Standards) defines characters lists--first level, second level, names--that cover all the characters for "educational" and "general use", as well as those needed for most modern personal names and place names.

Characters are produced by typing one of the readings and hitting the space bar, right?
All first level characters, yes. It's best to write in "words" rather than try to produce individual characters, particularly when there are many characters with that reading (e.g. KAN, KEI). Even if you want a character out of context, it is often quicker to think of a two-character compound in which the character appears, then delete the other character.

Why would anyone want to enter kanji by numbers?
Thousands of rarer characters are only entered by numbers. Each character is given a unique number. This number can be entered directly, in the same way as a pronunciation. There are actually a number of competing systems (kuten-code ‹ζ“_ƒR[ƒh, JIS code), and dictionaries give one or both. It does not matter when inputting, although the kuten is easier, consisting of four digits, rather than digits and a letter. Type in the four digits, then hit the space bar. The character should appear as one of the choices in the small window above the cursor. If there are two characters, one is kuten and the other JIS. Just be sure to select the right one.

Remember that more than one KK may correspond to one SK.
i™ž@•Ω jiαA@•Ωji燁@•Ωj

In what order do computers sort Chinese characters?
Computers will sort KK automatically in order of their radical, (SK are sorted by on-yomi, where that exists, otherwise [as in case of kokuji] by kun-yomi.) Any good application should be able to do this. In the case of Mac, Nisus will "sort paragraphs" by first character in the line (Edit menu). The command for HyperCard fields is "Sort lines of field fieldName".) Printed lists of characters by radicals are usually given in manuals. The radical assigned may not always be the traditional one.
Christopher Seeley,"The Japanese script and Computers: The JIS Character Codes and their Periphery", Japan Forum, Vol. 6, No. 1, April 1994, pp. 89-101.

