Quick and Dirty Introduction to Traditional and Simplified Chinese Transliteration

The simplest case of transliteration is when you have two alphabetic systems, Latin and Cyrillic for example, and there is more or less a one-to-one correspondence between the letters of those alphabets. The Cyrillic and Latin titles of Philosopher’s Stone for instance:

Хари Потер и камен мудрости
Hari Poter i kamen mudrosti

You can see that for every Cyrillic letter, there’s a corresponding Latin letter and the transliteration is symmetrical in that you can go back and forth very easily. OK, that’s not entirely true—there are exceptions for sure and there are many competing Latin<>Cyrillic transliteration systems designed for different purposes but that symmetry is often a goal.

This is not at all what Chinese transliteration is like. There is no really authoritative count of Chinese characters, just lists created for different purposes. The Taiwanese Ministry of Education has a “Chart of Standard Forms of Common National Characters” that list 4,808 characters and it’s estimated that the average person knows and uses 3,000-4,000 on a regular basis. However, Unicode supports nearly 78,000 characters and Taiwan’s national standard for encoding characters supports more than 96,000. There are a lot of them. Too many for some people—and so the People’s Republic of China went about reducing that list and simplifying how many of the characters are written. The result is the current “Table of General Standard Chinese Characters” which has 8,105 characters of which 3,500 are “Tier 1” which are “frequently used characters”. It’s still a lot, but 8,105 is the Simplified maximum which is a heck of a lot fewer that Taiwan’s 96,000.

What that results in is an asymmetrical system with a many-to-one mapping between Traditional Chinese (HANT1) and Simplified Chinese (HANS1). There is a lot of overlap and many characters are the same in both HANT and HANS; as I mentioned, the writing of many characters was simplified so there are also many new characters in HANS that are derived from HANT. For example, the character that means “gate”:

HANT: (8 strokes) > HANS: (3 strokes)

The Venn diagram of the two character sets looks like this:

Venn Diagram of HANS and HANT

Every HANT character can be mapped to one and only one HANS character making HANS > HANT transliteration trivial. And even though each HANS character can potentially be mapped to many HANT characters, transliterating from HANS to HANT isn’t so difficult a task; it just requires a little context and for the most part both transliterations can be performed algorithmically (i.e. by a computer).

Notes

1 The ISO-15924 standard assigns four-letter IDs to writing systems—Traditional Chinese is “HANT” and Simplified is “HANS”. “Han” in the context of writing systems refers to the entirety of Chinese characters, even those borrowed into Japanese and Korean. I’m sure you can figure out where the “T” and “S” come from!

You may also like...

3 responses

  1. Lone says:

    “Every HANT character can be mapped to one and only one HANS character” -> Not entirely true. For example, 乾 in HANT can be either 乾 or 干 in HANS according to its pronunciation.

    • Avatar photo PotterGlot says:

      Well that’s annoying! There can’t be many of those though right? Is it likely to be true of the handful of HANT characters that have multiple pronunciations? And would you say it’d be correct to qualify that “a HANT character + pronunciation always maps to one and only one HANS character”?

  2. Lone says:

    “a HANT character + pronunciation always maps to one and only one HANS character” -> still not entirely true. In fact there is a wikipedia page for this problem: https://zh.wikipedia.org/wiki/%E7%B9%81%E7%B0%A1%E8%BD%89%E6%8F%9B%E4%B8%80%E5%B0%8D%E5%A4%9A%E5%88%97%E8%A1%A8

Leave a Reply