Danny's Weblog
This section is for articles which relate mainly to the Cambodian language, often referred to as Khmer. As with the rest of my site, the articles are presented in *reverse* chronological order. Also, they tend to represent things which I have discovered or speculated about which *supplement* the standard materials: this is not intended to present a free teach-yourself-Khmer course.
In particular, note that I originally focused on using the "Limon-type" fonts for Cambodian, as they were far more commonly used than Unicode. Although I believe Limon is still much more common, support for Unicode is so much better these days that more recent posts focus on Unicode. To get a balanced picture, you should read the entire folder.
You may also be interested in articles which refer to Asian languages in general: Asia/Language-misc
How to say the Cambodian (Khmer) word for "go"
The "tl;dr" version is: You say the Cambodian word for "go" kinda like the English word "dove" (meaning "bird", not the American past tense of "dive").
(For the purposes of this article I decided to implement Khmer text and phonetics as a graphic, in case the user's PC does not properly support the Unicode used in the text. Numbers in the graphic below correspond to the any references in the text.)
1. Khmer script: ទៅ 2. IPA transcription: tɨʋ̒̚
The word for "go" is obviously important in any language and should be learned very early on, but it happens to be particularly hard for the foreign learner of Cambodian. I remember my first arrival on my own in Cambodia, when I ran to escape overpriced taxi touts and jumped on the back seat of a motorbike, telling the driver (I hoped) to "go! go!". He had no idea what I was saying until I said "go market!" The word "market" is much easier for the Westerner to pronounce understandably. This allowed the driver to guess that I did not mean "even if, scrub, polish, rub, exchange, barter, change, grow, spring up, bud, breasts, take off, remove, solve, attempt to solve, lion, bent over, unit of dry measure equivalent to about two pecks... market".
I had some idea of how bad my pronunciation was, because I had found the explanations of the elements of this word to be particularly vague. I am writing this posting to record my current understanding of the issue. I would have posted it much earlier, except I was trying to fix representation of Khmer in text on this site; but I've decided not to wait. Maybe people will find this and improve the pronunciation guides in textbooks.
It took me nearly two years of actually living in Cambodia to get fairly confident of my grasp of this word, and my pronunciation of it is still rather labored. I think the writers of the textbooks I relied on managed to write those books with only a partial grasp of the problems.
The basic problem is that every one of the three phonemes of this word does not exist in English. This makes it impossible to use a simple transcription like you might find in a travel guide. In the description below I have had to use a number of fairly specialized phonetics terms, which few Westerners are familiar with. You will need to Google them. Additionally, the final phoneme is pronounced "unreleased" (marked by that unusual diacritic), which changes the sound drastically for the Western ear, and that feature is poorly explained in most available texts on phonetics. Incidentally, I did not find a single hit for the IPA transcription I am using here on Google: to put it another way, this transcription of the word is my own. Being still rather hesitant about it, I have not marked it as either a broad or narrow transcription.
The wikipedia entry on the IPA phonetic system is worth looking at as a reference, although not very helpful to start learning from if you know nothing about phonetics:
[http://en.wikipedia.org/wiki/International_Phonetic_Alphabet]
An additional problem is that Cambodians do not have any phonological training, much less the ability to communicate such information to the hapless foreign learner.
We have to go through each of those difficult phonemes in turn.
The first phoneme is an unvoiced, unaspirated dental, represented by Huffman as a simple "t". Perhaps I should say explicitly that although English and American speakers utter this sound on occasion, they do not know how to create it in an arbitrary location in an utterance. In English, the "d" sound is always voiced, and both the "d" and the"t" are sometimes aspirated and sometimes not, depending on one's regional accent as well as the surrounding sounds. But these variations correspond to *different words* in Cambodian.
Although the phoneme is not very hard to pronounce adequately (at least to the Western ear), it is not trivial to clearly maintain the differences between this and the voiced and aspirated versions of the dental phonemes, especially while you have many other things to think about – either when you are listening or speaking. Additionally, the best position of articulation of this sound (with the tongue curled up against the palate) is unnatural for the Westerner, so he is not used to incidental sounds which are routinely produced in different environments (see below).
The second phoneme is described by Huffman like this: "High central unrounded vowel made by raising the center of the tongue toward the soft palate while keeping the lips flat or spread". (Also, I hear the sound as nasal, and other sources agree.) Now you can see this is going to be hard to make compatible with the articulation of the preceding consonant. The result is that at the end of that consonant/beginning of the vowel the tip of the tongue has to flick sharply towards the lower front teeth to allow the vowel to be articulated clearly, resulting in a sharp click at the front of the vowel; and this motion is so sharp that there is also a sort of audible slap as the tongue falls to the lower jaw at the end of the vowel. The overall effect, for the Westerner who has not subconsciously learned how Khmer sounds interact, is that the "click" features (reminiscent of African click languages) overwhelm the true phonemes.
But the worst part is the final consonant. This is considered to be part of a single vowel sound in the word as written in the Cambodian spelling system, and perhaps because of this it is often represented as some sort of vowel in phonetic transcriptions, causing endless confusion.
Although it may well sound like a vowel to the Western ear (and even in English, the exact definition of the difference between vowels and consonants is not entirely agreed), Huffman and my favorite dictionary (whose title in Khmer is "Modern Khmer-English Dictionary", and which I believe to be a ripoff version of Headley's "Modern Cambodian-English Dictionary", 1997) agree that the final sound is a consonant. But what is it?
It turns out to be a variant of an English "v" sound, except that instead of the lower lip being clamped between the upper and lower teeth, the two lips are simply placed (loosely) together. (The symbol for this consonant is in the right position in the charts showing phonetic features, but actually I haven't found a good specific description of it and I just ope it's correct.)
Additionally the consonant sound, as usual at the end of Khmer words, is pronounced unreleased, ie very short and without opening the mouth again at the end, so that you can hardly hear what we would think of as the actual consonant sound, but instead the result (including the actual vowel) sounds rather like an "o" sound (as in "hello") to an English person: the articulation of the "v" sounds like the articulation of the English diphthong. (That's indicated in my transcription by the wacky diacritic over the last letter.)
Huffman gives a rather vague description including "with more lip-rounding", but I believe that is quite false.
Headley says that in the final position, this phoneme is pronounced like "w". This is largely true of the *result*, but gives the wrong impression of how the sound is actually articulated, and if you try to utter the sound that way it distorts surrounding phonemes.
Perhaps one day I will meet a Cambodian who has studied phonetics sufficiently to confirm all this. Or laugh at me.
2009 Jan 20 [ Tue ]
A couple more comments on Wiktionary and Cambodian (Khmer)
In my previous posting on the Wiktionary's features for Khmer: [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/wiktionary01.html] I sounded perhaps a little harsh. Evidently the slightly lame features result from basing the system on the existing features of the Wikipedia. The Wikipedia has no features for automatically generating links back and forth so that the information provided in the version set up for English-speaking users is automatically provided to the version for Cambodian users, so the Wiktionary doesn't have them either. Likewise, the fact that X means A and B should automatically allow Wiktionary to display that A can (sometimes) be translated as X, but that won't happen until someone who can program realizes that it's necessary (and implements the user interface and documentation to let people actually enter the data cleanly).
Another point about Khmer and Unicode: I happened to view my previous posting in Opera under Linux, and was surprised to see that Opera did not handle the jerng consonants properly: it displayed the regular consonant with a + sign underneath it, and then the jerng consonant. However, the link worked – and then the Wiktionary page had the same problem. It's very strange; I thought the handling of Unicode was determined by the OS, and it's certainly handled OK by Firefox under Linux and IE6 under Windows (if Office 2003 is installed under XP at least).
Presumably this means that Opera needs some sort of feature to be implemented in the code to work properly with Khmer Unicode. Hmm.
2009 Jan 18 [ Sun ]Wiktionary now has some Cambodian (Khmer) language entries
Today I saw a link to a Russian word in Wiktionary. I was interested because it had the Russian characters for the word as part of the url. On a whim I tried entering Khmer Unicode characters for a Khmer word as part of the url. The first word I tried ("bdey", husband) didn't work, but the next one I tried, "pnum" or hill (as in Phnom Penh) worked.
Here is a link to that entry. Unless you have your machine set up for Khmer Unicode, it will be displayed badly, but I'm guessing it will still work for you as a link, assuming your machine handles Unicode at all: [http://en.wiktionary.org/wiki/%E1%9E%97%E1%9F%92%E1%9E%93%E1%9F%86]
Disappointingly, the Wiktionary does not seem to be set up to automatically work backwards. For instance, the link shown on that page to the English word "hill" does not display a link back to the Khmer word!
Here is a link to the intro to Khmer: [http://en.wiktionary.org/wiki/Category:Khmer_language]
This page is rather bare but I could not find anything better. The subheadings have a number after them, but I don't know what it means: it does not mean the number of entries under that heading, because several headings noted as "0" had multiple entries.
However, in total there are not many entries. If you go to the main page for the Wiktionary, you will see that Cambodian is in the 100-1000 entry group. Hopefully more entries will be added.
The link I gave above was for the "English dictionary" section: ie, in English, for English people (that is, the vanishingly tiny number who can read and write Khmer, as no way is provided to look up a word by phonetic transcription – to be fair, that's not as easy as it might sound). ...Oops, actually, you can enter *their* phonetic transcription, if you know what it is. But I did not see any explanation of how to *create* a phonetic transcription that is compatible with their database.
If you want to look up an English word and find the Khmer for that word, you can enter the word in the search box, and then pick out the Khmer link in the long list of languages below *if you're lucky*. (I think I may have looked at the Wiktionary a couple of times before, and seeing no entry for the Khmer for the current word concluded Khmer was not supported at all!)
If the word is not in the Wiktionary for even a single language (quite likely), you arrive at the results of a regular Wikipdedia search for that word, with no explanation: suboptimal.
There is a Khmer version of the Wiktionary, ie a version *in* Khmer, for Khmer-speaking people: [http://km.wiktionary.org/wiki/] but as far as I could see it only gives Japanese translations... no, it does have some English. Interestingly, the translation it gives for "welcome" clearly has no connection to the page provided for English users: [http://km.wiktionary.org/wiki/welcome]
I did not see any obvious errors in the few words I checked out. Clearly, far fewer secondary meanings are shown than you would find in a good dictionary. On the other hand, one word "tortuel", whose basic meaning is "receive", was shown to mean "welcome!" (as an interjection), which I was not aware of, but not in its fairly common sense of "eat" in the phrase "tortuel tian" (polite, intransitive). (dordūul is the Wiktionary's phonetic transcription).
All in all, it's quite impressive that it all works with Unicode, including the urls. I get the impression from the way that the data does not link together very well that the site designer does not actually speak any foreign languages himself. On the other hand, I probably won't be making any contributions to the Wiktionary personally, so I should feel guilty about carping criticism.
2008 Nov 15 [ Sat ]Nice file explaining Khmer Unicode and text entry
I have a vague feeling I found the Cambodian-language version of this a couple of years ago and couldn't find an English-language version. Anyway here it is: [http://www.cambodia.org/fonts/KhmerUnicode_Keyboard_Layout.pdf]
I found it via this page: [http://www.cambodia.org/fonts/] which is also well worth a shot.
Now that I can actually read it instead of gloomily puzzling out an approximation of the meaning word by word, there is a lot of useful info, for instance the order you need to enter diacritics when they pile up. Maybe next time I will remember this instead of having to go by trial and error.
2008 Jul 02 [ Wed ]
Translation of Khmer text used for khmerconverter test file
On 2008-07-01 I posted some text in Limon and Unicode. I didn't provide any translation or phonetic transcription.
I had not really understood the text at all because it used several words in a string which seemed to contradict each other. It reminded me of a survey question that I noticed back in the seventies which was something like "Are you in favour of, or do you oppose, the Government's intention to stop the European Community's plan to prevent the prohibition of schemes to restrict the denial of non-reportable medications?". I think people design such questions carefully to get the answer *they want*, which bears no relation to what the actual opinion of the respondents might be. In the case of that survey question, for some reason, they wanted the answers to be split 50/50, apparently.
Anyway, here's my shot at a sentence-by-sentence translation, followed by a free translation. (I apologize about the trailing sentence at the end: "baan" often ends a sentence in Cambodian and I took a wrong guess.) There's no phonetic transcription because I had to guess at several words.
Once again the text is in Unicode.
| សមាជិកព្រឹទ្ធសភាថៃ | Thai senators |
| មួយក្រុមដែល មានគ្នា ៧៧នាក់ | a group of 22 people |
| កាលពីថ្ងៃចន្ទបានចាប់ ផ្តើមដំណើរការបញ្ឈប់ | on Friday started progress in stopping |
| ការគាំទ្ររដ្ឋា ភិបាលថៃចំពោះសំណើរបស់រដ្ឋាភិបាល កម្ពុជា | support of the Thai government in relation to the [moment??] of the government of Cambodia |
| ដែលស្នើដាក់ប្រាសាទព្រះវិហារ | which proposes to put the temple of Preah Vihar |
| ចូលទៅក្នុងបញ្ជីបេតិកភណ្ឌពិភពលោក ។ | to enter the register [?] of belongings of the world [world heritage sites?] |
| កាសែតឌឹណេស្ហិនរបស់ថៃរាយការណ៍ | The newspaper "The Nation" of Thailand announced |
| ថាសមាជិកព្រឹទ្ធសភាទាំងនោះបាន | that these senators were able to... |
On Friday, a group of 22 senators in the Thai parliament began a bid to withdraw Thai government support for the Cambodian government's proposal that the temple of Preah Vihar should be registered as a World Heritage site. "The Nation", a Thai newspaper, reported that these senators were able to...
Incidentally, there are two newspapers called "The Nation" in Thailand. One is published in English; the name of the newspaper in the Khmer text is given in a phonetic form "der neyshun", so I'm assuming it's the English version that's being referred to.
Incidentally, this report seems to reflect the general attempt of the Thai and Cambodian governments and government-controlled media (ie practically all of it) to whip up tension between their citizens.
Responses: 1
Name/Blog: raul
URL: patapaul@gmail.com
Title: totalsports-news.blogspot.com
Comment/Excerpt: me gusta mucho tu blog lo isito a diario visita el mio y si t gusta deja un comentario y si quieres nos enlazamos lops blogs []
The "khmerconverter" utility to convert eg Limon to Unicode
The description sounded interesting: Limon and similar non-Unicode ("USA International") fonts to Khmer Unicode and vice versa. I've written about the Limon issues before, eg here: [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/windowssetup01.html]
I had found khmerconverter while looking around in Ubuntu Synaptic Package Manager. I had installed it a couple of weeks ago, but I couldn't see where the installer had put the launcher and didn't bother proceeding. Today I happened to see the launcher (in Applications - Accessories) and tried it, but it appeared to do nothing.
I found the name of the executable in the launcher and was able to do "man khmerconverter", which helped by showing command-line options, but not enough (the spec for the formats is not clear). On the web I found: [http://www.khmeros.info/drupal/?q=en/download/converter] which suggested that the app had a gui wrapper.
After a while it occurred to me that I should try running the app from the console instead of the desktop. This revealed that it was complaining about the absence of the "tix" library for Tk. I found tix in Synaptic and installed it (no DVD necessary): clicking the launcher then brought up the gui. (It seems to me that if an app fails with an error message, the launcher, or the windowing environment, or something, should detect that and wait for you to read the error message instead of immediately closing the window. Oh well.)
Hmm. This is the first time I've seen where a Synaptics app has clearly failed to install a necessary package.
So how can you try it out? You can download Limon and ABC "legacy" fonts here: [http://www.everyday.com.kh/khmerfont/khmerfont.asp]
This page is also useful: [http://www.cambodia.org/fonts/] with eg "How to type Khmer Unicode", a PDF document, unfortunately in Khmer and without any keyboard layout diagram for people trying to use a non-Khmer-Unicode keyboard. (There may be some reference to such a thing, but I was barely able to puzzle out more than a few words here and there.)
After I had installed the fonts (by unzipping them to my /home/dannyw/.fonts folder), Firefox was able to view www.everyday.com.kh properly. When I checked the HTML source, it does indeed handle fonts in css, and the css specifies EOT fonts (ie the special downloadable font format for IE). So although Firefox can't handle those, it apparently knows it can default to the (newly-installed) TTF fonts by name. OTOH, the page layout was still all screwed up: all the text was scrunched into the right column. I was able to set Firefox to View - Page style - No style. This made it possible to select a block of several sentences of text from everyday.com, and I could copy it into OpenOffice.
Then I could save as an OpenOffice .odt file, which is apparently the native format for khmerconverter. The output looked OK as far as I could see, ie the glyphs appeared to match – I'm not claiming to be able to *edit* Khmer text!
So while I've hardly tested khmerconverter exhaustively, it does appear to be useful.
Here are some blocks of test text so you can judge the performance of khmerconverter (and check whether my page and your browser setup work together – in particular check whether your browser is set to override font specs – d'oh!)
Original Limon (only looks right if "Limon S1" font is installed on your system – I'm not bothering to set up an EOT font spec here):
smaCikRBwT§sPaéf mYyRkumEdl manKña 77nak;kalBIéf¶cnÞ)ancab; epþImdMeNIrkarbBaÄb; karKaMRTrdæa Pi)aléfcMeBaHsMeNIrbs;rdæaPi)al km<úCaEdlesñIdak;R)asaTRBHvihar cUleTAkñúgbBa¢IebtikPNÐBiPBelak .kaEstDweNsðinrbs;éfraykarN_ fasmaCikRBwT§sPaTaMgenaH)an
Unicode version (should display OK if *any* Unicode font on your system can handle the Khmer group of Unicode codes): សមាជិកព្រឹទ្ធសភាថៃ មួយក្រុមដែល មានគ្នា ៧៧នាក់កាលពីថ្ងៃចន្ទបានចាប់ ផ្តើមដំណើរការបញ្ឈប់ ការគាំទ្ររដ្ឋា ភិបាលថៃចំពោះសំណើរបស់រដ្ឋាភិបាល កម្ពុជាដែលស្នើដាក់ប្រាសាទព្រះវិហារ ចូលទៅក្នុងបញ្ជីបេតិកភណ្ឌពិភពលោក ។កាសែតឌឹណេស្ហិនរបស់ថៃរាយការណ៍ ថាសមាជិកព្រឹទ្ធសភាទាំងនោះបាន
PKD example (just so you can see if you have PKD installed – I was too lazy to figure out the phonetcs for the whole of the above text): kNom At dIG te
2008 Jun 26 [ Thu ]Test file for UTF-8 support of Khmer Unicode
[2009-08-05 Late edit to this file]
[I realized I had absent-mindedly left the HTML and meta codes inside a test page when I copied it into the blog, so I have deleted all of that stuff and provided a link to the original page as standalone HTML instead of sitting inside the blog.]
[Link to standalone version:] [http://www.panix.com/~dannyw/blog-misc/utf_test01.html]
I wondered what the result is of providing UTF-8 bytes inside a webpage defined as iso-8859-en. It turns out that the browser, at least Firefox, believes the 8859 and displays the Cambodian as junk. So I've changed my meta charset spec to UTF-8 and it seems to work (even though vi at panix is showing the UTF-8 characters as a bunch of hex escape codes).
Khmer unicode sample
សួស្តី
រីករាយណាស់ដែលបានជួបអ្នកទាំងអស់គ្នា ។
If the above shows as a bunch of junk for you, you presumably don't have a font available which handles those Unicode character codes. I haven't yet set up a font spec to try and let your browser know which font to try.
Phonetics version using IPA Unicode character codes
suə'sdəy
riik 'riɜy nah dail bɑɑn 'juəp 'neək ti'aɳ ɑh kniə
I should probably put in an example of my own PKD font as well, but as nobody has reported using it I feel too lazy.
2007 Mar 30 [ Fri ]My PKD font has succumbed to mission creep
I guess I wound up trying to put too much into it – I guess I jumped the shark when I put in the Vietnamese tone marks. My most recent version has added English prosody symbols (rising and falling tone unit symbols), but in order to access them you have to use the US International keyboard – ie the font is no longer by design 7-bit safe. Aargh. Incidentally, another reason for not progressing was that I found out about Microsoft's downloadable keyboard editor software MSKLC: [http://www.microsoft.com/globaldev/tools/msklc.mspx] They demand that you allow them to scan the machine you're downloading it to to establish that the software licence is kosher. This of course requires that you access microsoft.com using Internet Explorer. I did try using a web-cafe machine, but the scanning program just gave a non-committal error msg (who knows – maybe it never comes right out and calls you a pirate).
Also, it needs .NET Framework installed – a large irritating download but not (the last time I tried) as heavily restricted as MSKLC itself.
After a little googling I found MSKLC at another site, but it still wouldn't run on my kosher Windows 2000 machine – again the error message meant nothing to me. I have just started downloading MSKLC from another site: [http://www.zdnet.be/downloads.cfm?id=36567] but I have a vague feeling that's where I found it before – the one that doesn't work.
If anyone has actually managed to run MSKLC please contact me. Evidently an easily-installable keyboard config with easy access to non-7-bit codes would make a tremendous difference to the design of a phonetic font. For instance, you could just switch to a PKD keyboard config and use the keypresses I set up for the special PKD font to access the Unicode characters! Also, you could create a new, simplified version of the US International keyboard that would allow you to avoid the nasty bewildering glitches you sometimes get when you're trying to enter Cambodian using a Limon font and hit one of the extended key sequences by mistake.
Oh well. As usual, the best is the enemy of the good.
2007 Jan 26 [ Fri ]PKD now works automatically in IE
PKD is my phonetics font for Khmer, Thai, English and Vietnamese. I had nearly gotten it reasy a couple of days ago, but the .EOT version of the font did not work.
I took another shot at Microsoft's WEFT utility – the thing that converts a .TTF file into an .EOT file that MS IE can download automatically – and the new .EOT appears to work, at least on this machine.
A few notes from the struggle:
1. When I checked the blog today I was surprised to find that the link to the PKD test file – pkdtst01.html – did not work, and indeed I could not find the file at all. I uploaded it again. I don't know why it vanished.
2. In WEFT, "expert font creation" allows you to create an .EOT without having to point to a dummy .html file. It even allows you to add "offline" fonts which are not yet loaded into Windows. But I could not figure out how to enter multiple "bindings" – the locations which are allowed to host .html files pointing to the .eot. I wound up using the braindead "Wizard" mode. Remember that it insists on writing to the .html file you point it at.
3. When you are checking the behavior of the .eot in IE, sometimes the Dynamic Fonts Usage window doesn't come up, even though there is a binding problem. Other times it brings up a nice list of the allowed bindings. I don't know why it sometimes worked and sometimes didn't.
4. If you try adding "off-line" fonts, the oly way to do it is to point to a directory, and then it *also* searches all the *subdirectories* without warning.
5. This might not have been a problem except I had a bunch of old versions of PKD in a subdir, all claiming to be the one and only true "PKD" . WEFT picked one without any error message; it was of course the *wrong* one. Generally the "offline" feature seemed more trouble than it was worth, unless you have dozens of fonts to deal with I suppose.
6. Note that the bindings do not specify the allowed location of the .EOT file. You can put that anywhere. The allowed bindings only set the possible locations for .html.
7. In the .css which specifies the filename and path for the .EOT file, case is significant.
I am now going to get ready to announce PKD on Usenet.
2007 Jan 24 [ Wed ]Progress on my phonetic font PKD for Khmer, Thai, English and Vietnamese
I am still not ready to release it, but I have made a lot of progress.
I gave in and decided to add most of the additional glyphs needed to provide a phonological trancription of English, similar to major dictionaries: like the dictionaries, I did not trouble to provide the upside-down "r" officially needed to support the English "r" sound; I also did not add the glyphs needed to support the new special representations of eg the final syllables of "little" and "rotten" and "father" because I think they are based on foolish and inconsistent principles. (Also, I have run out of upper-case letters.)
I have also added Vietnamese tone symbols.
Here's a PDF (6 pages) showing examples of how PKD can be used for teaching Cambodian, Thai, English and Vietnamese: [http://www.panix.com/~dannyw/pkd/pkdsample01.pdf]
Version 0.99 can be downloaded from here: [http://www.panix.com/~dannyw/fonts/]
The .TTF version can be installed like any other .TTF file. If you get the message "...is currently being used and cannot be replaced" it probably means that your machine has been locked down so that you cannot write to the needed directory. Try unclicking the option "copy to fonts folder".
Currently the .eot version (which provides autoloading for Internet Explorer users) does not work across the internet, although it does work when the .html and .ttf are stored locally (on the C: drive). I need to fix it and upload a new version. When I've done that I'll add some more info and publicize it. Another issue is that Windows WordPad does not correctly handle the character widths, although MS Word does. (Because all the characters are actually ASCII, you can edit PKD in Notepad if you want to.)
Once you have installed the .TTF you can try this test file: [http://www.panix.com/~dannyw/pkdtest01.html]
The test page optimistically assumes that the .EOT will autoload in MS IE, which as stated above is not yet true.
2007 Jan 04 [ Thu ]Still not ready to release new version of my phonetic font PKD
I had previously intended to write one document just explaining how to use PKD, and another document explaining the reasoning behind it. Well, after a while I realized that I couldn't really disentangle those two goals without resulting in a lot of duplication, so I decided to finish some research into phonetics and the International Phonetic Association's alphabet, also called the IPA, which seems to have changed a lot in the 40 years since I first used a phonetic alphabet.
Well it turns out that the IPA's standards are just amazingly arbitrary and inconsistent. I thought I could support most people who would like to use IPA glyphs with a small subset, but it turns out that the "conventional" system for English alone now includes a lot of symbols which I consider worse than useless, especially for the learner, especially the symbols for the unstressed syllables in "rotten" and "little", and the symbol (consisting of two characters) for the diphthong in "fight".
I had been intending to throw in the IPA section of DejaVu plus the tone symbols so that people could use PKD both for my own super-easy stopgap system or for all-singing all-dancing IPA, but now I have reconsidered and feel like stripping PKD back down again. Anyway, I don't feel like I can produce the explanatory document till I've absorbed the IPA scheme and been able to rebut it, and I can't finalize the font till then, so please continue to bate your breath.
2006 Dec 23 [ Sat ]Google.com now available in Khmer -- and works with Firefox
A couple of weeks ago I saw that Google now worked in Khmer. Indeed, they had done the same irritating thing as they did for Thailand, and make it come up in Khmer automatically if they detect your IP is in Cambodia.
It worked in IE but it didn't seem to work in Firefox: it just displayed the Khmer text as boxes, so I always had to click on the Google in English link. (I don't use non-session cookies.)
Today I noticed that it actually worked. The machine I'm running on right now has a KhmerOS font installed, although the keyboard driver is absent; I'm guessing that's all Google needs. I have to say I was none the wiser after I looked at the source for the page, however. Presumably Google has to detect the browser type and offer downloads of the .EOT version of the font for IE, or just send a call for the font to Firefox.
Conceivably also they only just made this fix for Firefox.
Incidentally when I suggested yesterday that my gf ask the staff in the internet cafe for help in getting Google to work in Khmer they said something like "huh? Google doesn't work in Khmer".
2006 Dec 21 [ Thu ]Good links for using MS Word, eg setting up normal.dot for Khmer
One of the major pains in setting up Word for Khmer is disabling all the keyboard macros which use ctrl-alt and ctrl-alt-shift combinations which are necessary for Limon fonts (and others). I don't trust the pre-rolled normal.dot files you find, because there's no indication of which version of Word they were created from and mixing the .dot version and your executable version is guaranteed to cause a lifetime of regret, so I have to laboriously go through all the bazillion possible options manually (and every few weeks I find another one I missed).
The following is a great user's guide for using MS Word for legal documents, but the advice is applicable to anyone using Word for long structured documents: [http://addbalance.com/usersguide/index.htm]
That page says it was updated as of 2001 (I guess when Microsoft still struggled to make headway against Word Perfect in legal offices), but the templates page is dated 2005: [http://addbalance.com/usersguide/templates.htm] and is an *excellent, excellent* guide to how normal.dot and the other templates work, far more informative than any other Office/Word docs I have ever seen.
This page includes a link to a "Shortcut Organizer", a bunch of Word Basic routines dated 2003 which apparently makes it easy to organize your keyboard macros between templates: [http://www.chriswoodman.co.uk/Shortcut%20Organizer.htm]
If you don't trust macros, here's an explanation of the manual procedure: [http://addbalance.com/word/movetotemplate.htm]
Here's the Wikipedia explanation of normal.dot: [http://en.wikipedia.org/wiki/Normal.dot] which links to the following lengthier description: [http://pubs.logicalexpressions.com/pub0009/LPMArticle.asp?ID=151] The latter includes the following tip, which would have saved me some teeth-gnashing:
And when you're attempting to hunt down your Normal.dot
template, the fastest way to figure out where it's located
is to click Tools/Options/File Locations. There you'll find
a path to your default template directory. The path may be
long and truncated so you can't view the full path. But
you can click Modify to move to another dialog that fully
displays the path.
The following view of templates from the "Dummies" range of books may also be helpful: [http://www.dummies.com/WileyCDA/DummiesArticle/id-333.html]
I didn't know that the behavior of Word has changed in recent versions. It used to automatically re-create normal.dot if you deleted it; it no longer does: [http://wordtips.vitalnews.com/Pages/T1229_How_Word_Treats_Normaldot.html]
You may find these articles illuminating also: [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/FileProperties.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/PaperSize.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/Styles.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/OtherThings.htm]
Note: One of the word.mvps.org documents above describes setting the paper size, but somewhere else I remember seeing the remark that normal.dot does *not* set paper size. Oh well.
My updated PKD font not ready yet
You can still download the 0.90 version: [http://www.panix.com/~dannyw/pkd/test2/pkd-v0p90.TTF]
but I haven't produced the improved version yet. For one thing, I noticed an error on one of the characters used for English. For another thing, the whole issue of phonetic transcription in English is rather fraught. My original version was, I thought, quite adequate for people to use, but I have gotten caught up in general considerations on phonetic systems. Not only are there umpteen candidate character sets used in different dictionaries, but the "official" IPA system is, to my ear at least, inconsistent and misleading.
My intention was to provide an *easy* way to enter phonetic characters, so I don't want to provide a full set of everything possible; anyway that already exists, in the IPA section of full Unicode fonts. So I need to pick a set, and satisfy myself that that set makes sense relative to other candidates, which is not easy, and makes me understand why dictionary editors each seem to choose a different system.
Incidentally PKD is based on the DejaVu font: [http://dejavu.sourceforge.net/wiki/index.php/Main_Page
] 2006 Dec 11 [ Mon ]Progress with my PKD phonetic font for Khmer
I have made various changes and figured out a lot of weird inconsistencies and misleading docs and right now I have made a version of PKD which fixes a lot of the problems with the old version.
1. The "embeddable" flag is correctly set, so Microsoft WEFT and Adobe allow you to embed the font in webpages and pdfs.
If you're using IE, this link to an HTML file should download the .eot version of my font needed for it automagically, perhaps after a prompt if you've set IE not to do automatic font downloads: [http://www.panix.com/~dannyw/pkd/test2/pkdtest09-01.html]
It won't do anything for Mozilla/Firefox because they don't work with .eot files. There used to be another system for Mozilla, I think using .prf files, but it died and Firefox no longer supports it.
2. I have changed the name of the font to PKD, instead of including the version number in the name. I had done the latter because for testing I wanted to keep multiple versions installed simultaneously, but I think having a single name for multiple versions makes it easier for users to upgrade.
3. Here's an example of a .pdf file with the font in it. It's not a very *good* example; it's just something I was working on. Most of it's ordinary English text; the Pkd is just a few lines right at the end: [http://www.panix.com/~dannyw/pkd/test2/phonetics01.pdf]
The final sentence actually shows off using PKD to represent English phonetics, which is actually not as easy as you might think as there is no standard to follow.
...Aagh, I've noticed a couple of errors in the Khmer; oh well.
4. Here's the font itself (version 0.90). It's free: [http://www.panix.com/~dannyw/pkd/test2/pkd-v0p90.TTF]
I haven't posted this info in my special PKD folder yet because I want to do some more tweaks and document stuff better first, and then announce the upgraded version (something like 0.95) on Usenet.
Responses: 2
Name/Blog: John
URL: John@pdscambodia.com
Title: Khmer Language Font
Comment/Excerpt: They use the Khmer OS font with the Khmer Language program on www.wsslanguage.com
Name/Blog: The Boss
URL: http://www.panix.com/~dannyw/weblog/
Title: What my PKD font is for
Comment/Excerpt: 2008-01-27 John seems to have misunderstood what my PKD font is for. It allows Cambodian (Khmer) to be *transcribed phonetically* (for the benefit of people learning Cambodian). The Khmer OS font package is a set of Unicode fonts for allowing people who can read Cambodian to enter Cambodian using Cambodian spelling order/conventions, instead of a sort of visual order (an effective kludge which has various drawbacks), which I have blogged about before. []
Font embedding problem
A while back I created a font which allows you to easily create the phonetic characters used to romanize Khmer text in Huffman (along with other useful stuff like Thai and even English): [http://www.panix.com/~dannyw/weblog/nolist/pkd/installing02.html]
However, it had a problem: even after I fixed a bug which prevented the font from being embedded in PDFs, Microsoft's WEFT tool still refused to allow the font to be embedded.
I am going to take another shot at this *this week* and even if I can't figure out the WEFT problem I am going to reissue the font with the PDF problem fixed (along with a couple of other slight fixes). (I apologize for the long delay.)
The following is an overview of the embedding issue.
Fonts can be embedded either in a PDF or in a website. In either case software is supposed to check magic bits in the font, presumably set by the font creator, which define whether he wants the font to be embedded or not.
Good intro to the problem, basically a bug in Fontographer: [http://www.politechbot.com/p-03506.html]
Tom7 wrote embed.exe to twiddle a bit in your font files to allow embedding: [http://www.andrew.cmu.edu/user/twm/embed/]
He alaso wrote a very readable intro to font creation in Windows with Fontographer, although funnily enough it makes it sound like there's no embed problem: [http://www.andrew.cmu.edu/user/twm/makefont/]
A description of the issue on webpages which has a clear definition of the magic bits: [http://members.tripod.com/~bhaavana/embedded/faq.html]
2006 May 04 [ Thu ]Using Khmer in Windows: Limon, USA International, Unicode, usp10.dll
I recently upgraded to a new drive on my laptop, and have tried to keep notes of what I needed to do to enable Khmer, both using the Limon-type fonts and Unicode. The following is a list of what I had to do. I have made a lot of postings about related issues, eg creating web pages that use Limon fonts, which are contained in the folder for this article (in reverse chronological order): [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/index.html]
Several of these stages are laborious and tedious, especially setting up the normal.dot template for MS Word, but once that has been done you can copy it to any other machine (with the same version of Word). More significantly, you need some familiarity with Cambodian, so if you know very little Cambodian and need to set up your computer for your girlfriend several steps will be very difficult.
On the other hand I think much of the information in this document will be useful to anyone who needs to set up any language using the "US International" keyboard.
1. I'm assuming you start from a clean copy of Windows 2000 or XP with service packs and whatnot already installed.
2. In order to handle Khmer Unicode, you must have a recent copy of the usp10.dll file installed. My understanding is that this is provided with XP SP2. I use W2000 myself, so I needed to get usp10.dll from somewhere. The easy way is to install MS Office 2003. Another way is to copy it from an existing installation, like an internet cafe. Another way is to join MS Volt. I tried to do so, and it appeared to work, but then some problem happened: I vaguely recall it wanted me to set up a MS Passport account, which is not something I want to do on an internet cafe machine.
3. For Limon-style fonts, you need to install the fonts themselves. They're easy to find in Cambodia of course; somewhat less easy if you're outside the country. There is nothing special about the font installation itself. (The "magic" needed to, for instance, position the vowels above or below the consonant is already built into the standard font system, and does not need the extra features in usp10.dll, which is only for Unicode.)
4. To actually use the Limon fonts, you need to install the "USA-International" keyboard. (As seems to be standard in Windows-speak, a term which normally refers to a piece of hardware is used to refer to a software driver for such hardware. Death to Microsoft!)
Go to Control panel – Regional and language options – Languages – Details – Text services and input languages – Settings. Select Add, and select Input language: English (United States) and keyboard layout/IME: United States-International. Click OK, going back to the Text services... window. Under Preferences – Language bar, I like to make sure that the task bar shows a button for the language type (eg EN for English, CA for Cambodian (Unicode), etc). This appears to be the default when you install a second keyboard, but you may be starting from a different setting.
5. Now click on Key settings. By default, every language you install goes ahead and grabs some key combinations to allow you to flip keyboards without using the mouse. I think this is an absolutely terrible idea. I advise you to check the list carefully and delete any ctrl-alt or ctrl-alt-shift combinations that are needed for Limon. (MS seems to specify only left-alt combinations for this purpose, so if you run into problems entering Khmer, try using the right-alt key.) Internet cafes in Cambodia tend to install keyboard layouts for Chinese, Japanese and Korean, so the machine I am currently typing on has a long list. At a minimum, delete the "switch between languages" function, left alt-shift, which can cause *extreme* confusion if you sometimes hit the keys in different order.
6. Now you need to change the current keyboard layout to US International. Unfortunately Microsoft made the taskbar display *more confusing* in XP. In Windows 2000, you can left-click on the language symbol (EN), and it will show English (United States) – US as one option and English (United States) – US International as another option. In XP, the menu just shows English (United States). To get a different keyboard you need to right-click the EN and select "settings". Even then, I cannot find a way to just change the *current window* to US-International. Instead, you have to change the *default* language to English (United States) – United States-International. Then click Apply, then OK. (Maybe *this* is why they provide the blasted keyboard shortcuts.) Be aware that if you have "US International" selected when typing English text, it will do strange things when you try to use the single and double quote characters, because they are intended to start multikey sequences for European accented characters; to get the ordinary quote characters, type a space immediately afterwards.
7. You should be ready to try entering text in Limon now. I personally use a Limon keyboard at home, but it's not really necessary: the only Limon keyboards I've seen at Internet cafes are ones I donated. However, you do need a "cheat sheet" to check where the characters are. In Cambodia it's easy to get a printout of this from computer and cd stores, eg PTC on Monivong, but I haven't been able to find a downloadable version. I have some blurry photos of an actual physical Limon keyboard on this blog somewhere.
Load Wordpad and try entering some text. If any ctrl-alt combinations produce usable characters – eg ctrl-alt-z produces a jerng thaa – then US International is functional.
8. A huge problem however is that many kinds of software grab the key combinations that you need. Some – like programmer's editors – simply discards ctrl-alt, eg producing the character for "g" instead of ctrl-alt-g. Other software binds many key combinations for other uses. I will describe what to do for MS Word below.
9. Note that when you make these changes to Word, it saves them in its template file: normal.dot. This is stored in a hard-to-find location. On my current machine it's at: c:\Documents and Settings\QW02\Application Data\Microsoft\Templates
You can check the location with Tools – Options – File locations.
You need to understand that when you change the template, it affects *new* files – not *old* ones created with a different template! If you want to add Khmer text to existing files, you'll have to create a *new* file and paste the old file into it.
You can find pre-fixed versions of normal.dot included with the khmer fonts in many cases, but these appear to have been created under old versions of Word, and may contain all kinds of undesired settings, and macros. If only because of the version issue – MS Word is notorious for flaky behavior when you mix versions – I think it's worth the effort to create your own clean normal.dot so I describe the process here (as far as I know this info is available nowhere else on the web, and is probably useful for any language using US-International).
Additonally you may have important settings stored in your existing normal.dot file which you want to preserve.
10. You will have to disable *many, many* "keyboard shortcuts". In addition, it has many automation features enabled by default which work *very badly* when you're entering text in Khmer. (The first one you will probably notice is the one that automatically capitalizes words at the start of a sentence!)
-1. First go to Tools – Automation and turn off *all* the autocorrect options
-2. Then Tools – Options – Spelling and Grammar – turn off all options (Some of these sound like the same as the Automation options but apparently they aren't)
-3. Then Tools – Customize – Keyboard. I could not find a quick way to do this: you have to laboriously click through every single blasted function checking for ctrl-alt and ctrl-alt-shift combinations. This took me about 30-60 mins of work. I advise you to quit out of Word after just a couple of minutes and restart it to check that your changes have been saved in normal.dot as expected, before going on to complete the job.
-4. However, this interface does *not* allow you to change every keyboard shortcut! There are still several which need a different trick. This was shown to me by Piseth, seemingly the most clueful guy in this internet cafe. You can use the "symbol" feature to override *any* shortcut (as far as I've checked). Props to Piseth!
– 1. Go to Insert Symbol (the route to this seems to be different on this machine to the route on my laptop, so I'm not sure what the default route is). In the Symbols tab, select a Limon font, eg Limon S1. (Word may display the font name in its own font, which makes it very hard to read at the character size in the dialog).
– 2. *Then* select "(normal text)" as the font. In my tests, this has the (non-obvious) effect of allowing you to select the desired glyph from the Limon S1 font at this time, but subsequently – in use – the system will grab the character with the corresponding *character code* from the current font. So the keyboard shortcuts we set up will work even when we're in the middle of typing text in Limon F3, or whatever.
– 3. Now you have to *eyeball* the font table shown, and compare it with a keyboard layout and the list of undesired shortcuts below to locate which shortcuts need to be reassigned. Regrettably, the font table is very hard to read at the font size MS used – diacritics are particularly bad. I found myself using guesswork sometimes.
– 4. At the moment I have only found two undesired shortcuts in Limon: cltr-alt-shift-hyphen, and ctrl-alt-equals. By contrast, the Unicode keyboard seems to have an assigned function for *every* key combination. I have not yet keyboarded much in Unicode so I don't have a good list. However, certainly ctrl-alt-5 – which produces the "euro" character *and* switches to the Times font – needs to be fixed. (It's not necessary for Limon.)
– 5. Piseth actually created his own normal.dot casually, simply by keying in Khmer text over a period of about a month and fixing each undesired shortcut as he encountered it.
11. The rest of this info is for Unicode only.
12. The files needed can be downloaded here: [http://www.khmeros.info/drupal/?q=en/download]
You probably want:
-1. Khmer Unicode Installer for Windows
-2. Documents
The "installer" includes the following features:
-1. Searches for the usp10.dll file on your system and makes it available to all programs (more difficult than it sounds, because this is a protected system file)
-2. Installs several Unicode Khmer fonts
-3. The Khmer unicode keyboard driver (has to be designated CA, which officially stands for Catalan, not Cambodian)
The documents include a PDF of the keyboard layout and a PDF describing how to use it (distinctly different from the Limon layout).
13. If you are using the Symbol trick to reassign shortcuts while inside Word using the Unicode keyboard, remember to switch back to the "US English" keyboard when you're entering the key combination! If you don't the Unicode keyboard driver will dutifully send not the key combination, but the Unicode character code to Word. Confusion will ensue.
14. The display of characters while you're actually entering them is a little disconcerting. Suppose you enter a consonant, then press the jerng key to get a jerng consonant, then the key for the desired jerng consonant. At that point the jerng consonant *still* displays above the line. Probably if you understand Khmer better than me this makes sense. Anyhow, once you enter *another* consonant the one you wanted to be jerng will be shoved under the line, as desired.
2006 Jan 15 [ Sun ]Unicode Khmer keyboard now available in Phnom Penh
I had been wondering when these keyboards would show up. I spotted one today at Pacific Data Systems on Street 63 (a good place – it also has the Limon-layout keyboards). They said they arrived in the last couple of weeks.
It was just 7 USD. I happily bought it. It seems low quality (vague key action and low case rigidity) but the keycap printing seems adequate.
I haven't played with it yet. The shop provided a free setup disk for it. A few weeks ago I found what appeared to be a very easy-to-use setup program downloadable from the "Khmer Software Initiative": [http://prdownloads.sourceforge.net/khmer/KhmerUnicode1.2.FindUsp.exe?download]
From the description, it sounds like it does *not* include the all-important updated "usp10.dll" file, which is only really available as part of MS Office 2003. *However*, it claims that it can *transfer* this file from MS Office into the system32 folder to make it available for all applications, which is essential and *not easy* to do manually.
I tried running that update on my laptop and it did not barf, although since I had *already* gotten unicode working the fact that it *still* works is not a strong recommendation.
I tried to show the keyboard off to my local internet cafe but they didn't seem very interested. They do have Unicode installed – on *one* machine.
Incidentally, I could not see an announcement of the availability of the new keyboard on khmeros.info. However, they do have an announcement of locale files for khmer. I was all excited because I thought it might provide Perl locale features, but seemingly not. However, they are interesting.
This one: [http://www.khmeros.info/download/km.xml] has the names of various *other* languages *in Khmer*, as well as date formats, the "riel" symbol etc. *Very* interestingly, the Khmer displayed correctly! When I checked, another Khmer page at khmeros.info displayed fine! It looks like this machine (the one I am now using at the internet cafe) *can* use Unicode! It appears to accept the laguage designation "km", at least in Firefox. Woo! have to try a few other machines.
The other locale file was a broken link, but the correct link is here (my advanced hacking skills tee-hee): [http://www.khmeros.info/download/km_KH.xml] It just shows number formats, including currency.
Incidentally, they have a help file *in Khmer* for how to do email. I'm sure it's aimed at people trying to use Unicode. It's probably beyond me, but I may get my girlfriend to check it out: [http://prdownloads.sourceforge.net/khmer/Moyura1.0.7-km-KH.exe?download]
Later: I tried this other website: [http://www.forum.org.kh/en/index.shtml]
It provides versions in "Khmer" and "Khmer Unicode". The "Khmer Unicode" version worked and the "Khmer" didn't. The "Khmer" was set up in .css I supose – anyhow, I'm not sure why it failed. But certainly this machine (PC11) seems happy with Khmer Unicode.
Here's another Khmer Unicode install procedure. It needs a lot more manual steps: [http://www.khmer.ws/unicode/windows.asp]
Also, interestingly, it seems to set up Khmer with the designation "ca", which is officially Catalan. I don't think Firefox is working that way.
2005 Nov 10 [ Thu ]My new font for Khmer phonetics: "PKD"
I have created a new font, based on a public-domain font, which contains the phonetic characters you need to represent Khmer as used in Huffman's books. I've named it "PKD" – phonetic Khmer Danny.
I had tried to use existing phonetic fonts, but they were very hard to use. My font is easy, because you access the funky characters with nothing more than the shift key – it's not like Limon, where you also need to install a special keyboard handler, and the non-ASCII characters get mangled by most programs: with PKD, programs just see the phonetic characters as upper-case ASCII characters, so any existing editor can be used. (Limon has to use umpteen characters to represent the enormous Khmer character set; my font only needs to identify the *sounds*, which are far fewer.)
If you don't have my font enabled, a sample of text will look something like this:
kNOm jOG tIv bAntup tIk
which will not be mangled by anything – unlike the "1/2" signs, degree signs and whatnot that you get from Limon. (Can anyone guess what the above sentence means?)
When you do install my font and set that string of characters to use it, it'll show the right phonetics (assuming I gave the correct pronunciation myself).
It's a free download, about 35 kB. I hope a lot of people start using it.
[http://www.panix.com/~dannyw/weblog/nolist/pkd/]
Incidentally, it also contains English phonetics and Thai tone marks.
2005 Nov 06 [ Sun ]I'm working on a new phonetic font for discussing Khmer
I've naturally been considering setting up this site to display Khmer characters ever since I started learning Khmer. I was put off by several things:
1. The Unicode system for Khmer which has never quite taken off
2. Possible lossage for users who do not have Khmer fonts installed already
3. No suitable phonetic font without copyright restrictions
I think I now have a way around at least the last issue.
Recently it occurred to me that I could just hack a public-domain font for the few characters I need. Also, since my intention is just to use the font for phonetics, I don't need upper-case characters – which means I can use the shift key as an ultra-simple way to access the funky characters needed (which also means that the character set will pass through any system designed for standard ASCII).
I roughed up a font last night which can (I think) handle all the characters needed for Khmer *and* English – I wanted the latter because many questions of Khmer pronunciation have to consider dialects such as American English.
I want to play with it for a while to minimize obvious blunders (for instance I want to make sure random browsers can download it automagically), and then I'll make it available here – and start using it.
2005 Jun 24 [ Fri ]
Open-source internationalization report discusses progress on Khmer and computers
This page describes the report and dates it as 2005-06-06:
[http://www.iosn.net/l10n/foss-localization-primer/]
This is the report itself in PDF:
[http://www.iosn.net/l10n/foss-localization-primer/foss-localization-primer.pdf]
It does refer to the "official" status of Khmer Unicode but I am now wary of treating *anything* as official where Khmer Unicode is concerned.
Responses: 1
Name/Blog: Beth
URL: http://beth.typepad.com/cambodia4kidsorg
Title:
Comment/Excerpt: Thanks for this! []
Problems singing in Khmer
I posted a story about the pronunciation of Thai tones which pointed out that *singing* is hard to explain if you go by the standard explanations of what Thai tones are.
It recently occurred to me that singing must be quite hard in Khmer too, for several reasons:
1. Like English, words usually end in a consonant, not a vowel (which is whay opera has always sounded better in Italian)
2. Worse, many words end in a glottal stop, which is an important *phoneme* in Cambodian, rather than happenstance as in English
3. Girls seem to be encouraged to both speak and sing in an unnatural falsetto
4. The sheer number of phonemes makes it even more difficult to find a rhyme than in English, so they tend to be unnatural and/or to break the metre
It is certainly the case that glissandos are much more common than in English, and are frequently coerced to fit within a single beat. This tends to give Cambodian popular music an antique, Margaret Dumont-esque flavor – especially combined with the shrill female parts.
2005 May 29 [ Sun ]
A news announcement about the status of Khmer Unicode
I have unofficially heard various things, so the following announcement: [http://www.camnet.com.kh/akp/english_news.htm] (search for "National Information Communications Technology Development Authority") was very interesting. (The above link text appears identical to the story I saw in the Cambodia Daily 2005-05-27 p18.)
1. The government has *still* not made a decision on Unicode: "draft plan for designing a Khmer-language font, pending approval"
2. "A keyboard in Khmer is not yet available"
3. Open Office is already being used in some govt offices. There's a download available here: [http://www.khmeros.info/drupal/?q=en/download/apps] but they seem very hesitant about its status.
4. "The software has been developed with the cooperation of local NGO Open Forum of Cambodia". Hmmm. I should check them out.
2005 Mar 31 [ Thu ]Review of "Modern Spoken Cambodian" by Franklin E. Huffman
I have referred to this book, somewhat disparagingly, in previous comparisons of books available for people learning Cambodian. Over many months of daily use I have formed a more positive opinion and would like add some comments.
1. Many potential users must be put off by the phonetic alphabet used to represent Cambodian. My own copy (used) was clearly scribbled on by some hapless soul who had never used a phonetic alphabet before: I can see his laborious "translations", eg of "haw" (phonetics) as "how", and I really feel for him. In fact the original intention of the book was to use recordings of actual Cambodian speakers to provide models of pronunciation, but as I have never come across such recordings on sale in PP I cannot recommend them. More seriously, I do not feel it is reasonable to expect adults to reproduce sounds merely by hearing them. Just as jerng letters chor and baa look indistinguishable to the beginner (or me, after one beer), the sounds will blur together in the ear. The detailed discussion of how to *produce* Cambodian phonemes in the companion volume "Cambodian System of Writing and Beginning Reader" strikes me as essential, but of course it is embedded in the forbidding study of the Cambodian alphabet, which put me off reading it entirely for months.
Furthermore, I would be surprised if any Cambodian can read these phonetics by any more than guesswork. This makes it almost completely impractical to use these books as intended, ie with the rote training administered and monitored by a Cambodian national.
On the other hand, I am convinced that in the beginning stages Cambodian *must* be taught using phonetics, and the rote training as listed in the book seems well thought out. So I wish somebody would update the book by adding Cambodian script for all text (including explanations), and recreate the audio recordings, and strip out the rote exercises (which add tremendously to the bulk) for a teacher's version. (If the student sees the answers they do not properly have to dredge them from short-term memory.)
I would also like to see a short version of the information on producing phonemes extracted from\ the companion volume.
2. Likewise, I wish someone would reorganize the dictionary sections, discarding the ludicrous alphabetization scheme and adding a Cambodian-script to English section. As well, they might include listings for all the words shown in elided versions in the text. I am not sure whether Huffman's strategy of routinely providing elided versions in the text is actually reasonable, but if that is maintained I think it is necessary for the beginner to be able to look them up. Incidentally, I do appreciate Huffman including proper names in the dictionary, although I think it would do no harm to have an additional section listing most common Cambodian names whether listed in the dictionary or not, since when they appear in Cambodian there is no capitalization or other hint that a name is involved (well, there is usually a space).
3. The lack of typographical variation available to Huffman (of course, at that time, having phonetics was state-of-the-art, and it appears to have been typeset using a Varitype) makes the text layout impenetrable to a modern reader. If the text were reorganized, one could use the opportunity to make it very much more attractive.
4. In addition, many items would need to be updated:
-1. Although I have still very little Sprachgefuehl in Cambodian, I get the impression that many personal pronoun usages have shifted considerably since 1970. Specifically, it seems to me that "loak" is now simply a polite term which anyone can use to anybody, whereas "neak" is a more "informal" word which has lost the demeaning associations of that time. I could certainly be wrong (as a foreigner I live in a bubble outside normal usages), but would hope that this matter could be addressed.
-2. Clearly much vocabulary needs to be added: cellphones, dvd players, CDs, internet cafes, NGOs, mines, corruption, etc.
-3. The content of the texts should be checked. Many landmarks referred to are gone, or the references are now misleading.
-4. As so much tourism is related to Angkor Wat, multiple sections could be added relating to the history, travel to and from the temples, etc.
-5. A few illustrations might be nice: a map of the world, Cambodia and Phnom Penh, for instance; names of parts of the body; names of clothing items, etc; a few official forms with translations; a Karaoke CD cover; some cuttings from newspapers...
-6. Generally, I have grown more aware of shortcomings in other texts, and have started to see that when viewed as a unit with the "Cambodian System of Writing" and the tapes and the native instructor system, it does make sense.
Monty Python-style entry in the "Right" English-Cambodian dictionary
Many people still remember the old Monty Python sketch about the hapless tourist in England who has placed his trust in a phrasebook with incredibly bad translations.
I have never encountered one quite that bad, though I often see books which recommend words or expressions which evoke serious double entendres.
I often use the "Right" dictionary from Norton University. Its name reminds me of flash memory chips, most of which are the reverse of the facts, eg "Compact Flash" which is the largest available, "Smart Media" which had the fewest features, "Secure Digital" which will lock out your own data against you, etc. Ie, it is riddled with mistakes in the English. This does not usually bother me, as it reassures me that the text was probably originally created in Cambodia, and I am looking for subtleties in the Cambodian usage.
I think I noticed some slightly surprisingly vulgar word, so on a whim I searched for "fuck" (I guess I should apologize to anyone whose web search for "Cambodian fuck" leads him to this harmless page). Stap me if they didn't have it: the verb is listed as "ruam reaksaa". However, they did *not* have any marker for "obscene". (I just checked their introduction, and they do not mention any such marker.) This made alarm bells ring, and when I checked it in my "Modern" Khmer-English dictionary it lists this expression as meaning "to take care of each other".
Obviously a Khmer person may have no idea of the level of obscenity of this word (a surprising number of young people will say the word "fucking", meaning "bad", in the midst of everyday chatter in English or Khmer; they apparently know that it is deprecated but have little idea of how much). So he might conceivably look up the word, get the wrong meaning, and later come out with something like "if we feel tired after we look at Angkor Wat, we can go back to the hotel and fuck".
At any rate, such is my whimsy. I did wonder whether it is simply some sort of prudishness in not being eager to print an equally obscene word in Cambodian; I just searched for "joii" (listed in "Modern" as "to have sexual intercourse, to copulate (of humans) (indec.)") and it is not listed at all in the "Learner Oxford", which despite its name (no book name signifies anything in Cambodia) was apparently compiled by Cambodians (it lists "jia" as a "copulation relater...", an attempt to refer to the "copula" which sounds to me more like a porno book).
Incidentally, the "Right" has the peculiar feature of including many thesaurus-like entries (set off by grey shading). These seem very dangerous to me because the thesaurus concept rather relies on the user *already* knowing the connotations and needing only to be *reminded* of the possibilities. For instance, the word "childish" has the following words listed as synonyms: "adolescent, childlike, foolish, immature, infantile, juvenile, youthful". In order to get much out of this listing a typical Khmer user would probably have to laboriously look up every one of them, and might well decide on one too early.
A strange error in "Right" is printing "gymnast" as "sgymnast" as a page heading (p378), but in the correct place for "gymnast". It seems strange to me that such things can happen: if I were assembling a dictionary I would certainly try to use a computer to put everything in alphabetical order, certainly the English words. At a guess they may have started to use a computer system to create the headings, but then found it was buggy and had to hand-edit many or most pages.
This expenditure of effort may be what prevented them noticing that a hundred or so pages were printed out of order in my copy. That, and providing several colour pages showing flags of all nations. (Wtf?)
Responses: 2
Name/Blog: Lobin
URL:
Title:
Comment/Excerpt: This translation of "fuck" into the Khmer "ruam reaksaa" and back into the English "to take care of each other" puts a new light on the personal ads in the Phnom Penh Advertiser. If my memory serves me correctly, most of them are after someone "kind, polite and can take care of each others".
Name/Blog: The Boss
URL: http://www.panix.com/~dannyw/weblog/
Title:
Comment/Excerpt: I don't remember seeing those particular ads. But I have certainly concluded that ads use many interesting euphemisms. For instance, in guides to PP bars: "friendly" girls. []
The "zero-space" fonts from www.forum.org.kh
Although the Unicode system is apparently the new standard, it is difficult to install unless you happen to be running XP with MS Office. (I have finally gotten it running under W2K, and it is quite impressive, but I am not ready to write this up yet as I want to try a couple of real projects using it first.)
One of the great benefits of the Unicode system is the zero-width space idea: that is, to allow existing software to handle linebreaks intelligently, you provide a character in the ASCII space bytecode position which *displays* as a zero width (and you also provide some other bytecode which displays as a regular space). Incidentally, it has occurred to me that software must have some definition of which bytecodes are text and which are non-text (whitespace, punctuation, escape sequences, etc) but I was unable to decipher the impenetrable Truetype docs to see whether it is embedded in the TTF definition.
The standard Limon fonts are not zero-space, but such fonts are easily available from here: [http://www.forum.org.kh/eng/zero-space_fonts.htm]
ABC computer also has a very nicely-arranged page with various options (not including Unicode): [http://www.abc.com.kh/khmer-fonts.asp]
It includes a link to a version of the MS Word template file "normal.dot" which seems to be necessary. I have not figured out what this actually does, but it is certainly necessary to turn off various text-entry-checking features in MS Word, or they will utterly gum up the Kh text – auto capitalization springs to mind, but it all has to go.
In theory it ought to be possible to mix sections of English text and Kh text and persuade Word to automagically turn those features on and off in the right places, but I have found Word's style and language-switching features to be misleadingly documented and buggy. (And I have a vague recollection of similar bugs in the style feature back as far as Word 95 or so.)
One little warning: I think forum.org.kh are connected to the French, and are probably the reason my "USA International" keyboard option now displays as "Etats-Unis International".
2004 Nov 25 [ Thu ]Great website for using Cambodian on computers
I was chatting to a friend of mine the other day and I asked him about progress with Unicode fonts for Cambodian (Khmer). He said that the spec for a Unicode font has finally been produced and his company has been rushing to implement it. At present, he has hit a snag: for some reason MS SQL is not recognizing Khmer numeric characters as numeric, ie it is not allowing them to be entered into numeric fields; he hasn't figured out whether the problem is in the Unicode spec or in MS SQL.
He pointed me to a very good site for those interested in the topic: [http://khmeros.info/]
They have a confidence-inspiring page on installing Khmer Unicode. Unfortunately for Windows 2000 users, their recommendation for installing the all-important usp10.dll file is to install MS Office 2003 Final Edition! I can see why they say that as trying to manually install it filled me with gloom and terror, but I hate the idea of trying to install that bloated buggy dog just for one damned file. I wonder if I can delete Office 2003 and retain the .dll. I also wonder if previous versions of Office will still work with Unicode.
They have several references to Unix support, but it's not clear to me whether a stock version of Linux is easy to upgrade with the Khmer Unicode font. It does seem that individual Linux applications need to be upgraded to work in Khmer, but it wasn't clear whether that was just to add Khmer text labels to the controls, etc.
2004 Sep 14 [ Tue ]What does the word for "bed" really mean?
The more I try to learn Khmer the more I wish I could read a definitive text on the grammar. For instance, the word "mian" is typically defined as "have" or "to have", but is often used to express "with", bereft of any of the structure you would think a verb would need (eg "dail").
So you have to wonder what the dictionary really means when it confidently asserts that a certain word is "p" (ie a verb or a verb/adjective), especially when I notice that a Khmer book on English usage, which supplies Khmer equivalents for many other parts of speech, uses the English word "infinitive" in the text without translation.
Another point that occurs to me is how to establish the real meaning of a word. I always want to be able to refer to an English/Khmer section to look up a word I need in Khmer, and immediately look up all its *other* meanings in a companion Khmer/English section. (That's the *real* reason you need a Khmer/English section as a beginner – there's not much chance you'll be able to look up a Khmer word just by hearing it.) But the only book I know where the sections really somewhat *match* is the Tuttle "Practical". Others that I have checked – even if they have both sections in one volume – clearly have grabbed the two sections from completely different sources.
The dictionary just says that the word for "bed" is "grair deyk", but when you *check* that, it means "platform sleep". Now what that suggests to me is not so much a Western-style bed, but rather the literal crude platforms that you often see people sleeping on along the sidewalk. So now I wonder what they *really* call a Western-style bed. (My caution is based on a time when I was taking a trip from PP to Hat Lek: I asked for a "bontop dteuk" – bathroom – and was shown to a shack over the river with a hole in the floor.)
This issue even occurs between European languages. In German for instance, a "Bett" is really a "set of bedclothes". The store "Betten Rid", for instance, in Munich, sells bedlinen, and no beds, if I remember rightly. (Incidentally, one of the few reasons I have to hate the Germans is that the plural of "Das Bett" is "Die Betten".) Likewise, a "Federbett" means a duvet with a feather filling (which they often use *under* the sleeper as well as above).)
You can imagine what happens when I went looking for a store that sold beds...
2004 Sep 04 [ Sat ]
Maurice Bauhahn's "FAQ and Resources" page on Khmer revisited.
I've linked to this page before: [http://www.bauhahnm.clara.net/Khmer/Welcome.html]
I took another look at it tonight and was even more impressed, especially as he's been keeping it up to date (sooner or later Unicode *has* to get standardized...).
In particular I found a page about doing word breaks for Thai in Perl (hmm... maybe I should put this file under Thai): [http://www.bauhahnm.clara.net/Khmer/WordBreaks.html]
He says it would be easy to do the same thing for Khmer... hmmm... I don't even know of a word list for Khmer, much less one encoded in Unicode...
2004 Sep 01 [ Wed ]Some evidence for my doubling theory in Cambodian
My previous article on Khmer needing two words to say the same thing: [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/double01.html]
Tonight, while chatting to a bewitching woman in a bar, I tried to say that I was not fluent, using a word I had carefully looked up in the dictionary that evening: jrerl.
Puzzled glances and rapid jibber-jabber between the staff led me to conclude that something had gone wrong with my scheme.
Even after I pointed to the word in one of my dictionaries they did not know what I intended. Eventually they figured it out, and offered a different word: "stoat". One girl very clearly said that the word I had used *could* be used in my sense, but *on its own* was utterly insufficient. She very clearly said that a lot of Khmer words were the same way. She also (groan) urged me to get a Khmer teacher instead of trying to rely on books. (I would happily do so if I could find one who spoke good enough English; perhaps I should give up that criterion.)
Actually, I have been feeling that need for some time: I had already observed that even the best book I have gives very little idea of the connotations of the words, or whether they have become old-fashioned or even dropped out of use entirely. So even when I am not saying something actually wrong, I am probably saying something like "What-ho, cobber, how about it then, is that swell or what?" I was hoping that *reading* would allow me to bypass the inconvenient step of actually talking to people, but that's still coming along very slowly (although I am encouraged – it's a lot easier than reading Thai already, which I can really only do with some ease in cartoon books where there's a lot of visual cues).
It reminds me of the situation with the English lessons printed weekly in the Cambodia Daily. As well as many errors which make me think the lessons are created by a Khmer, the individual exercises give no explicit indication to the unwary Khmer student that they vary wildly in difficulty. One section involved nonstandard poetic usages which it seems to me are *cruel* to present to an earnest student who has not studied at Oxford.
Responses: 2
Name/Blog: Scott Pierson
URL: http://www.wsslanguage.com
Title: Help learning Khmer Language
Comment/Excerpt:
Name/Blog: Scott Pierson
URL: http://www.wsslanguage.com
Title: Help Learning Khmer Language
Comment/Excerpt: I know what you mean. You might want to check out this computer program. www.wsslanguage.com []
Why Khmer often uses two words that mean the same thing
Struggling to read Khmer (a single paragraph leaves me aching for a beer), I have become conscious of how often they use two terms which according to the dictionary mean the same thing, eg "jung bomphut" (end). I haven't seen a good explanation for this (other than "it happens"), but it seems to me that the reason is a practical one: Khmer has a heck of a lot of words with multiple and very different meanings, but by taking the intersection of those two sets of meanings one (usually) can reach an unambiguous sense.
Of course there are plenty of words in English with multiple meanings. But it seems to me that in many cases one can distinguish between them by incidental grammatical context. For instance "plate" meaning "dish" and "to add a metallic layer" (not to mention "provide a certain pleasurable service"). In addition, English has plenty of near synonyms which a careful writer can select to avoid ambiguity.
In Khmer, of course, not only are there no declensions/conjugations etc to give a backup to the intended sense, but in addition the syntax – it seems to me – seems highly lacking in logical cues. It may well be that someone with my paltry Khmer is missing a lot of subtleties, but at least to the learner the "sentences" seem to just run on and on, without conjunctions to clarify the intended logic (ie words like "so", "but", "yet", "then" seem much less frequent in Khmer, even if the dictionary says they exist).
It reminds me that what I really like about English is that it naturally favours complete verbs. I *like* sentences which have a subject, a tense, a mood and an aspect. President Bush the Elder was famous for his omission of the subject, but also he tended to omit tense and mood, tending to prefer the "ing" form. So when he was asked "What are you doing about Iran-Contra scandal?" he would reply "Makin' progress", leaving you with no idea what he meant. Had he said "My chief of staff has destroyed all incriminating documents" there might have been trouble.
I believe this issue is consciously understood by tabloid headline writers. They wish to make the most meretriciously salacious headline possible, even though the actual story inside is tame. So by eschewing actual verbs, they leave the headline unclear enough to lead the hapless buyer to expect more than is provided. For instance: "Michael Jackson in pedophile admission" can be safely used on the cover, even though the inside story is about some groping highschool teacher who complains that he only confessed to the cops because he didn't have MJ's high-priced legal team. (This kind of ambiguity is the basic reason for the frequent advice to avoid the passive mood, in which of course just the subject is left undefined. How much less clear is it to use a noun, which results in the omission of tense mood and aspect also?)
2004 Aug 19 [ Thu ]
The Windows "On-Screen Keyboard" useful for Khmer
A few months ago I wrote about the availability, at least in Phnom Penh, of special PC keyboards with the Limon font setup (using the "US-International" keyboard setup).
However, I just realized today that Windows 2000 and XP come with the "On-Screen Keyboard" utility. You can start it with Start – Programs – Accessories – Accessibility – On-Screen Keyboard. (I've been tinkering with it; haven't tried to use it for a real text.)
Then under "Settings" you can change the font and the font size. I suggest Limon S1 16pt. Unfortunately, I could not find a way to increase the size of the on-screen keyboard; even 16 pts is definitely too small for many characters to be legible, at least to a learner.
Note that you will not be able to access the ctrl-alt and ctrl-alt-shift characters unless you have set your keyboard to "United-States International". In this mode, normally intended for use with European languages, pressing certain characters before vowels allows easy access to European accented vowels. For instance, pressing the double-quote character then u results in the u-umlaut character. Nothing is displayed when you press the quote key; Windows is waiting for the second character to decide what to do. If you press space, it produces the normal quote char. If you press the quote char again, it produces *two* quote chars.
I can't find a reference for the explanation above. I *especially* can't find a reference for what happens when you're in the middle of typing *Khmer*. I get the impression that the vowel keys in European character sets are the vowel keys in Khmer, and the quote/colon/circumflex keys in European are either independent vowels or Western digit "6", so not normally followed by a vowel, but still.
I've been tinkering with it; haven't tried to use it for a real text. One thing I ran into is that irritatingly it sends the characters to the current application *in the font set by the current application*; it would be much handier if it sent back Limon while typing on the physical keyboard could still produce English.
Microsoft info: [http://www.microsoft.com/enable/training/windows2000/onscreenkeyboard.aspx]
2004 Jul 18 [ Sun ]The inevitable progression of the Khmer language
Ever since I first observed a Cambodian rapidly typing away in English on an ICQ chat session, I have been fascinated by their ability to adapt to a medium which was hostile to their native language. Indeed, I have for a long time had in mind a project of collecting such chat sessions in order to analyze the communication format. (Clearly there are ethical blocks. I don't want to grab private stuff, but what else gets discussed? And if I tell somebody *in advance* that I want to capture the thread, it's going to make them self-conscious.)
Here's a Slashdot posting about the general issue of minority languages on the internet:
Unless you have a majority multilingual ... (Score:5, Interesting) by kbahey (102895) on Saturday July 17, @11:49PM (#9728677) [http://slashdot.org/comments.pl?sid=114846&cid=9728677] ( [http://baheyeldin.com/khalid)]
Unless you have a majority of the visitors / participants that are multilingual capable, you have to separate the content of a web site by language.
I say this from experience on several newsgroups, then forums over the years.
It starts out simple: people who are early adopters often speak English, and can read English (e.g. programmers, ...etc. who know English anyway). Then as technology spreads among the less techno-elite, people who do not know English well want to express themselves in their native language.
In languages that use a non Latin character set, there is a phase where internet communication uses Latin characters to represent their own language. I have seen at least Hindi and Arabic written in Latin alphabet, with some modifiers. (Even some Euro languages lost some characters, like Scandinavian and Germanic languages, where the "O" in Torvalds lacks the stroke in the middle, and the "A" with the small circle, ..etc.)
There are various "dialects" used in these Latinized alphabets, and people learn one version or the other depending on where they learn it first.
This becomes a transitionary phase on these forums, where people will express themselves using this Latin based alphabet to represent their own language.
Then later, as their own language becomes more wide spread and accepted, more people get to use computers and the internet, and they perhaps do not know any language other than their own. This leads to them demanding that only their native language be used in forums that are about their country/society/language/...etc.
Anyone who speaks a "foreign" language in those forums is reminded that the primary language is such and such, and not to confuse others. Some take this as a matter of national pride, some take it as mere courtsey, others take it as common sense, and yet others take it as a mere form of communication. Depends on who you are, your outlook, and your biases.
That is what I have seen in several newsgroups/forums over the years.
So, this is the phase that Orkut is at right now.
Eventually, they may have to separate the content by language. Although there are barriers here, because Orkut is about "networking", and not just "discussions".
It would be interesting to see how this turf war gets resolved eventually, at least for those who are like me who like to observe the new frontiers that the internet have defined/merged/melted/setup.
P. S. In Canada for example, where there are two large groups speaking two languages, a majority of web sites give the option on what language to use at the very beginning. Forums are separated into two languages on many sites. There is a minority who are bilingual and can (and do) participate in the two camps. I imagine Hispanics in the USA, and Spanish speaking Anglos do the same on some forums.
2004 Jul 15 [ Thu ]
The "Wikipedia" has an interesting entry on Khmer
The "Wikipedia" is a server set up to allow anyone to contribute to information on any topic, resulting in a sort of free encyclopedia.
I had not thought about checking it for info on Cambodia – it happened to occur to me while I was looking at their entry on Spetsnaz! (I'm pretty sure that's pronounced "spetsnass", by the way, but I could be wrong.) Anyway although it's not very comprehensive it has some intersting new stuff, and links – eg it gives the language codenames under various standards, like "km" for ISO 639-1.
[http://en.wikipedia.org/wiki/Khmer_language]
I had certainly never heard the term "abugida" before but I'm sure it will make me a hit at parties.
One good link I found via the Wikipedia page: [http://www.omniglot.com/writing/khmer.htm]
I think the "inherent" vowels, represented in the chart of Khmer characters given in that link, should really be more like "aw" and "oh" than "a" and "o". I suspect the original reference used International Phonetic Alphabet characters.
It reminds me that I have been remiss in assembling a cheat sheet of the various fonts. Oh well, it's still on my list.
The fonts links given above seem mostly quite outdated. The following seems more or less up-to-date: [http://www.seasite.niu.edu/khmer/] and includes a useful link to an IPA font.
2004 Jul 11 [ Sun ]Weird similarities between Cambodian and French
There are a lot of French people in Phnom Penh, probably because of the lingering effects of the French colonial period – many of the older people still speak French, and there are many (delapidated) signs in French rather than English.
But it's struck me that there are also some weird similarities in the languages.
1. Neither language pronounces "s" or "r" at the end of a word.
2. The adjective is usually after the noun.
3. You usually say "loak" at the end of every sentence like you say "monsieur" in French
4. The negative is formed in two parts "awt... dte" like "ne... pas"
5. Both languages have a variety of nasal vowels (absent in English)
6. Both languages have "d's" and "t's" which elide to the "dt" sound (also found in Irish and American eg "Paddy" for Patrick)
7. Both languages have the palatal n sound like "manana" in Spanish ("oignon" in French)
8. The plural sounds the same as the singular (including pronouns, eg "il/ils" in French and "gey" in Cambodian)
All of which does not exactly add up to any mutual comprehension, but it does seem like it might be easier for a Frenchman to learn the language than an Anglo.
2004 Jun 22 [ Tue ]Website selling Cambodian-language books and course material
[http://www.101language.com/khmer.html]
I ran into this when I was trying to find the actual name of an excellent Cambodian-to-English dictionary I found some months ago. (Bootleg editions typically strip off original copyright info, although not always).
It appears to be "Modern Cambodian-English Dictionary" by Robert K. Headley Jr, reissued 1997 (original 1977). Their price (hardback): USD 158. (I paid less.) I intend to review this book separately.
The site also (at last) has some information on the hybrid players I've occasionally seen for making it easier to repeat the material on tapes or CDs: [http://www.101language.com/instant-replay.html]
However, it still seems to me the material would need to be preformatted to suit the player. Would it really reliably detect the beginning and end of sentences automagically?
2004 Jun 08 [ Tue ]Dictionary order for the Khmer "sanyook sanyaa" diacritic/vowel
The other day I wanted to look up a word – I think it was the word for "umbrella" – in the Khmer section of a dictionary. I couldn't find it in the vowels section for the first letter, so I thought "I'm doing this wrong – I should check the dictionary order for this vowel (the sanyook sanyaa)." Stap me, I couldn't even find the vowel in the list of vowels at the beginning of the dictionary!
Wondering if I was losing my eyesight or sanity, I struggled on and finally found the word among the *consonant* section for that consonant. (I was starting to think that this symbol only existed in Thai loan-words.)
When I checked the Huffman book, it turned out that for inscrutable reasons the Cambodians don't *classify* this symbol as a vowel, even though it's certainly pronounced like one. They classify it as a diacritic like the "kill" sign. Funnily enough, there's no section of my dictionary that gives alphabetic order for the diacritic marks.
This sort of thing makes me despair of my goal of producing some useful documents for people learning Khmer. Obviously, in making up my charts of the consonants and vowels, I *never noticed* that the sanyook sanyaa wasn't among the vowels. How many similar lacunae might there be?
2004 May 19 [ Wed ]Webpage with links to umpteen Cambodian fonts
I kinda stumbled across this. It looks a little out-of-date but it's interesting to see so many sources of fonts.
I also get the impression the maintainer of the page hasn't actually tried them, because there are no notes about compatibility with keyboards, etc.
[http://cgm.cs.mcgill.ca/~luc/cambodia.html]
2004 May 02 [ Sun ]Example website using Unicode for Khmer
[http://tech.khmerknowledge.com/]
This seems to be basically for Linux users, and refers, fortunately in English, to "KhmerUnicode Open Type fonts" as well as a converter program for Linux and Windows that converts texts in "legacy" fonts (presumably Limon) into Unicode:
Legacy Khmer to Unicode Converter Update 2004-02-03 17:52:49
by ...
An updated version of the legacy Khmer to Khmer Unicode utility is available. This releae contains a few bug fixes mainly for the Windows system.
Windows Download
cvt.zip
Usage
cvt infile.txt > outfile.html
The HTML source for sections of Khmer text just shows: <font face="Khmer OS" size=+1> I poked around and found a link for font setup here: [http://www.khmer.cc/community/t.c?b=16&t=907"] but I've tried that before and it didn't seem to work. Aside from anything else, one of the files is protected under Windows 2000 so you can copy over the new version till you're blue in the face and nothing happens. Worse, when I used a utility to do the copy before Windows starts protecting the file, it *still* didn't work... as well as being very scary.
Responses: 1
Name/Blog: John
URL: www.wsslanguage.com
Title: Khmer OS font
Comment/Excerpt: The website at www.wsslanguage.com for learning Khmer/Cambodian uses the Khmer unicode font. []
US-based Cambodian translator's web page
[http://www.magma.ca/~sary/tips.htm]
He has quite a few interesting pieces of information but I get the impression it is now out of date. He thoughtfully includes a gif of his keyboard layout, and it is interestingly not the same as the Limon layout; it's labelled Windows 3.1 though!
He doesn't say anything about Khmer in Unicode, or how to do Khmer on webpages; he uses PDF himself where necessary.
He discusses the character set here: [http://www.magma.ca/~sary/KhmerCharacters.pdf]
He counts *12* independent vowel characters, which is actually what I found just looking through the dictionary, but Huffman specifies 14.
The following site is interesting but demands you download their font and install it manually before you can enter. Oh well. Here's their idea of the keyboard layout for their font: [http://www.khmerlanguage.com/keyboard.php] which doesn't match anything either.
2004 Apr 28 [ Wed ]Dictionary order in Cambodian
For a while I thought I had found an explanation of Cambodian dictionary order but I was going to take the time to understand it sometime soon.
A few weeks ago I tried to find that explanation, and darned if I can.
By looking through a kosher Cambodian dictionary, I think I now know the rules, although they're insanely difficult. (When I was talking to Thais about language problems I would sometimes offer them an English dictionary and apologize for making them look up an English word, but they always said "oh no! it's so easy to look up words in English!" – and I don't think that was just Kreng Jai either.)
The following is a description of the sort seqeunce. I haven't figured out a good way to express it and this description is probably ambiguous in a lot of ways.
1. Is the first letter an independent vowel? if so that's the first search element "A".
2. Is the (next) letter a dependent vowel? If so, then check for any *trailing parts* of a multi-element dependent vowel. Then *set the vowel aside* for a while: "B".
3. If the previous letter was a dependent vowel then this letter has to be a consonant. "C".
4. Is the next letter a consonant? "D"
5. Is the next letter a dependent vowel for a *subsequent* consonant? (We already stripped off the trailing element of a multielement dependent vowel in "2".) If so then check for any trailing parts: "E".
6. Is the next letter a *subscript* consonant? "F"
For subsequent characters, repeat steps 1-6.
Now the sort order is as follows: ACDBEF (ACDBEF, ACBDEF. ..)
The dictionary will tell you the sort order for consonants and the sort order for vowels *separately* but does not give the explanation above. Many dictionaries also leave out the sort order for the independent vowels, but you can find it.
Likewise the sort order for the accents is a little hard to find, but is available. Basically everything else sorts first, and then if an element is otherwise identical it sorts in accent order.
Personally I found it very wacky and counterintuitive that a succeeding consonant is sorted before a vowel which belongs to the preceding consonant. It's also wacky that a subscript consonant sorts after a vowel... but oh well.
I intend to give a fuller and hopefully both more accurate and consistent version of this when I can sit down next to a good dictionary (for element sort order and examples) *and* a computer which handles Khmer text properly (and output to pdf).
"Khmer Grammar" book
Yesterday I finally got around to assembling a cheat sheet with the English values of Khmer characters in uksor mool as well as uksor jriang. Just hours later I found a book about Khmer, in Khmer, that has comparison tables of several Khmer typefaces, as well as a great deal of other useful info. Still, I plan to post the cheat sheet as soon as I've edited it.
I wish I could read the book more easily. Right now it takes me a couple of minutes to puzzle out a single sentence (even if I can figure it out at all). Fortunately he provides English translations, for some reason, of most titles and headings.
The book was 0.75 USD at the International Bookstore on Sihanouk Boulevarde, just W of Norodom. The title, "Khmer Grammar", is shown in English as well as Khmer.
The author's name is listed on the rear cover as Chon Chiang; it also lists his birthdate, the names of both parents, and a cv! I don't know if that's common in Khmer textbooks. I'm not sure what the name of the publisher is: there's a word or phrase "roksaa seyt-tee" on the cover which might be it. It was published in 2002.
I can't understand his intro, but he certainly references several grammar texts in English. I wish I could figure out what his intentions were. I get the impression he may have tried to shoehorn Khmer grammar into English concepts, which of course would make interesting reading anyway.
He lists complete consonant sets in five typefaces, plus a few characters that are some sort of ornamental woodcut face.
1. Uksor jriang: this is the "slanted" style, otherwise similar to the standard typefaces in the textbooks.
2. Uksor chor: the "standing" style, standard in textbooks in various versions, including Limon S1. Other than being upright, it's the same as the slanted style.
3. Uksor khawm: the "Cambodian" style (khawm is another word for Cambodian). Huffman says this is more common than the true uksor mool (see below) and is commonly referred to incorrectly as uksor mool. I would say it is *far* more common than uksor mool.
4. "True" uksor mool: this is significantly different in several ways. In particular, the first character "g" has an extra middle leg that makes it look peculiarly like a bp. In adition, the subscript "r" is a large rounded thing. This is one of the (many) letters that have confused me in handwriting. The books say uksor jriang is based on handwriting, but certainly everybody writes the subscript r from uksor mool.
5. An ornamental version of uksor khawm that has a sort of drop shadow effect. I'm not sure why he includes it because it's basically identical to uksor khawm. However it reminds me that for some reason when Cambodian TV broadcasts karaoke they always use a sort of white surround to the characters that makes it amazingly hard to make them out.
There's a couple of interesting general points about the typefaces.
1. There's no real listing of dependent vowels. Is it really true that the differences are not significant? I certainly find it hard to distinguish between the "ey" and "aa" vowels in uksor mool.
2. The independent vowels are listed for only one font. Also, the order does not seem to match my dictionary, and there are a few dependent vowels sprinkled in for no reason I understand.
3. I can't see a listing of diacritics or punctuation (but maybe I could if I could read Khmer better). Numbers are also missing. There is no discussion of handwriting (not that that fits under the English word "grammar", although I don't know if typefaces do either).
4. When I asked a Cambodian guy to check my cheat sheet, he found it very hard to look through the independent vowels, and for a couple of minutes was trying to figure out even *how many* there should be, until I took pity on him. Apparently this issue is neglected among Cambodians, just as it is neglected in books on Cambodian for foreign learners.
5. I had made the point before that I was not aware of the font variations being an issue for Cambodians when they learn to read, but presumably the writer doesn't throw those pages in just to fill out the end of the book.
2004 Apr 20 [ Tue ]Phrases that are used by the waiter
It would be nice if phrasebooks had space for some of the expressions that are going to be used by your waiter. For instance, I now know how to ask for separate checks, but I don't know the phrase the waiter uses to ask *me* about it.
The other day I finally noted down "please enjoy your meal":
Soam piisaa dooii riik riay
"Please dine with happy"
Now I can say "Thankyou" instead of "What the hell was that you said, buddy? And quit with the mumbling."
2004 Apr 13 [ Tue ]A step forward in my understanding of Khmer
The other day I was strolling down St 63 on my way to the Lucky Market and a tout approached me and addressed me in Khmer. I'm pretty sure what he said was "many pretty girl, twer knia".
I was so gratified by being able to understand the Khmer, and sort of flattered that he should assume I would understand it, that I almost took him up on his offer.
As it was, I just did my Oliver Hardy impression (where he beams and his hands flutter nervously at his tie) and sauntered on.
2004 Apr 11 [ Sun ]Using the PDF format to display Cambodian text
I'm still not sure what procedure is most practical to allow this site to produce webpages displaying Cambodian characters.
Anyway, here is another test: a link to a PDF file with English and Khmer text.
...oops! I see the Open Office pdf output contains various embedded info in some sort of Unicode, so I'd better not include that link unless I zero that out. Oh well. The .pdf seemed to work, but of course I wanted to check on a machine which does not have Limon fonts installed.
2004 Mar 25 [ Thu ]Cambodian-character keyboards available
Having realized that the fonts that Cambodians actually use are from the Limon package, and that that package uses the "USA International" keyboard layout, I wondered if there were keyboards available for this relatively standard layout.
Although I have never seen such a keyboard in an internet cafe, I did find one without too much searching. "Pacific Systems", a small but seemingly well-organized store at 176 Street 63 (south of Norodom) (855) 23 219 289 (pp@pacific.com.kh) has a model in stock for 8 USD. I haven't used it much because the PS2 socket on my laptop disintegrated around that time (aargh) but I plugged it into a PC at the internet cafe and it seemed OK. I have a couple of closeups of the relevant (qwerty) keycaps. They don't look particularly classy but they're not bad – I mean they're not decals. In an internet cafe they would probably erode like the qwerty designations in most cases.
I have pictures of the keyboard here – the left half and the right half (to avoid wasting pixels):
[http://www.panix.com/~dannyw/images/khkb/khkb-left03.jpg]
[http://www.panix.com/~dannyw/images/khkb/khkb-right03.jpg]
It's called the "Diamond Deluxe Membrane Keyboard" on the box. As usual, there's no real info about the manufacturer. The photos on the box just show English keycaps; there's a hand-scrawled "KH" on the side of the box. The connector is PS/2. No model number on the box.
FCC ID IZ/TK-106M Made in China No 041132931
Has an extra "Fn" key next to R shift: Fn-F12 to lock and unlock, Fn-F1 through Fn-F8 to adjust typematic speed (is this standard now for keyboards? I haven't paid attention).
There are a few slight differences from the printed layout I picked up from a couple of places for Limon, having to do with the top left key, ie the one that normally has the backtick and tilde, as well as the keys next to the Enter key. This seems to be just a mechanical difference between the layouts of the Diamond keyboard and whichever keyboard was used for my diagram (with a large Enter key).
Incidentally it occurs to me that since the US-International layout is relatively widely used, it may be that there are relatively few conflicts with other software. Ie, hopefully, there are no ctrl-C's ctrl-D's and Ctrl-Z's etc in the stream. Haven't checked this yet.
2004 Mar 23 [ Tue ]Sample webpage using Cambodian font
I haven't figured out how to integrate Cambodian text with my Blosxom content management software, but here is a plain HTML page which displays some English and Cambodian text.
It uses the same technique to set up the fonts, via a .css page which tells the browsr it can pull in the fonts it needs from www.cambodia.gov.kh, that the www.cambodia.gov.kh site uses, so it should work. But if it doesn't work, and you think you have enabled IE to download fonts automatically, let me know. (It's hard for me to test that because all the machines I have access to already have the Limon fonts installed.)
My test page: [http://www.panix.com/~dannyw/limontest/test1.html]
2004 Mar 21 [ Sun ]How to create a webpage using Khmer fonts
I've written about this before but right now I seem to have all the info ready. Btw, as you may have noticed I haven't actually *tried* this yet. (I'm not looking forward to debugging the Blosxom software trying to work with Cambodian text.)
My previous posting, mentioning automatic font installs for your browser: [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/fonts03.html]
Here's what I figured out this time. First I went to a webpage in Cambodian: [http://www.cambodia.gov.kh/unisql2/egov/khmer/home.view.html]
The browser I was using in an internet cafe immediately displayed the page using Khmer fonts, but your machine probably will not, because it probably doesn't have Limon fonts loaded. But Internet Explorer is usually configured so it will download the fonts for you, possibly after asking permission.
Now how can we write a webpage to work like that for your users?
First, the original webpage (above) calls a .css file to define its appearance:
<head> <title>Cambodia e-Gov Homepage</title> <META Name="Keyword" Content="cambodia, khmer, royal government, king, premier, ministry"> <meta http-equiv="Content-Type" content="text/html; charset=x-user-defined"> <link rel="stylesheet" href="/egov/init/khmer.basic.css" type="text/css"> ... </head>
Note also the "charset" definition above. I'm not sure what effect the "user-defined" has. It makes more sense when it's set to something like "IS0-133T Eskimo", because then your browser would know that it could use its built-in Eskimo fonts without needing to download the particular fonts the page specifies. But it wouldn't make much sense to try and use the same "user-defined" fonts for say Cambodian and Tattooinian.
Then at the end of the .css file:
<STYLE TYPE="text/css">
<! –
@font-face {
font-family: Limon S1;
font-style: normal;
font-weight: normal;
src: url(
[http://www.cambodia.gov.kh/egov/font/LIMONS0.eot);]
}
– >
<! –
@font-face {
font-family: Limon R1;
font-style: normal;
font-weight: normal;
src: url(
[http://www.cambodia.gov.kh/egov/font/LIMONR0.eot);]
}
– >
</STYLE>
In other words, this tells the browser "if you get told to use the Limon S1 font, or the R1 font, you can download them here..."
It may be worth pointing out that the standard distribution of Limon fonts comes with about 6 variations of each of the S, F and R series, but the S1 and R1 mentioned above seem to be very standardized as the uksor jriang (book text) and mool (rounded, used on signs and headings) fonts. Btw, I'm guessing that the Cambodian government wouldn't be too upset if you linked your page to their fonts. But if you're worried about it, they do have a page explicitly for downloading their fonts, so you could provide copies locally on your server.
[http://www.cambodia.gov.kh/unisql1/egov/english/material.view.html?doc_oid=@110|1|1]
Now elsewhere in the .css file – actually near the beginning – it specifies what font to use for standard body text etc:
BODY {
font-size : 18pt;
font-family : "Limon S1", "Verdana";
color : #595959;
}
TD {
font-size : 18pt;
font-family : "Limon S1", "Verdana";
color : #595959;
}
FORM {font:18pt "Limon S1"}
INPUT {font:18pt "Limon S1"}
TEXTAREA {font:18pt "Limon S1"}
/*//(Heng append): Contents of each title */
.tt {
font-size : 18pt;
font-family : "Limon S1", "Verdana";
color : #3C3A2B;
}
I leave adding a definition for text blocks in English to the reader.
Now the main webpages – like /unisql2/egov/khmer/home.view.html – don't need to specify the Limon fonts anywhere explicitly.
Incidentally, a general issue for me is that Cambodian is a small part of my site, but if I start using the above tricks almost everyone who looks at my site will be interrogated about downloading fonts. That certainly puts me off browsing such sites (especially Chinese, blast it). It may be less of an issue for you if your site is organized solely thematically rather than chronologically, so people only wind up trying to view a Cambodian-font page if they're interested in that subject.
Btw, what does that ".eot" extension on the fonts mean? A little googling led me to here: [http://www.microsoft.com/typography/web/embedding/weft3/default.htm?fname=%20&fsize=]
This is a Microsoft page about their "Web Embedding Fonts Tool WEFT" which is available for free download. I haven't tried it. Why in heck couldn't they just use a .ttf file? I looked at their "overview" page and the WEFT tool looks astonishingly badly designed. It actually takes the time to look at *every bleeping page on your site* in order to try to provide *only the fonts and character sets you actually use*. Whoopee – now every time you add a page to your site you have to rerun WEFT and tell everyone to download the new .eot file. Maroons.
If I ever actually use this <sack of soup>WEFT</sack of soup> I will just create a page with all the characters on it in both fonts and tell WEFT to analyze that.
Sheesh – I just noticed – you have to *enter your name and email address* every time you use it! "Maroons" seems inadequate somehow.
2004 Mar 06 [ Sat ]How to say "Sure!" in Cambodian
I couldn't find this in my dictionaries,so I asked someone.
He said one thing to say was "Soam enjeuyn", ie "please go ahead". Another thing to say is "Baat – awt ay dte", ie "yes – it's no big deal".
Useful phrases for shoeshine boys
They tend to ask for a dollar, I bargain them down to 1000 R, and then I pay them 1500 or 2000. So I suspect someone with actual negotiating skills could get a shine for 300 R.
They tend to use an absolute minimum of polish, because the polish must cost a lot relative to their margin. So a shine lasts very poorly. I have found that using an old sock to wipe dust off the shoe will keep shoes looking OK for perhaps 3 or 4 wearings. Still, I usually walk 3-4 km per day.
The books are adequate for most parts of the transaction, but here are some phrases I needed, because I always wear laceup shoes, and I intensely dislike the slovenly American-style lacing pattern which is seemingly all the boys know.
Please remove the laces: Soam yoak ksai jeuyn
I will put the laces back: Knyom dak ksai weuyn
Copyright © 2003-2009 Alternate Worlds Publishing, Boston MA USA