|
About How this site is organized and what it's for Weblog start pageThe start page contains the most recent 15 articles. Home pageThe main home page of my website, not my weblog. Currently not used. ------------------ Articles by month Click here to get all the articles for a particular month. This month's articles (if any) Current month Today's articles (if any) Articles dated 2008/08/29 only ------------------ Subtopics ------------------
Site map
Search for text on this site
You may have to use search if I move files around! Listing of all articles by date
Flavours There's more than one way to view this weblog; these links display the current page in other formats. External links These are a few of my favourite sites. T E S T Slashdot yesterday Copyright © 2003-2007 Alternate Worlds Publishing, Boston MA USA Wenhua dageming de zhongyao jiaoxun shi bixu fandui geren mixin If I have been able to see further, it is because I am surrounded by midgets. Never ascribe to stupidity that which can adequately be explained by malice. "Your argument's repugnant and intriguing." "That's kinda my thing." |
Danny's Weblog
| ||||||||||||||||
| សមាជិកព្រឹទ្ធសភាថៃ | Thai senators |
| មួយក្រុមដែល មានគ្នា ៧៧នាក់ | a group of 22 people |
| កាលពីថ្ងៃចន្ទបានចាប់ ផ្តើមដំណើរការបញ្ឈប់ | on Friday started progress in stopping |
| ការគាំទ្ររដ្ឋា ភិបាលថៃចំពោះសំណើរបស់រដ្ឋាភិបាល កម្ពុជា | support of the Thai government in relation to the [moment??] of the government of Cambodia |
| ដែលស្នើដាក់ប្រាសាទព្រះវិហារ | which proposes to put the temple of Preah Vihar |
| ចូលទៅក្នុងបញ្ជីបេតិកភណ្ឌពិភពលោក ។ | to enter the register [?] of belongings of the world [world heritage sites?] |
| កាសែតឌឹណេស្ហិនរបស់ថៃរាយការណ៍ | The newspaper "The Nation" of Thailand announced |
| ថាសមាជិកព្រឹទ្ធសភាទាំងនោះបាន | that these senators were able to... |
On Friday, a group of 22 senators in the Thai parliament began a bid to withdraw Thai government support for the Cambodian government's proposal that the temple of Preah Vihar should be registered as a World Heritage site. "The Nation", a Thai newspaper, reported that these senators were able to...
Incidentally, there are two newspapers called "The Nation" in Thailand. One is published in English; the name of the newspaper in the Khmer text is given in a phonetic form "der neyshun", so I'm assuming it's the English version that's being referred to.
Incidentally, this report seems to reflect the general attempt of the Thai and Cambodian governments and government-controlled media (ie practically all of it) to whip up tension between their citizens.
2008 Jul 01 [ Tue ]The description sounded interesting: Limon and similar non-Unicode ("USA International") fonts to Khmer Unicode and vice versa. I've written about the Limon issues before, eg here: [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/windowssetup01.html]
I had found khmerconverter while looking around in Ubuntu Synaptic Package Manager. I had installed it a couple of weeks ago, but I couldn't see where the installer had put the launcher and didn't bother proceeding. Today I happened to see the launcher (in Applications - Accessories) and tried it, but it appeared to do nothing.
I found the name of the executable in the launcher and was able to do "man khmerconverter", which helped by showing command-line options, but not enough (the spec for the formats is not clear). On the web I found: [http://www.khmeros.info/drupal/?q=en/download/converter] which suggested that the app had a gui wrapper.
After a while it occurred to me that I should try running the app from the console instead of the desktop. This revealed that it was complaining about the absence of the "tix" library for Tk. I found tix in Synaptic and installed it (no DVD necessary): clicking the launcher then brought up the gui. (It seems to me that if an app fails with an error message, the launcher, or the windowing environment, or something, should detect that and wait for you to read the error message instead of immediately closing the window. Oh well.)
Hmm. This is the first time I've seen where a Synaptics app has clearly failed to install a necessary package.
So how can you try it out? You can download Limon and ABC "legacy" fonts here: [http://www.everyday.com.kh/khmerfont/khmerfont.asp]
This page is also useful: [http://www.cambodia.org/fonts/] with eg "How to type Khmer Unicode", a PDF document, unfortunately in Khmer and without any keyboard layout diagram for people trying to use a non-Khmer-Unicode keyboard. (There may be some reference to such a thing, but I was barely able to puzzle out more than a few words here and there.)
After I had installed the fonts (by unzipping them to my /home/dannyw/.fonts folder), Firefox was able to view www.everyday.com.kh properly. When I checked the HTML source, it does indeed handle fonts in css, and the css specifies EOT fonts (ie the special downloadable font format for IE). So although Firefox can't handle those, it apparently knows it can default to the (newly-installed) TTF fonts by name. OTOH, the page layout was still all screwed up: all the text was scrunched into the right column. I was able to set Firefox to View - Page style - No style. This made it possible to select a block of several sentences of text from everyday.com, and I could copy it into OpenOffice.
Then I could save as an OpenOffice .odt file, which is apparently the native format for khmerconverter. The output looked OK as far as I could see, ie the glyphs appeared to match – I'm not claiming to be able to *edit* Khmer text!
So while I've hardly tested khmerconverter exhaustively, it does appear to be useful.
Here are some blocks of test text so you can judge the performance of khmerconverter (and check whether my page and your browser setup work together – in particular check whether your browser is set to override font specs – d'oh!)
Original Limon (only looks right if "Limon S1" font is installed on your system – I'm not bothering to set up an EOT font spec here):
smaCikRBwT§sPaéf mYyRkumEdl manKña 77nak;kalBIéf¶cnÞ)ancab; epþImdMeNIrkarbBaÄb; karKaMRTrdæa Pi)aléfcMeBaHsMeNIrbs;rdæaPi)al km<úCaEdlesñIdak;R)asaTRBHvihar cUleTAkñúgbBa¢IebtikPNÐBiPBelak .kaEstDweNsðinrbs;éfraykarN_ fasmaCikRBwT§sPaTaMgenaH)an
Unicode version (should display OK if *any* Unicode font on your system can handle the Khmer group of Unicode codes): សមាជិកព្រឹទ្ធសភាថៃ មួយក្រុមដែល មានគ្នា ៧៧នាក់កាលពីថ្ងៃចន្ទបានចាប់ ផ្តើមដំណើរការបញ្ឈប់ ការគាំទ្ររដ្ឋា ភិបាលថៃចំពោះសំណើរបស់រដ្ឋាភិបាល កម្ពុជាដែលស្នើដាក់ប្រាសាទព្រះវិហារ ចូលទៅក្នុងបញ្ជីបេតិកភណ្ឌពិភពលោក ។កាសែតឌឹណេស្ហិនរបស់ថៃរាយការណ៍ ថាសមាជិកព្រឹទ្ធសភាទាំងនោះបាន
PKD example (just so you can see if you have PKD installed – I was too lazy to figure out the phonetcs for the whole of the above text): kNom At dIG te
2008 Jun 26 [ Thu ]
I wondered what the result is of providing UTF-8 bytes inside a webpage defined as iso-8859-en. It turns out that the browser, at least Firefox, believes the 8859 and displays the Cambodian as junk. So I've changed my meta charset spec to UTF-8 and it seems to work (even though vi at panix is showing the UTF-8 characters as a bunch of hex escape codes).
រីករាយណាស់ដែលបានជួបអ្នកទាំងអស់គ្នា ។
If the above shows as a bunch of junk for you, you presumably don't have a font available which handles those Unicode character codes. I haven't yet set up a font spec to try and let your browser know which font to try.
suə'sdəy
riik 'riɜy nah dail bɑɑn 'juəp 'neək ti'aɳ ɑh kniə
I should probably put in an example of my own PKD font as well, but as nobody has reported using it I feel too lazy.
2007 Mar 30 [ Fri ]
I guess I wound up trying to put too much into it – I guess I jumped the shark when I put in the Vietnamese tone marks. My most recent version has added English prosody symbols (rising and falling tone unit symbols), but in order to access them you have to use the US International keyboard – ie the font is no longer by design 7-bit safe. Aargh. Incidentally, another reason for not progressing was that I found out about Microsoft's downloadable keyboard editor software MSKLC: [http://www.microsoft.com/globaldev/tools/msklc.mspx] They demand that you allow them to scan the machine you're downloading it to to establish that the software licence is kosher. This of course requires that you access microsoft.com using Internet Explorer. I did try using a web-cafe machine, but the scanning program just gave a non-committal error msg (who knows – maybe it never comes right out and calls you a pirate).
Also, it needs .NET Framework installed – a large irritating download but not (the last time I tried) as heavily restricted as MSKLC itself.
After a little googling I found MSKLC at another site, but it still wouldn't run on my kosher Windows 2000 machine – again the error message meant nothing to me. I have just started downloading MSKLC from another site: [http://www.zdnet.be/downloads.cfm?id=36567] but I have a vague feeling that's where I found it before – the one that doesn't work.
If anyone has actually managed to run MSKLC please contact me. Evidently an easily-installable keyboard config with easy access to non-7-bit codes would make a tremendous difference to the design of a phonetic font. For instance, you could just switch to a PKD keyboard config and use the keypresses I set up for the special PKD font to access the Unicode characters! Also, you could create a new, simplified version of the US International keyboard that would allow you to avoid the nasty bewildering glitches you sometimes get when you're trying to enter Cambodian using a Limon font and hit one of the extended key sequences by mistake.
Oh well. As usual, the best is the enemy of the good.
2007 Jan 26 [ Fri ]PKD is my phonetics font for Khmer, Thai, English and Vietnamese. I had nearly gotten it reasy a couple of days ago, but the .EOT version of the font did not work.
I took another shot at Microsoft's WEFT utility – the thing that converts a .TTF file into an .EOT file that MS IE can download automatically – and the new .EOT appears to work, at least on this machine.
A few notes from the struggle:
1. When I checked the blog today I was surprised to find that the link to the PKD test file – pkdtst01.html – did not work, and indeed I could not find the file at all. I uploaded it again. I don't know why it vanished.
2. In WEFT, "expert font creation" allows you to create an .EOT without having to point to a dummy .html file. It even allows you to add "offline" fonts which are not yet loaded into Windows. But I could not figure out how to enter multiple "bindings" – the locations which are allowed to host .html files pointing to the .eot. I wound up using the braindead "Wizard" mode. Remember that it insists on writing to the .html file you point it at.
3. When you are checking the behavior of the .eot in IE, sometimes the Dynamic Fonts Usage window doesn't come up, even though there is a binding problem. Other times it brings up a nice list of the allowed bindings. I don't know why it sometimes worked and sometimes didn't.
4. If you try adding "off-line" fonts, the oly way to do it is to point to a directory, and then it *also* searches all the *subdirectories* without warning.
5. This might not have been a problem except I had a bunch of old versions of PKD in a subdir, all claiming to be the one and only true "PKD" . WEFT picked one without any error message; it was of course the *wrong* one. Generally the "offline" feature seemed more trouble than it was worth, unless you have dozens of fonts to deal with I suppose.
6. Note that the bindings do not specify the allowed location of the .EOT file. You can put that anywhere. The allowed bindings only set the possible locations for .html.
7. In the .css which specifies the filename and path for the .EOT file, case is significant.
I am now going to get ready to announce PKD on Usenet.
2007 Jan 24 [ Wed ]I am still not ready to release it, but I have made a lot of progress.
I gave in and decided to add most of the additional glyphs needed to provide a phonological trancription of English, similar to major dictionaries: like the dictionaries, I did not trouble to provide the upside-down "r" officially needed to support the English "r" sound; I also did not add the glyphs needed to support the new special representations of eg the final syllables of "little" and "rotten" and "father" because I think they are based on foolish and inconsistent principles. (Also, I have run out of upper-case letters.)
I have also added Vietnamese tone symbols.
Here's a PDF (6 pages) showing examples of how PKD can be used for teaching Cambodian, Thai, English and Vietnamese: [http://www.panix.com/~dannyw/pkd/pkdsample01.pdf]
Version 0.99 can be downloaded from here: [http://www.panix.com/~dannyw/fonts/]
The .TTF version can be installed like any other .TTF file. If you get the message "...is currently being used and cannot be replaced" it probably means that your machine has been locked down so that you cannot write to the needed directory. Try unclicking the option "copy to fonts folder".
Currently the .eot version (which provides autoloading for Internet Explorer users) does not work across the internet, although it does work when the .html and .ttf are stored locally (on the C: drive). I need to fix it and upload a new version. When I've done that I'll add some more info and publicize it. Another issue is that Windows WordPad does not correctly handle the character widths, although MS Word does. (Because all the characters are actually ASCII, you can edit PKD in Notepad if you want to.)
Once you have installed the .TTF you can try this test file: [http://www.panix.com/~dannyw/pkdtest01.html]
The test page optimistically assumes that the .EOT will autoload in MS IE, which as stated above is not yet true.
2007 Jan 04 [ Thu ]I had previously intended to write one document just explaining how to use PKD, and another document explaining the reasoning behind it. Well, after a while I realized that I couldn't really disentangle those two goals without resulting in a lot of duplication, so I decided to finish some research into phonetics and the International Phonetic Association's alphabet, also called the IPA, which seems to have changed a lot in the 40 years since I first used a phonetic alphabet.
Well it turns out that the IPA's standards are just amazingly arbitrary and inconsistent. I thought I could support most people who would like to use IPA glyphs with a small subset, but it turns out that the "conventional" system for English alone now includes a lot of symbols which I consider worse than useless, especially for the learner, especially the symbols for the unstressed syllables in "rotten" and "little", and the symbol (consisting of two characters) for the diphthong in "fight".
I had been intending to throw in the IPA section of DejaVu plus the tone symbols so that people could use PKD both for my own super-easy stopgap system or for all-singing all-dancing IPA, but now I have reconsidered and feel like stripping PKD back down again. Anyway, I don't feel like I can produce the explanatory document till I've absorbed the IPA scheme and been able to rebut it, and I can't finalize the font till then, so please continue to bate your breath.
2006 Dec 23 [ Sat ]A couple of weeks ago I saw that Google now worked in Khmer. Indeed, they had done the same irritating thing as they did for Thailand, and make it come up in Khmer automatically if they detect your IP is in Cambodia.
It worked in IE but it didn't seem to work in Firefox: it just displayed the Khmer text as boxes, so I always had to click on the Google in English link. (I don't use non-session cookies.)
Today I noticed that it actually worked. The machine I'm running on right now has a KhmerOS font installed, although the keyboard driver is absent; I'm guessing that's all Google needs. I have to say I was none the wiser after I looked at the source for the page, however. Presumably Google has to detect the browser type and offer downloads of the .EOT version of the font for IE, or just send a call for the font to Firefox.
Conceivably also they only just made this fix for Firefox.
Incidentally when I suggested yesterday that my gf ask the staff in the internet cafe for help in getting Google to work in Khmer they said something like "huh? Google doesn't work in Khmer".
2006 Dec 21 [ Thu ]One of the major pains in setting up Word for Khmer is disabling all the keyboard macros which use ctrl-alt and ctrl-alt-shift combinations which are necessary for Limon fonts (and others). I don't trust the pre-rolled normal.dot files you find, because there's no indication of which version of Word they were created from and mixing the .dot version and your executable version is guaranteed to cause a lifetime of regret, so I have to laboriously go through all the bazillion possible options manually (and every few weeks I find another one I missed).
The following is a great user's guide for using MS Word for legal documents, but the advice is applicable to anyone using Word for long structured documents: [http://addbalance.com/usersguide/index.htm]
That page says it was updated as of 2001 (I guess when Microsoft still struggled to make headway against Word Perfect in legal offices), but the templates page is dated 2005: [http://addbalance.com/usersguide/templates.htm] and is an *excellent, excellent* guide to how normal.dot and the other templates work, far more informative than any other Office/Word docs I have ever seen.
This page includes a link to a "Shortcut Organizer", a bunch of Word Basic routines dated 2003 which apparently makes it easy to organize your keyboard macros between templates: [http://www.chriswoodman.co.uk/Shortcut%20Organizer.htm]
If you don't trust macros, here's an explanation of the manual procedure: [http://addbalance.com/word/movetotemplate.htm]
Here's the Wikipedia explanation of normal.dot: [http://en.wikipedia.org/wiki/Normal.dot] which links to the following lengthier description: [http://pubs.logicalexpressions.com/pub0009/LPMArticle.asp?ID=151] The latter includes the following tip, which would have saved me some teeth-gnashing:
And when you're attempting to hunt down your Normal.dot
template, the fastest way to figure out where it's located
is to click Tools/Options/File Locations. There you'll find
a path to your default template directory. The path may be
long and truncated so you can't view the full path. But
you can click Modify to move to another dialog that fully
displays the path.
The following view of templates from the "Dummies" range of books may also be helpful: [http://www.dummies.com/WileyCDA/DummiesArticle/id-333.html]
I didn't know that the behavior of Word has changed in recent versions. It used to automatically re-create normal.dot if you deleted it; it no longer does: [http://wordtips.vitalnews.com/Pages/T1229_How_Word_Treats_Normaldot.html]
You may find these articles illuminating also: [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/FileProperties.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/PaperSize.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/Styles.htm] [http://word.mvps.org/FAQs/Customization/CreateATemplatePart2/OtherThings.htm]
Note: One of the word.mvps.org documents above describes setting the paper size, but somewhere else I remember seeing the remark that normal.dot does *not* set paper size. Oh well.
You can still download the 0.90 version: [http://www.panix.com/~dannyw/pkd/test2/pkd-v0p90.TTF]
but I haven't produced the improved version yet. For one thing, I noticed an error on one of the characters used for English. For another thing, the whole issue of phonetic transcription in English is rather fraught. My original version was, I thought, quite adequate for people to use, but I have gotten caught up in general considerations on phonetic systems. Not only are there umpteen candidate character sets used in different dictionaries, but the "official" IPA system is, to my ear at least, inconsistent and misleading.
My intention was to provide an *easy* way to enter phonetic characters, so I don't want to provide a full set of everything possible; anyway that already exists, in the IPA section of full Unicode fonts. So I need to pick a set, and satisfy myself that that set makes sense relative to other candidates, which is not easy, and makes me understand why dictionary editors each seem to choose a different system.
Incidentally PKD is based on the DejaVu font: [http://dejavu.sourceforge.net/wiki/index.php/Main_Page
] 2006 Dec 11 [ Mon ]I have made various changes and figured out a lot of weird inconsistencies and misleading docs and right now I have made a version of PKD which fixes a lot of the problems with the old version.
1. The "embeddable" flag is correctly set, so Microsoft WEFT and Adobe allow you to embed the font in webpages and pdfs.
If you're using IE, this link to an HTML file should download the .eot version of my font needed for it automagically, perhaps after a prompt if you've set IE not to do automatic font downloads: [http://www.panix.com/~dannyw/pkd/test2/pkdtest09-01.html]
It won't do anything for Mozilla/Firefox because they don't work with .eot files. There used to be another system for Mozilla, I think using .prf files, but it died and Firefox no longer supports it.
2. I have changed the name of the font to PKD, instead of including the version number in the name. I had done the latter because for testing I wanted to keep multiple versions installed simultaneously, but I think having a single name for multiple versions makes it easier for users to upgrade.
3. Here's an example of a .pdf file with the font in it. It's not a very *good* example; it's just something I was working on. Most of it's ordinary English text; the Pkd is just a few lines right at the end: [http://www.panix.com/~dannyw/pkd/test2/phonetics01.pdf]
The final sentence actually shows off using PKD to represent English phonetics, which is actually not as easy as you might think as there is no standard to follow.
...Aagh, I've noticed a couple of errors in the Khmer; oh well.
4. Here's the font itself (version 0.90). It's free: [http://www.panix.com/~dannyw/pkd/test2/pkd-v0p90.TTF]
I haven't posted this info in my special PKD folder yet because I want to do some more tweaks and document stuff better first, and then announce the upgraded version (something like 0.95) on Usenet.
A while back I created a font which allows you to easily create the phonetic characters used to romanize Khmer text in Huffman (along with other useful stuff like Thai and even English): [http://www.panix.com/~dannyw/weblog/nolist/pkd/installing02.html]
However, it had a problem: even after I fixed a bug which prevented the font from being embedded in PDFs, Microsoft's WEFT tool still refused to allow the font to be embedded.
I am going to take another shot at this *this week* and even if I can't figure out the WEFT problem I am going to reissue the font with the PDF problem fixed (along with a couple of other slight fixes). (I apologize for the long delay.)
The following is an overview of the embedding issue.
Fonts can be embedded either in a PDF or in a website. In either case software is supposed to check magic bits in the font, presumably set by the font creator, which define whether he wants the font to be embedded or not.
Good intro to the problem, basically a bug in Fontographer: [http://www.politechbot.com/p-03506.html]
Tom7 wrote embed.exe to twiddle a bit in your font files to allow embedding: [http://www.andrew.cmu.edu/user/twm/embed/]
He alaso wrote a very readable intro to font creation in Windows with Fontographer, although funnily enough it makes it sound like there's no embed problem: [http://www.andrew.cmu.edu/user/twm/makefont/]
A description of the issue on webpages which has a clear definition of the magic bits: [http://members.tripod.com/~bhaavana/embedded/faq.html]
2006 May 04 [ Thu ]I recently upgraded to a new drive on my laptop, and have tried to keep notes of what I needed to do to enable Khmer, both using the Limon-type fonts and Unicode. The following is a list of what I had to do. I have made a lot of postings about related issues, eg creating web pages that use Limon fonts, which are contained in the folder for this article (in reverse chronological order): [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/index.html]
Several of these stages are laborious and tedious, especially setting up the normal.dot template for MS Word, but once that has been done you can copy it to any other machine (with the same version of Word). More significantly, you need some familiarity with Cambodian, so if you know very little Cambodian and need to set up your computer for your girlfriend several steps will be very difficult.
On the other hand I think much of the information in this document will be useful to anyone who needs to set up any language using the "US International" keyboard.
1. I'm assuming you start from a clean copy of Windows 2000 or XP with service packs and whatnot already installed.
2. In order to handle Khmer Unicode, you must have a recent copy of the usp10.dll file installed. My understanding is that this is provided with XP SP2. I use W2000 myself, so I needed to get usp10.dll from somewhere. The easy way is to install MS Office 2003. Another way is to copy it from an existing installation, like an internet cafe. Another way is to join MS Volt. I tried to do so, and it appeared to work, but then some problem happened: I vaguely recall it wanted me to set up a MS Passport account, which is not something I want to do on an internet cafe machine.
3. For Limon-style fonts, you need to install the fonts themselves. They're easy to find in Cambodia of course; somewhat less easy if you're outside the country. There is nothing special about the font installation itself. (The "magic" needed to, for instance, position the vowels above or below the consonant is already built into the standard font system, and does not need the extra features in usp10.dll, which is only for Unicode.)
4. To actually use the Limon fonts, you need to install the "USA-International" keyboard. (As seems to be standard in Windows-speak, a term which normally refers to a piece of hardware is used to refer to a software driver for such hardware. Death to Microsoft!)
Go to Control panel – Regional and language options – Languages – Details – Text services and input languages – Settings. Select Add, and select Input language: English (United States) and keyboard layout/IME: United States-International. Click OK, going back to the Text services... window. Under Preferences – Language bar, I like to make sure that the task bar shows a button for the language type (eg EN for English, CA for Cambodian (Unicode), etc). This appears to be the default when you install a second keyboard, but you may be starting from a different setting.
5. Now click on Key settings. By default, every language you install goes ahead and grabs some key combinations to allow you to flip keyboards without using the mouse. I think this is an absolutely terrible idea. I advise you to check the list carefully and delete any ctrl-alt or ctrl-alt-shift combinations that are needed for Limon. (MS seems to specify only left-alt combinations for this purpose, so if you run into problems entering Khmer, try using the right-alt key.) Internet cafes in Cambodia tend to install keyboard layouts for Chinese, Japanese and Korean, so the machine I am currently typing on has a long list. At a minimum, delete the "switch between languages" function, left alt-shift, which can cause *extreme* confusion if you sometimes hit the keys in different order.
6. Now you need to change the current keyboard layout to US International. Unfortunately Microsoft made the taskbar display *more confusing* in XP. In Windows 2000, you can left-click on the language symbol (EN), and it will show English (United States) – US as one option and English (United States) – US International as another option. In XP, the menu just shows English (United States). To get a different keyboard you need to right-click the EN and select "settings". Even then, I cannot find a way to just change the *current window* to US-International. Instead, you have to change the *default* language to English (United States) – United States-International. Then click Apply, then OK. (Maybe *this* is why they provide the blasted keyboard shortcuts.) Be aware that if you have "US International" selected when typing English text, it will do strange things when you try to use the single and double quote characters, because they are intended to start multikey sequences for European accented characters; to get the ordinary quote characters, type a space immediately afterwards.
7. You should be ready to try entering text in Limon now. I personally use a Limon keyboard at home, but it's not really necessary: the only Limon keyboards I've seen at Internet cafes are ones I donated. However, you do need a "cheat sheet" to check where the characters are. In Cambodia it's easy to get a printout of this from computer and cd stores, eg PTC on Monivong, but I haven't been able to find a downloadable version. I have some blurry photos of an actual physical Limon keyboard on this blog somewhere.
Load Wordpad and try entering some text. If any ctrl-alt combinations produce usable characters – eg ctrl-alt-z produces a jerng thaa – then US International is functional.
8. A huge problem however is that many kinds of software grab the key combinations that you need. Some – like programmer's editors – simply discards ctrl-alt, eg producing the character for "g" instead of ctrl-alt-g. Other software binds many key combinations for other uses. I will describe what to do for MS Word below.
9. Note that when you make these changes to Word, it saves them in its template file: normal.dot. This is stored in a hard-to-find location. On my current machine it's at: c:\Documents and Settings\QW02\Application Data\Microsoft\Templates
You can check the location with Tools – Options – File locations.
You need to understand that when you change the template, it affects *new* files – not *old* ones created with a different template! If you want to add Khmer text to existing files, you'll have to create a *new* file and paste the old file into it.
You can find pre-fixed versions of normal.dot included with the khmer fonts in many cases, but these appear to have been created under old versions of Word, and may contain all kinds of undesired settings, and macros. If only because of the version issue – MS Word is notorious for flaky behavior when you mix versions – I think it's worth the effort to create your own clean normal.dot so I describe the process here (as far as I know this info is available nowhere else on the web, and is probably useful for any language using US-International).
Additonally you may have important settings stored in your existing normal.dot file which you want to preserve.
10. You will have to disable *many, many* "keyboard shortcuts". In addition, it has many automation features enabled by default which work *very badly* when you're entering text in Khmer. (The first one you will probably notice is the one that automatically capitalizes words at the start of a sentence!)
-1. First go to Tools – Automation and turn off *all* the autocorrect options
-2. Then Tools – Options – Spelling and Grammar – turn off all options (Some of these sound like the same as the Automation options but apparently they aren't)
-3. Then Tools – Customize – Keyboard. I could not find a quick way to do this: you have to laboriously click through every single blasted function checking for ctrl-alt and ctrl-alt-shift combinations. This took me about 30-60 mins of work. I advise you to quit out of Word after just a couple of minutes and restart it to check that your changes have been saved in normal.dot as expected, before going on to complete the job.
-4. However, this interface does *not* allow you to change every keyboard shortcut! There are still several which need a different trick. This was shown to me by Piseth, seemingly the most clueful guy in this internet cafe. You can use the "symbol" feature to override *any* shortcut (as far as I've checked). Props to Piseth!
– 1. Go to Insert Symbol (the route to this seems to be different on this machine to the route on my laptop, so I'm not sure what the default route is). In the Symbols tab, select a Limon font, eg Limon S1. (Word may display the font name in its own font, which makes it very hard to read at the character size in the dialog).
– 2. *Then* select "(normal text)" as the font. In my tests, this has the (non-obvious) effect of allowing you to select the desired glyph from the Limon S1 font at this time, but subsequently – in use – the system will grab the character with the corresponding *character code* from the current font. So the keyboard shortcuts we set up will work even when we're in the middle of typing text in Limon F3, or whatever.
– 3. Now you have to *eyeball* the font table shown, and compare it with a keyboard layout and the list of undesired shortcuts below to locate which shortcuts need to be reassigned. Regrettably, the font table is very hard to read at the font size MS used – diacritics are particularly bad. I found myself using guesswork sometimes.
– 4. At the moment I have only found two undesired shortcuts in Limon: cltr-alt-shift-hyphen, and ctrl-alt-equals. By contrast, the Unicode keyboard seems to have an assigned function for *every* key combination. I have not yet keyboarded much in Unicode so I don't have a good list. However, certainly ctrl-alt-5 – which produces the "euro" character *and* switches to the Times font – needs to be fixed. (It's not necessary for Limon.)
– 5. Piseth actually created his own normal.dot casually, simply by keying in Khmer text over a period of about a month and fixing each undesired shortcut as he encountered it.
11. The rest of this info is for Unicode only.
12. The files needed can be downloaded here: [http://www.khmeros.info/drupal/?q=en/download]
You probably want:
-1. Khmer Unicode Installer for Windows
-2. Documents
The "installer" includes the following features:
-1. Searches for the usp10.dll file on your system and makes it available to all programs (more difficult than it sounds, because this is a protected system file)
-2. Installs several Unicode Khmer fonts
-3. The Khmer unicode keyboard driver (has to be designated CA, which officially stands for Catalan, not Cambodian)
The documents include a PDF of the keyboard layout and a PDF describing how to use it (distinctly different from the Limon layout).
13. If you are using the Symbol trick to reassign shortcuts while inside Word using the Unicode keyboard, remember to switch back to the "US English" keyboard when you're entering the key combination! If you don't the Unicode keyboard driver will dutifully send not the key combination, but the Unicode character code to Word. Confusion will ensue.
14. The display of characters while you're actually entering them is a little disconcerting. Suppose you enter a consonant, then press the jerng key to get a jerng consonant, then the key for the desired jerng consonant. At that point the jerng consonant *still* displays above the line. Probably if you understand Khmer better than me this makes sense. Anyhow, once you enter *another* consonant the one you wanted to be jerng will be shoved under the line, as desired.
2006 Jan 15 [ Sun ]I had been wondering when these keyboards would show up. I spotted one today at Pacific Data Systems on Street 63 (a good place – it also has the Limon-layout keyboards). They said they arrived in the last couple of weeks.
It was just 7 USD. I happily bought it. It seems low quality (vague key action and low case rigidity) but the keycap printing seems adequate.
I haven't played with it yet. The shop provided a free setup disk for it. A few weeks ago I found what appeared to be a very easy-to-use setup program downloadable from the "Khmer Software Initiative": [http://prdownloads.sourceforge.net/khmer/KhmerUnicode1.2.FindUsp.exe?download]
From the description, it sounds like it does *not* include the all-important updated "usp10.dll" file, which is only really available as part of MS Office 2003. *However*, it claims that it can *transfer* this file from MS Office into the system32 folder to make it available for all applications, which is essential and *not easy* to do manually.
I tried running that update on my laptop and it did not barf, although since I had *already* gotten unicode working the fact that it *still* works is not a strong recommendation.
I tried to show the keyboard off to my local internet cafe but they didn't seem very interested. They do have Unicode installed – on *one* machine.
Incidentally, I could not see an announcement of the availability of the new keyboard on khmeros.info. However, they do have an announcement of locale files for khmer. I was all excited because I thought it might provide Perl locale features, but seemingly not. However, they are interesting.
This one: [http://www.khmeros.info/download/km.xml] has the names of various *other* languages *in Khmer*, as well as date formats, the "riel" symbol etc. *Very* interestingly, the Khmer displayed correctly! When I checked, another Khmer page at khmeros.info displayed fine! It looks like this machine (the one I am now using at the internet cafe) *can* use Unicode! It appears to accept the laguage designation "km", at least in Firefox. Woo! have to try a few other machines.
The other locale file was a broken link, but the correct link is here (my advanced hacking skills tee-hee): [http://www.khmeros.info/download/km_KH.xml] It just shows number formats, including currency.
Incidentally, they have a help file *in Khmer* for how to do email. I'm sure it's aimed at people trying to use Unicode. It's probably beyond me, but I may get my girlfriend to check it out: [http://prdownloads.sourceforge.net/khmer/Moyura1.0.7-km-KH.exe?download]
Later: I tried this other website: [http://www.forum.org.kh/en/index.shtml]
It provides versions in "Khmer" and "Khmer Unicode". The "Khmer Unicode" version worked and the "Khmer" didn't. The "Khmer" was set up in .css I supose – anyhow, I'm not sure why it failed. But certainly this machine (PC11) seems happy with Khmer Unicode.
Here's another Khmer Unicode install procedure. It needs a lot more manual steps: [http://www.khmer.ws/unicode/windows.asp]
Also, interestingly, it seems to set up Khmer with the designation "ca", which is officially Catalan. I don't think Firefox is working that way.
2005 Nov 10 [ Thu ]I have created a new font, based on a public-domain font, which contains the phonetic characters you need to represent Khmer as used in Huffman's books. I've named it "PKD" – phonetic Khmer Danny.
I had tried to use existing phonetic fonts, but they were very hard to use. My font is easy, because you access the funky characters with nothing more than the shift key – it's not like Limon, where you also need to install a special keyboard handler, and the non-ASCII characters get mangled by most programs: with PKD, programs just see the phonetic characters as upper-case ASCII characters, so any existing editor can be used. (Limon has to use umpteen characters to represent the enormous Khmer character set; my font only needs to identify the *sounds*, which are far fewer.)
If you don't have my font enabled, a sample of text will look something like this:
kNOm jOG tIv bAntup tIk
which will not be mangled by anything – unlike the "1/2" signs, degree signs and whatnot that you get from Limon. (Can anyone guess what the above sentence means?)
When you do install my font and set that string of characters to use it, it'll show the right phonetics (assuming I gave the correct pronunciation myself).
It's a free download, about 35 kB. I hope a lot of people start using it.
[http://www.panix.com/~dannyw/weblog/nolist/pkd/]
Incidentally, it also contains English phonetics and Thai tone marks.
2005 Nov 06 [ Sun ]I've naturally been considering setting up this site to display Khmer characters ever since I started learning Khmer. I was put off by several things:
1. The Unicode system for Khmer which has never quite taken off
2. Possible lossage for users who do not have Khmer fonts installed already
3. No suitable phonetic font without copyright restrictions
I think I now have a way around at least the last issue.
Recently it occurred to me that I could just hack a public-domain font for the few characters I need. Also, since my intention is just to use the font for phonetics, I don't need upper-case characters – which means I can use the shift key as an ultra-simple way to access the funky characters needed (which also means that the character set will pass through any system designed for standard ASCII).
I roughed up a font last night which can (I think) handle all the characters needed for Khmer *and* English – I wanted the latter because many questions of Khmer pronunciation have to consider dialects such as American English.
I want to play with it for a while to minimize obvious blunders (for instance I want to make sure random browsers can download it automagically), and then I'll make it available here – and start using it.
2005 Jun 24 [ Fri ]
This page describes the report and dates it as 2005-06-06:
[http://www.iosn.net/l10n/foss-localization-primer/]
This is the report itself in PDF:
[http://www.iosn.net/l10n/foss-localization-primer/foss-localization-primer.pdf]
It does refer to the "official" status of Khmer Unicode but I am now wary of treating *anything* as official where Khmer Unicode is concerned.
I posted a story about the pronunciation of Thai tones which pointed out that *singing* is hard to explain if you go by the standard explanations of what Thai tones are.
It recently occurred to me that singing must be quite hard in Khmer too, for several reasons:
1. Like English, words usually end in a consonant, not a vowel (which is whay opera has always sounded better in Italian)
2. Worse, many words end in a glottal stop, which is an important *phoneme* in Cambodian, rather than happenstance as in English
3. Girls seem to be encouraged to both speak and sing in an unnatural falsetto
4. The sheer number of phonemes makes it even more difficult to find a rhyme than in English, so they tend to be unnatural and/or to break the metre
It is certainly the case that glissandos are much more common than in English, and are frequently coerced to fit within a single beat. This tends to give Cambodian popular music an antique, Margaret Dumont-esque flavor – especially combined with the shrill female parts.
2005 May 29 [ Sun ]
I have unofficially heard various things, so the following announcement: [http://www.camnet.com.kh/akp/english_news.htm] (search for "National Information Communications Technology Development Authority") was very interesting. (The above link text appears identical to the story I saw in the Cambodia Daily 2005-05-27 p18.)
1. The government has *still* not made a decision on Unicode: "draft plan for designing a Khmer-language font, pending approval"
2. "A keyboard in Khmer is not yet available"
3. Open Office is already being used in some govt offices. There's a download available here: [http://www.khmeros.info/drupal/?q=en/download/apps] but they seem very hesitant about its status.
4. "The software has been developed with the cooperation of local NGO Open Forum of Cambodia". Hmmm. I should check them out.
2005 Mar 31 [ Thu ]I have referred to this book, somewhat disparagingly, in previous comparisons of books available for people learning Cambodian. Over many months of daily use I have formed a more positive opinion and would like add some comments.
1. Many potential users must be put off by the phonetic alphabet used to represent Cambodian. My own copy (used) was clearly scribbled on by some hapless soul who had never used a phonetic alphabet before: I can see his laborious "translations", eg of "haw" (phonetics) as "how", and I really feel for him. In fact the original intention of the book was to use recordings of actual Cambodian speakers to provide models of pronunciation, but as I have never come across such recordings on sale in PP I cannot recommend them. More seriously, I do not feel it is reasonable to expect adults to reproduce sounds merely by hearing them. Just as jerng letters chor and baa look indistinguishable to the beginner (or me, after one beer), the sounds will blur together in the ear. The detailed discussion of how to *produce* Cambodian phonemes in the companion volume "Cambodian System of Writing and Beginning Reader" strikes me as essential, but of course it is embedded in the forbidding study of the Cambodian alphabet, which put me off reading it entirely for months.
Furthermore, I would be surprised if any Cambodian can read these phonetics by any more than guesswork. This makes it almost completely impractical to use these books as intended, ie with the rote training administered and monitored by a Cambodian national.
On the other hand, I am convinced that in the beginning stages Cambodian *must* be taught using phonetics, and the rote training as listed in the book seems well thought out. So I wish somebody would update the book by adding Cambodian script for all text (including explanations), and recreate the audio recordings, and strip out the rote exercises (which add tremendously to the bulk) for a teacher's version. (If the student sees the answers they do not properly have to dredge them from short-term memory.)
I would also like to see a short version of the information on producing phonemes extracted from\ the companion volume.
2. Likewise, I wish someone would reorganize the dictionary sections, discarding the ludicrous alphabetization scheme and adding a Cambodian-script to English section. As well, they might include listings for all the words shown in elided versions in the text. I am not sure whether Huffman's strategy of routinely providing elided versions in the text is actually reasonable, but if that is maintained I think it is necessary for the beginner to be able to look them up. Incidentally, I do appreciate Huffman including proper names in the dictionary, although I think it would do no harm to have an additional section listing most common Cambodian names whether listed in the dictionary or not, since when they appear in Cambodian there is no capitalization or other hint that a name is involved (well, there is usually a space).
3. The lack of typographical variation available to Huffman (of course, at that time, having phonetics was state-of-the-art, and it appears to have been typeset using a Varitype) makes the text layout impenetrable to a modern reader. If the text were reorganized, one could use the opportunity to make it very much more attractive.
4. In addition, many items would need to be updated:
-1. Although I have still very little Sprachgefuehl in Cambodian, I get the impression that many personal pronoun usages have shifted considerably since 1970. Specifically, it seems to me that "loak" is now simply a polite term which anyone can use to anybody, whereas "neak" is a more "informal" word which has lost the demeaning associations of that time. I could certainly be wrong (as a foreigner I live in a bubble outside normal usages), but would hope that this matter could be addressed.
-2. Clearly much vocabulary needs to be added: cellphones, dvd players, CDs, internet cafes, NGOs, mines, corruption, etc.
-3. The content of the texts should be checked. Many landmarks referred to are gone, or the references are now misleading.
-4. As so much tourism is related to Angkor Wat, multiple sections could be added relating to the history, travel to and from the temples, etc.
-5. A few illustrations might be nice: a map of the world, Cambodia and Phnom Penh, for instance; names of parts of the body; names of clothing items, etc; a few official forms with translations; a Karaoke CD cover; some cuttings from newspapers...
-6. Generally, I have grown more aware of shortcomings in other texts, and have started to see that when viewed as a unit with the "Cambodian System of Writing" and the tapes and the native instructor system, it does make sense.
Many people still remember the old Monty Python sketch about the hapless tourist in England who has placed his trust in a phrasebook with incredibly bad translations.
I have never encountered one quite that bad, though I often see books which recommend words or expressions which evoke serious double entendres.
I often use the "Right" dictionary from Norton University. Its name reminds me of flash memory chips, most of which are the reverse of the facts, eg "Compact Flash" which is the largest available, "Smart Media" which had the fewest features, "Secure Digital" which will lock out your own data against you, etc. Ie, it is riddled with mistakes in the English. This does not usually bother me, as it reassures me that the text was probably originally created in Cambodia, and I am looking for subtleties in the Cambodian usage.
I think I noticed some slightly surprisingly vulgar word, so on a whim I searched for "fuck" (I guess I should apologize to anyone whose web search for "Cambodian fuck" leads him to this harmless page). Stap me if they didn't have it: the verb is listed as "ruam reaksaa". However, they did *not* have any marker for "obscene". (I just checked their introduction, and they do not mention any such marker.) This made alarm bells ring, and when I checked it in my "Modern" Khmer-English dictionary it lists this expression as meaning "to take care of each other".
Obviously a Khmer person may have no idea of the level of obscenity of this word (a surprising number of young people will say the word "fucking", meaning "bad", in the midst of everyday chatter in English or Khmer; they apparently know that it is deprecated but have little idea of how much). So he might conceivably look up the word, get the wrong meaning, and later come out with something like "if we feel tired after we look at Angkor Wat, we can go back to the hotel and fuck".
At any rate, such is my whimsy. I did wonder whether it is simply some sort of prudishness in not being eager to print an equally obscene word in Cambodian; I just searched for "joii" (listed in "Modern" as "to have sexual intercourse, to copulate (of humans) (indec.)") and it is not listed at all in the "Learner Oxford", which despite its name (no book name signifies anything in Cambodia) was apparently compiled by Cambodians (it lists "jia" as a "copulation relater...", an attempt to refer to the "copula" which sounds to me more like a porno book).
Incidentally, the "Right" has the peculiar feature of including many thesaurus-like entries (set off by grey shading). These seem very dangerous to me because the thesaurus concept rather relies on the user *already* knowing the connotations and needing only to be *reminded* of the possibilities. For instance, the word "childish" has the following words listed as synonyms: "adolescent, childlike, foolish, immature, infantile, juvenile, youthful". In order to get much out of this listing a typical Khmer user would probably have to laboriously look up every one of them, and might well decide on one too early.
A strange error in "Right" is printing "gymnast" as "sgymnast" as a page heading (p378), but in the correct place for "gymnast". It seems strange to me that such things can happen: if I were assembling a dictionary I would certainly try to use a computer to put everything in alphabetical order, certainly the English words. At a guess they may have started to use a computer system to create the headings, but then found it was buggy and had to hand-edit many or most pages.
This expenditure of effort may be what prevented them noticing that a hundred or so pages were printed out of order in my copy. That, and providing several colour pages showing flags of all nations. (Wtf?)
Although the Unicode system is apparently the new standard, it is difficult to install unless you happen to be running XP with MS Office. (I have finally gotten it running under W2K, and it is quite impressive, but I am not ready to write this up yet as I want to try a couple of real projects using it first.)
One of the great benefits of the Unicode system is the zero-width space idea: that is, to allow existing software to handle linebreaks intelligently, you provide a character in the ASCII space bytecode position which *displays* as a zero width (and you also provide some other bytecode which displays as a regular space). Incidentally, it has occurred to me that software must have some definition of which bytecodes are text and which are non-text (whitespace, punctuation, escape sequences, etc) but I was unable to decipher the impenetrable Truetype docs to see whether it is embedded in the TTF definition.
The standard Limon fonts are not zero-space, but such fonts are easily available from here: [http://www.forum.org.kh/eng/zero-space_fonts.htm]
ABC computer also has a very nicely-arranged page with various options (not including Unicode): [http://www.abc.com.kh/khmer-fonts.asp]
It includes a link to a version of the MS Word template file "normal.dot" which seems to be necessary. I have not figured out what this actually does, but it is certainly necessary to turn off various text-entry-checking features in MS Word, or they will utterly gum up the Kh text – auto capitalization springs to mind, but it all has to go.
In theory it ought to be possible to mix sections of English text and Kh text and persuade Word to automagically turn those features on and off in the right places, but I have found Word's style and language-switching features to be misleadingly documented and buggy. (And I have a vague recollection of similar bugs in the style feature back as far as Word 95 or so.)
One little warning: I think forum.org.kh are connected to the French, and are probably the reason my "USA International" keyboard option now displays as "Etats-Unis International".
2004 Nov 25 [ Thu ]I was chatting to a friend of mine the other day and I asked him about progress with Unicode fonts for Cambodian (Khmer). He said that the spec for a Unicode font has finally been produced and his company has been rushing to implement it. At present, he has hit a snag: for some reason MS SQL is not recognizing Khmer numeric characters as numeric, ie it is not allowing them to be entered into numeric fields; he hasn't figured out whether the problem is in the Unicode spec or in MS SQL.
He pointed me to a very good site for those interested in the topic: [http://khmeros.info/]
They have a confidence-inspiring page on installing Khmer Unicode. Unfortunately for Windows 2000 users, their recommendation for installing the all-important usp10.dll file is to install MS Office 2003 Final Edition! I can see why they say that as trying to manually install it filled me with gloom and terror, but I hate the idea of trying to install that bloated buggy dog just for one damned file. I wonder if I can delete Office 2003 and retain the .dll. I also wonder if previous versions of Office will still work with Unicode.
They have several references to Unix support, but it's not clear to me whether a stock version of Linux is easy to upgrade with the Khmer Unicode font. It does seem that individual Linux applications need to be upgraded to work in Khmer, but it wasn't clear whether that was just to add Khmer text labels to the controls, etc.
2004 Sep 14 [ Tue ]The more I try to learn Khmer the more I wish I could read a definitive text on the grammar. For instance, the word "mian" is typically defined as "have" or "to have", but is often used to express "with", bereft of any of the structure you would think a verb would need (eg "dail").
So you have to wonder what the dictionary really means when it confidently asserts that a certain word is "p" (ie a verb or a verb/adjective), especially when I notice that a Khmer book on English usage, which supplies Khmer equivalents for many other parts of speech, uses the English word "infinitive" in the text without translation.
Another point that occurs to me is how to establish the real meaning of a word. I always want to be able to refer to an English/Khmer section to look up a word I need in Khmer, and immediately look up all its *other* meanings in a companion Khmer/English section. (That's the *real* reason you need a Khmer/English section as a beginner – there's not much chance you'll be able to look up a Khmer word just by hearing it.) But the only book I know where the sections really somewhat *match* is the Tuttle "Practical". Others that I have checked – even if they have both sections in one volume – clearly have grabbed the two sections from completely different sources.
The dictionary just says that the word for "bed" is "grair deyk", but when you *check* that, it means "platform sleep". Now what that suggests to me is not so much a Western-style bed, but rather the literal crude platforms that you often see people sleeping on along the sidewalk. So now I wonder what they *really* call a Western-style bed. (My caution is based on a time when I was taking a trip from PP to Hat Lek: I asked for a "bontop dteuk" – bathroom – and was shown to a shack over the river with a hole in the floor.)
This issue even occurs between European languages. In German for instance, a "Bett" is really a "set of bedclothes". The store "Betten Rid", for instance, in Munich, sells bedlinen, and no beds, if I remember rightly. (Incidentally, one of the few reasons I have to hate the Germans is that the plural of "Das Bett" is "Die Betten".) Likewise, a "Federbett" means a duvet with a feather filling (which they often use *under* the sleeper as well as above).)
You can imagine what happens when I went looking for a store that sold beds...
2004 Sep 04 [ Sat ]
I've linked to this page before: [http://www.bauhahnm.clara.net/Khmer/Welcome.html]
I took another look at it tonight and was even more impressed, especially as he's been keeping it up to date (sooner or later Unicode *has* to get standardized...).
In particular I found a page about doing word breaks for Thai in Perl (hmm... maybe I should put this file under Thai): [http://www.bauhahnm.clara.net/Khmer/WordBreaks.html]
He says it would be easy to do the same thing for Khmer... hmmm... I don't even know of a word list for Khmer, much less one encoded in Unicode...
2004 Sep 01 [ Wed ]My previous article on Khmer needing two words to say the same thing: [http://www.panix.com/~dannyw/weblog/Asia/Cambodia/Khmer-language/double01.html]
Tonight, while chatting to a bewitching woman in a bar, I tried to say that I was not fluent, using a word I had carefully looked up in the dictionary that evening: jrerl.
Puzzled glances and rapid jibber-jabber between the staff led me to conclude that something had gone wrong with my scheme.
Even after I pointed to the word in one of my dictionaries they did not know what I intended. Eventually they figured it out, and offered a different word: "stoat". One girl very clearly said that the word I had used *could* be used in my sense, but *on its own* was utterly insufficient. She very clearly said that a lot of Khmer words were the same way. She also (groan) urged me to get a Khmer teacher instead of trying to rely on books. (I would happily do so if I could find one who spoke good enough English; perhaps I should give up that criterion.)
Actually, I have been feeling that need for some time: I had already observed that even the best book I have gives very little idea of the connotations of the words, or whether they have become old-fashioned or even dropped out of use entirely. So even when I am not saying something actually wrong, I am probably saying something like "What-ho, cobber, how about it then, is that swell or what?" I was hoping that *reading* would allow me to bypass the inconvenient step of actually talking to people, but that's still coming along very slowly (although I am encouraged – it's a lot easier than reading Thai already, which I can really only do with some ease in cartoon books where there's a lot of visual cues).
It reminds me of the situation with the English lessons printed weekly in the Cambodia Daily. As well as many errors which make me think the lessons are created by a Khmer, the individual exercises give no explicit indication to the unwary Khmer student that they vary wildly in difficulty. One section involved nonstandard poetic usages which it seems to me are *cruel* to present to an earnest student who has not studied at Oxford.
Struggling to read Khmer (a single paragraph leaves me aching for a beer), I have become conscious of how often they use two terms which according to the dictionary mean the same thing, eg "jung bomphut" (end). I haven't seen a good explanation for this (other than "it happens"), but it seems to me that the reason is a practical one: Khmer has a heck of a lot of words with multiple and very different meanings, but by taking the intersection of those two sets of meanings one (usually) can reach an unambiguous sense.
Of course there are plenty of words in English with multiple meanings. But it seems to me that in many cases one can distinguish between them by incidental grammatical context. For instance "plate" meaning "dish" and "to add a metallic layer" (not to mention "provide a certain pleasurable service"). In addition, English has plenty of near synonyms which a careful writer can select to avoid ambiguity.
In Khmer, of course, not only are there no declensions/conjugations etc to give a backup to the intended sense, but in addition the syntax – it seems to me – seems highly lacking in logical cues. It may well be that someone with my paltry Khmer is missing a lot of subtleties, but at least to the learner the "sentences" seem to just run on and on, without conjunctions to clarify the intended logic (ie words like "so", "but", "yet", "then" seem much less frequent in Khmer, even if the dictionary says they exist).
It reminds me that what I really like about English is that it naturally favours complete verbs. I *like* sentences which have a subject, a tense, a mood and an aspect. President Bush the Elder was famous for his omission of the subject, but also he tended to omit tense and mood, tending to prefer the "ing" form. So when he was asked "What are you doing about Iran-Contra scandal?" he would reply "Makin' progress", leaving you with no idea what he meant. Had he said "My chief of staff has destroyed all incriminating documents" there might have been trouble.
I believe this issue is consciously understood by tabloid headline writers. They wish to make the most meretriciously salacious headline possible, even though the actual story inside is tame. So by eschewing actual verbs, they leave the headline unclear enough to lead the hapless buyer to expect more than is provided. For instance: "Michael Jackson in pedophile admission" can be safely used on the cover, even though the inside story is about some groping highschool teacher who complains that he only confessed to the cops because he didn't have MJ's high-priced legal team. (This kind of ambiguity is the basic reason for the frequent advice to avoid the passive mood, in which of course just the subject is left undefined. How much less clear is it to use a noun, which results in the omission of tense mood and aspect also?)
2004 Aug 19 [ Thu ]
A few months ago I wrote about the availability, at least in Phnom Penh, of special PC keyboards with the Limon font setup (using the "US-International" keyboard setup).
However, I just realized today that Windows 2000 and XP come with the "On-Screen Keyboard" utility. You can start it with Start – Programs – Accessories – Accessibility – On-Screen Keyboard. (I've been tinkering with it; haven't tried to use it for a real text.)
Then under "Settings" you can change the font and the font size. I suggest Limon S1 16pt. Unfortunately, I could not find a way to increase the size of the on-screen keyboard; even 16 pts is definitely too small for many characters to be legible, at least to a learner.
Note that you will not be able to access the ctrl-alt and ctrl-alt-shift characters unless you have set your keyboard to "United-States International". In this mode, normally intended for use with European languages, pressing certain characters before vowels allows easy access to European accented vowels. For instance, pressing the double-quote character then u results in the u-umlaut character. Nothing is displayed when you press the quote key; Windows is waiting for the second character to decide what to do. If you press space, it produces the normal quote char. If you press the quote char again, it produces *two* quote chars.
I can't find a reference for the explanation above. I *especially* can't find a reference for what happens when you're in the middle of typing *Khmer*. I get the impression that the vowel keys in European character sets are the vowel keys in Khmer, and the quote/colon/circumflex keys in European are either independent vowels or Western digit "6", so not normally followed by a vowel, but still.
I've been tinkering with it; haven't tried to use it for a real text. One thing I ran into is that irritatingly it sends the characters to the current application *in the font set by the current application*; it would be much handier if it sent back Limon while typing on the physical keyboard could still produce English.
Microsoft info: [http://www.microsoft.com/enable/training/windows2000/onscreenkeyboard.aspx]
2004 Jul 18 [ Sun ]Ever since I first observed a Cambodian rapidly typing away in English on an ICQ chat session, I have been fascinated by their ability to adapt to a medium which was hostile to their native language. Indeed, I have for a long time had in mind a project of collecting such chat sessions in order to analyze the communication format. (Clearly there are ethical blocks. I don't want to grab private stuff, but what else gets discussed? And if I tell somebody *in advance* that I want to capture the thread, it's going to make them self-conscious.)
Here's a Slashdot posting about the general issue of minority languages on the internet:
Unless you have a majority multilingual ... (Score:5, Interesting) by kbahey (102895) on Saturday July 17, @11:49PM (#9728677) [http://slashdot.org/comments.pl?sid=114846&cid=9728677] ( [http://baheyeldin.com/khalid)]
Unless you have a majority of the visitors / participants that are multilingual capable, you have to separate the content of a web site by language.
I say this from experience on several newsgroups, then forums over the years.
It starts out simple: people who are early adopters often speak English, and can read English (e.g. programmers, ...etc. who know English anyway). Then as technology spreads among the less techno-elite, people who do not know English well want to express themselves in their native language.
In languages that use a non Latin character set, there is a phase where internet communication uses Latin characters to represent their own language. I have seen at least Hindi and Arabic written in Latin alphabet, with some modifiers. (Even some Euro languages lost some characters, like Scandinavian and Germanic languages, where the "O" in Torvalds lacks the stroke in the middle, and the "A" with the small circle, ..etc.)
There are various "dialects" used in these Latinized alphabets, and people learn one version or the other depending on where they learn it first.
This becomes a transitionary phase on these forums, where people will express themselves using this Latin based alphabet to represent their own language.
Then later, as their own language becomes more wide spread and accepted, more people get to use computers and the internet, and they perhaps do not know any language other than their own. This leads to them demanding that only their native language be used in forums that are about their country/society/language/...etc.
Anyone who speaks a "foreign" language in those forums is reminded that the primary language is such and such, and not to confuse others. Some take this as a matter of national pride, some take it as mere courtsey, others take it as common sense, and yet others take it as a mere form of communication. Depends on who you are, your outlook, and your biases.
That is what I have seen in several newsgroups/forums over the years.
So, this is the phase that Orkut is at right now.
Eventually, they may have to separate the content by language. Although there are barriers here, because Orkut is about "networking", and not just "discussions".
It would be interesting to see how this turf war gets resolved eventually, at least for those who are like me who like to observe the new frontiers that the internet have defined/merged/melted/setup.
P. S. In Canada for example, where there are two large groups speaking two languages, a majority of web sites give the option on what language to use at the very beginning. Forums are separated into two languages on many sites. There is a minority who are bilingual and can (and do) participate in the two camps. I imagine Hispanics in the USA, and Spanish speaking Anglos do the same on some forums.
2004 Jul 15 [ Thu ]
The "Wikipedia" is a server set up to allow anyone to contribute to information on any topic, resulting in a sort of free encyclopedia.
I had not thought about checking it for info on Cambodia – it happened to occur to me while I was looking at their entry on Spetsnaz! (I'm pretty sure that's pronounced "spetsnass", by the way, but I could be wrong.) Anyway although it's not very comprehensive it has some intersting new stuff, and links – eg it gives the language codenames under various standards, like "km" for ISO 639-1.
[http://en.wikipedia.org/wiki/Khmer_language]
I had certainly never heard the term "abugida" before but I'm sure it will make me a hit at parties.
One good link I found via the Wikipedia page: [http://www.omniglot.com/writing/khmer.htm]
I think the "inherent" vowels, represented in the chart of Khmer characters given in that link, should really be more like "aw" and "oh" than "a" and "o". I suspect the original reference used International Phonetic Alphabet characters.
It reminds me that I have been remiss in assembling a cheat sheet of the various fonts. Oh well, it's still on my list.
The fonts links given above seem mostly quite outdated. The following seems more or less up-to-date: [http://www.seasite.niu.edu/khmer/] and includes a useful link to an IPA font.
2004 Jul 11 [ Sun ]There are a lot of French people in Phnom Penh, probably because of the lingering effects of the French colonial period – many of the older people still speak French, and there are many (delapidated) signs in French rather than English.
But it's struck me that there are also some weird similarities in the languages.
1. Neither language pronounces "s" or "r" at the end of a word.
2. The adjective is usually after the noun.
3. You usually say "loak" at the end of every sentence like you say "monsieur" in French
4. The negative is formed in two parts "awt... dte" like "ne... pas"
5. Both languages have a variety of nasal vowels (absent in English)
6. Both languages have "d's" and "t's" which elide to the "dt" sound (also found in Irish and American eg "Paddy" for Patrick)
7. Both languages have the palatal n sound like "manana" in Spanish ("oignon" in French)
8. The plural sounds the same as the singular (including pronouns, eg "il/ils" in French and "gey" in Cambodian)
All of which does not exactly add up to any mutual comprehension, but it does seem like it might be easier for a Frenchman to learn the language than an Anglo.