UTF-8 tips

From Wikitech

using xterm

With recent installs of XFree86 and Xorg-x11, xterm works with UTF-8 for any "normal use".

There are other UTF-8 terminals which might do a better job than xterm on some very advanced Unicode features like right-to-left writing and so forth, but for normal european and asian characters, xterm of recent X11 installs works just fine if the a proper UTF-8 locale is installed and the system is properly set up.

xterm -en utf-8 sort of works, but you should only use it if your system does not have a proper UTF-8 locale.

xterm -u8 is better if you have an UTF-8 locale, but this still does not change the locale environment in the shell/subprocess of xterm to an UTF-8 locale, so you'd have to do e.g. something like export LC_ALL=en_US.UTF-8 as the first thing inside the xterm.

If you are starting the xterm from a shell, it's best to simply set the locale environment to an UTF-8 locale and xterm will automatically switch to UTF-8 mode. example: LC_ALL=en_US.UTF-8 xterm

editors

before starting editor, be sure that the LC_CTYPE resource of your locale envoronment is set to an UTF-8 locale. You can check the locale which is in force for newly forked processes from a shell with the locale command

joe

UTF-8 in joe does not work in version < 3.1, in 3.1, at least the package which comes in SuSE 9.2, works fine with UTF-8.

vim

vim supports UTF-8 since much more than a year so if you have a recent release (version 6) and everyhing set up (xterm, locale) correctly it should just work, otherwise you might need a newer software install.

To bypass the xterm issue for the editor, you can just set the locale tho an UTF-8 locale as described avove and use the graphical version of vim, gvim. It works like the xtext-only vim, but jsut opens a new X11 window and provides a nice menu for people which would like to use the mouse indead, but nobody forces you to use a mouse, you can work with it like with vim in xterm, only using the keyboard.

vim and gvim also have the nice feature that if there is a byte sequcence in the file which cannot be a UTF-8 byte sequence which would represnet a valid Unicode character, it assumes that the file is not encoded in UTF-8 but in latin1 (ISO-8859-1) instead, converts the file in memory to UTF-8 and converts it back to latin1 on save. You just have to be aware of it, the conversion is indicated by a message line containing "(converted)" after read and write of the file.