UTF-8 Character Encoding

UTF-8 Character Encoding

Last Updated on 2017-02-27 by Sture

Description

The LANG=xx_YY.ZZZZ environment variable sets the system locale to language code xx, country code YY, and character encoding ZZZZ. Language and country code affect default application language, number formatting, date and time formatting, string collation, currency settings, and more.

By enabling a locale using UTF-8 character encoding, the system can understand and display each of the 1112064 characters in the Unicode character set, instead of just US ASCII as is default with LANG=C.

Preparation for Installation

Start PuTTY on a Windows PC, Terminal on a Mac or similar terminal application on a Linux PC.

In this example Terminal on a Mac is used.

Open a remote SSH session to the server with:

Mac:~ user$ ssh user@192.168.1.4 [enter]
N.B.: Replace user@192.168.1.4 with User ID and IP Address on Your server!
[user@server ~]$

Enable superuser privileges with:

[user@server ~]$ sudo -s [enter]
Password: <-- passwd [enter]
[root@server /usr/home/user]#

N.B.: Enter user password, not the root password!

Available UTF-8 Locale

Display a list of every available UTF-8 locale on your computer with:

[root@server /usr/home/user]# locale -a | grep '\.UTF-8$' [enter]
af_ZA.UTF-8
am_ET.UTF-8
.
.
.
sv_SE.UTF-8
tr_TR.UTF-8
uk_UA.UTF-8
zh_CN.UTF-8
zh_HK.UTF-8
zh_TW.UTF-8
[root@server /usr/home/user]#

Customize UTF-8 Locale

In this example, as a Swede, I will use English as the default language with Swedish monetary, numeric and time settings.

Create a copy of the en_US.UTF-8 directory with:

[root@server /usr/home/user]# cp -R /usr/share/locale/en_US.UTF-8 /usr/share/locale/en_SE.UTF-8 [enter]
[root@server /usr/home/user]#

…then modify this directory with:

[root@server /usr/home/user]# cp /usr/share/locale/sv_SE.UTF-8/LC_MONETARY /usr/share/locale/en_SE.UTF-8/ [enter]
[root@server /usr/home/user]# cp /usr/share/locale/sv_SE.UTF-8/LC_NUMERIC /usr/share/locale/en_SE.UTF-8/ [enter]
[root@server /usr/home/user]#

Change to 24h clock in uptime, w etc with:

[root@server /usr/home/user]# ee /usr/share/locale/en_SE.UTF-8/LC_TIME [enter]

Edit the LC_TIME file, line 40 – 44, and line 58 as in this example. Do NOT delete the emty lines 42, 43 and 58!

Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
January
February
March
April
May
June
July
August
September
October
November
December
Sun
Mon
Tue
Wed
Thu
Fri
Sat
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
%H:%M:%S
%Y-%m-%d
%a %e %b %X %Y

%a %e %b %Y %X %Z
January
February
March
April
May
June
July
August
September
October
November
December
md

Edit the login class capability database in /etc/login.conf with:

[root@server /usr/home/user]# ee /etc/login.conf [enter]

It is recommended that LC_COLLATE be set to C because some programs still require ASCII ordering in order to function correctly.

…and add a default character set and locale as in this example:

default:\
:passwd_format=sha512:\
:copyright=/etc/COPYRIGHT:\
:welcome=/etc/motd:\
:setenv=MAIL=/var/mail/$,BLOCKSIZE=K,LC_COLLATE=C:\
:path=/sbin /bin /usr/sbin /usr/bin /usr/local/sbin /usr/local/bin ~/bin:\
:nologin=/var/run/nologin:\
:cputime=unlimited:\
:datasize=unlimited:\
:stacksize=unlimited:\
:memorylocked=64K:\
:memoryuse=unlimited:\
:filesize=unlimited:\
:coredumpsize=unlimited:\
:openfiles=unlimited:\
:maxproc=unlimited:\
:sbsize=unlimited:\
:vmemoryuse=unlimited:\
:swapuse=unlimited:\
:pseudoterminals=unlimited:\
:kqueues=unlimited:\
:umtxp=unlimited:\
:priority=0:\
:ignoretime@:\
:charset=UTF-8:\
:lang=en_SE.UTF-8:\
:umask=022:
.
.

Login shells will inherit the environment variables defined here in the default class or in a narrower class if it matches one.

After making this changesrRebuild the login database with:

[root@server /usr/home/user]# cap_mkdb /etc/login.conf [enter]
[user@server /usr/home/user]#

You may have to specify the new locale elsewhere (like /etc/profile) for non login shell uses such as GDM and other login managers.

[root@server /usr/home/user]# echo 'export LANG=en_SE.UTF-8' >> /etc/profile; echo 'export CHARSET=UTF-8' >> /etc/profile [enter]
[root@server /usr/home/user]#

…and add a default character set and locale as in this example:

You can read more in the Using Localization chapter of the Handbook.

On next login check your work by running:

[root@server /usr/home/user]# locale [enter]
LANG=en_SE.UTF-8
LC_CTYPE="en_SE.UTF-8"
LC_COLLATE="en_SE.UTF-8"
LC_TIME="en_SE.UTF-8"
LC_NUMERIC="en_SE.UTF-8"
LC_MONETARY="en_SE.UTF-8"
LC_MESSAGES="en_SE.UTF-8"
LC_ALL=
[root@server /usr/home/user]#

Leave a Reply