Clarify undisplayable characters, --unibyte, locales.

Clarify self-insertion of non-ASCII 8-bit chars. Clarify coding system detection of escape sequences. Clarify keyboard input methods and coding systems. Comment out the commands to inquire about character sets. Misc cleanups.
2024-11-28 07:45:00 +00:00 · 2001-02-17 18:12:07 +00:00 · 2001-02-17 18:12:07 +00:00 · 4b40407a71
commit 4b40407a71
parent 8e375db276
1 changed files with 160 additions and 150 deletions
--- a/man/mule.texi
+++ b/man/mule.texi
@ -42,7 +42,7 @@ have been merged from the modified version of Emacs known as MULE (for
 ``MULti-lingual Enhancement to GNU Emacs'')
  Emacs also supports various encodings of these characters used by
-internationalized software, such as word processors, mailers, etc.
+other internationalized software, such as word processors and mailers.
@menu
 * International Intro::     Basic concepts of multibyte characters.
@ -80,16 +80,31 @@ cases) in the @kbd{C-q} command (@pxref{Multibyte Conversion}).
@kindex C-h h
@findex view-hello-file
@cindex undisplayable characters
-@cindex ?
+@cindex @samp{?} in display
@cindex ??
  The command @kbd{C-h h} (@code{view-hello-file}) displays the file
@file{etc/HELLO}, which shows how to say ``hello'' in many languages.
-This illustrates various scripts.  If the font you're using doesn't have
+This illustrates various scripts.  If some characters can't be
-characters for all those different languages, you will see some hollow
+displayed on your terminal, they appear as @samp{?} or as hollow boxes
-boxes instead of characters; see @ref{Fontsets}.  On non-windowing
+(@pxref{Undisplayable Characters}).
-displays, @samp{?} is displayed in place of the hollow box.  More than
+
-one @samp{?} is displayed for undisplayable characters that are wider
+  Keyboards, even in the countries where these character sets are used,
-than one column.
+generally don't have keys for all the characters in them.  So Emacs
 supports various @dfn{input methods}, typically one for each script or
 language, to make it convenient to type them.
@kindex C-x RET
  The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
 to multibyte characters, coding systems, and input methods.
@ignore
@c This is commented out because it doesn't fit here, or anywhere.
@c This manual does not discuss "character sets" as they
@c are used in Mule, and it makes no sense to mention these commands
@c except as part of a larger discussion of the topic.
@c But it is not clear that topic is worth mentioning here,
@c since that is more of an implementation concept
@c than a user-level concept.  And when we switch to Unicode,
@c character sets in the current sense may not even exist.
@findex list-charset-chars
@cindex characters in a certain charset
@ -101,15 +116,7 @@ character set, and displays all the characters in that character set.
  The command @kbd{M-x describe-character-set} prompts for a character
 set name and displays information about that character set, including
 its internal representation within Emacs.
-
+@end ignore
  Keyboards, even in the countries where these character sets are used,
 generally don't have keys for all the characters in them.  So Emacs
 supports various @dfn{input methods}, typically one for each script or
 language, to make it convenient to type them.
@kindex C-x RET
  The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
 to multibyte characters, coding systems, and input methods.
@node Enabling Multibyte
@section Enabling Multibyte Characters
@ -153,16 +160,22 @@ have basically the same effect as @samp{--unibyte}.
@cindex unibyte operation, and Lisp files
@cindex init file, and non-ASCII characters
@cindex environment variables, and non-ASCII characters
-  Multibyte strings are not created during initialization from the
+  With @samp{--unibyte}, multibyte strings are not created during
-values of environment variables, @file{/etc/passwd} entries etc.@: that
+initialization from the values of environment variables,
-contain non-ASCII 8-bit characters.  However, Lisp files, when they are
+@file{/etc/passwd} entries etc.@: that contain non-ASCII 8-bit
-loaded for running, and in particular the initialization file
+characters.
-@file{.emacs}, are normally read as multibyte---even with
+
-@samp{--unibyte}.  To avoid multibyte strings being generated by
+  Emacs normally loads Lisp files as multibyte, regardless of whether
-non-ASCII characters in Lisp files, put @samp{-*-unibyte: t;-*-} in a
+you used @samp{--unibyte}.  This includes the Emacs initialization
-comment on the first line, or specify the coding system @samp{raw-text}
+file, @file{.emacs}, and the initialization files of Emacs packages
-with @kbd{C-x @key{RET} c}.  Do the same for initialization files for
+such as Gnus.  However, you can specify unibyte loading for a
-packages like Gnus.
+particular Lisp file, by putting @samp{-*-unibyte: t;-*-} in a comment
 on the first line.  Then that file is always loaded as unibyte text,
 even if you did not start Emacs with @samp{--unibyte}.  The motivation
 for these conventions is that it is more reliable to always load any
 particular Lisp file in the same way.  However, you can load a Lisp
 file as unibyte, on any one occasion, by typing @kbd{C-x @key{RET} c
 raw-text @key{RET}} immediately before loading it.
  The mode line indicates whether multibyte character support is enabled
 in the current buffer.  If it is, there are two or more characters (most
@ -206,13 +219,12 @@ sign), Polish, Romanian, Slovak, Slovenian, Thai, Tibetan, Turkish,
 Dutch, Spanish, and Vietnamese.
@end quotation
-@cindex fonts, for displaying different languages
+@cindex fonts for various scripts
-  To be able to display the script(s) used by your language environment
+  To display the script(s) used by your language environment on a
-on a windowed display, you need to have a suitable font installed.  If
+graphical display, you need to have a suitable font.  If some of the
-some of the characters appear as empty boxes, download and install the
+characters appear as empty boxes, you should install the GNU Intlfonts
-GNU Intlfonts distribution, which includes fonts for all supported
+package, which includes fonts for all supported scripts.
-scripts.  @xref{Fontsets}, for more details about setting up your
+@xref{Fontsets}, for more details about setting up your fonts.
 fonts.
@findex set-locale-environment
@vindex locale-language-names
@ -220,31 +232,21 @@ fonts.
@cindex locales
  Some operating systems let you specify the language you are using by
 setting the locale environment variables @env{LC_ALL}, @env{LC_CTYPE},
-and @env{LANG}; the first of these which is nonempty specifies your
+or @env{LANG}.@footnote{If more than one of these is set, the first
-locale.  Emacs handles this during startup by invoking the
+one that is nonempty specifies your locale for this purpose.}  Emacs
-@code{set-locale-environment} function, which matches your locale
+handles this during startup by matching your locale against entries in
-against entries in the value of the variable
+the value of the variables @code{locale-charset-language-names} and
@code{locale-language-names} and selects the corresponding language
-environment if a match is found.  But if your locale also matches an
+environment if a match is found.  (The former variable overrides the
-entry in the variable @code{locale-charset-language-names}, this entry
+latter.)  It also adjusts the display table and terminal coding
-is preferred if its character set disagrees.  For example, suppose the
+system, the locale coding system, and the preferred coding system as
-locale @samp{en_GB.ISO8859-15} matches @code{"Latin-1"} in
+needed for the locale.
@code{locale-language-names} and @code{"Latin-9"} in
@code{locale-charset-language-names}; since these two language
 environments' character sets disagree, Emacs uses @code{"Latin-9"}.
-  If all goes well, the @code{set-locale-environment} function selects
+  If you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG}
-the language environment, since language is part of locale.  It also
+environment variables while running Emacs, you may want to invoke the
-adjusts the display table and terminal coding system, the locale coding
+@code{set-locale-environment} function afterwards to readjust the
-system, and the preferred coding system as needed for the locale.
+language environment from the new locale.
  Since the @code{set-locale-environment} function is automatically
 invoked during startup, you normally do not need to invoke it yourself.
 However, if you modify the @env{LC_ALL}, @env{LC_CTYPE}, or @env{LANG}
 environment variables, you may want to invoke the
@code{set-locale-environment} function afterwards.
@findex set-locale-environment
@vindex locale-preferred-coding-systems
  The @code{set-locale-environment} function normally uses the preferred
 coding system established by the language environment to decode system
@ -255,10 +257,10 @@ matches @code{japanese-shift-jis} in
@code{locale-preferred-coding-systems}, Emacs uses that encoding even
 though it might normally use @code{japanese-iso-8bit}.
-  The environment chosen from the locale when Emacs starts is
+  You can override the language environment chosen at startup with
-overidden by any explicit use of the command
+explicit use of the command @code{set-language-environment}, or with
-@code{set-language-environment} or customization of
+customization of @code{current-language-environment} in your init
-@code{current-language-environment} in your init file.
+file.
@kindex C-h L
@findex describe-language-environment
@ -369,8 +371,10 @@ characters to type next is displayed in the echo area (but not when you
 are in the minibuffer).
@cindex Leim package
-Input methods are implemented in the separate Leim package, which must
+  Input methods are implemented in the separate Leim package: they are
-be installed with Emacs.
+available only if the system administrator used Leim when building
 Emacs.  If Emacs was built without Leim, you will find that no input
 methods are defined.
@node Select Input Method
@section Selecting an Input Method
@ -443,11 +447,12 @@ method, including the string that stands for it in the mode line.
 through 0377 (octal) are not really legitimate in the buffer.  The valid
 non-ASCII printing characters have codes that start from 0400.
-  If you type a self-inserting character in the range 0240
+  If you type a self-inserting character in the range 0240 through
-through 0377, Emacs assumes you intended to use one of the ISO
+0377, or if you use @kbd{C-q} to insert one, Emacs assumes you
-Latin-@var{n} character sets, and converts it to the Emacs code
+intended to use one of the ISO Latin-@var{n} character sets, and
-representing that Latin-@var{n} character.  You select @emph{which} ISO
+converts it to the Emacs code representing that Latin-@var{n}
-Latin character set to use through your choice of language environment
+character.  You select @emph{which} ISO Latin character set to use
 through your choice of language environment
@iftex
 (see above).
@end iftex
@ -456,13 +461,12 @@ Latin character set to use through your choice of language environment
@end ifinfo
 If you do not specify a choice, the default is Latin-1.
-  The same thing happens when you use @kbd{C-q} to enter an octal code
+  If you insert a character in the range 0200 through 0237, which
-in this range.  If you enter a code in the range 0200 through 0237,
+forms the @code{eight-bit-control} character set, it is inserted
 which forms the @code{eight-bit-control} character set, it is inserted
 literally.  You should normally avoid doing this since buffers
 containing such characters have to be written out in either the
-@code{emacs-mule} or @code{raw-text} coding system, which is usually not
+@code{emacs-mule} or @code{raw-text} coding system, which is usually
-what you want.
+not what you want.
@node Coding Systems
@section Coding Systems
@ -652,24 +656,24 @@ to non-@code{nil}.
@cindex escape sequences in files
  By default, the automatic detection of coding system is sensitive to
 escape sequences.  If Emacs sees a sequence of characters that begin
-with an @key{ESC} character, and the sequence is valid as an ISO-2022
+with an escape character, and the sequence is valid as an ISO-2022
-code, the code is determined as one of ISO-2022 encoding, and the file
+code, that tells Emacs to use one of the ISO-2022 encodings to decode
-is decoded by the corresponding coding system
+the file.
 (e.g. @code{iso-2022-7bit}).
-  However, there may be cases that you want to read escape sequences in
+  However, there may be cases that you want to read escape sequences
-a file as is.  In such a case, you can set th variable
+in a file as is.  In such a case, you can set the variable
@code{inhibit-iso-escape-detection} to non-@code{nil}.  Then the code
-detection will ignore any escape sequences, and so no file is detected
+detection ignores any escape sequences, and never uses an ISO-2022
-as being encoded in some of ISO-2022 encoding.  The result is that all
+encoding.  The result is that all escape sequences become visible in
-escape sequences become visible in a buffer.
+the buffer.
  The default value of @code{inhibit-iso-escape-detection} is
-@code{nil}, and it is strongly recommended not to change it.  That's
+@code{nil}.  We recommend that you not change it permanently, only for
-because many Emacs Lisp source files that contain non-ASCII characters
+one specific operation.  That's because many Emacs Lisp source files
-are encoded in the coding system @code{iso-2022-7bit} in the Emacs
+that contain non-ASCII characters are encoded in the coding system
-distribution, and they won't be decoded correctly when you visit those
+@code{iso-2022-7bit} in the Emacs distribution, and they won't be
-files if you suppress the escape sequence detection.
+decoded correctly when you visit those files if you suppress the
 escape sequence detection.
@vindex coding
  You can specify the coding system for a particular file using the
@ -700,33 +704,34 @@ a different coding system, you can specify a different coding system for
 the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify
 Coding}).
-  While editing a file, you will sometimes insert characters which
+  You can insert any possible character into any Emacs buffer, but
-cannot be encoded with the coding system stored in
+most coding systems can only handle some of the possible characters.
-@code{buffer-file-coding-system}.  For example, suppose you start with
+This means that you can insert characters that cannot be encoded with
-an ASCII file and insert a few Latin-1 characters into it.  Or you could
+the coding system that will be used to save the buffer.  For example,
-edit a text file in Polish encoded in @code{iso-8859-2} and add to it
+you could start with an ASCII file and insert a few Latin-1 characters
-translations of several Polish words into Russian.  When you save the
+into it, or or you could edit a text file in Polish encoded in
-buffer, Emacs can no longer use the previous value of the buffer's
+@code{iso-8859-2} and add to it translations of several Polish words
-coding system, because the characters you added cannot be encoded by
+into Russian.  When you save the buffer, Emacs cannot use the current
-that coding system.
+value of @code{buffer-file-coding-system}, because the characters you
 added cannot be encoded by that coding system.
  When that happens, Emacs tries the most-preferred coding system (set
 by @kbd{M-x prefer-coding-system} or @kbd{M-x
-set-language-environment}), and if that coding system can safely encode
+set-language-environment}), and if that coding system can safely
-all of the characters in the buffer, Emacs uses it, and stores its value
+encode all of the characters in the buffer, Emacs uses it, and stores
-in @code{buffer-file-coding-system}.  Otherwise, Emacs pops up a window
+its value in @code{buffer-file-coding-system}.  Otherwise, Emacs
-with a list of coding systems suitable for encoding the buffer, and
+displays a list of coding systems suitable for encoding the buffer's
-prompts you to choose one of those coding systems.
+contents, and asks to choose one of those coding systems.
-  If you insert characters which cannot be encoded by the buffer's
+  If you insert the unsuitable characters in a mail message, Emacs
-coding system while editing a mail message, Emacs behaves a bit
+behaves a bit differently.  It additionally checks whether the
-differently.  It additionally checks whether the most-preferred coding
+most-preferred coding system is recommended for use in MIME messages;
-system is recommended for use in MIME messages; if it isn't, Emacs tells
+if it isn't, Emacs tells you that the most-preferred coding system is
-you that the most-preferred coding system is not recommended and prompts
+not recommended and prompts you for another coding system.  This is so
-you for another coding system.  This is so you won't inadvertently send
+you won't inadvertently send a message encoded in a way that your
-a message encoded in a way that your recipient's mail software will have
+recipient's mail software will have difficulty decoding.  (If you do
-difficulty decoding.  (If you do want to use the most-preferred coding
+want to use the most-preferred coding system, you can type its name to
-system, you can type its name to Emacs prompt anyway.)
+Emacs prompt anyway.)
@vindex sendmail-coding-system
  When you send a message with Mail mode (@pxref{Sending Mail}), Emacs has
@ -916,13 +921,14 @@ name, or it may get an error.  If such a problem happens, use @kbd{C-x
 C-w} to specify a new file name for that buffer.
@vindex locale-coding-system
-  The variable @code{locale-coding-system} specifies a coding system to
+  The variable @code{locale-coding-system} specifies a coding system
-use when encoding and decoding system strings such as system error
+to use when encoding and decoding system strings such as system error
-messages and @code{format-time-string} formats and time stamps.  This
+messages and @code{format-time-string} formats and time stamps.  You
-coding system should be compatible with the underlying system's coding
+should choose a coding system that is compatible with the underlying
-system, which is normally specified by the first environment variable in
+system's text representation, which is normally specified by one of
-the list @env{LC_ALL}, @env{LC_CTYPE}, @env{LANG} whose value is
+the environment variables @env{LC_ALL}, @env{LC_CTYPE}, and
-nonempty.
+@env{LANG}.  (The first one whose value is nonempty is the one that
 determines the text representation.)
@node Fontsets
@section Fontsets
@ -941,7 +947,7 @@ specifying its name, anywhere that you could use a single font.  Of
 course, Emacs fontsets can use only the fonts that the X server
 supports; if certain characters appear on the screen as hollow boxes,
 this means that the fontset in use for them has no font for those
-characters.@footnote{The installation instructions have information on
+characters.@footnote{The Emacs installation instructions have information on
 additional font support.}
  Emacs creates two fontsets automatically: the @dfn{standard fontset}
@ -1099,23 +1105,27 @@ call this function explicitly to create a fontset.
@node Undisplayable Characters
@section Undisplayable Characters
-Your terminal may not be able to display some non-@sc{ascii} characters.
+  Your terminal may be unable to display some non-@sc{ascii}
-Most non-windowing terminals can only use a single character set,
+characters.  Most non-windowing terminals can only use a single
-specified by the variable @code{default-terminal-coding-system}
+character set (use the variable @code{default-terminal-coding-system}
-(@pxref{Specify Coding}) and characters which can't be encoded in it are
+(@pxref{Specify Coding}) to tell Emacs which one); characters which
-displayed as @samp{?} by default.  Windowing terminals may not have the
+can't be encoded in that coding system are displayed as @samp{?} by
-necessary font available to display a given character and display a
+default.
 hollow box instead.  You can change the default behavior.
-If you use Latin-1 characters but your terminal can't display Latin-1,
+  Windowing terminals can display a broader range of characters, but
-you can arrange to display mnemonic @sc{ascii} sequences instead, e.g.@:
+you may not have fonts installed for all of them; characters that have
-@samp{"o} for o-umlaut.  Load the library @file{iso-ascii} to do this.
+no font appear as a hollow box.
-If your terminal can display Latin-1, you can display characters from
+  If you use Latin-1 characters but your terminal can't display
-other European character sets using a mixture of equivalent Latin-1
+Latin-1, you can arrange to display mnemonic @sc{ascii} sequences
-characters and @sc{ascii} mnemonics.  Use the Custom option
+instead, e.g.@: @samp{"o} for o-umlaut.  Load the library
-@code{latin1-display} to enable this.  The mnemonic @sc{ascii} sequences
+@file{iso-ascii} to do this.
-mostly correspond to those of the prefix input methods.
+
  If your terminal can display Latin-1, you can display characters
 from other European character sets using a mixture of equivalent
 Latin-1 characters and @sc{ascii} mnemonics.  Use the Custom option
@code{latin1-display} to enable this.  The mnemonic @sc{ascii}
 sequences mostly correspond to those of the prefix input methods.
@node Single-Byte Character Support
@section Single-byte Character Set Support
@ -1172,18 +1182,18 @@ characters:
@findex set-keyboard-coding-system
@vindex keyboard-coding-system
 If your keyboard can generate character codes 128 and up, representing
-non-ASCII characters, use the command @code{M-x
+non-ASCII you can type those character codes directly.
 set-keyboard-coding-system} or the Custom option
@code{keyboard-coding-system} to specify this in the same way as for
 multibyte usage (@pxref{Specify Coding}).
-It is not necessary to do this under a window system which can
+On a windowing terminal, you should not need to do anything special to
-distinguish 8-bit characters and Meta keys.  If you do this on a normal
+use these keys; they should simply work.  On a text-only terminal, you
-terminal, you will probably need to use @kbd{ESC} to type Meta
+should use the command @code{M-x set-keyboard-coding-system} or the
-characters.@footnote{In some cases, such as the Linux console and
+Custom option @code{keyboard-coding-system} to specify which coding
-@code{xterm}, you can arrange for Meta to be converted to @kbd{ESC} and
+system your keyboard uses (@pxref{Specify Coding}).  Enabling this
-still be able type 8-bit characters present directly on the keyboard or
+feature will probably require you to use @kbd{ESC} to type Meta
-using @kbd{Compose} or @kbd{AltGr} keys.}  @xref{User Input}.
+characters; however, on a Linux console or in @code{xterm}, you can
 arrange for Meta to be converted to @kbd{ESC} and still be able type
 8-bit characters present directly on the keyboard or using
@kbd{Compose} or @kbd{AltGr} keys.  @xref{User Input}.
@item
 You can use an input method for the selected language environment.
@ -1205,7 +1215,7 @@ and in any other context where a key sequence is allowed.
 library is loaded, the @key{ALT} modifier key, if you have one, serves
 the same purpose as @kbd{C-x 8}; use @key{ALT} together with an accent
 character to modify the following letter.  In addition, if you have keys
-for the Latin-1 ``dead accent characters'', they too are defined to
+for the Latin-1 ``dead accent characters,'' they too are defined to
 compose with the following character, once @code{iso-transl} is loaded.
 Use @kbd{C-x 8 C-h} to list the available translations as mnemonic
 command names.
@ -1215,9 +1225,9 @@ command names.
@cindex ISO Accents mode
@findex iso-accents-mode
@cindex Latin-1, Latin-2 and Latin-3 input mode
-For Latin-1, Latin-2 and Latin-3, @kbd{M-x iso-accents-mode} installs a
+For Latin-1, Latin-2 and Latin-3, @kbd{M-x iso-accents-mode} installs
-minor mode which provides a facility like the @code{latin-1-prefix}
+a minor mode which works much like the @code{latin-1-prefix} input
-input method but independent of the Leim package.  This mode is
+method does not depend on having the input methods installed.  This
-buffer-local.  It can be customized for various languages with @kbd{M-x
+mode is buffer-local.  It can be customized for various languages with
-iso-accents-customize}.
+@kbd{M-x iso-accents-customize}.
@end itemize