1
0
mirror of https://git.savannah.gnu.org/git/emacs.git synced 2025-01-16 17:19:41 +00:00

Non-ASCII in regexp ranges.

This commit is contained in:
Dave Love 2000-10-13 16:36:35 +00:00
parent 40ad3db491
commit 6cc089d2ad

View File

@ -311,10 +311,17 @@ matches both @samp{]} and @samp{-}.
To include @samp{^} in a character alternative, put it anywhere but at
the beginning.
The beginning and end of a range must be in the same character set
(@pxref{Character Sets}). Thus, @samp{[a-\x8e0]} is invalid because
@samp{a} is in the @sc{ascii} character set but the character 0x8e0
(@samp{a} with grave accent) is in the Emacs character set for Latin-1.
The beginning and end of a range of multibyte characters must be in the
same character set (@pxref{Character Sets}). Thus, @samp{[\x8e0-\x97c]}
is invalid because character 0x8e0 (@samp{a} with grave accent) is in
the Emacs character set for Latin-1 but the character 0x97c (@samp{u}
with diaeresis) is in the Emacs character set for Latin-2.
If a range starts with a unibyte character @var{c} and ends with a
multibyte character @var{c2}, the range is divided into two parts: one
is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
@var{c1} is the first character of the charset to which @var{c2}
belongs.
You cannot always match all non-@sc{ascii} characters with the regular
expression @samp{[\200-\377]}. This works when searching a unibyte