1
0
mirror of https://git.savannah.gnu.org/git/emacs.git synced 2024-11-24 07:20:37 +00:00

Update the bidirectional reordering engine for Unicode 6.3 and 7.0.

src/bidi.c (bidi_ignore_explicit_marks_for_paragraph_level): Remove
 variable.
 (bidi_get_type): Return the isolate initiators and terminator
 types.
 (bidi_isolate_fmt_char, bidi_paired_bracket_type)
 (bidi_fetch_char_skip_isolates, find_first_strong_char)
 (bidi_find_bracket_pairs, bidi_resolve_brackets): New functions.
 (bidi_set_sos_type): Renamed from bidi_set_sor_type and updated
 for the new features.
 (bidi_push_embedding_level, bidi_pop_embedding_level): Update to
 push and pop correctly for isolates.
 (bidi_remember_char): Modified to accept an additional argument
 and record the bidi type according to its value.
 (bidi_cache_iterator_state): Accept an additional argument to only
 update an existing state.  Handle the new members of struct bidi_it.
 (bidi_cache_find): Arguments changed: no lnger accepts a level,
 instead accepts a flag telling it whether it is okay to return
 unresolved neutrals.
 (bidi_initialize): Initiate and staticpro the bracket-type uniprop
 table.  Initialize new isolate-related members.
 (bidi_paragraph_init): Some code factored out into
 find_first_strong_char.
 (bidi_resolve_explicit_1): Function deleted, its code incorporated
 into bidi_resolve_explicit.
 (bidi_resolve_explicit): Support the isolate initiators and
 terminator.  Fix handling of embeddings and overrides according to
 new UBA requirements.  Record information about previously seen
 characters here (moved from bidi_level_of_next_char).
 (bidi_resolve_weak): Adapt to changes in struct members.
 (FLAG_EMBEDDING_INSIDE, FLAG_OPPOSITE_INSIDE, MAX_BPA_STACK)
 (STORE_BRACKET_CHARPOS, PUSH_BPA_STACK): New macros.
 (bidi_resolve_neutral): Call bidi_resolve_brackets to handle the
 paired bracket resolution.  Handle isolate initiators and
 terminator.
 (bidi_type_of_next_char): Remove unneeded code for BN limit.
 (bidi_level_of_next_char): Move the code that records information
 about previous characters to bidi_resolve_explicit.  Fix logic of
 resolving neutrals and make sure their cache entries are updated.
 Remove now unneeded special handling of PDF level.
 src/dispextern.h (struct glyph): Enlarge the width of resolved_level.
 (BIDI_MAXDEPTH): New macro, renamed from BIDI_MAXLEVEL and
 enlarged per Unicode 6.3.
 (enum bidi_bracket_type_t): New data type.
 (struct bidi_saved_info): Leave only 2 type members out of 4.
 Remove bytepos.
 (struct bidi_stack): Add members necessary to support isolating
 sequences.
 (struct bidi_it): Add new members necessary to support isolating
 sequences and bracket pair resolution.
 src/xdisp.c (Fbidi_resolved_levels): New function.
 (syms_of_xdisp): Defsubr it.
 (append_glyph, append_composite_glyph, produce_image_glyph)
 (append_stretch_glyph, append_glyphless_glyph): Convert aborts to
 assertions.
 (syms_of_xdisp) <inhibit-bidi-mirroring>: New variable.
 src/term.c (append_glyph, append_composite_glyph)
 (append_glyphless_glyph): Convert aborts to assertions.
 src/.gdbinit (pgx): Display the character codepoint, resolved level,
 and bidi type also for glyphless glyphs.

 lisp/simple.el (what-cursor-position): Update to support the new bidi
 characters.
 lisp/descr-text.el (describe-char): Update to support the new bidi
 characters.

 admin/unidata/unidata-gen.el (unidata-prop-alist): New properties
 'paired-bracket' and 'bracket-type', in support of the UBA 6.3.
 (unidata-gen-table): Support PROP-IDX being a function.
 (unidata-describe-bidi-bracket-type, unidata-gen-brackets-list)
 (unidata-gen-bracket-type-list): New functions.
 (unidata-check): Support checking the 'bracket-type' attribute.
 (unidata-gen-files): Don't create backups for uni-*.el files.
 admin/unidata/Makefile.in (${unidir}/charprop.el): Depend on
 BidiMirroring.txt and BidiBrackets.txt.
 admin/unidata/BidiBrackets.txt: New file, from Unicode.

 etc/NEWS: Mention the UBA implementation update.
 etc/HELLO: Remove now unneeded directional control characters.

 doc/lispref/nonascii.texi (Character Properties): Document the new
 properties 'bracket-type' and 'paired-bracket'.
 doc/lisprefdisplay.texi (Bidirectional Display): Update the version of the
 UBA to which we are conforming.

 test/BidiCharacterTest.txt: New file, from Unicode.
 test/biditest.el: New file.
This commit is contained in:
Eli Zaretskii 2014-10-15 17:11:25 +03:00
commit ed7ebd933a
23 changed files with 98173 additions and 633 deletions

1
.gitignore vendored
View File

@ -21,3 +21,4 @@ etc/refcards/*.aux
etc/refcards/*.log
info/dir
info/*.info
test/biditest.txt

View File

@ -1,3 +1,18 @@
2014-10-15 Eli Zaretskii <eliz@gnu.org>
* unidata/unidata-gen.el (unidata-prop-alist): New properties
'paired-bracket' and 'bracket-type', in support of the UBA 6.3.
(unidata-gen-table): Support PROP-IDX being a function.
(unidata-describe-bidi-bracket-type, unidata-gen-brackets-list)
(unidata-gen-bracket-type-list): New functions.
(unidata-check): Support checking the 'bracket-type' attribute.
(unidata-gen-files): Don't create backups for uni-*.el files.
* unidata/Makefile.in (${unidir}/charprop.el): Depend on
BidiMirroring.txt and BidiBrackets.txt.
* unidata/BidiBrackets.txt: New file, from Unicode.
2014-10-13 Glenn Morris <rgm@gnu.org>
* authors.el (authors-aliases, authors-fixed-case)

View File

@ -0,0 +1,176 @@
# BidiBrackets-7.0.0.txt
# Date: 2014-01-21, 02:30:00 GMT [AG, LI, KW]
#
# Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type Properties
#
# This file is a normative contributory data file in the Unicode
# Character Database.
#
# Copyright (c) 1991-2014 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# Bidi_Paired_Bracket is a normative property of type Miscellaneous,
# which establishes a mapping between characters that are treated as
# bracket pairs by the Unicode Bidirectional Algorithm.
#
# Bidi_Paired_Bracket_Type is a normative property of type Enumeration,
# which classifies characters into opening and closing paired brackets
# for the purposes of the Unicode Bidirectional Algorithm.
#
# This file lists the set of code points with Bidi_Paired_Bracket_Type
# property values Open and Close. The set is derived from the character
# properties General_Category (gc), Bidi_Class (bc), Bidi_Mirrored (Bidi_M),
# and Bidi_Mirroring_Glyph (bmg), as follows: two characters, A and B,
# form a bracket pair if A has gc=Ps and B has gc=Pe, both have bc=ON and
# Bidi_M=Y, and bmg of A is B. Bidi_Paired_Bracket (bpb) maps A to B and
# vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are
# Open (o) and Close (c), respectively.
#
# For legacy reasons, the characters U+FD3E ORNATE LEFT PARENTHESIS and
# U+FD3F ORNATE RIGHT PARENTHESIS do not mirror in bidirectional display
# and therefore do not form a bracket pair.
#
# The Unicode property value stability policy guarantees that characters
# which have bpt=o or bpt=c also have bc=ON and Bidi_M=Y. As a result, an
# implementation can optimize the lookup of the Bidi_Paired_Bracket_Type
# property values Open and Close by restricting the processing to characters
# with bc=ON.
#
# The format of the file is three fields separated by a semicolon.
# Field 0: Unicode code point value, represented as a hexadecimal value
# Field 1: Bidi_Paired_Bracket property value, a code point value or <none>
# Field 2: Bidi_Paired_Bracket_Type property value, one of the following:
# o Open
# c Close
# n None
# The names of the characters in field 0 are given in comments at the end
# of each line.
#
# For information on bidirectional paired brackets, see UAX #9: Unicode
# Bidirectional Algorithm, at http://www.unicode.org/unicode/reports/tr9/
#
# This file was originally created by Andrew Glass and Laurentiu Iancu
# for Unicode 6.3.
0028; 0029; o # LEFT PARENTHESIS
0029; 0028; c # RIGHT PARENTHESIS
005B; 005D; o # LEFT SQUARE BRACKET
005D; 005B; c # RIGHT SQUARE BRACKET
007B; 007D; o # LEFT CURLY BRACKET
007D; 007B; c # RIGHT CURLY BRACKET
0F3A; 0F3B; o # TIBETAN MARK GUG RTAGS GYON
0F3B; 0F3A; c # TIBETAN MARK GUG RTAGS GYAS
0F3C; 0F3D; o # TIBETAN MARK ANG KHANG GYON
0F3D; 0F3C; c # TIBETAN MARK ANG KHANG GYAS
169B; 169C; o # OGHAM FEATHER MARK
169C; 169B; c # OGHAM REVERSED FEATHER MARK
2045; 2046; o # LEFT SQUARE BRACKET WITH QUILL
2046; 2045; c # RIGHT SQUARE BRACKET WITH QUILL
207D; 207E; o # SUPERSCRIPT LEFT PARENTHESIS
207E; 207D; c # SUPERSCRIPT RIGHT PARENTHESIS
208D; 208E; o # SUBSCRIPT LEFT PARENTHESIS
208E; 208D; c # SUBSCRIPT RIGHT PARENTHESIS
2308; 2309; o # LEFT CEILING
2309; 2308; c # RIGHT CEILING
230A; 230B; o # LEFT FLOOR
230B; 230A; c # RIGHT FLOOR
2329; 232A; o # LEFT-POINTING ANGLE BRACKET
232A; 2329; c # RIGHT-POINTING ANGLE BRACKET
2768; 2769; o # MEDIUM LEFT PARENTHESIS ORNAMENT
2769; 2768; c # MEDIUM RIGHT PARENTHESIS ORNAMENT
276A; 276B; o # MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT
276B; 276A; c # MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT
276C; 276D; o # MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT
276D; 276C; c # MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT
276E; 276F; o # HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT
276F; 276E; c # HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT
2770; 2771; o # HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT
2771; 2770; c # HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT
2772; 2773; o # LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT
2773; 2772; c # LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT
2774; 2775; o # MEDIUM LEFT CURLY BRACKET ORNAMENT
2775; 2774; c # MEDIUM RIGHT CURLY BRACKET ORNAMENT
27C5; 27C6; o # LEFT S-SHAPED BAG DELIMITER
27C6; 27C5; c # RIGHT S-SHAPED BAG DELIMITER
27E6; 27E7; o # MATHEMATICAL LEFT WHITE SQUARE BRACKET
27E7; 27E6; c # MATHEMATICAL RIGHT WHITE SQUARE BRACKET
27E8; 27E9; o # MATHEMATICAL LEFT ANGLE BRACKET
27E9; 27E8; c # MATHEMATICAL RIGHT ANGLE BRACKET
27EA; 27EB; o # MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
27EB; 27EA; c # MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
27EC; 27ED; o # MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET
27ED; 27EC; c # MATHEMATICAL RIGHT WHITE TORTOISE SHELL BRACKET
27EE; 27EF; o # MATHEMATICAL LEFT FLATTENED PARENTHESIS
27EF; 27EE; c # MATHEMATICAL RIGHT FLATTENED PARENTHESIS
2983; 2984; o # LEFT WHITE CURLY BRACKET
2984; 2983; c # RIGHT WHITE CURLY BRACKET
2985; 2986; o # LEFT WHITE PARENTHESIS
2986; 2985; c # RIGHT WHITE PARENTHESIS
2987; 2988; o # Z NOTATION LEFT IMAGE BRACKET
2988; 2987; c # Z NOTATION RIGHT IMAGE BRACKET
2989; 298A; o # Z NOTATION LEFT BINDING BRACKET
298A; 2989; c # Z NOTATION RIGHT BINDING BRACKET
298B; 298C; o # LEFT SQUARE BRACKET WITH UNDERBAR
298C; 298B; c # RIGHT SQUARE BRACKET WITH UNDERBAR
298D; 2990; o # LEFT SQUARE BRACKET WITH TICK IN TOP CORNER
298E; 298F; c # RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
298F; 298E; o # LEFT SQUARE BRACKET WITH TICK IN BOTTOM CORNER
2990; 298D; c # RIGHT SQUARE BRACKET WITH TICK IN TOP CORNER
2991; 2992; o # LEFT ANGLE BRACKET WITH DOT
2992; 2991; c # RIGHT ANGLE BRACKET WITH DOT
2993; 2994; o # LEFT ARC LESS-THAN BRACKET
2994; 2993; c # RIGHT ARC GREATER-THAN BRACKET
2995; 2996; o # DOUBLE LEFT ARC GREATER-THAN BRACKET
2996; 2995; c # DOUBLE RIGHT ARC LESS-THAN BRACKET
2997; 2998; o # LEFT BLACK TORTOISE SHELL BRACKET
2998; 2997; c # RIGHT BLACK TORTOISE SHELL BRACKET
29D8; 29D9; o # LEFT WIGGLY FENCE
29D9; 29D8; c # RIGHT WIGGLY FENCE
29DA; 29DB; o # LEFT DOUBLE WIGGLY FENCE
29DB; 29DA; c # RIGHT DOUBLE WIGGLY FENCE
29FC; 29FD; o # LEFT-POINTING CURVED ANGLE BRACKET
29FD; 29FC; c # RIGHT-POINTING CURVED ANGLE BRACKET
2E22; 2E23; o # TOP LEFT HALF BRACKET
2E23; 2E22; c # TOP RIGHT HALF BRACKET
2E24; 2E25; o # BOTTOM LEFT HALF BRACKET
2E25; 2E24; c # BOTTOM RIGHT HALF BRACKET
2E26; 2E27; o # LEFT SIDEWAYS U BRACKET
2E27; 2E26; c # RIGHT SIDEWAYS U BRACKET
2E28; 2E29; o # LEFT DOUBLE PARENTHESIS
2E29; 2E28; c # RIGHT DOUBLE PARENTHESIS
3008; 3009; o # LEFT ANGLE BRACKET
3009; 3008; c # RIGHT ANGLE BRACKET
300A; 300B; o # LEFT DOUBLE ANGLE BRACKET
300B; 300A; c # RIGHT DOUBLE ANGLE BRACKET
300C; 300D; o # LEFT CORNER BRACKET
300D; 300C; c # RIGHT CORNER BRACKET
300E; 300F; o # LEFT WHITE CORNER BRACKET
300F; 300E; c # RIGHT WHITE CORNER BRACKET
3010; 3011; o # LEFT BLACK LENTICULAR BRACKET
3011; 3010; c # RIGHT BLACK LENTICULAR BRACKET
3014; 3015; o # LEFT TORTOISE SHELL BRACKET
3015; 3014; c # RIGHT TORTOISE SHELL BRACKET
3016; 3017; o # LEFT WHITE LENTICULAR BRACKET
3017; 3016; c # RIGHT WHITE LENTICULAR BRACKET
3018; 3019; o # LEFT WHITE TORTOISE SHELL BRACKET
3019; 3018; c # RIGHT WHITE TORTOISE SHELL BRACKET
301A; 301B; o # LEFT WHITE SQUARE BRACKET
301B; 301A; c # RIGHT WHITE SQUARE BRACKET
FE59; FE5A; o # SMALL LEFT PARENTHESIS
FE5A; FE59; c # SMALL RIGHT PARENTHESIS
FE5B; FE5C; o # SMALL LEFT CURLY BRACKET
FE5C; FE5B; c # SMALL RIGHT CURLY BRACKET
FE5D; FE5E; o # SMALL LEFT TORTOISE SHELL BRACKET
FE5E; FE5D; c # SMALL RIGHT TORTOISE SHELL BRACKET
FF08; FF09; o # FULLWIDTH LEFT PARENTHESIS
FF09; FF08; c # FULLWIDTH RIGHT PARENTHESIS
FF3B; FF3D; o # FULLWIDTH LEFT SQUARE BRACKET
FF3D; FF3B; c # FULLWIDTH RIGHT SQUARE BRACKET
FF5B; FF5D; o # FULLWIDTH LEFT CURLY BRACKET
FF5D; FF5B; c # FULLWIDTH RIGHT CURLY BRACKET
FF5F; FF60; o # FULLWIDTH LEFT WHITE PARENTHESIS
FF60; FF5F; c # FULLWIDTH RIGHT WHITE PARENTHESIS
FF62; FF63; o # HALFWIDTH LEFT CORNER BRACKET
FF63; FF62; c # HALFWIDTH RIGHT CORNER BRACKET
# EOF

View File

@ -54,7 +54,9 @@ FORCE =
FORCE:
.PHONY: FORCE
${unidir}/charprop.el: ${FORCE} ${srcdir}/unidata-gen.el ${srcdir}/UnicodeData.txt | \
${unidir}/charprop.el: ${FORCE} ${srcdir}/unidata-gen.el \
${srcdir}/UnicodeData.txt ${srcdir}/BidiMirroring.txt \
${srcdir}/BidiBrackets.txt | \
${srcdir}/unidata-gen.elc unidata.txt
-if [ -f "$@" ]; then \
cd ${unidir} && chmod +w charprop.el `sed -n 's/^;; FILE: //p' < charprop.el`; \

View File

@ -154,7 +154,8 @@
;; PROP: character property
;; INDEX: index to each element of unidata-list for PROP.
;; It may be a function that generates an alist of character codes
;; vs. the corresponding property values.
;; vs. the corresponding property values. Currently, only character
;; codepoints or symbol values are supported in this case.
;; GENERATOR: function to generate a char-table
;; FILENAME: filename to store the char-table
;; DOCSTRING: docstring for the property
@ -273,7 +274,23 @@ is the character itself."
"Unicode bidi-mirroring characters.
Property value is a character that has the corresponding mirroring image or nil.
The value nil means that the actual property value of a character
is the character itself.")))
is the character itself.")
(paired-bracket
unidata-gen-brackets-list unidata-gen-table-character "uni-brackets.el"
"Unicode bidi paired-bracket characters.
Property value is the paired bracket character, or nil.
The value nil means that the character is neither an opening nor
a closing paired bracket."
string)
(bracket-type
unidata-gen-bracket-type-list unidata-gen-table-symbol "uni-brackets.el"
"Unicode bidi paired-bracket type.
Property value is a symbol `o' (Open), `c' (Close), or `n' (None)."
unidata-describe-bidi-bracket-type
n
;; The order of elements must be in sync with bidi_bracket_type_t
;; in src/dispextern.h.
(n o c))))
;; Functions to access the above data.
(defsubst unidata-prop-index (prop) (nth 1 (assq prop unidata-prop-alist)))
@ -451,7 +468,10 @@ is the character itself.")))
(unidata-encode-val val-list (nth 2 elm)))
(set-char-table-range table (cons (car elm) (nth 1 elm)) (nth 2 elm)))
(setq tail unidata-list)
(if (functionp prop-idx)
(setq tail (funcall prop-idx)
prop-idx 1)
(setq tail unidata-list))
(while tail
(setq elt (car tail) tail (cdr tail))
(setq range (car elt)
@ -1157,6 +1177,12 @@ is the character itself.")))
(string ?'))))
val " "))
(defun unidata-describe-bidi-bracket-type (val)
(cdr (assq val
'((n . "Not a paired bracket character.")
(o . "Opening paired bracket character.")
(c . "Closing paired bracket character.")))))
(defun unidata-gen-mirroring-list ()
(let ((head (list nil))
tail)
@ -1170,6 +1196,36 @@ is the character itself.")))
(setq tail (setcdr tail (list (list char mirror)))))))
(cdr head)))
(defun unidata-gen-brackets-list ()
(let ((head (list nil))
tail)
(with-temp-buffer
(insert-file-contents (expand-file-name "BidiBrackets.txt" unidata-dir))
(goto-char (point-min))
(setq tail head)
(while (re-search-forward
"^\\([0-9A-F]+\\);\\s +\\([0-9A-F]+\\);\\s +\\([oc]\\)"
nil t)
(let ((char (string-to-number (match-string 1) 16))
(paired (match-string 2)))
(setq tail (setcdr tail (list (list char paired)))))))
(cdr head)))
(defun unidata-gen-bracket-type-list ()
(let ((head (list nil))
tail)
(with-temp-buffer
(insert-file-contents (expand-file-name "BidiBrackets.txt" unidata-dir))
(goto-char (point-min))
(setq tail head)
(while (re-search-forward
"^\\([0-9A-F]+\\);\\s +\\([0-9A-F]+\\);\\s +\\([oc]\\)"
nil t)
(let ((char (string-to-number (match-string 1) 16))
(type (match-string 3)))
(setq tail (setcdr tail (list (list char type)))))))
(cdr head)))
;; Verify if we can retrieve correct values from the generated
;; char-tables.
;;
@ -1218,7 +1274,9 @@ is the character itself.")))
((eq generator 'unidata-gen-table-decomposition)
(setq val1 (unidata-split-decomposition val1))))
(cond ((eq prop 'decomposition)
(setq val1 (list char)))))
(setq val1 (list char)))
((eq prop 'bracket-type)
(setq val1 'n))))
(when (>= char check)
(message "%S %04X" prop check)
(setq check (+ check #x400)))
@ -1261,6 +1319,9 @@ is the character itself.")))
(describer (unidata-prop-describer prop))
(default-value (unidata-prop-default prop))
(val-list (unidata-prop-val-list prop))
;; Avoid creating backup files for those uni-*.el files
;; that hold more than one table.
(backup-inhibited t)
table)
;; Filename in this comment line is extracted by sed in
;; Makefile.

View File

@ -1,3 +1,11 @@
2014-10-15 Eli Zaretskii <eliz@gnu.org>
* nonascii.texi (Character Properties): Document the new
properties 'bracket-type' and 'paired-bracket'.
* display.texi (Bidirectional Display): Update the version of the
UBA to which we are conforming.
2014-10-13 Glenn Morris <rgm@gnu.org>
* Makefile.in (dist): Update for new output variables.

View File

@ -6613,10 +6613,9 @@ positions do not increase monotonically with string or buffer
position. In performing this @dfn{bidirectional reordering}, Emacs
follows the Unicode Bidirectional Algorithm (a.k.a.@: @acronym{UBA}),
which is described in Annex #9 of the Unicode standard
(@url{http://www.unicode.org/reports/tr9/}). Emacs currently provides
a ``Non-isolate Bidirectionality'' class implementation of the
@acronym{UBA}: it does not yet support the isolate directional
formatting characters introduced with Unicode Standard v6.3.0.
(@url{http://www.unicode.org/reports/tr9/}). Emacs provides a ``Full
Bidirectionality'' class implementation of the @acronym{UBA},
consistent with the requirements of the Unicode Standard v7.0.
@defvar bidi-display-reordering
If the value of this buffer-local variable is non-@code{nil} (the

View File

@ -520,6 +520,24 @@ property to display mirror images of characters when appropriate
(@pxref{Bidirectional Display}). For unassigned codepoints, the value
is @code{nil}.
@item paired-bracket
Corresponds to the Unicode @code{Bidi_Paired_Bracket} property. The
value of this property is the codepoint of a character's @dfn{paired
bracket}, or @code{nil} if the character is not a bracket character.
This establishes a mapping between characters that are treated as
bracket pairs by the Unicode Bidirectional Algorithm; Emacs uses this
property when it decides how to reorder for display parentheses,
braces, and other similar characters (@pxref{Bidirectional Display}).
@item bracket-type
Corresponds to the Unicode @code{Bidi_Paired_Bracket_Type} property.
For characters whose @code{paired-bracket} property is non-@code{nil},
the value of this property is a symbol, either @code{o} (for opening
bracket characters) or @code{c} (for closing bracket characters). For
characters whose @code{paired-bracket} property is @code{nil}, the
value is the symbol @code{n} (None). Like @code{paired-bracket}, this
property is used for bidirectional display.
@item old-name
Corresponds to the Unicode @code{Unicode_1_Name} property. The value
is a string. Unassigned codepoints, and characters that have no value
@ -574,6 +592,14 @@ This function returns the value of @var{char}'s @var{propname} property.
(get-char-code-property ?\u2163 'numeric-value)
@result{} 4
@end group
@group
(get-char-code-property ?\( 'paired-bracket)
@result{} 41 ;; closing parenthesis
@end group
@group
(get-char-code-property ?\) 'bracket-type)
@result{} c
@end group
@end example
@end defun

View File

@ -1,3 +1,9 @@
2014-10-15 Eli Zaretskii <eliz@gnu.org>
* NEWS: Mention the UBA implementation update.
* HELLO: Remove now unneeded directional control characters.
2014-10-13 Jan Djärv <jan.h.d@swipnet.se>
* NEWS: Move and clarify OSX >= 10.6.

View File

@ -18,7 +18,7 @@ Non-ASCII examples:
LANGUAGE (NATIVE NAME) HELLO
---------------------- -----
Amharic ($,1O M[MmN{(B) $,1M`MKM](B
Arabic $,1ro(B($,1-g.$-y-q-h.*.1-i(B) $,1-g.$-s.1.$-g.%(B $,1-y.$.*.#.%(B
Arabic ($,1-g.$-y-q-h.*.1-i(B) $,1-g.$-s.1.$-g.%(B $,1-y.$.*.#.%(B
Bengali ($,17,7>6b727>(B) $,17(7.787M6u7>70(B
Braille $,2(3(1('('(5(B
Burmese ($,1H9H\H4HZH9HL(B) $,1H9H$HZHYH"H<HLH5HK(B
@ -37,7 +37,7 @@ German (Deutsch) Guten Tag / Gr,A|_(B Gott
Greek (,Fekkgmij\(B) ,FCei\(B ,Fsar(B
Greek, ancient ($,1p1,Fkkgmij^(B) ,FO$,1pv,Fk](B ,Fte(B ,Fja$,1q6(B ,Fl]ca(B ,Fwa$,1r6,Fqe(B
Gujarati ($,19W:!9\9p9~9d: (B) $,19h9n9x:-9d:'(B
Hebrew $,1ro(B($,1-",q-(,y-*(B) ,Hylem(B
Hebrew ($,1-",q-(,y-*(B) ,Hylem(B
Hungarian (magyar) Sz,Bi(Bp j,Bs(B napot!
Hindi ($,15y55B5f6 (B) $,15h5n5x6-5d6'(B / $,15h5n5x6-5U5~5p(B $,16D(B
Italian (italiano) Ciao / Buon giorno

View File

@ -110,6 +110,14 @@ character in the pasted text as actual user input. This results in a
paste experience similar to that under a window system, and significant
performance improvements when pasting large amounts of text.
** Emacs now supports the latest version of the UBA.
The Emacs implementation of the Unicode Bidirectional Algorithm (UBA)
was updated to support all the latest additions and changes introduced
in Unicode Standard versions 6.3 and 7.0, and a few changes suggested
for Unicode 8.0. This includes full support for directional isolates
and the Bidirectional Parentheses Algorithm (BPA) specified by these
Unicode standards.
* Changes in Specialized Modes and Packages in Emacs 25.1

View File

@ -1,5 +1,11 @@
2014-10-15 Eli Zaretskii <eliz@gnu.org>
* simple.el (what-cursor-position): Update to support the new bidi
characters.
* descr-text.el (describe-char): Update to support the new bidi
characters.
* emacs-lisp/tabulated-list.el (tabulated-list-mode): Force
bidi-paragraph-direction to 'left-to-right'. This fixes
buffer-menu display when the first buffer happens to start with

View File

@ -434,13 +434,26 @@ relevant to POS."
code (encode-char char charset)))
(setq code char))
(cond
;; Append a PDF character to directional embeddings and
;; overrides, to prevent potential messup of the following
;; text.
((memq char '(?\x202a ?\x202b ?\x202d ?\x202e))
;; Append a PDF character to left-to-right directional
;; embeddings and overrides, to prevent potential messup of the
;; following text.
((memq char '(?\x202a ?\x202d))
(setq char-description
(concat char-description
(propertize (string ?\x202c) 'invisible t))))
;; Append a PDF character followed by LRM to right-to-left
;; directional embeddings and overrides, to prevent potential
;; messup of the following numerical text.
((memq char '(?\x202b ?\x202e))
(setq char-description
(concat char-description
(propertize (string ?\x202c ?\x200e) 'invisible t))))
;; Append a PDI character to directional isolate initiators, to
;; prevent potential messup of the following numerical text
((memq char '(?\x2066 ?\x2067 ?\x2068))
(setq char-description
(concat char-description
(propertize (string ?\x2069) 'invisible t))))
;; Append a LRM character to any strong character to avoid
;; messing up the numerical codepoint.
((memq (get-char-code-property char 'bidi-class) '(R AL))

View File

@ -1223,15 +1223,21 @@ in *Help* buffer. See also the command `describe-char'."
(interactive "P")
(let* ((char (following-char))
(bidi-fixer
(cond ((memq char '(?\x202a ?\x202b ?\x202d ?\x202e))
;; If the character is one of LRE, LRO, RLE, RLO, it
;; will start a directional embedding, which could
;; completely disrupt the rest of the line (e.g., RLO
;; will display the rest of the line right-to-left).
;; So we put an invisible PDF character after these
;; characters, to end the embedding, which eliminates
;; any effects on the rest of the line.
;; If the character is one of LRE, LRO, RLE, RLO, it will
;; start a directional embedding, which could completely
;; disrupt the rest of the line (e.g., RLO will display the
;; rest of the line right-to-left). So we put an invisible
;; PDF character after these characters, to end the
;; embedding, which eliminates any effects on the rest of
;; the line. For RLE and RLO we also append an invisible
;; LRM, to avoid reordering the following numerical
;; characters. For LRI/RLI/FSI we append a PDI.
(cond ((memq char '(?\x202a ?\x202d))
(propertize (string ?\x202c) 'invisible t))
((memq char '(?\x202b ?\x202e))
(propertize (string ?\x202c ?\x200e) 'invisible t))
((memq char '(?\x2066 ?\x2067 ?\x2068))
(propertize (string ?\x2069) 'invisible t))
;; Strong right-to-left characters cause reordering of
;; the following numerical characters which show the
;; codepoint, so append LRM to countermand that.

View File

@ -468,18 +468,18 @@ define pgx
end
# GLYPHLESS_GLYPH
if ($g.type == 2)
printf "GLYPHLESS["
printf "G-LESS["
if ($g.u.glyphless.method == 0)
printf "THIN]"
printf "THIN;0x%x]", $g.u.glyphless.ch
end
if ($g.u.glyphless.method == 1)
printf "EMPTY]"
printf "EMPTY;0x%x]", $g.u.glyphless.ch
end
if ($g.u.glyphless.method == 2)
printf "ACRO]"
printf "ACRO;0x%x]", $g.u.glyphless.ch
end
if ($g.u.glyphless.method == 3)
printf "HEX]"
printf "HEX;0x%x]", $g.u.glyphless.ch
end
end
# IMAGE_GLYPH
@ -498,7 +498,7 @@ define pgx
printf " pos=%d", $g.charpos
end
# For characters, print their resolved level and bidi type
if ($g.type == 0)
if ($g.type == 0 || $g.type == 2)
printf " blev=%d,btyp=", $g.resolved_level
pbiditype $g.bidi_type
end

View File

@ -1,3 +1,70 @@
2014-10-15 Eli Zaretskii <eliz@gnu.org>
Update the bidirectional reordering engine for Unicode 6.3 and 7.0.
* bidi.c (bidi_ignore_explicit_marks_for_paragraph_level): Remove
variable.
(bidi_get_type): Return the isolate initiators and terminator
types.
(bidi_isolate_fmt_char, bidi_paired_bracket_type)
(bidi_fetch_char_skip_isolates, find_first_strong_char)
(bidi_find_bracket_pairs, bidi_resolve_brackets): New functions.
(bidi_set_sos_type): Renamed from bidi_set_sor_type and updated
for the new features.
(bidi_push_embedding_level, bidi_pop_embedding_level): Update to
push and pop correctly for isolates.
(bidi_remember_char): Modified to accept an additional argument
and record the bidi type according to its value.
(bidi_cache_iterator_state): Accept an additional argument to only
update an existing state. Handle the new members of struct bidi_it.
(bidi_cache_find): Arguments changed: no lnger accepts a level,
instead accepts a flag telling it whether it is okay to return
unresolved neutrals.
(bidi_initialize): Initiate and staticpro the bracket-type uniprop
table. Initialize new isolate-related members.
(bidi_paragraph_init): Some code factored out into
find_first_strong_char.
(bidi_resolve_explicit_1): Function deleted, its code incorporated
into bidi_resolve_explicit.
(bidi_resolve_explicit): Support the isolate initiators and
terminator. Fix handling of embeddings and overrides according to
new UBA requirements. Record information about previously seen
characters here (moved from bidi_level_of_next_char).
(bidi_resolve_weak): Adapt to changes in struct members.
(FLAG_EMBEDDING_INSIDE, FLAG_OPPOSITE_INSIDE, MAX_BPA_STACK)
(STORE_BRACKET_CHARPOS, PUSH_BPA_STACK): New macros.
(bidi_resolve_neutral): Call bidi_resolve_brackets to handle the
paired bracket resolution. Handle isolate initiators and
terminator.
(bidi_type_of_next_char): Remove unneeded code for BN limit.
(bidi_level_of_next_char): Move the code that records information
about previous characters to bidi_resolve_explicit. Fix logic of
resolving neutrals and make sure their cache entries are updated.
Remove now unneeded special handling of PDF level.
* dispextern.h (struct glyph): Enlarge the width of resolved_level.
(BIDI_MAXDEPTH): New macro, renamed from BIDI_MAXLEVEL and
enlarged per Unicode 6.3.
(enum bidi_bracket_type_t): New data type.
(struct bidi_saved_info): Leave only 2 type members out of 4.
Remove bytepos.
(struct bidi_stack): Add members necessary to support isolating
sequences.
(struct bidi_it): Add new members necessary to support isolating
sequences and bracket pair resolution.
* xdisp.c (Fbidi_resolved_levels): New function.
(syms_of_xdisp): Defsubr it.
(append_glyph, append_composite_glyph, produce_image_glyph)
(append_stretch_glyph, append_glyphless_glyph): Convert aborts to
assertions.
(syms_of_xdisp) <inhibit-bidi-mirroring>: New variable.
* term.c (append_glyph, append_composite_glyph)
(append_glyphless_glyph): Convert aborts to assertions.
* .gdbinit (pgx): Display the character codepoint, resolved level,
and bidi type also for glyphless glyphs.
2014-10-15 Dmitry Antipov <dmantipov@yandex.ru>
Avoid unwanted point motion in Fline_beginning_position.

1634
src/bidi.c

File diff suppressed because it is too large Load Diff

View File

@ -445,8 +445,8 @@ struct glyph
/* True means don't display cursor here. */
bool_bf avoid_cursor_p : 1;
/* Resolved bidirectional level of this character [0..63]. */
unsigned resolved_level : 5;
/* Resolved bidirectional level of this character [0..127]. */
unsigned resolved_level : 7;
/* Resolved bidirectional type of this character, see enum
bidi_type_t below. Note that according to UAX#9, only some
@ -1857,7 +1857,9 @@ GLYPH_CODE_P (Lisp_Object gc)
extern int face_change_count;
/* For reordering of bidirectional text. */
#define BIDI_MAXLEVEL 64
/* UAX#9's max_depth value. */
#define BIDI_MAXDEPTH 125
/* Data type for describing the bidirectional character types. The
first 7 must be at the beginning, because they are the only values
@ -1894,23 +1896,39 @@ typedef enum {
NEUTRAL_ON /* other neutrals */
} bidi_type_t;
/* Data type for describing the Bidi Paired Bracket Type of a character.
The order of members must be in sync with the 8th element of the
member of unidata-prop-alist (in admin/unidata/unidata-gen.el) for
Unicode character property `bracket-type'. */
typedef enum {
BIDI_BRACKET_NONE = 1,
BIDI_BRACKET_OPEN,
BIDI_BRACKET_CLOSE
} bidi_bracket_type_t;
/* The basic directionality data type. */
typedef enum { NEUTRAL_DIR, L2R, R2L } bidi_dir_t;
/* Data type for storing information about characters we need to
remember. */
struct bidi_saved_info {
ptrdiff_t bytepos, charpos; /* character's buffer position */
ptrdiff_t charpos; /* character's buffer position */
bidi_type_t type; /* character's resolved bidi type */
bidi_type_t type_after_w1; /* original type of the character, after W1 */
bidi_type_t orig_type; /* type as we found it in the buffer */
bidi_type_t orig_type; /* bidi type as we found it in the buffer */
};
/* Data type for keeping track of saved embedding levels and override
status information. */
/* Data type for keeping track of information about saved embedding
levels, override status, isolate status, and isolating sequence
runs. */
struct bidi_stack {
int level;
bidi_dir_t override;
struct bidi_saved_info last_strong;
struct bidi_saved_info next_for_neutral;
struct bidi_saved_info prev_for_neutral;
unsigned level : 7;
bool_bf isolate_status : 1;
unsigned override : 2;
unsigned sos : 2;
};
/* Data type for storing information about a string being iterated on. */
@ -1935,22 +1953,24 @@ struct bidi_it {
ptrdiff_t nchars; /* its "length", usually 1; it's > 1 for a run
of characters covered by a display string */
ptrdiff_t ch_len; /* its length in bytes */
bidi_type_t type; /* bidi type of this character, after
bidi_type_t type; /* final bidi type of this character, after
resolving weak and neutral types */
bidi_type_t type_after_w1; /* original type, after overrides and W1 */
bidi_type_t orig_type; /* original type, as found in the buffer */
int resolved_level; /* final resolved level of this character */
int invalid_levels; /* how many PDFs to ignore */
int invalid_rl_levels; /* how many PDFs from RLE/RLO to ignore */
bidi_type_t type_after_wn; /* bidi type after overrides and Wn */
bidi_type_t orig_type; /* original bidi type, as found in the buffer */
char resolved_level; /* final resolved level of this character */
char isolate_level; /* count of isolate initiators unmatched by PDI */
ptrdiff_t invalid_levels; /* how many PDFs to ignore */
ptrdiff_t invalid_isolates; /* how many PDIs to ignore */
struct bidi_saved_info prev; /* info about previous character */
struct bidi_saved_info last_strong; /* last-seen strong directional char */
struct bidi_saved_info next_for_neutral; /* surrounding characters for... */
struct bidi_saved_info prev_for_neutral; /* ...resolving neutrals */
struct bidi_saved_info next_for_ws; /* character after sequence of ws */
ptrdiff_t bracket_pairing_pos; /* position of pairing bracket */
bidi_type_t bracket_enclosed_type; /* type for bracket resolution */
ptrdiff_t next_en_pos; /* pos. of next char for determining ET type */
bidi_type_t next_en_type; /* type of char at next_en_pos */
ptrdiff_t ignore_bn_limit; /* position until which to ignore BNs */
bidi_dir_t sor; /* direction of start-of-run in effect */
bidi_dir_t sos; /* direction of start-of-sequence in effect */
int scan_dir; /* direction of text scan, 1: forw, -1: back */
ptrdiff_t disp_pos; /* position of display string after ch */
int disp_prop; /* if non-zero, there really is a
@ -1960,12 +1980,11 @@ struct bidi_it {
/* Note: Everything from here on is not copied/saved when the bidi
iterator state is saved, pushed, or popped. So only put here
stuff that is not part of the bidi iterator's state! */
struct bidi_stack level_stack[BIDI_MAXLEVEL]; /* stack of embedding levels */
struct bidi_stack level_stack[BIDI_MAXDEPTH+2+1]; /* directional status stack */
struct bidi_string_data string; /* string to reorder */
struct window *w; /* the window being displayed */
bidi_dir_t paragraph_dir; /* current paragraph direction */
ptrdiff_t separator_limit; /* where paragraph separator should end */
bool_bf prev_was_pdf : 1; /* if true, previous char was PDF */
bool_bf first_elt : 1; /* if true, examine current char first */
bool_bf new_paragraph : 1; /* if true, we expect a new paragraph */
bool_bf frame_window_p : 1; /* true if displaying on a GUI frame */

View File

@ -1513,8 +1513,7 @@ append_glyph (struct it *it)
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
else
@ -1710,8 +1709,7 @@ append_composite_glyph (struct it *it)
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
else
@ -1795,8 +1793,7 @@ append_glyphless_glyph (struct it *it, int face_id, const char *str)
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
else

View File

@ -6935,7 +6935,8 @@ get_next_display_element (struct it *it)
is R..." */
/* FIXME: Do we need an exception for characters from display
tables? */
if (it->bidi_p && it->bidi_it.type == STRONG_R)
if (it->bidi_p && it->bidi_it.type == STRONG_R
&& !inhibit_bidi_mirroring)
it->c = bidi_mirror_char (it->c);
/* Map via display table or translate control characters.
IT->c, IT->len etc. have been set to the next character by
@ -21468,6 +21469,114 @@ Value is the new character position of point. */)
#undef ROW_GLYPH_NEWLINE_P
}
DEFUN ("bidi-resolved-levels", Fbidi_resolved_levels,
Sbidi_resolved_levels, 0, 1, 0,
doc: /* Return the resolved bidirectional levels of characters at VPOS.
The resolved levels are produced by the Emacs bidi reordering engine
that implements the UBA, the Unicode Bidirectional Algorithm. Please
read the Unicode Standard Annex 9 (UAX#9) for background information
about these levels.
VPOS is the zero-based number of the current window's screen line
for which to produce the resolved levels. If VPOS is nil or omitted,
it defaults to the screen line of point. If the window displays a
header line, VPOS of zero will report on the header line, and first
line of text in the window will have VPOS of 1.
Value is an array of resolved levels, indexed by glyph number.
Glyphs are numbered from zero starting from the beginning of the
screen line, i.e. the left edge of the window for left-to-right lines
and from the right edge for right-to-left lines. The resolved levels
are produced only for the window's text area; text in display margins
is not included.
If the selected window's display is not up-to-date, or if the specified
screen line does not display text, this function returns nil. It is
highly recommended to bind this function to some simple key, like F8,
in order to avoid these problems.
This function exists mainly for testing the correctness of the
Emacs UBA implementation, in particular with the test suite. */)
(Lisp_Object vpos)
{
struct window *w = XWINDOW (selected_window);
struct buffer *b = XBUFFER (w->contents);
int nrow;
struct glyph_row *row;
if (NILP (vpos))
{
int d1, d2, d3, d4, d5;
pos_visible_p (w, PT, &d1, &d2, &d3, &d4, &d5, &nrow);
}
else
{
CHECK_NUMBER_COERCE_MARKER (vpos);
nrow = XINT (vpos);
}
/* We require up-to-date glyph matrix for this window. */
if (w->window_end_valid
&& !windows_or_buffers_changed
&& b
&& !b->clip_changed
&& !b->prevent_redisplay_optimizations_p
&& !window_outdated (w)
&& nrow >= 0
&& nrow < w->current_matrix->nrows
&& (row = MATRIX_ROW (w->current_matrix, nrow))->enabled_p
&& MATRIX_ROW_DISPLAYS_TEXT_P (row))
{
struct glyph *g, *e, *g1;
int nglyphs, i;
Lisp_Object levels;
if (!row->reversed_p) /* Left-to-right glyph row. */
{
g = g1 = row->glyphs[TEXT_AREA];
e = g + row->used[TEXT_AREA];
/* Skip over glyphs at the start of the row that was
generated by redisplay for its own needs. */
while (g < e
&& INTEGERP (g->object)
&& g->charpos < 0)
g++;
g1 = g;
/* Count the "interesting" glyphs in this row. */
for (nglyphs = 0; g < e && !INTEGERP (g->object); g++)
nglyphs++;
/* Create and fill the array. */
levels = make_uninit_vector (nglyphs);
for (i = 0; g1 < g; i++, g1++)
ASET (levels, i, make_number (g1->resolved_level));
}
else /* Right-to-left glyph row. */
{
g = row->glyphs[TEXT_AREA] + row->used[TEXT_AREA] - 1;
e = row->glyphs[TEXT_AREA] - 1;
while (g > e
&& INTEGERP (g->object)
&& g->charpos < 0)
g--;
g1 = g;
for (nglyphs = 0; g > e && !INTEGERP (g->object); g--)
nglyphs++;
levels = make_uninit_vector (nglyphs);
for (i = 0; g1 > g; i++, g1--)
ASET (levels, i, make_number (g1->resolved_level));
}
return levels;
}
else
return Qnil;
}
/***********************************************************************
Menu Bar
@ -25198,8 +25307,7 @@ append_glyph (struct it *it)
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
else
@ -25282,8 +25390,7 @@ append_composite_glyph (struct it *it)
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
++it->glyph_row->used[area];
@ -25471,8 +25578,7 @@ produce_image_glyph (struct it *it)
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
++it->glyph_row->used[area];
@ -25560,8 +25666,7 @@ append_stretch_glyph (struct it *it, Lisp_Object object,
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
else
@ -26020,8 +26125,7 @@ append_glyphless_glyph (struct it *it, int face_id, int for_no_font, int len,
if (it->bidi_p)
{
glyph->resolved_level = it->bidi_it.resolved_level;
if ((it->bidi_it.type & 7) != it->bidi_it.type)
emacs_abort ();
eassert ((it->bidi_it.type & 7) == it->bidi_it.type);
glyph->bidi_type = it->bidi_it.type;
}
++it->glyph_row->used[area];
@ -30437,6 +30541,7 @@ syms_of_xdisp (void)
DEFSYM (Qright_to_left, "right-to-left");
DEFSYM (Qleft_to_right, "left-to-right");
defsubr (&Sbidi_resolved_levels);
#ifdef HAVE_WINDOW_SYSTEM
DEFVAR_BOOL ("x-stretch-cursor", x_stretch_cursor_p,
@ -30843,6 +30948,12 @@ To add a prefix to continuation lines, use `wrap-prefix'. */);
doc: /* Non-nil means don't free realized faces. Internal use only. */);
inhibit_free_realized_faces = 0;
DEFVAR_BOOL ("inhibit-bidi-mirroring", inhibit_bidi_mirroring,
doc: /* Non-nil means don't mirror characters even when bidi context requires that.
Intended for use during debugging and for testing bidi display;
see biditest.el in the test suite. */);
inhibit_bidi_mirroring = 0;
#ifdef GLYPH_DEBUG
DEFVAR_BOOL ("inhibit-try-window-id", inhibit_try_window_id,
doc: /* Inhibit try_window_id display optimization. */);

96392
test/BidiCharacterTest.txt Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,9 @@
2014-10-15 Eli Zaretskii <eliz@gnu.org>
* BidiCharacterTest.txt: New file, from Unicode.
* biditest.el: New file.
2014-10-08 Leo Liu <sdl.web@gmail.com>
* automated/print-tests.el: New file.

121
test/biditest.el Normal file
View File

@ -0,0 +1,121 @@
;;; biditest.el --- test bidi reordering in GNU Emacs display engine.
;; Copyright (C) 2013-2014 Free Software Foundation, Inc.
;; Author: Eli Zaretskii
;; Maintainer: FSF
;; Package: emacs
;; This program is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs. If not, see <http://www.gnu.org/licenses/>.
;;; Commentary:
;; Produce a specially-formatted text file from BidiCharacterTest.txt
;; file that is part of the Unicode Standard's UCD package. The file
;; shows the expected results of reordering according to the UBA. The
;; file is supposed to be visited in Emacs, and the resulting display
;; compared with the expected one.
;;; Code:
(defun biditest-generate-testfile (input-file output-file)
"Generate a bidi test file OUTPUT-FILE from data in INPUT-FILE.
INPUT-FILE should be in the format of the BidiCharacterTest.txt file
available from the Unicode site, as part of the UCD database, see
http://www.unicode.org/Public/UCD/latest/ucd/BidiCharacterTest.txt.
The resulting file should be viewed with `inhibit-bidi-mirroring' set to t."
(let ((output-buf (get-buffer-create "*biditest-output*"))
(lnum 1)
tbuf)
(with-temp-buffer
(message "Generating output in %s ..." output-file)
(setq tbuf (current-buffer))
(insert-file-contents input-file)
(goto-char (point-min))
(while (not (eobp))
(when (looking-at "^\\([0-9A-F ]+\\);\\([012]\\);\\([01]\\);\\([0-9 ]+\\);\\([0-9 ]+\\)$")
(let ((codes (match-string 1))
(default-paragraph (match-string 2))
(resolved-paragraph (match-string 3))
;; FIXME: Should compare LEVELS with what the display
;; engine actually produced.
(levels (match-string 4))
(indices (match-string 5)))
(setq codes (split-string codes " ")
indices (split-string indices " "))
(switch-to-buffer output-buf)
(insert (format "Test on line %d:\n\n" lnum))
;; Force paragraph direction to what the UCD test
;; specifies.
(insert (cond
((string= default-paragraph "0") ;L2R
#x200e)
((string= default-paragraph "1") ;R2L
#x200f)
(t ""))) ; dynamic
;; Insert the characters
(mapc (lambda (code)
(insert (string-to-number code 16)))
codes)
(insert "\n\n")
;; Insert the expected results
(insert "Expected result:\n\n")
;; We want the expected results displayed exactly as
;; specified in the test file, without any reordering, so
;; we override the directional properties of all of the
;; characters in the expected result by prepending
;; LRO/RLO.
(cond ((string= resolved-paragraph "0")
(insert #x200e #x202d))
((string= resolved-paragraph "1")
(insert #x200f #x202e)
;; We need to reverse the list of indices for R2L
;; paragraphs, so that their logical order on
;; display matches user expectations.
(setq indices (nreverse indices))))
(mapc (lambda (index)
(insert (string-to-number
(nth (string-to-number index 10) codes)
16)))
indices)
(insert #x202c) ; end the embedding
(insert "\n\n"))
(switch-to-buffer tbuf))
(forward-line 1)
(setq lnum (1+ lnum)))
(switch-to-buffer output-buf)
(let ((coding-system-for-write 'utf-8-unix))
(write-file output-file))
(message "Generating output in %s ... done" output-file))))
(defun biditest-create-test ()
"Create a test file for testing the Emacs bidirectional display.
The resulting file should be viewed with `inhibit-bidi-mirroring' set to t."
(biditest-generate-testfile (pop command-line-args-left)
(or (pop command-line-args-left)
"biditest.txt")))
;; A handy function for displaying the resolved bidi levels.
(defun bidi-levels ()
"Display the resolved bidirectional levels of characters on current line.
The results can be compared with the levels stated in the
BidiCharacterTest.txt file."
(interactive)
(message "%s" (bidi-resolved-levels)))
(define-key global-map [f8] 'bidi-levels)