The latest version of UTS #46 in Unicode 16 has changed the way it
indicates which codepoints are invalid in domain names, causing
'idna-mapping-table' to contain incorrect information, which then breaks
'textsec-domain-suspicious-p' and our test suite. (Bug#73312)
* admin/unidata/unidata-gen.el (unidata-gen-idna-mapping): Check the
IDNA validity field in "IdnaMappingTable.txt" in addition to checking
the status field, as the latter can now be 'valid' for disallowed
codepoints.
* admin/notes/unicode: Point people at "admin/unidata/README" for URLs
for Unicode files.
* admin/unidata/README: Use stable URLs for the various files. Remove
dates, the files self-describe their dates anyway.
* lisp/international/characters.el: Assign 'digit' category to all
the characters whose Unicode 'general-category' is Nd.
* admin/unidata/blocks.awk: Add code to assign 'symbol' category
to all characters belonging to the 'symbol' script.
* etc/NEWS: Announce the above changes
Hoist the `eval-when-compile` to encompass the entire list, since
backquote forms aren't automatically evaluated at compile time.
This results in a single constant list in the generated code, and
much less actual code.
* admin/unidata/Makefile.in (${unidir}/charscript.el,
${unidir}/emoji-zwj.el): Define variables for the required sources and
pass those to awk instead of $^, since the latter includes the awk
script itself.
There is a conflict between forward matching and backward matching
composition rules involving the same codepoint, which can cause the
backward matching ones not to be invoked. Ensure that VS-15 (U+FE0E)
and VS-16 (U+FE0F) are composed by forward matching rules instead in
order to avoid this issue.
* admin/unidata/emoji-zwj.awk: Add rules for CHAR+VS-15 and CHAR+VS-16.
* lisp/composite.el: remove backward matching rule for VS-15. (Bug#63731)
* admin/notes/unicode: Add instructions for emoji-variation-sequences.txt
* admin/unidata/emoji-variation-sequences.txt: New file, imported from
Unicode 15.
* lisp/url/url-cookie.el (url-cookie-write-file):
* lisp/international/titdic-cnv.el (tit-process-header):
* lisp/international/ja-dic-cnv.el (skkdic-convert):
* lisp/international/emoji.el (emoji--generate-file):
* lisp/emacs-lisp/loaddefs-gen.el (loaddefs-generate--rubric)
* admin/unidata/unidata-gen.el (unidata-gen-file)
(unidata-gen-charprop): Use the new functions.
* lisp/emacs-lisp/generate-file.el: New file to provide
convenience functions for generated files. It's not always
trivial to know which parts of the trailer that has to be
obfuscated to avoid not getting byte-compiled etc, and some parts
of the headers/trailers are usually forgotten when hand-coding
these.
* admin/unidata/unidata-gen.el (unidata-file-alist): Update the
default values of 'bidi-class' according to the latest Unicode
Standard.
* admin/notes/unicode: Mention possible changes in
DerivedBidiClass.txt that need to be reflected in unidata-gen.el.
* lisp/international/characters.el (#xfb50, #xfdf0): Fix the
Arabic block characters. (Bug#55256)
* admin/unidata/blocks.awk (name2alias): Map some obscure blocks
to their native scripts, to follow Scripts.txt.
* lisp/international/characters.el (char-script-table): Add
few exceptions.
* lisp/international/fontset.el (script-representative-chars):
Remove scripts no longer used.
* admin/unidata/Makefile.in (${unidir}/uni-scripts.el): Build
uni-scripts.el.
* admin/unidata/Scripts.txt:
* admin/unidata/ScriptExtensions.txt:
* admin/unidata/PropertyValueAliases.txt: New files from Unicode.
* admin/unidata/README: Update.
* admin/unidata/unidata-gen.el (unidata-gen-charprop): Allow
writing other data, too.
(unidata-gen-scripts, unidata-gen--read-script-aliases)
(unidata-gen--insert-file): New functions to parse the Script* files.
* lisp/international/textsec.el: Implement some functions that
work on scripts.
With the recent changes to src/verbose.mk.in, it’s more important
to be consistent about putting AM_V_GEN and similar macros at the
start of a rule’s recipe, since ‘make’ now outputs the diagnostic
before it executes the recipe rather than the shell outputting it.
Most of the uses were already this way, but there were a few
outliers. Problem reported by Pip Cet.
* Makefile.in (${srcdir}/info/dir):
* admin/unidata/Makefile.in (${unidir}/charprop.el, ${unifiles})
(${unidir}/emoji-labels.el):
* lib/Makefile.in (libgnu.a, libegnu.a):
* lisp/Makefile.in (TAGS):
* src/Makefile.in (lisp.mk, Emacs):
* test/Makefile.in (%.log, $(test_module)):
Put AM_V_GEN and similar macros first.
* .gitignore: Ignore the generated emoji-labels.el file.
* admin/unidata/Makefile.in (${unidir}/emoji-labels.el): Generate
the emoji-labels.el file.
(gen-clean): Delete it.
* admin/unidata/README (https): Note the source for the Unicode
file that has emoji categorisations.
* admin/unidata/emoji-test.txt: Import another Unicode file.
* doc/emacs/mule.texi (Input Methods): Document the new key bindings.
* lisp/international/emoji.el: New file.
* lisp/international/mule-cmds.el (ctl-x-map): Bind the emoji
commands.
* admin/unidata/blocks.awk: Remove emoji overrides for codepoints with
Emoji_Presentation = No, they're no longer necessary.
* lisp/composite.el: Remove #xFE0F (VS-16) from the range handled by
`compose-gstring-for-variation-glyph' so it can be handled by
`font_range'.
* src/composite.c (syms_of_composite): New variable
`auto-composition-emoji-eligible-codepoints'.
* admin/unidata/emoji-zwj.awk: Generate value for
`auto-composition-emoji-eligible-codepoints'. Add
`composition-function-table' entries for 'codepoint + U+FE0F' for
them.
* src/font.c (codepoint_is_emoji_eligible): New function to check if
we should try to use the emoji font for a codepoint.
(font_range): Use it.
If the codepoint that triggered composition is from the emoji script,
use the emoji font to check the string being composed, rather than the
font of the first character of the string. This makes e.g.
"emoji codepoint with Emoji_Presentation = No followed by VS-16 (FE0F)"
display the emoji version of the glyph for that codepoint.
* admin/unidata/blocks.awk: Add VS-1 through VS-16 to the emoji
script.
* src/composite.c (autocmp_chars): Accept additional argument CH for
the codepoint that triggered composition, pass it to font_range.
(composition_reseat_it, find_automatic_composition): Pass codepoint
that triggered composition to autocmp_chars.
* src/font.c (font_range): Accept additional argument CH for the
triggering codepoint. If the codepoint is from the 'emoji' script,
use Vscript_representative_chars to find the font to use for the
composition attempt.
(syms_of_font): Add Qemoji symbol.
* src/font.h: Update font_range prototype for argument CH.
* etc/NEWS: Announce change.
Read skin tone modifier sequences from emoji-sequences.txt and add
them to the per-character composition rules derived from
emoji-zwj-sequences.txt, rather than adding them as lookback rules for
the skin tone modifiers. This avoids an issue with the application of
such lookback rules. See Bug#39799,
specifically
<https://lists.gnu.org/archive/html/bug-gnu-emacs/2021-09/msg01878.html>
for more details.
* admin/unidata/Makefile.in (zwj): Add emoji-sequences.txt as a dependency.
* admin/unidata/emoji-zwj.awk: Match RGI_Emoji_ZWJ_Sequence and
RGI_Emoji_Modifier_Sequence. Remove manual addition of skin tone
composition rules with lookback.