mirror of
https://git.FreeBSD.org/ports.git
synced 2025-01-19 08:13:21 +00:00
3a114f05a3
This release adds Unisurrogate, a utility that computes the UTF-16 surrogate decomposition of characters outside the BMP.
25 lines
1.2 KiB
Plaintext
25 lines
1.2 KiB
Plaintext
Uniutils consists of six programs for finding out what is in a Unicode file.
|
|
They are useful when working with Unicode files when one doesn't know the
|
|
writing system, doesn't have the necessary font, needs to inspect invisible
|
|
characters, needs to find out whether characters have been combined or in what
|
|
order they occur, or needs statistics on which characters occur.
|
|
|
|
uniname defaults to printing the character offset of each character, its byte
|
|
offset, its hex code value, its encoding, the glyph itself, and its name.
|
|
|
|
unidesc reports the character ranges to which different portions of the text
|
|
belong. It can also be used to identify Unicode encodings (e.g. UTF-16be)
|
|
flagged by magic numbers.
|
|
|
|
unihist generates a histogram of the characters in its input, which must be
|
|
encoded in UTF-8 Unicode.
|
|
|
|
ExplicateUTF8 is intended for debugging or for learning about Unicode. It
|
|
determines and explains the validity of a sequence of bytes as a UTF8 encoding.
|
|
|
|
Unirev is a filter that reverses UTF-8 strings character-by-character (as
|
|
opposed to byte-by-byte).
|
|
|
|
Unisurrogate takes a codepoint on the command line and, if it falls outside the
|
|
BMP, reports its surrogate decomposition.
|