call EXTEND_RANGE_TABLE and return a proper value.
(set_image_of_range): Don't call set_image_of_range_1
if no TRANSLATE or if range includes all of Latin-1.
Only call it for the Latin-1 part of the range.
For other cases, make two separate ranges,
one for the original specified characters and one for
their case-conversions.
(set_image_of_range): Use set_image_of_range_1 for Latin-1.
Return a value to indicate running out of memory.
(SET_RANGE_TABLE_WORK_AREA): Check value from set_image_of_range.
(extend_range_table_work_area): New subroutine.
(EXTEND_RANGE_TABLE): Replaces EXTEND_RANGE_TABLE_WORK_AREA.
Different calling conventions, and used from set_image_of_range{,_1}.
(IMMEDIATE_QUIT_CHECK): Definitions moved.
(PATFETCH_RAW): Rename to PATFETCH.
(set_image_of_range): New fun.
(SET_RANGE_TABLE_WORK_AREA): Use it.
(regex_compile): Don't translate the pattern chars so eagerly.
Only do it when inserting an `exactn' bytecode or when handling a char-range.
(mutually_exclusive_p): Avoid empty statement.
(CHECK_INFINITE_LOOP): Use DISCARD_FAILURE_REG_OR_COUNT
when jumping to `fail' to avoid undoing reg changes in the
last iteration of the loop.
(GET_UNSIGNED_NUMBER): Skip spaces around the number.
Also change several `int' into `re_wchar_t'.
(PATTERN_STACK_EMPTY, PUSH_PATTERN_OP, POP_PATTERN_OP): Remove.
(PUSH_FAILURE_POINTER): Don't cast any more.
(POP_FAILURE_REG_OR_COUNT): Remove the cast that strips `const'.
We want GCC to complain, since this piece of code makes
re_match non-reentrant, which *should* be fixed.
(GET_BUFFER_SPACE): Use size_t rather than unsigned long.
(EXTEND_BUFFER): Use RETALLOC.
(SET_LIST_BIT): Don't cast.
(re_wchar_t): New type.
(re_iswctype, re_wctype_to_bit): Make it crystal clear to GCC
that those two functions will always properly return.
(IMMEDIATE_QUIT_CHECK): Cast to void.
(analyse_first): Use recursion rather than an explicit stack.
(re_compile_fastmap): Can't fail anymore.
(re_search_2): Don't check re_compile_fastmap for failure.
(PUSH_NUMBER): Renamed from PUSH_FAILURE_COUNT.
Now also sets the new value (passed in a new argument).
(re_match_2_internal): Use it.
Also, use a new var `reg' of type size_t when looping through regs
rather than reuse the inappropriate `mcnt'.
(btowc, iswctype, wctype) [_LIBC]: Redefine to __<fun>.
(BIT_ALPHA, BIT_ALNUM, BIT_ASCII, BIT_NONASCII, BIT_GRAPH, BIT_PRINT)
(BIT_UNIBYTE): Remove.
(re_match_2_internal): Delete corresponding code and streamline the
BIT_MULTIBYTE case to not bother checking ISUNIBYTE.
(CHAR_CLASS_MAX_LENGTH) [!WIDE_CHAR_SUPPORT]: Set to 9 rather than 6.
(re_wctype_t): New type.
(re_wctype, re_iswctype, re_wctype_to_bit): New functions.
(regex_compile): Use them and fix handling of overly long char classes.
(struct re_pattern_buffer): Remove newline_anchor.
* regex.c: Keep namespace clean for GNU libc by renaming <fun>
to __<fun> and using `weak_alias (__<fun>, <fun>)'.
(re_max_failures, fail_stack): Use size_t rather than unsigned.
(regex_compile): For ^ and $, choose between buffer and line (beg|end)
depending on the new RE_NO_NEWLINE_ANCHOR syntax flag.
(print_compiled_pattern, re_search_2, mutually_exclusive_p)
(re_match_2_internal, re_compile_pattern, re_comp, regcomp):
Get rid of references to newline_anchor.
(regcomp): Allocate and precompute a fastmap.
(bcopy, bcmp, REGEX_REALLOCATE, re_match_2_internal):
Use memcmp and memcpy instead of bcopy and bcmp.
(init_syntax_once): Use ISALNUM.
(PUSH_FAILURE_POINT, re_match_2_internal): Remove failure_id.
(REG_UNSET_VALUE): Remove. Use NULL instead.
(REG_UNSET, re_match_2_internal): Use NULL.
(SET_HIGH_BOUND, MOVE_BUFFER_POINTER, ELSE_EXTEND_BUFFER_HIGH_BOUND):
New macros.
(EXTEND_BUFFER): Use them (to work with BOUNDED_POINTERS).
(GET_UNSIGNED_NUMBER): Don't use ISDIGIT.
(regex_compile): In handle_interval, return an error rather than try to
unfetch the interval if we can't find the closing brace.
Obey the RE_NO_GNU_OPS syntax bit.
(TOLOWER): New macro.
(regcomp): Use it.
(regexec): Allocate regs.start and regs.end as one block.
(PTR_TO_OFFSET, POS_AS_IN_BUFFER): Move to a better place.
(ISDIGIT, ISCNTRL, ISXDIGIT) [!emacs]: Remove duplicate definition.
(regex_compile): Use RE_FRUGAL instead of RE_ALL_GREEDY.
(re_compile_pattern): Use size_t for length.
(init_syntax_once): Move to a better place.
* regex.h: Merge changes from GNU libc. Indent cpp directives.
(RE_FRUGAL): Replaces RE_ALL_GREEDY (inverted meaning).
(POP_FAILURE_REG_OR_COUNT): Renamed from POP_FAILURE_REG.
Handle popping of a register's or a counter's data.
(POP_FAILURE_POINT): Use the new name.
(re_match_2_internal): Push counter data on the stack for succeed_n,
jump_n and set_number_at and remove misleading dead code in succeed_n.
(RE_MULTIBYTE_P, RE_STRING_CHAR_AND_LENGTH): New macros.
(GET_CHAR_BEFORE_2): Moved from charset.h plus fixed minor bug when
we are between str1 and str2.
(MAX_MULTIBYTE_LENGTH, CHAR_STRING) [!emacs]: Provide trivial default.
(PATFETCH): Use `TRANSLATE'.
(PATFETCH_RAW): Fetch multibyte char if applicable.
(PATUNFETCH): Remove.
(regex_compile): Rely on PATFETCH to do most of the multibyte magic.
When writing a char, write it directly into the pattern buffer rather
than going needlessly through a temp char-array.
(re_match_2_internal): Similarly, rely on RE_STRING_CHAR to do the
multibyte magic and remove the useless `#ifdef emacs'.
(bcmp_translate): Don't compare as multibyte chars when in a unibyte
buffer.
* regex.h (struct re_pattern_buffer): Make field `multibyte'
conditional on `emacs'.
* charset.h (GET_CHAR_BEFORE_2): Moved to regex.c.
of re_compile_fastmap and generalizing it a little bit so that it
can also just return whether a given (sub)pattern can match the empty
string or not.
(regex_compile): Use `analyse_first' to decide whether the loop-check
needs to be done or not for *, +, *? and +? (the loop check is costly
for non-greedy repetition).
(re_compile_fastmap): Delegate the actual work to `analyse_first'.
(enum re_opcode_t): Update description of succeed_n.
(PATFETCH): Always define.
(regex_compile): Use lookahead rather than PATUNFETCH (for repetition
operators, char classes, shy-groups and intervals).
Optimize special cases of intervals so as to only use succeed_n and
jump_n when really needed.
(re_compile_fastmap): Simplify handling of jump_n and succeed_n now
that we don't have to handle the special cases any more.
Simplify on_failure_jump handling as well.
(print_partial_compiled_pattern, re_compile_fastmap): Handle new opcode.
(regex_compile): Use on_failure_jump_nastyloop for non-greedy loops.
(re_match_2_internal): Add code for on_failure_jump_nastyloop when
executing it as well as when popping it off the stack to find infinite
loops in non-greedy repetition operators.
definitions for non-Emacs compilation.
(enum re_opcode_t): Remove (not)wordchar and move (not)syntaxspec
outside of `#ifdef emacs'.
(print_partial_compiled_pattern): Update.
(regex_compile): Use (not)syntaxspec(Sword) instead of (not)wordchar.
(re_compile_fastmap): Merge handling of charset and charset_not (for
emacs and non-emacs compilation as well).
Similarly for (not)categoryspec and (not)syntaxspec.
Don't use the fastmap when reaching `anychar' since the added
complexity is not justified.
(re_match_2_internal): Merge (not)wordchar (emacs and non-emacs)
and (not)syntaxspec. Merge (not)categoryspec.
(GET_CHAR_AFER_2): Remove.
(RE_TRANSLATE, RE_TRANSLATE_P): New macros moved from regex.h.
(enum re_opcode_t): Remove on_failure_jump_exclusive.
(print_partial_compiled_pattern, re_compile_fastmap)
(re_match_2_internal): Remove on_failure_jump_exclusive.
(regex_compile): Turn optimizable P+ loops into PP*, so that the
optimization only need to work for * (ie. can use of_keep_string_jump).
Remove the special case for .*\n since it is now covered by the general
optimization.
(re_search_2): Don't bother with `room'.
(skip_one_char): New function.
(skip_noops): Simplify since `memory' is not needed any more.
(mutually_exclusive_p): Restructure slightly to use `switch' and
add handling for "all" remaining cases.
(re_match_2_internal): Change on_failure_jump_smart to use
on_failure_keep_string_jump (and redirect the end-of-loop jump)
rather than on_failure_jump_exclusive.
POINTER_TO_OFFSET gives the same value before and after PREFETCH.
Use `dfail' to guarantee "atomic" matching.
(PTR_TO_OFFSET): Use POINTER_TO_OFFSET.
(debug): Now only active if > 0 rather than if != 0.
(DEBUG_*): Update for the new meaning of `debug'.
(print_partial_compiled_pattern): Add missing `succeed' case.
Use CHARSET_* macros in the charset(_not) branch.
Fix off-by-two bugs in `succeed_n', `jump_n' and `set_number_at'.
(store_op1, store_op2, insert_op1, insert_op2)
(at_begline_loc_p, at_endline_loc_p): Add prototype.
(group_in_compile_stack): Move to after its arg's types are declared
and add a prototype.
(PATFETCH): Define in terms of PATFETCH_RAW.
(GET_UNSIGNED_NUMBER): Add the usual `do { ... } while(0)' wrapper.
(QUIT): Redefine as a nop except for NTemacs.
(regex_compile): Handle intervals {,M} as if it was {0,M}.
Fix indentation of the greedy-op and shy-group code.
(at_(beg|end)line_loc_p): Fix argument's types.
(re_compile_fastmap): Ifdef out failure_stack_ptr to shut up gcc.
(re_search_2): Use POS_AS_IN_BUFFER. Simplify `room' computation.
(MATCHING_IN_FIRST_STRING): Remove.
(re_match_2): Use POS_AS_IN_BUFFER.
Ifdef out failure_stack_ptr to shut up gcc.
Use FIRST_STRING_P and POINTER_TO_OFFSET.
Use QUIT unconditionally.
string char type. It's `const unsigned char' to match the rest of Emacs.
Consistently make sure all pointers to strings use it and make sure all
pointers into the pattern use `unsigned char'.
(re_match_2_internal): Use `PREFETCH+STRING_CHAR' instead of
GET_CHAR_AFTER_2.
Also merge wordbound and notwordbound to reduce code duplication.
* charset.h (GET_CHAR_AFTER_2): Remove.
(GET_CHAR_BEFORE_2): Use unsigned chars, like everywhere else.
by bugs revealed when trying to add shy-groups. Overall, what happened
is that loops are now structured a little differently, groups can be
shy and the code is a little simpler.
(enum re_opcode_t): Remove jump_past_alt, maybe_pop_jump,
push_dummy_failure and dumy_failure_jump.
Add on_failure_jump_(exclusive, loop and smart).
Also fix the comment for (start|stop)_memory since they now only take
one argument (the second has becomes unnecessary).
(print_partial_compiled_pattern): Adjust for changes in re_opcode_t.
(print_compiled_pattern): Use %ld to printf long ints and flush to make
debugging a little easier.
(union fail_stack_elt): Make the integer unsigned.
(struct fail_stack_type): Add a `frame' element.
(INIT_FAIL_STACK): Init `frame' as well.
(POP_PATTERN_OP): New macro for re_compile_fastmap.
(DEBUG_PUSH, DEBUG_POP): Remove.
(NUM_REG_ITEMS): Remove.
(NUM_NONREG_ITEMS): Adjust.
(FAILURE_PAT, FAILURE_STR, NEXT_FAILURE_HANDLE, TOP_FAILURE_HANDLE):
New macros for the cycle detection.
(ENSURE_FAIL_STACK): New macro for PUSH_FAILURE_(REG|POINT).
(PUSH_FAILURE_REG, POP_FAILURE_REG, CHECK_INFINITE_LOOP): New macros.
(PUSH_FAILURE_POINT): Don't push registers any more.
The pattern address pushed is not the destination of the jump
but the source of it instead.
(NUM_FAILURE_ITEMS): Remove.
(POP_FAILURE_POINT): Adapt to the new stack structure (i.e. pop
registers before the actual failure point).
Don't hardcode any meaning for str==NULL anymore.
(union register_info_type, REG_MATCH_NULL_STRING_P, IS_ACTIVE)
(MATCHED_SOMETHING, EVER_MATCHED_SOMETHING, SET_REGS_MATCHED): Remove.
(REG_UNSET_VALUE): Use NULL (why not?).
(compile_range): Remove declaration since it doesn't exist.
(struct compile_stack_elt_t): Remove inner_group_offset.
(old_reg(start|end), reg_info, reg_dummy, reg_info_dummy): Remove.
(regex_grow_registers): Remove dead code.
(FIXUP_ALT_JUMP): New macro.
(regex_compile): Add shy-groups
Change loops to use on_failure_jump_smart&jump instead of
on_failure_jump&maybe_pop_jump.
Change + loops to eliminate the initial (dummy_failure_)jump.
Remove c1_base (looks like unused variable to me).
Use `jump' instead of `jump_past_alt' and don't bother with
push_dummy_failure in alternatives since it is now unnecessary.
Use FIXUP_ALT_JUMP.
Eliminate a useless `#ifdef emacs' for (re)allocating the stack.
(re_compile_fastmap): Remove dead variables i and num_regs.
Exit from loop when bufp->can_be_null rather than jumping to `done'.
Avoid jumping backwards so as to ensure termination.
Use PATTERN_STACK_EMPTY and POP_PATTERN_OP.
Improved handling of backreferences.
Remove dead code in handling of `anychar'.
(skip_noops, mutually_exclusive_p): New functions taken from the
handling of `maybe_pop_jump' in re_match_2_internal.
Slightly improve mutually_exclusive_p to handle ".+\n".
((lowest|highest)_active_reg, NO_(LOWEST|HIGHEST)_ACTIVE_REG)
Remove.
(re_match_2_internal): Use %p instead of 0x%x when printf'ing ptrs.
Don't SET_REGS_MATCHED anymore. Remove many dead variables.
Push register (in `start_memory') on the stack rather than storing it
in old_reg(start|end).
Remove the cycle detection from `stop_memory', replaced by the use
of on_failure_jump_loop for greedy loops.
Add code for the new on_failure_jump_<foo>.
Remove ad-hoc code in `on_failure_jump' to push more registers
in the case of a loop.
Take out code from `maybe_pop_jump' into separate functions and
adapt it to the semantics of `on_failure_jump_smart'.
Remove jump_past_alt, dummy_failure_jump and push_dummy_failure.
Remove dummy_failure handling and handling of `failures to jump
to on_failure_jump' (this last one was already dead code, it seems).
((group|alt|common_op)_match_null_string_p): Remove.
* regex.c (regex_compile): Adjusted for the change of CHAR_STRING.
1999-12-04 Stefan Monnier <monnier@cs.yale.edu>
* regex.c (regex_compile): Recognize *?, +? and ?? as non-greedy
operators and handle them properly.
* regex.h (RE_ALL_GREEDY): New option.
(RE_UNMATCHED_RIGHT_PAREN_ORD): Moved to the end where alphabetic
sorting would put it.
(RE_SYNTAX_AWK, RE_SYNTAX_GREP, RE_SYNTAX_EGREP)
(_RE_SYNTAX_POSIX_COMMON): Use the new option to keep old behavior.