organic/notes/optimization_ideas.org

* Analysis
** Parse start per character
It might help analysis to record how often we start a specific type of parse for each character. For example, at the start of a plain list, if we had a count of how often each character was the start of a parse of a list we could use that to see how often that list is getting re-parsed.
* Optimizations
** Edit whitespace for list items
Whether or not a list item owns the trailing whitespace depends on if it is the last list item in that list. Since we do not know ahead of time if an item is the last item in the list, we have to either re-parse the list item or modify it after parsing.

*** For
We already are modifying the source of some elements after-the-fact with src_rust{set_source()} so this would be more of the same.
*** Against
I'd like to phase out such modifications because they seem hacky and fragile.
** Make detect element function
Some exit matchers are based on when the next element is found. Some elements do not need to be fully parsed to identify that they are a valid element. For example, src_org{1. foo} can already be identified as the start of a plain list (in the right context) without needing to parse the entire element.
*** For
Avoiding parsing the entire element for an exit matcher would reduce redundant parses.
*** Against
This adds code complexity and introduces the potential for bugs.

How many elements can be reasonably early-detected? For example, src_org{#+begin_src foo} is not enough to detect the start of a source block because without the src_org{#+end_src} it is just plain text.
** Grab multiple characters in plaintext parser before checking exit matcher
Currently we check the exit matcher after each character inside the plain text parser (and many others). Are there character sequences we can assume no exit matcher will trigger between? For example, a contiguous string of latin-alphabet letters?
*** For
This could significantly reduce our calls to exit matchers.
*** Against
I think targets would break this.

The exit matchers are already implicitly building this behavior since they should all exit very early when the starting character is wrong. Putting this logic in a centralized place, far away from where those characters are actually going to be used, is unfortunate for readability.
** Use exit matcher to cut off trailing whitespace instead of re-matching in plain lists.
Add notes about optimization ideas. 2023-08-15 02:45:16 +00:00			`* Analysis`
			`** Parse start per character`
			`It might help analysis to record how often we start a specific type of parse for each character. For example, at the start of a plain list, if we had a count of how often each character was the start of a parse of a list we could use that to see how often that list is getting re-parsed.`
			`* Optimizations`
			`** Edit whitespace for list items`
			`Whether or not a list item owns the trailing whitespace depends on if it is the last list item in that list. Since we do not know ahead of time if an item is the last item in the list, we have to either re-parse the list item or modify it after parsing.`

			`*** For`
			`We already are modifying the source of some elements after-the-fact with src_rust{set_source()} so this would be more of the same.`
			`*** Against`
			`I'd like to phase out such modifications because they seem hacky and fragile.`
			`** Make detect element function`
			`Some exit matchers are based on when the next element is found. Some elements do not need to be fully parsed to identify that they are a valid element. For example, src_org{1. foo} can already be identified as the start of a plain list (in the right context) without needing to parse the entire element.`
			`*** For`
			`Avoiding parsing the entire element for an exit matcher would reduce redundant parses.`
			`*** Against`
			`This adds code complexity and introduces the potential for bugs.`

			`How many elements can be reasonably early-detected? For example, src_org{#+begin_src foo} is not enough to detect the start of a source block because without the src_org{#+end_src} it is just plain text.`
			`** Grab multiple characters in plaintext parser before checking exit matcher`
			`Currently we check the exit matcher after each character inside the plain text parser (and many others). Are there character sequences we can assume no exit matcher will trigger between? For example, a contiguous string of latin-alphabet letters?`
			`*** For`
			`This could significantly reduce our calls to exit matchers.`
			`*** Against`
			`I think targets would break this.`

			`The exit matchers are already implicitly building this behavior since they should all exit very early when the starting character is wrong. Putting this logic in a centralized place, far away from where those characters are actually going to be used, is unfortunate for readability.`
Add an optimization idea. 2023-09-14 07:25:12 +00:00			`** Use exit matcher to cut off trailing whitespace instead of re-matching in plain lists.`