It is currently unknown if this will produce a performance increase, but unless it has a significant performance penalty we are going to go forward with this change because it makes it more explicit which values need to be read deeply from other elements (therefore needing to be in the context) vs values that can be bound to the exit matcher since they are only used within the confines of the current element.
I suspect we will get a performance boost since it will be reducing the nodes that need to be walked in the context but maintaining bracket depth count over the entire document instead of only inside elements that need balanced brackets could cost us.
The documentation currently states that the body for these cannot span more than three lines but that is not the behavior I am seeing from emacs in practice. Waiting on a mailing list response to tell me if this is a documentation error or a parser error.
This was previously using the standard docker makefile I use as a starting point for all of my docker makefiles. Now it will properly mount the source directory.
This is not meant to produce publishable or comparable benchmarks. Such a script would have to run many iterations with the input already loaded into memory, proper prioritization via nice/ionice, and have a warm-up phase. This is just automating a basic test I am frequently running to compare parse times when investigating performance issues.
This is an optimization. When you have something like plain text which ends when it hits the next element, we only need to parse enough to detect that an element is about to occur. For elements like plain lists, this is as simple as parsing a line starting with optional whitespace and then a bullet, which avoids parsing the entire plain list tree. The benefit is most noticeable in deeply nested plain lists.
Paragraph's exit matcher which detects elements was causing the plain list parser to exit after the first item was parsed which was causing significant amounts of re-parsing.
This can be done a lot more efficiently now that we are keeping track of this information in the wrapped input type instead of having to fetch to the original document out of the context tree.
I used the wrong word. This is referring to the priority between paragraphs ending vs export snippets ending, not a reference to something occurring in the past.
This is to be a good citizen by not downloading all the rust dependencies every time I run the tests locally. Unfortunately, it will still compile all the dependencies each time, but that is a local operation.
The behavior in emacs does not match the description in the org-mode documentation. I have sent an email to the org-mode mailing list and I am waiting their response so I can adjust (or not adjust) my parser accordingly.
We are including a bunch of files that are not needed for running the rust code. This excludes them to be a better citizen to both crates.io and all users of this package.
This is a relic from the early development days in this repo. When I first started this repo, it was a clean-slate playground to test ideas for solving the road blocks I hit with my previous attempt at an org-mode parser. To keep things simple, I originally only had a very basic set of syntax rules that only vaguely looked similar to org-mode. Once I had things figured out, I kept developing in this repo, morphing it into a full org-mode parser. A couple of references to those early days still remained, and this patch should get rid of the last of them.
Organic is an emacs-less implementation of an [org-mode](https://orgmode.org/) parser.
## Project Status
This project is a personal learning project to grow my experience in [rust](https://www.rust-lang.org/). It is under development and at this time I would not recommend anyone use this code. The goal is to turn this into a project others can use, at which point more information will appear in this README.
## License
This project is released under the public-domain-equivalent [0BSD license](https://www.tldrlegal.com/license/bsd-0-clause-license). This license puts no restrictions on the use of this code (you do not even have to include the copyright notice or license text when using it). HOWEVER, this project has a couple permissively licensed dependencies which do require their copyright notices and/or license texts to be included. I am not a lawyer and this is not legal advice but it is my layperson's understanding that if you distribute a binary with this library linked in, you will need to abide by their terms since their code will also be linked in your binary. I try to keep the dependencies to a minimum and the most restrictive dependency I will ever include is a permissively licensed one.
Organic is an emacs-less implementation of an [[https://orgmode.org/][org-mode]] parser.
* Project Status
This project is a personal learning project to grow my experience in [[https://www.rust-lang.org/][rust]]. It is under development and at this time I would not recommend anyone use this code. The goal is to turn this into a project others can use, at which point more information will appear in this README.
* License
This project is released under the public-domain-equivalent [[https://www.tldrlegal.com/license/bsd-0-clause-license][0BSD license]]. This license puts no restrictions on the use of this code (you do not even have to include the copyright notice or license text when using it). HOWEVER, this project has a couple permissively licensed dependencies which do require their copyright notices and/or license texts to be included. I am not a lawyer and this is not legal advice but it is my layperson's understanding that if you distribute a binary with this library linked in, you will need to abide by their terms since their code will also be linked in your binary. I try to keep the dependencies to a minimum and the most restrictive dependency I will ever include is a permissively licensed one.
@@ -71,10 +72,10 @@ use organic::parser::sexp::sexp_with_padding;
fnis_expect_fail(name: &str)-> Option<&str>{
matchname{
"drawer_drawer_with_headline_inside"=>Some("Apparently lines with :end: become their own paragraph. This odd behavior needs to be investigated more."),
"element_container_priority_footnote_definition_dynamic_block"=>Some("Apparently broken begin lines become their own paragraph."),
"paragraphs_paragraph_with_backslash_line_breaks"=>Some("The text we're getting out of the parse tree is already processed to remove line breaks, so our comparison needs to take that into account."),
"export_snippet_paragraph_break_precedent"=>Some("Emacs 28 has broken behavior so the tests in the CI fail."),
"autogen_greater_element_drawer_drawer_with_headline_inside"=>Some("Apparently lines with :end: become their own paragraph. This odd behavior needs to be investigated more."),
"autogen_element_container_priority_footnote_definition_dynamic_block"=>Some("Apparently broken begin lines become their own paragraph."),
"autogen_lesser_element_paragraphs_paragraph_with_backslash_line_breaks"=>Some("The text we're getting out of the parse tree is already processed to remove line breaks, so our comparison needs to take that into account."),
"autogen_unicode_hearts"=>Some("Unicode is coming out of emacs strange."),
# Savannah does not allow fetching specific revisions, so we're going to have to put unnecessary load on their server by cloning main and then checking out the revision we want.
It might help analysis to record how often we start a specific type of parse for each character. For example, at the start of a plain list, if we had a count of how often each character was the start of a parse of a list we could use that to see how often that list is getting re-parsed.
* Optimizations
** Edit whitespace for list items
Whether or not a list item owns the trailing whitespace depends on if it is the last list item in that list. Since we do not know ahead of time if an item is the last item in the list, we have to either re-parse the list item or modify it after parsing.
*** For
We already are modifying the source of some elements after-the-fact with src_rust{set_source()} so this would be more of the same.
*** Against
I'd like to phase out such modifications because they seem hacky and fragile.
** Make detect element function
Some exit matchers are based on when the next element is found. Some elements do not need to be fully parsed to identify that they are a valid element. For example, src_org{1. foo} can already be identified as the start of a plain list (in the right context) without needing to parse the entire element.
*** For
Avoiding parsing the entire element for an exit matcher would reduce redundant parses.
*** Against
This adds code complexity and introduces the potential for bugs.
How many elements can be reasonably early-detected? For example, src_org{#+begin_src foo} is not enough to detect the start of a source block because without the src_org{#+end_src} it is just plain text.
** Grab multiple characters in plaintext parser before checking exit matcher
Currently we check the exit matcher after each character inside the plain text parser (and many others). Are there character sequences we can assume no exit matcher will trigger between? For example, a contiguous string of latin-alphabet letters?
*** For
This could significantly reduce our calls to exit matchers.
*** Against
I think targets would break this.
The exit matchers are already implicitly building this behavior since they should all exit very early when the starting character is wrong. Putting this logic in a centralized place, far away from where those characters are actually going to be used, is unfortunate for readability.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.