org-mode parser in rust
Go to file
2023-10-02 11:20:43 -04:00
.lighthouse Add the foreign document test to the CI. 2023-09-06 16:00:09 -04:00
docker Fix handling file names with periods before the file extension. 2023-09-30 01:26:24 -04:00
elisp_snippets Only allow specific keywords for affiliated keywords. 2023-08-29 16:56:07 -04:00
notes Add tests for odd headline levels. 2023-09-29 16:37:22 -04:00
org_mode_samples Add more tests. 2023-10-02 10:53:52 -04:00
scripts Update callgrind script to build with optimizations. 2023-09-29 12:02:25 -04:00
src Set is_footnote_section during parsing. 2023-10-02 11:20:43 -04:00
tests Add tests for odd headline levels. 2023-09-29 16:37:22 -04:00
.dockerignore Prefix the automatically generated tests. 2023-08-20 23:53:11 -04:00
.gitignore
build.rs Test org_mode_samples both with and without alphabetical lists enabled. 2023-09-29 15:30:38 -04:00
Cargo.toml Publish version 0.1.8. 2023-09-21 23:47:48 -04:00
LICENSE
Makefile Add a "format" makefile target. 2023-09-21 23:20:22 -04:00
README.md Add a supported versions section to the README. 2023-09-23 14:54:56 -04:00
rustfmt.toml Add an exit matcher to plain text. 2023-04-22 19:46:27 -04:00

Organic - Free Range Org-Mode

Organic is an emacs-less implementation of an org-mode parser.

Project Status

This project is still under HEAVY development. While the version remains v0.1.x the API will be changing often. Once we hit v0.2.x we will start following semver.

Currently, the parser is able to correctly identify the start/end bounds of all the org-mode objects and elements (except table.el tables, org-mode tables are supported) but many of the interior properties are not yet populated.

Project Goals

  • We aim to provide perfect parity with the emacs org-mode parser. In that regard, any document that parses differently between Emacs and Organic is considered a bug.
  • The parser should be fast. We're not doing anything special, but since this is written in Rust and natively compiled we should be able to beat the existing parsers.
  • The parser should have minimal dependencies. This should reduce effort w.r.t.: security audits, legal compliance, portability.
  • The parser should be usable everywhere. In the interest of getting org-mode used in as many places as possible, this parser should be usable by everyone everywhere. This means:
    • It must have a permissive license for use in proprietary code bases.
    • We will investigate compiling to WASM. This is an important goal of the project and will definitely happen, but only after the parser has a more stable API.
    • We will investigate compiling to a C library for native linking to other code. This is more of a maybe-goal for the project.

Project Non-Goals

  • This project will not include an elisp engine since that would drastically increase the complexity of the code. Any features requiring an elisp engine will not be implemented (for example, Emacs supports embedded eval expressions in documents but this parser will never support that).
  • This project is exclusively an org-mode parser. This limits its scope to roughly the output of (org-element-parse-buffer). It will not render org-mode documents in other formats like HTML or LaTeX.

Project Maybe-Goals

  • table.el support. Currently we support org-mode tables but org-mode also allows table.el tables. So far, their use in org-mode documents seems rather uncommon so this is a low-priority feature.
  • Document editing support. I do not anticipate any advanced editing features to make editing ergonomic, but it should be relatively easy to be able to parse an org-mode document and serialize it back into org-mode. This would enable cool features to be built on top of the library like auto-formatters. To accomplish this feature, We'd have to capture all of the various separators and whitespace that we are currently simply throwing away. This would add many additional fields to the parsed structs and it would add more noise to the parsers themselves, so I do not want to approach this feature until the parser is more complete since it would make modifications and refactoring more difficult.

Supported Versions

This project targets the version of Emacs and Org-mode that are built into the organic-test docker image. This is newer than the version of Org-mode that shipped with Emacs 29.1. The parser itself does not depend on Emacs or Org-mode though, so this only matters for development purposes when running the automated tests that compare against upstream Org-mode.

Using this library

TODO: Add section on using Organic as a library (which is the intended use for this project). This will be added when we have a bit more API stability since currently the library is under heavy development.

Development

The parse binary

This program takes org-mode input either streamed in on stdin or as paths to files passed in as arguments. It then parses them using Organic and dumps the result to stdout. This program is intended solely as a development tool. Examples:

cat /foo/bar.org | cargo run --bin parse
cargo build --profile release-lto
./target/release-lto/parse /foo/bar.org /lorem/ipsum.org

The compare binary

This program takes org-mode input either streamed in on stdin or as paths to files passed in as arguments. It then parses them using Organic and the official Emacs Org-mode parser and compares the parse result. This program is intended solely as a development tool. Since org-mode is a moving target, it is recommended that you run this through docker since we pin the version of org-mode to a specific revision. Examples:

cat /foo/bar.org | ./scripts/run_docker_compare.bash
./scripts/run_docker_compare.bash /foo/bar.org /lorem/ipsum.org

Not recommended since it is not through docker:

cat /foo/bar.org | cargo run --features compare --bin compare
cargo build --profile release-lto --features compare
./target/release-lto/compare /foo/bar.org /lorem/ipsum.org

Running the tests

There are three levels of tests for this repository: the standard tests, the autogenerated tests, and the foreign document tests.

The standard tests

These are regular hand-written rust tests. These can be run with:

make unittest

The auto-generated tests

These tests are automatically generated from the files in the org_mode_samples directory and they are still integrated with the rust/cargo testing framework. For each org-mode document in that folder, a test is generated that will parse the document with both Organic and the official Emacs Org-mode parser and then it will compare the parse results. Any deviation is considered a failure. Since org-mode is a moving target, it is recommended that you run these tests inside docker since the organic-test docker image is pinned to a specific revision of org-mode. These can be run with:

make dockertest

The foreign document tests

These tests function the same as the auto-generated tests except they are not integrated with the rust/cargo testing framework and they involve comparing the parse of org-mode documents that live outside this repository. This allows us to test against a far greater variety of org-mode input documents without pulling massive sets of org-mode documents into this repository. The recommended way to run these tests is still through docker because it pins org-mode and the test documents to specific git revisions. These can be run with:

make foreign_document_test

License

This project is released under the public-domain-equivalent 0BSD license, however, this project has a couple permissively licensed non-public-domain-equivalent dependencies which require their copyright notices and/or license texts to be included. I am not a lawyer and this is not legal advice but it is my layperson's understanding that if you distribute a binary statically linking this library, you will need to abide by their terms since their code will also be linked in your binary.