Compile the elisp ahead of time so it is not done on every docker container launch.

Update org-mode version.
2023-09-20 02:30:58 -04:00 · 2023-09-16 14:46:02 -04:00 · 2023-09-08 13:11:28 -04:00 · 2023-08-31 21:21:14 -04:00 · 2023-08-31 20:23:00 -04:00 · 2023-08-31 15:18:25 -04:00
17 changed files with 946 additions and 159 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,7 @@
+**/.git
+target/
+docker/
+LICENSE
+readme/
+README.md
+notes/
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -394,6 +394,7 @@ dependencies = [
 "nom",
 "serde",
 "tokio",
+ "tower",
 "tower-http",
 ]

--- a/Cargo.toml
+++ b/Cargo.toml
@@ -8,4 +8,10 @@ axum = { git = "https://github.com/tokio-rs/axum.git", rev = "52a90390195e884bcc
 nom = "7.1.1"
 serde = { version = "1.0.183", features = ["derive"] }
 tokio = { version = "1.30.0", default-features = false, features = ["macros", "process", "rt", "rt-multi-thread"] }
-tower-http = { version = "0.4.3", features = ["fs"] }
+tower = "0.4.13"
+tower-http = { version = "0.4.3", features = ["fs", "set-header"] }
+
+[profile.release-lto]
+inherits = "release"
+lto = true
+strip = "symbols"
--- a/README.md
+++ b/README.md
@@ -0,0 +1,33 @@
+# Org-Mode AST Investigation Tool
+This repository contains a slapdash tool to make visualizing the abstract syntax tree of an org-mode document easier. Write your org-mode source into the top text box, and below on the right it will create a clickable tree of the AST. When you click on a node, the contents of that node will be highlighted on the left.
+
+![Screenshot showing the interface to the org-mode abstract syntax tree investigation tool.](readme/screenshot.png?raw=true "Org-mode investigation tool interface")
+
+## Running
+Running in docker is the recommended way to run this. It creates a consistent working environment, without impacting (or requiring you to install) emacs, org-mode, or rust.
+### Docker
+First we need to build the docker container. On the first run, this will pull the emacs and org-mode source code so this build will take a while the first time. After that, subsequent builds should be fast because docker caches the layers.
+
+```bash
+# from the root of this repository:
+make --directory=docker
+```
+
+Next we need to launch the server:
+```bash
+docker run --init --rm --publish 3000:3000/tcp --read-only --mount type=tmpfs,destination=/tmp org-investigation
+```
+
+This launches a server listening on port 3000, so pop open your browser to http://127.0.0.1:3000/ to access the web interface.
+
+(alternatively, you can run the `scripts/launch_docker.bash` script which performs these two steps.)
+### No docker
+You will need a fully functional rust setup with nightly installed (due to the use of exit_status_error). Then from the root of this repo you can launch the server by running:
+
+```bash
+cargo run --release
+```
+
+It will use your installed version of emacs and org-mode which may differ from what the docker users are using.
+
+This launches a server listening on port 3000, so pop open your browser to http://127.0.0.1:3000/ to access the web interface.
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -0,0 +1,44 @@
+FROM alpine:3.17 AS build
+RUN apk add --no-cache build-base musl-dev git autoconf make texinfo gnutls-dev ncurses-dev gawk libgccjit-dev
+
+
+FROM build AS build-emacs
+ARG EMACS_VERSION=emacs-29.1
+RUN git clone --depth 1 --branch $EMACS_VERSION https://git.savannah.gnu.org/git/emacs.git /root/emacs
+WORKDIR /root/emacs
+RUN mkdir /root/dist
+RUN ./autogen.sh
+RUN ./configure --prefix /usr --without-x --without-sound --with-native-compilation=aot
+RUN make
+RUN make DESTDIR="/root/dist" install
+
+
+FROM build AS build-org-mode
+ARG ORG_VERSION=c703541ffcc14965e3567f928de1683a1c1e33f6
+COPY --from=build-emacs /root/dist/ /
+RUN mkdir /root/dist
+# Savannah does not allow fetching specific revisions, so we're going to have to put unnecessary load on their server by cloning main and then checking out the revision we want.
+RUN git clone https://git.savannah.gnu.org/git/emacs/org-mode.git /root/org-mode && git -C /root/org-mode checkout $ORG_VERSION
+# RUN mkdir /root/org-mode && git -C /root/org-mode init --initial-branch=main && git -C /root/org-mode remote add origin https://git.savannah.gnu.org/git/emacs/org-mode.git && git -C /root/org-mode fetch origin $ORG_VERSION && git -C /root/org-mode checkout FETCH_HEAD
+WORKDIR /root/org-mode
+RUN make compile
+RUN make DESTDIR="/root/dist" install
+
+
+FROM rustlang/rust:nightly-alpine3.17 AS build-org-investigation
+RUN apk add --no-cache musl-dev
+RUN mkdir /root/org-investigation
+WORKDIR /root/org-investigation
+COPY . .
+RUN CARGO_TARGET_DIR=/target cargo build --profile release-lto
+
+
+FROM alpine:3.17 AS run
+ENV LANG=en_US.UTF-8
+RUN apk add --no-cache ncurses gnutls libgccjit
+COPY --from=build-emacs /root/dist/ /
+COPY --from=build-org-mode /root/dist/ /
+COPY --from=build-org-investigation /target/release-lto/org_ownership_investigation /usr/bin/
+COPY static /opt/org-investigation/static
+WORKDIR /opt/org-investigation
+CMD ["/usr/bin/org_ownership_investigation"]
--- a/docker/Makefile
+++ b/docker/Makefile
@@ -0,0 +1,9 @@
+IMAGE_NAME:=org-investigation
+
+.PHONY: build
+build:
+	docker build -t $(IMAGE_NAME) -f Dockerfile ../
+
+.PHONY: clean
+clean:
+	docker rmi $(IMAGE_NAME)
--- a/notes/plain_list_ownership_notes.org
+++ b/notes/plain_list_ownership_notes.org
@@ -0,0 +1,130 @@
+* Test 1
+** Source
+#+begin_src org
+  1. foo
+
+     1. bar
+
+     2. baz
+
+  2. lorem
+
+  ipsum
+#+end_src
+** Ownership
+This table is just showing ownership for the plain list items, not the containing plain list nor the elements inside each item.
+
+| Plain List *Item*      | Owns trailing blank lines |
+|------------------------+---------------------------|
+| foo (includes bar baz) | Yes                       |
+| bar                    | Yes                       |
+| baz                    | Yes                       |
+| lorem                  | No                        |
+** Analysis
+In this test case, we see that the only list item that doesn't own its trailing blank lines is "lorem", the final list item of the outer-most list.
+* Test 2
+We add "cat" as a paragraph at the end of foo which makes "baz" lose its trailing blank lines.
+** Source
+#+begin_src org
+  1. foo
+
+     1. bar
+
+     2. baz
+
+     cat
+
+  2. lorem
+
+  ipsum
+#+end_src
+** Ownership
+| Plain List *Item*             | Owns trailing blank lines |
+|-------------------------------+---------------------------|
+| foo -> cat (includes bar baz) | Yes                       |
+| bar                           | Yes                       |
+| baz                           | No                        |
+| lorem                         | No                        |
+** Analysis
+In isolation, this implies that the final plain list item does not own its trailing blank lines, which conflicts with "baz" from test 1.
+
+New theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a list item.
+
+| Plain List *Item*             | Owns trailing blank lines | Why                                                       |
+|-------------------------------+---------------------------+-----------------------------------------------------------|
+| foo -> cat (includes bar baz) | Yes                       | Not the final list item                                   |
+| bar                           | Yes                       | Not the final list item                                   |
+| baz                           | No                        | Final item of bar->baz and not the final element of "foo" |
+| lorem                         | No                        | Final item of foo->lorem and not contained in a list item |
+* Test 3
+So if that theory is true, taking the entire (foo -> lorem) list from test 1 and nesting it inside a list should coerce "lorem" to own its trailing blank lines since it would then be a final list item (of foo -> lorem) and the final element of the new list.
+** Source
+#+begin_src org
+  1. cat
+     1. foo
+
+        1. bar
+
+        2. baz
+
+     2. lorem
+
+  ipsum
+#+end_src
+** Ownership
+| Plain List *Item*           | Owns trailing blank lines |
+|-----------------------------+---------------------------|
+| cat (includes foo -> lorem) | No                        |
+| foo (includes bar baz)      | Yes                       |
+| bar                         | Yes                       |
+| baz                         | Yes                       |
+| lorem                       | No                        |
+** Analysis
+Against expectations, we did not coerce lorem to consume its trailing blank lines. What is different between "baz" and "lorem"? Well, "baz" is contained within "foo" which has a "lorem" after it, whereas "lorem" is contained within "cat" which does not have any list items after it.
+
+New theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a non-final list item.
+| Plain List *Item*           | Owns trailing blank lines | Why                                                  |
+|-----------------------------+---------------------------+------------------------------------------------------|
+| cat (includes foo -> lorem) | No                        | Final list item and not contained in a list item     |
+| foo (includes bar baz)      | Yes                       | Not the final list item                              |
+| bar                         | Yes                       | Not the final list item                              |
+| baz                         | Yes                       | Final element of non-final list item                 |
+| lorem                       | No                        | Final list item and final element of final list item |
+* Test 4
+So if that theory is true, then we should be able to coerce lorem to consume its trailing blank lines by adding a second item to the cat list.
+** Source
+#+begin_src org
+  1. cat
+     1. foo
+
+        1. bar
+
+        2. baz
+
+     2. lorem
+
+  2. dog
+
+  ipsum
+#+end_src
+** Ownership
+| Plain List *Item*           | Owns trailing blank lines |
+|-----------------------------+---------------------------|
+| cat (includes foo -> lorem) | Yes                       |
+| foo (includes bar baz)      | Yes                       |
+| bar                         | Yes                       |
+| baz                         | Yes                       |
+| lorem                       | Yes                       |
+| dog                         | No                        |
+** Analysis
+For the first time our expectations were met!
+
+Enduring theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a non-final list item.
+| Plain List *Item*           | Owns trailing blank lines | Why                                              |
+|-----------------------------+---------------------------+--------------------------------------------------|
+| cat (includes foo -> lorem) | Yes                       | Not the final list item                          |
+| foo (includes bar baz)      | Yes                       | Not the final list item                          |
+| bar                         | Yes                       | Not the final list item                          |
+| baz                         | Yes                       | Final element of non-final list item             |
+| lorem                       | Yes                       | Final element of non-final list item             |
+| dog                         | No                        | Final list item and not contained in a list item |
--- a/readme/screenshot.png
+++ b/readme/screenshot.png
--- a/scripts/launch_docker.bash
+++ b/scripts/launch_docker.bash
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+#
+set -euo pipefail
+IFS=$'\n\t'
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+function main {
+    make --directory "$DIR/../docker"
+    exec docker run --init --rm --read-only --mount type=tmpfs,destination=/tmp --publish 3000:3000/tcp org-investigation
+}
+
+main "${@}"
--- a/src/main.rs
+++ b/src/main.rs
@@ -1,36 +1,57 @@
 #![feature(exit_status_error)]
+use axum::http::header::CACHE_CONTROL;
+use axum::http::HeaderValue;
+use axum::response::IntoResponse;
 use axum::{http::StatusCode, routing::post, Json, Router};
-use owner_tree::{build_owner_tree, OwnerTree};
-use parse::emacs_parse_org_document;
+use owner_tree::build_owner_tree;
+use parse::{emacs_parse_org_document, get_emacs_version};
+use tower::ServiceBuilder;
 use tower_http::services::{ServeDir, ServeFile};
+use tower_http::set_header::SetResponseHeaderLayer;
+
+use crate::parse::get_org_mode_version;

 mod error;
 mod owner_tree;
 mod parse;
+mod rtrim_iterator;
 mod sexp;

 #[tokio::main]
-async fn main() {
-    let serve_dir = ServeDir::new("static").not_found_service(ServeFile::new("static/index.html"));
+async fn main() -> Result<(), Box<dyn std::error::Error>> {
+    let static_files_service = {
+        let serve_dir =
+            ServeDir::new("static").not_found_service(ServeFile::new("static/index.html"));
+
+        ServiceBuilder::new()
+            .layer(SetResponseHeaderLayer::if_not_present(
+                CACHE_CONTROL,
+                HeaderValue::from_static("public, max-age=120"),
+            ))
+            .service(serve_dir)
+    };
    let app = Router::new()
        .route("/parse", post(parse_org_mode))
-        .fallback_service(serve_dir);
+        .fallback_service(static_files_service);

-    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
-    axum::serve(listener, app).await.unwrap();
+    let (emacs_version, org_mode_version) =
+        tokio::join!(get_emacs_version(), get_org_mode_version());
+    println!("Using emacs version: {}", emacs_version?.trim());
+    println!("Using org-mode version: {}", org_mode_version?.trim());
+
+    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
+    println!("Listening on port 3000. Pop open your browser to http://127.0.0.1:3000/ .");
+    axum::serve(listener, app).await?;
+    Ok(())
 }

-async fn parse_org_mode(
-    body: String,
-) -> Result<(StatusCode, Json<OwnerTree>), (StatusCode, String)> {
+async fn parse_org_mode(body: String) -> Result<impl IntoResponse, (StatusCode, String)> {
    _parse_org_mode(body)
        .await
        .map_err(|e| (StatusCode::BAD_REQUEST, e.to_string()))
 }

-async fn _parse_org_mode(
-    body: String,
-) -> Result<(StatusCode, Json<OwnerTree>), Box<dyn std::error::Error>> {
+async fn _parse_org_mode(body: String) -> Result<impl IntoResponse, Box<dyn std::error::Error>> {
    let ast = emacs_parse_org_document(&body).await?;
    let owner_tree = build_owner_tree(body.as_str(), ast.as_str()).map_err(|e| e.to_string())?;
    Ok((StatusCode::OK, Json(owner_tree)))
--- a/src/owner_tree.rs
+++ b/src/owner_tree.rs
@@ -1,18 +1,22 @@
 use serde::Serialize;

-use crate::sexp::{sexp_with_padding, Token};
+use crate::{
+    rtrim_iterator::RTrimIterator,
+    sexp::{sexp_with_padding, Token},
+};

 pub fn build_owner_tree<'a>(
    body: &'a str,
    ast_raw: &'a str,
 ) -> Result<OwnerTree, Box<dyn std::error::Error + 'a>> {
    let (_remaining, parsed_sexp) = sexp_with_padding(ast_raw)?;
-    let lists = find_lists_in_document(body, &parsed_sexp)?;
+    assert_name(&parsed_sexp, "org-data")?;
+    let ast_node = build_ast_node(body, None, &parsed_sexp)?;

    Ok(OwnerTree {
        input: body.to_owned(),
        ast: ast_raw.to_owned(),
-        lists,
+        tree: ast_node,
    })
 }

@@ -20,7 +24,14 @@ pub fn build_owner_tree<'a>(
 pub struct OwnerTree {
    input: String,
    ast: String,
-    lists: Vec<PlainList>,
+    tree: AstNode,
+}
+
+#[derive(Serialize)]
+pub struct AstNode {
+    name: String,
+    position: SourceRange,
+    children: Vec<AstNode>,
 }

 #[derive(Serialize)]
@@ -37,108 +48,79 @@ pub struct PlainListItem {

 #[derive(Serialize)]
 pub struct SourceRange {
-    start_line: u32,
-    end_line: u32, // Exclusive
-    start_character: u32,
-    end_character: u32, // Exclusive
+    start_line: usize,
+    end_line: usize, // Exclusive
+    start_character: usize,
+    end_character: usize, // Exclusive
 }

-fn find_lists_in_document<'a>(
+fn build_ast_node<'a>(
    original_source: &str,
+    parent_contents_begin: Option<usize>,
    current_token: &Token<'a>,
-) -> Result<Vec<PlainList>, Box<dyn std::error::Error>> {
-    // DFS looking for top-level lists
-
-    let mut found_lists = Vec::new();
-    let children = current_token.as_list()?;
-    let token_name = "org-data";
-    assert_name(current_token, token_name)?;
-
-    // skip 2 to skip token name and standard properties
-    for child_token in children.iter().skip(2) {
-        found_lists.extend(recurse_token(original_source, child_token)?);
-    }
-
-    Ok(found_lists)
-}
-
-fn recurse_token<'a>(
-    original_source: &str,
-    current_token: &Token<'a>,
-) -> Result<Vec<PlainList>, Box<dyn std::error::Error>> {
-    match current_token {
-        Token::Atom(_) | Token::TextWithProperties(_) => Ok(Vec::new()),
-        Token::List(_) => {
-            let new_lists = find_lists_in_list(original_source, current_token)?;
-            Ok(new_lists)
+) -> Result<AstNode, Box<dyn std::error::Error>> {
+    let maybe_plain_text = current_token.as_text();
+    let ast_node = match maybe_plain_text {
+        Ok(plain_text) => {
+            let parent_contents_begin = parent_contents_begin
+                .ok_or("parent_contents_begin should be set for all plain text nodes.")?;
+            let mut parameters = plain_text.properties.iter();
+            let begin = parent_contents_begin
+                + maybe_token_to_usize(parameters.next())?
+                    .ok_or("Missing first element past the text.")?;
+            let end = parent_contents_begin
+                + maybe_token_to_usize(parameters.next())?
+                    .ok_or("Missing second element past the text.")?;
+            let (start_line, end_line) = get_line_numbers(original_source, begin, end)?;
+            AstNode {
+                name: "plain-text".to_owned(),
+                position: SourceRange {
+                    start_line,
+                    end_line,
+                    start_character: begin,
+                    end_character: end,
+                },
+                children: Vec::new(),
+            }
        }
-        Token::Vector(_) => {
-            let new_lists = find_lists_in_vector(original_source, current_token)?;
-            Ok(new_lists)
+        Err(_) => {
+            // Not plain text, so it must be a list
+            let parameters = current_token.as_list()?;
+            let name = parameters
+                .first()
+                .ok_or("Should have at least one child.")?
+                .as_atom()?;
+            let position = get_bounds(original_source, current_token)?;
+            let mut children = Vec::new();
+            let original_contents_begin = get_contents_begin(current_token);
+            match original_contents_begin {
+                Ok(original_contents_begin) => {
+                    let mut contents_begin = original_contents_begin;
+                    for child in parameters.into_iter().skip(2) {
+                        let new_ast_node =
+                            build_ast_node(original_source, Some(contents_begin), child)?;
+                        contents_begin = new_ast_node.position.end_character;
+                        children.push(new_ast_node);
+                    }
+                }
+                Err(_) => {
+                    // Some nodes don't have a contents begin, so hopefully plain text can't be inside them.
+                    for child in parameters.into_iter().skip(2) {
+                        let new_ast_node = build_ast_node(original_source, None, child)?;
+                        children.push(new_ast_node);
+                    }
+                }
+            };
+
+            AstNode {
+                name: name.to_owned(),
+                position,
+                children,
+            }
        }
-    }
-}
+    };

-fn find_lists_in_list<'a>(
-    original_source: &str,
-    current_token: &Token<'a>,
-) -> Result<Vec<PlainList>, Box<dyn std::error::Error>> {
-    let mut found_lists = Vec::new();
-    let children = current_token.as_list()?;
-    if assert_name(current_token, "plain-list").is_ok() {
-        // Found a list!
-        let mut found_items = Vec::new();
-        // skip 2 to skip token name and standard properties
-        for child_token in children.iter().skip(2) {
-            found_items.push(get_item_in_list(original_source, child_token)?);
-        }
-
-        found_lists.push(PlainList {
-            position: get_bounds(original_source, current_token)?,
-            items: found_items,
-        });
-    } else {
-        // skip 2 to skip token name and standard properties
-        for child_token in children.iter().skip(2) {
-            found_lists.extend(recurse_token(original_source, child_token)?);
-        }
-    }
-
-    Ok(found_lists)
-}
-
-fn find_lists_in_vector<'a>(
-    original_source: &str,
-    current_token: &Token<'a>,
-) -> Result<Vec<PlainList>, Box<dyn std::error::Error>> {
-    let mut found_lists = Vec::new();
-    let children = current_token.as_vector()?;
-
-    for child_token in children.iter() {
-        found_lists.extend(recurse_token(original_source, child_token)?);
-    }
-
-    Ok(found_lists)
-}
-
-fn get_item_in_list<'a>(
-    original_source: &str,
-    current_token: &Token<'a>,
-) -> Result<PlainListItem, Box<dyn std::error::Error>> {
-    let mut found_lists = Vec::new();
-    let children = current_token.as_list()?;
-    let token_name = "item";
-    assert_name(current_token, token_name)?;
-
-    // skip 2 to skip token name and standard properties
-    for child_token in children.iter().skip(2) {
-        found_lists.extend(recurse_token(original_source, child_token)?);
-    }
-
-    Ok(PlainListItem {
-        position: get_bounds(original_source, current_token)?,
-        lists: found_lists,
-    })
+    Ok(ast_node)
 }

 fn assert_name<'s>(emacs: &'s Token<'s>, name: &str) -> Result<(), Box<dyn std::error::Error>> {
@@ -161,39 +143,35 @@ fn get_bounds<'s>(
    original_source: &'s str,
    emacs: &'s Token<'s>,
 ) -> Result<SourceRange, Box<dyn std::error::Error>> {
-    let children = emacs.as_list()?;
-    let attributes_child = children
-        .iter()
-        .nth(1)
-        .ok_or("Should have an attributes child.")?;
-    let attributes_map = attributes_child.as_map()?;
-    let standard_properties = attributes_map.get(":standard-properties");
-    let (begin, end) = if standard_properties.is_some() {
-        let std_props = standard_properties
-            .expect("if statement proves its Some")
-            .as_vector()?;
-        let begin = std_props
-            .get(0)
-            .ok_or("Missing first element in standard properties")?
-            .as_atom()?;
-        let end = std_props
-            .get(1)
-            .ok_or("Missing first element in standard properties")?
-            .as_atom()?;
-        (begin, end)
-    } else {
-        let begin = attributes_map
-            .get(":begin")
-            .ok_or("Missing :begin attribute.")?
-            .as_atom()?;
-        let end = attributes_map
-            .get(":end")
-            .ok_or("Missing :end attribute.")?
-            .as_atom()?;
-        (begin, end)
-    };
-    let begin = begin.parse::<u32>()?;
-    let end = end.parse::<u32>()?;
+    let standard_properties = get_standard_properties(emacs)?;
+    let (begin, end) = (
+        standard_properties
+            .begin
+            .ok_or("Token should have a begin.")?,
+        standard_properties.end.ok_or("Token should have an end.")?,
+    );
+    let (start_line, end_line) = get_line_numbers(original_source, begin, end)?;
+    Ok(SourceRange {
+        start_line,
+        end_line,
+        start_character: begin,
+        end_character: end,
+    })
+}
+
+fn get_contents_begin<'s>(emacs: &'s Token<'s>) -> Result<usize, Box<dyn std::error::Error>> {
+    let standard_properties = get_standard_properties(emacs)?;
+    Ok(standard_properties
+        .contents_begin
+        .ok_or("Token should have a contents-begin.")?)
+}
+
+fn get_line_numbers<'s>(
+    original_source: &'s str,
+    begin: usize,
+    end: usize,
+) -> Result<(usize, usize), Box<dyn std::error::Error>> {
+    // This is used for highlighting which lines contain text relevant to the token, so even if a token does not extend all the way to the end of the line, the end_line figure will be the following line number (since the range is exclusive, not inclusive).
    let start_line = original_source
        .chars()
        .into_iter()
@@ -201,17 +179,96 @@ fn get_bounds<'s>(
        .filter(|x| *x == '\n')
        .count()
        + 1;
-    let end_line = original_source
-        .chars()
-        .into_iter()
-        .take(usize::try_from(end)? - 1)
-        .filter(|x| *x == '\n')
-        .count()
-        + 1;
-    Ok(SourceRange {
-        start_line: u32::try_from(start_line)?,
-        end_line: u32::try_from(end_line)?,
-        start_character: begin,
-        end_character: end,
+    let end_line = {
+        let content_up_to_and_including_token = original_source
+            .chars()
+            .into_iter()
+            .take(usize::try_from(end)? - 1);
+        // Remove the trailing newline (if there is one) because we're going to add an extra line regardless of whether or not this ends with a new line.
+        let without_trailing_newline = RTrimIterator::new(content_up_to_and_including_token, '\n');
+        without_trailing_newline.filter(|x| *x == '\n').count() + 2
+    };
+
+    Ok((usize::try_from(start_line)?, usize::try_from(end_line)?))
+}
+
+struct StandardProperties {
+    begin: Option<usize>,
+    #[allow(dead_code)]
+    post_affiliated: Option<usize>,
+    #[allow(dead_code)]
+    contents_begin: Option<usize>,
+    #[allow(dead_code)]
+    contents_end: Option<usize>,
+    end: Option<usize>,
+    #[allow(dead_code)]
+    post_blank: Option<usize>,
+}
+
+fn get_standard_properties<'s>(
+    emacs: &'s Token<'s>,
+) -> Result<StandardProperties, Box<dyn std::error::Error>> {
+    let children = emacs.as_list()?;
+    let attributes_child = children
+        .iter()
+        .nth(1)
+        .ok_or("Should have an attributes child.")?;
+    let attributes_map = attributes_child.as_map()?;
+    let standard_properties = attributes_map.get(":standard-properties");
+    Ok(if standard_properties.is_some() {
+        let mut std_props = standard_properties
+            .expect("if statement proves its Some")
+            .as_vector()?
+            .into_iter();
+        let begin = maybe_token_to_usize(std_props.next())?;
+        let post_affiliated = maybe_token_to_usize(std_props.next())?;
+        let contents_begin = maybe_token_to_usize(std_props.next())?;
+        let contents_end = maybe_token_to_usize(std_props.next())?;
+        let end = maybe_token_to_usize(std_props.next())?;
+        let post_blank = maybe_token_to_usize(std_props.next())?;
+        StandardProperties {
+            begin,
+            post_affiliated,
+            contents_begin,
+            contents_end,
+            end,
+            post_blank,
+        }
+    } else {
+        let begin = maybe_token_to_usize(attributes_map.get(":begin").map(|token| *token))?;
+        let end = maybe_token_to_usize(attributes_map.get(":end").map(|token| *token))?;
+        let contents_begin =
+            maybe_token_to_usize(attributes_map.get(":contents-begin").map(|token| *token))?;
+        let contents_end =
+            maybe_token_to_usize(attributes_map.get(":contents-end").map(|token| *token))?;
+        let post_blank =
+            maybe_token_to_usize(attributes_map.get(":post-blank").map(|token| *token))?;
+        let post_affiliated =
+            maybe_token_to_usize(attributes_map.get(":post-affiliated").map(|token| *token))?;
+        StandardProperties {
+            begin,
+            post_affiliated,
+            contents_begin,
+            contents_end,
+            end,
+            post_blank,
+        }
    })
 }
+
+fn maybe_token_to_usize(
+    token: Option<&Token<'_>>,
+) -> Result<Option<usize>, Box<dyn std::error::Error>> {
+    Ok(token
+        .map(|token| token.as_atom())
+        .map_or(Ok(None), |r| r.map(Some))?
+        .map(|val| {
+            if val == "nil" {
+                None
+            } else {
+                Some(val.parse::<usize>())
+            }
+        })
+        .flatten() // Outer option is whether or not the param exists, inner option is whether or not it is nil
+        .map_or(Ok(None), |r| r.map(Some))?)
+}
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -51,3 +51,40 @@ where
    }
    output
 }
+
+pub async fn get_emacs_version() -> Result<String, Box<dyn std::error::Error>> {
+    let elisp_script = r#"(progn
+     (message "%s" (version))
+)"#;
+    let mut cmd = Command::new("emacs");
+    let proc = cmd
+        .arg("-q")
+        .arg("--no-site-file")
+        .arg("--no-splash")
+        .arg("--batch")
+        .arg("--eval")
+        .arg(elisp_script);
+
+    let out = proc.output().await?;
+    out.status.exit_ok()?;
+    Ok(String::from_utf8(out.stderr)?)
+}
+
+pub async fn get_org_mode_version() -> Result<String, Box<dyn std::error::Error>> {
+    let elisp_script = r#"(progn
+     (org-mode)
+     (message "%s" (org-version nil t nil))
+)"#;
+    let mut cmd = Command::new("emacs");
+    let proc = cmd
+        .arg("-q")
+        .arg("--no-site-file")
+        .arg("--no-splash")
+        .arg("--batch")
+        .arg("--eval")
+        .arg(elisp_script);
+
+    let out = proc.output().await?;
+    out.status.exit_ok()?;
+    Ok(String::from_utf8(out.stderr)?)
+}
--- a/src/rtrim_iterator.rs
+++ b/src/rtrim_iterator.rs
@@ -0,0 +1,86 @@
+/// Removes 1 character from the end of an iterator if it matches needle
+pub struct RTrimIterator<I> {
+    iter: I,
+    needle: char,
+    buffer: Option<char>,
+}
+
+impl<I> Iterator for RTrimIterator<I>
+where
+    I: Iterator<Item = char>,
+{
+    type Item = char;
+
+    fn next(&mut self) -> Option<I::Item> {
+        loop {
+            match (self.buffer, self.iter.next()) {
+                (None, None) => {
+                    // We reached the end of the list and have an empty buffer, meaning the string did not end with the needle character.
+                    return None;
+                }
+                (None, Some(chr)) if chr == self.needle => {
+                    // We came across an instance of needle, buffer it and loop again because we do not know if this is the end of the string.
+                    self.buffer = Some(chr);
+                }
+                (None, Some(chr)) => {
+                    // We have an empty buffer and the next character is not the needle character, return it immediately.
+                    return Some(chr);
+                }
+                (Some(buf), None) if buf == self.needle => {
+                    // We reached the end of the list and have the specified needle in the buffer where it will stay forever.
+                    return None;
+                }
+                (Some(_), None) => {
+                    // We reached the end of the list and the buffered character is not the needle character, so write it out.
+                    return self.buffer.take();
+                }
+                (Some(_), Some(chr)) => {
+                    // We have a buffered character, but it is not the end of the string, so regardless of its contents we can write it out.
+                    return self.buffer.replace(chr);
+                }
+            };
+        }
+    }
+}
+
+impl<I> RTrimIterator<I> {
+    pub fn new(iter: I, needle: char) -> RTrimIterator<I> {
+        RTrimIterator {
+            iter,
+            needle,
+            buffer: None,
+        }
+    }
+}
+
+mod tests {
+    use super::*;
+
+    #[test]
+    fn no_match() {
+        let input = "abcd";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, input);
+    }
+
+    #[test]
+    fn middle_match() {
+        let input = "ab\ncd";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, input);
+    }
+
+    #[test]
+    fn end_match() {
+        let input = "abcd\n";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, "abcd");
+    }
+
+    #[test]
+    fn double_match() {
+        let input = "abcd\n\n";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, "abcd\n");
+    }
+}
--- a/static/index.html
+++ b/static/index.html
@@ -1,5 +1,21 @@
+<!doctype html>
 <html>
+  <head>
+    <link rel="stylesheet" href="reset.css">
+    <link rel="stylesheet" href="style.css">
+    <script type="text/javascript" src="script.js" defer></script>
+  </head>
  <body>
-    Test html file.
+    <h2>Input org-mode source:</h2>
+    <textarea id="org-input" rows="24" cols="80"></textarea>
+    <hr/>
+    <div class="output_container">
+      <div>
+        <div id="parse-output" class="code_block" style="counter-set: code_line_number 0;"></div>
+      </div>
+      <div>
+        <div id="ast-tree" class="ast_tree"></div>
+      </div>
+    </div>
  </body>
 </html>
--- a/static/reset.css
+++ b/static/reset.css
@@ -0,0 +1,48 @@
+/* http://meyerweb.com/eric/tools/css/reset/
+   v2.0 | 20110126
+   License: none (public domain)
+*/
+
+html, body, div, span, applet, object, iframe,
+h1, h2, h3, h4, h5, h6, p, blockquote, pre,
+a, abbr, acronym, address, big, cite, code,
+del, dfn, em, img, ins, kbd, q, s, samp,
+small, strike, strong, sub, sup, tt, var,
+b, u, i, center,
+dl, dt, dd, ol, ul, li,
+fieldset, form, label, legend,
+table, caption, tbody, tfoot, thead, tr, th, td,
+article, aside, canvas, details, embed,
+figure, figcaption, footer, header, hgroup,
+menu, nav, output, ruby, section, summary,
+time, mark, audio, video {
+        margin: 0;
+        padding: 0;
+        border: 0;
+        font-size: 100%;
+        font: inherit;
+        vertical-align: baseline;
+}
+/* HTML5 display-role reset for older browsers */
+article, aside, details, figcaption, figure,
+footer, header, hgroup, menu, nav, section {
+        display: block;
+}
+body {
+        line-height: 1;
+}
+ol, ul {
+        list-style: none;
+}
+blockquote, q {
+        quotes: none;
+}
+blockquote:before, blockquote:after,
+q:before, q:after {
+        content: '';
+        content: none;
+}
+table {
+        border-collapse: collapse;
+        border-spacing: 0;
+}
--- a/static/script.js
+++ b/static/script.js
@@ -0,0 +1,194 @@
+let inFlightRequest = null;
+const inputElement = document.querySelector("#org-input");
+const outputElement = document.querySelector("#parse-output");
+const astTreeElement = document.querySelector("#ast-tree");
+
+function abortableFetch(request, options) {
+    const controller = new AbortController();
+    const signal = controller.signal;
+
+    return {
+        abort: () => controller.abort(),
+        ready: fetch(request, { ...options, signal })
+    };
+}
+
+function clearOutput() {
+    clearActiveAstNode();
+    outputElement.innerHTML = "";
+    astTreeElement.innerHTML = "";
+}
+
+function renderParseResponse(response) {
+    clearOutput();
+    renderSourceBox(response);
+    renderAstTree(response);
+}
+
+function renderSourceBox(response) {
+    const lines = response.input.split(/\r?\n/);
+    const numLines = lines.length;
+    const numDigits = Math.log10(numLines) + 1;
+
+    outputElement.style.paddingLeft = `calc(${numDigits + 1}ch + 10px)`;
+
+    for (let line of lines) {
+        let wrappedLine = document.createElement("code");
+        if (line !== "" && line !== null) {
+            for (let chr of line) {
+                // Please forgive me
+                let wrappedCharacter = document.createElement("span");
+                wrappedCharacter.textContent = chr;
+                wrappedLine.appendChild(wrappedCharacter);
+            }
+        } else {
+            let wrappedCharacter = document.createElement("span");
+            wrappedCharacter.textContent = "\n";
+            wrappedLine.appendChild(wrappedCharacter);
+        }
+        outputElement.appendChild(wrappedLine);
+    }
+}
+
+function renderAstTree(response) {
+    renderAstNode(response.input, 0, response.tree);
+}
+
+function renderAstNode(originalSource, depth, astNode) {
+    const nodeElem = document.createElement("div");
+    nodeElem.classList.add("ast_node");
+
+    let sourceForNode = unicodeAwareSlice(originalSource, astNode.position.start_character - 1, astNode.position.end_character - 1);
+    // Since sourceForList is a string, JSON.stringify will escape with backslashes and wrap the text in quotation marks, ensuring that the string ends up on a single line. Coincidentally, this is the behavior we want.
+    let escapedSource = JSON.stringify(sourceForNode);
+
+    nodeElem.innerText = `${astNode.name}: ${escapedSource}`;
+    nodeElem.style.marginLeft = `${depth * 20}px`;
+    nodeElem.dataset.startLine = astNode.position.start_line;
+    nodeElem.dataset.endLine = astNode.position.end_line;
+    nodeElem.dataset.startCharacter = astNode.position.start_character;
+    nodeElem.dataset.endCharacter = astNode.position.end_character;
+
+    nodeElem.addEventListener("click", () => {
+        setActiveAstNode(nodeElem, originalSource);
+    });
+
+    astTreeElement.appendChild(nodeElem);
+    for (let child of astNode.children) {
+        renderAstNode(originalSource, depth + 1, child);
+    }
+}
+
+function clearActiveAstNode() {
+    for (let elem of document.querySelectorAll("#ast-tree .ast_node.highlighted")) {
+        elem.classList.remove("highlighted");
+    }
+    for (let elem of document.querySelectorAll("#parse-output > code.highlighted")) {
+        elem.classList.remove("highlighted");
+    }
+    for (let elem of document.querySelectorAll("#parse-output > code > span")) {
+        elem.classList.remove("highlighted");
+    }
+}
+
+function setActiveAstNode(elem, originalSource) {
+    clearActiveAstNode();
+    elem.classList.add("highlighted");
+    let startLine = parseInt(elem.dataset.startLine, 10);
+    let endLine = parseInt(elem.dataset.endLine, 10);
+    let startCharacter = parseInt(elem.dataset.startCharacter, 10);
+    let endCharacter = parseInt(elem.dataset.endCharacter, 10);
+    for (let line = startLine; line < endLine; ++line) {
+        highlightLine("parse-output", line - 1);
+    }
+    highlightCharacters("parse-output", originalSource, startCharacter, endCharacter);
+}
+
+inputElement.addEventListener("input", async () => {
+    let orgSource = inputElement.value;
+    if (inFlightRequest != null) {
+        inFlightRequest.abort();
+        inFlightRequest = null;
+    }
+    clearOutput();
+
+    let newRequest = abortableFetch("/parse", {
+        method: "POST",
+        cache: "no-cache",
+        body: orgSource,
+    });
+    inFlightRequest = newRequest;
+
+    let response = null;
+    try {
+        response = await inFlightRequest.ready;
+    }
+    catch (err) {
+        if (err.name === "AbortError") return;
+    }
+    renderParseResponse(await response.json());
+});
+
+function highlightLine(htmlName, lineOffset) {
+  const childOffset = lineOffset + 1;
+    const codeLineElement = document.querySelector(`#${htmlName} > code:nth-child(${childOffset})`);
+  codeLineElement?.classList.add("highlighted")
+}
+
+function highlightCharacters(htmlName, originalSource, startCharacter, endCharacter) {
+    let sourceBefore = unicodeAwareSlice(originalSource, 0, startCharacter - 1);
+    let precedingLineBreak = unicodeAwareLastIndexOfCharacter(sourceBefore, "\n");
+    let characterIndexOnLine = precedingLineBreak !== -1 ? startCharacter - precedingLineBreak - 1 : startCharacter;
+    let lineNumber = (sourceBefore.match(/\r?\n/g) || '').length + 1;
+
+    for (let characterIndex = startCharacter; characterIndex < endCharacter; ++characterIndex) {
+        document.querySelector(`#${htmlName} > code:nth-child(${lineNumber}) > span:nth-child(${characterIndexOnLine})`)?.classList.add("highlighted");
+        if (unicodeAwareCharAtOffset(originalSource, characterIndex - 1) == "\n") {
+            ++lineNumber;
+            characterIndexOnLine = 1;
+        } else {
+            ++characterIndexOnLine;
+        }
+    }
+
+}
+
+function unicodeAwareSlice(text, start, end) {
+    // Boooo javascript
+    let i = 0;
+    let output = "";
+    for (chr of text) {
+        if (i >= end) {
+            break;
+        }
+        if (i >= start) {
+            output += chr;
+        }
+        ++i;
+    }
+    return output;
+}
+
+function unicodeAwareLastIndexOfCharacter(haystack, needle) {
+    // Boooo javascript
+    let i = 0;
+    let found = -1;
+    for (chr of haystack) {
+        if (chr == needle) {
+            found = i;
+        }
+        ++i;
+    }
+    return found;
+}
+
+function unicodeAwareCharAtOffset(text, offset) {
+    // Boooo javascript
+    let i = offset;
+    for (chr of text) {
+        if (i == 0) {
+            return chr;
+        }
+        --i;
+    }
+}
--- a/static/style.css
+++ b/static/style.css
@@ -0,0 +1,86 @@
+h1, h2, h3, h4, h5, h6, h7 {
+    font-weight: 700;
+}
+h1 {
+    font-size: 28px;
+}
+h2 {
+    font-size: 24px;
+}
+h3 {
+    font-size: 22px;
+}
+h4 {
+    font-size: 20px;
+}
+h5 {
+    font-size: 18px;
+}
+h6 {
+    font-size: 18px;
+}
+h7 {
+    font-size: 18px;
+}
+
+.code_block {
+    font: 14px/1.4 "Cascadia Mono", monospace;
+    background: #272822ff;
+    color: #f8f8f2ff;
+    display: table;
+    white-space: break-spaces;
+    padding: 5px;
+}
+
+.code_block > code {
+    display: table;
+    counter-increment: code_line_number;
+}
+
+.code_block > code::before {
+    content: counter(code_line_number) " ";
+    display: inline-block;
+    position: absolute;
+    transform: TranslateX(-100%);
+    padding-right: 5px;
+    color: #eeeeee;
+}
+
+.code_block > code.highlighted {
+    /* We aren't using this because we are going to highlight individual characters, but we still need to set the highlighted class on the code elem so the line numbers on the left get highlighted to make empty lines more obvious. */
+    /* background: #307351ff; */
+}
+
+.code_block > code.highlighted::before {
+    background: #307351ff;
+}
+
+.code_block > code > span.highlighted {
+    background: #307351ff;
+}
+
+.output_container {
+    display: flex;
+    flex-direction: row;
+}
+
+.output_container > * {
+    flex: 1 0 0;
+}
+
+.ast_tree {
+    padding: 5px;
+}
+
+.ast_node {
+    cursor: pointer;
+    background: #eeeeee;
+    margin-bottom: 5px;
+    border: 1px solid #000000;
+    padding: 2px;
+}
+
+.ast_node.highlighted {
+    background: #307351ff;
+    color: #ffffff;
+}
Author	SHA1	Message	Date
Tom Alexander	3760358783	Compile the elisp ahead of time so it is not done on every docker container launch.	2023-09-20 02:30:58 -04:00
Tom Alexander	024b2ade03	Update org-mode version.	2023-09-16 14:46:02 -04:00
Tom Alexander	55e5c31368	Update org-mode version.	2023-09-08 13:11:28 -04:00
Tom Alexander	4a556bc84f	Use read-only root for docker containers.	2023-08-31 21:21:14 -04:00
Tom Alexander	9bf2a912d6	Enable unicode in the docker container.	2023-08-31 20:23:00 -04:00
Tom Alexander	e8f262727d	Add a script to build and launch the docker container in one step.	2023-08-31 15:18:25 -04:00
Tom Alexander	b4170dda1f	Update org-mode version.	2023-08-25 04:58:45 -04:00
Tom Alexander	bd99fbc4c4	Get the versions of emacs and org-mode and write them to stdout.	2023-08-25 02:30:52 -04:00
Tom Alexander	79c834a1e6	Add the init flag to the docker run command.	2023-08-25 02:10:03 -04:00
Tom Alexander	2505a10275	Parameterize the emacs and org-mode versions in the dockerfile.	2023-08-25 02:02:57 -04:00
Tom Alexander	cfc9153c28	Handle nodes that do not have a contents begin like fixed width areas.	2023-08-20 16:25:57 -04:00
Tom Alexander	13a73efdcf	Handle line numbers properly when selected node does not end in a line break.	2023-08-20 15:55:54 -04:00
Tom Alexander	cba1d1e988	Add notes from plain list ownership investigation.	2023-08-19 01:10:42 -04:00
Tom Alexander	e8a89dfeca	Remove log statement.	2023-08-18 23:33:48 -04:00
Tom Alexander	367dfaa146	Update org-mode version.	2023-08-18 23:32:44 -04:00
Tom Alexander	c4762510f4	Handle unicode. Turns out javascript iterates over strings by character, but all the string functions like slicing, lastIndexOf, and indexing with [] are all based on codepoints without taking into account surrogate pairs like orange heart. It would have been nice if that was mentioned in the documentation...	2023-08-18 23:32:21 -04:00
Tom Alexander	372542d914	Add a print to announce the server is running.	2023-08-18 22:32:01 -04:00
Tom Alexander	0d6621d389	Add docker.	2023-08-18 22:26:42 -04:00
Tom Alexander	e96c39e3e0	Add a README.	2023-08-18 21:35:39 -04:00
Tom Alexander	9032b00e1b	Fix handling of plain text.	2023-08-18 21:22:53 -04:00
Tom Alexander	acdc8b8993	Highlighting characters.	2023-08-18 21:06:43 -04:00
Tom Alexander	676dffa15f	Rendering ast tree.	2023-08-18 19:23:31 -04:00
Tom Alexander	ab836f2794	Switch to returning the whole tree from rust instead of just the lists.	2023-08-18 19:11:51 -04:00
Tom Alexander	0ee33949e9	Beginning of rendering the ast list.	2023-08-18 18:32:23 -04:00
Tom Alexander	27a2bea705	Split the output so I can have a tree.	2023-08-18 17:40:19 -04:00
Tom Alexander	4fb203c1db	Putting in new-line characters in the empty lines has fixed copy+paste and made the min-height css unnecessary.	2023-08-18 17:20:45 -04:00
Tom Alexander	51b4eed034	Beginning the render the parsed content.	2023-08-18 17:10:55 -04:00
Tom Alexander	c3be0f249d	Minor style improvements.	2023-08-18 16:26:05 -04:00
Tom Alexander	13fab742e5	Add a sample output code block.	2023-08-18 16:19:37 -04:00
Tom Alexander	893de9a65e	Set cache control headers for the static files.	2023-08-18 15:50:22 -04:00
Tom Alexander	bff0a62291	Change response to impl IntoResponse.	2023-08-18 15:41:23 -04:00
Tom Alexander	c24c5ee54e	POSTing the body to the server.	2023-08-18 15:41:06 -04:00