Compile the elisp ahead of time so it is not done on every docker container launch.

Update org-mode version.
2023-09-20 02:30:58 -04:00 · 2023-09-16 14:46:02 -04:00 · 2023-09-08 13:11:28 -04:00 · 2023-08-31 21:21:14 -04:00 · 2023-08-31 20:23:00 -04:00 · 2023-08-31 15:18:25 -04:00
13 changed files with 562 additions and 101 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,7 @@
+**/.git
+target/
+docker/
+LICENSE
+readme/
+README.md
+notes/
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -10,3 +10,8 @@ serde = { version = "1.0.183", features = ["derive"] }
 tokio = { version = "1.30.0", default-features = false, features = ["macros", "process", "rt", "rt-multi-thread"] }
 tower = "0.4.13"
 tower-http = { version = "0.4.3", features = ["fs", "set-header"] }
+
+[profile.release-lto]
+inherits = "release"
+lto = true
+strip = "symbols"
--- a/README.md
+++ b/README.md
@@ -0,0 +1,33 @@
+# Org-Mode AST Investigation Tool
+This repository contains a slapdash tool to make visualizing the abstract syntax tree of an org-mode document easier. Write your org-mode source into the top text box, and below on the right it will create a clickable tree of the AST. When you click on a node, the contents of that node will be highlighted on the left.
+
+![Screenshot showing the interface to the org-mode abstract syntax tree investigation tool.](readme/screenshot.png?raw=true "Org-mode investigation tool interface")
+
+## Running
+Running in docker is the recommended way to run this. It creates a consistent working environment, without impacting (or requiring you to install) emacs, org-mode, or rust.
+### Docker
+First we need to build the docker container. On the first run, this will pull the emacs and org-mode source code so this build will take a while the first time. After that, subsequent builds should be fast because docker caches the layers.
+
+```bash
+# from the root of this repository:
+make --directory=docker
+```
+
+Next we need to launch the server:
+```bash
+docker run --init --rm --publish 3000:3000/tcp --read-only --mount type=tmpfs,destination=/tmp org-investigation
+```
+
+This launches a server listening on port 3000, so pop open your browser to http://127.0.0.1:3000/ to access the web interface.
+
+(alternatively, you can run the `scripts/launch_docker.bash` script which performs these two steps.)
+### No docker
+You will need a fully functional rust setup with nightly installed (due to the use of exit_status_error). Then from the root of this repo you can launch the server by running:
+
+```bash
+cargo run --release
+```
+
+It will use your installed version of emacs and org-mode which may differ from what the docker users are using.
+
+This launches a server listening on port 3000, so pop open your browser to http://127.0.0.1:3000/ to access the web interface.
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -0,0 +1,44 @@
+FROM alpine:3.17 AS build
+RUN apk add --no-cache build-base musl-dev git autoconf make texinfo gnutls-dev ncurses-dev gawk libgccjit-dev
+
+
+FROM build AS build-emacs
+ARG EMACS_VERSION=emacs-29.1
+RUN git clone --depth 1 --branch $EMACS_VERSION https://git.savannah.gnu.org/git/emacs.git /root/emacs
+WORKDIR /root/emacs
+RUN mkdir /root/dist
+RUN ./autogen.sh
+RUN ./configure --prefix /usr --without-x --without-sound --with-native-compilation=aot
+RUN make
+RUN make DESTDIR="/root/dist" install
+
+
+FROM build AS build-org-mode
+ARG ORG_VERSION=c703541ffcc14965e3567f928de1683a1c1e33f6
+COPY --from=build-emacs /root/dist/ /
+RUN mkdir /root/dist
+# Savannah does not allow fetching specific revisions, so we're going to have to put unnecessary load on their server by cloning main and then checking out the revision we want.
+RUN git clone https://git.savannah.gnu.org/git/emacs/org-mode.git /root/org-mode && git -C /root/org-mode checkout $ORG_VERSION
+# RUN mkdir /root/org-mode && git -C /root/org-mode init --initial-branch=main && git -C /root/org-mode remote add origin https://git.savannah.gnu.org/git/emacs/org-mode.git && git -C /root/org-mode fetch origin $ORG_VERSION && git -C /root/org-mode checkout FETCH_HEAD
+WORKDIR /root/org-mode
+RUN make compile
+RUN make DESTDIR="/root/dist" install
+
+
+FROM rustlang/rust:nightly-alpine3.17 AS build-org-investigation
+RUN apk add --no-cache musl-dev
+RUN mkdir /root/org-investigation
+WORKDIR /root/org-investigation
+COPY . .
+RUN CARGO_TARGET_DIR=/target cargo build --profile release-lto
+
+
+FROM alpine:3.17 AS run
+ENV LANG=en_US.UTF-8
+RUN apk add --no-cache ncurses gnutls libgccjit
+COPY --from=build-emacs /root/dist/ /
+COPY --from=build-org-mode /root/dist/ /
+COPY --from=build-org-investigation /target/release-lto/org_ownership_investigation /usr/bin/
+COPY static /opt/org-investigation/static
+WORKDIR /opt/org-investigation
+CMD ["/usr/bin/org_ownership_investigation"]
--- a/docker/Makefile
+++ b/docker/Makefile
@@ -0,0 +1,9 @@
+IMAGE_NAME:=org-investigation
+
+.PHONY: build
+build:
+	docker build -t $(IMAGE_NAME) -f Dockerfile ../
+
+.PHONY: clean
+clean:
+	docker rmi $(IMAGE_NAME)
--- a/notes/plain_list_ownership_notes.org
+++ b/notes/plain_list_ownership_notes.org
@@ -0,0 +1,130 @@
+* Test 1
+** Source
+#+begin_src org
+  1. foo
+
+     1. bar
+
+     2. baz
+
+  2. lorem
+
+  ipsum
+#+end_src
+** Ownership
+This table is just showing ownership for the plain list items, not the containing plain list nor the elements inside each item.
+
+| Plain List *Item*      | Owns trailing blank lines |
+|------------------------+---------------------------|
+| foo (includes bar baz) | Yes                       |
+| bar                    | Yes                       |
+| baz                    | Yes                       |
+| lorem                  | No                        |
+** Analysis
+In this test case, we see that the only list item that doesn't own its trailing blank lines is "lorem", the final list item of the outer-most list.
+* Test 2
+We add "cat" as a paragraph at the end of foo which makes "baz" lose its trailing blank lines.
+** Source
+#+begin_src org
+  1. foo
+
+     1. bar
+
+     2. baz
+
+     cat
+
+  2. lorem
+
+  ipsum
+#+end_src
+** Ownership
+| Plain List *Item*             | Owns trailing blank lines |
+|-------------------------------+---------------------------|
+| foo -> cat (includes bar baz) | Yes                       |
+| bar                           | Yes                       |
+| baz                           | No                        |
+| lorem                         | No                        |
+** Analysis
+In isolation, this implies that the final plain list item does not own its trailing blank lines, which conflicts with "baz" from test 1.
+
+New theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a list item.
+
+| Plain List *Item*             | Owns trailing blank lines | Why                                                       |
+|-------------------------------+---------------------------+-----------------------------------------------------------|
+| foo -> cat (includes bar baz) | Yes                       | Not the final list item                                   |
+| bar                           | Yes                       | Not the final list item                                   |
+| baz                           | No                        | Final item of bar->baz and not the final element of "foo" |
+| lorem                         | No                        | Final item of foo->lorem and not contained in a list item |
+* Test 3
+So if that theory is true, taking the entire (foo -> lorem) list from test 1 and nesting it inside a list should coerce "lorem" to own its trailing blank lines since it would then be a final list item (of foo -> lorem) and the final element of the new list.
+** Source
+#+begin_src org
+  1. cat
+     1. foo
+
+        1. bar
+
+        2. baz
+
+     2. lorem
+
+  ipsum
+#+end_src
+** Ownership
+| Plain List *Item*           | Owns trailing blank lines |
+|-----------------------------+---------------------------|
+| cat (includes foo -> lorem) | No                        |
+| foo (includes bar baz)      | Yes                       |
+| bar                         | Yes                       |
+| baz                         | Yes                       |
+| lorem                       | No                        |
+** Analysis
+Against expectations, we did not coerce lorem to consume its trailing blank lines. What is different between "baz" and "lorem"? Well, "baz" is contained within "foo" which has a "lorem" after it, whereas "lorem" is contained within "cat" which does not have any list items after it.
+
+New theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a non-final list item.
+| Plain List *Item*           | Owns trailing blank lines | Why                                                  |
+|-----------------------------+---------------------------+------------------------------------------------------|
+| cat (includes foo -> lorem) | No                        | Final list item and not contained in a list item     |
+| foo (includes bar baz)      | Yes                       | Not the final list item                              |
+| bar                         | Yes                       | Not the final list item                              |
+| baz                         | Yes                       | Final element of non-final list item                 |
+| lorem                       | No                        | Final list item and final element of final list item |
+* Test 4
+So if that theory is true, then we should be able to coerce lorem to consume its trailing blank lines by adding a second item to the cat list.
+** Source
+#+begin_src org
+  1. cat
+     1. foo
+
+        1. bar
+
+        2. baz
+
+     2. lorem
+
+  2. dog
+
+  ipsum
+#+end_src
+** Ownership
+| Plain List *Item*           | Owns trailing blank lines |
+|-----------------------------+---------------------------|
+| cat (includes foo -> lorem) | Yes                       |
+| foo (includes bar baz)      | Yes                       |
+| bar                         | Yes                       |
+| baz                         | Yes                       |
+| lorem                       | Yes                       |
+| dog                         | No                        |
+** Analysis
+For the first time our expectations were met!
+
+Enduring theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a non-final list item.
+| Plain List *Item*           | Owns trailing blank lines | Why                                              |
+|-----------------------------+---------------------------+--------------------------------------------------|
+| cat (includes foo -> lorem) | Yes                       | Not the final list item                          |
+| foo (includes bar baz)      | Yes                       | Not the final list item                          |
+| bar                         | Yes                       | Not the final list item                          |
+| baz                         | Yes                       | Final element of non-final list item             |
+| lorem                       | Yes                       | Final element of non-final list item             |
+| dog                         | No                        | Final list item and not contained in a list item |
--- a/readme/screenshot.png
+++ b/readme/screenshot.png
--- a/scripts/launch_docker.bash
+++ b/scripts/launch_docker.bash
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+#
+set -euo pipefail
+IFS=$'\n\t'
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+function main {
+    make --directory "$DIR/../docker"
+    exec docker run --init --rm --read-only --mount type=tmpfs,destination=/tmp --publish 3000:3000/tcp org-investigation
+}
+
+main "${@}"
--- a/src/main.rs
+++ b/src/main.rs
@@ -4,18 +4,21 @@ use axum::http::HeaderValue;
 use axum::response::IntoResponse;
 use axum::{http::StatusCode, routing::post, Json, Router};
 use owner_tree::build_owner_tree;
-use parse::emacs_parse_org_document;
+use parse::{emacs_parse_org_document, get_emacs_version};
 use tower::ServiceBuilder;
 use tower_http::services::{ServeDir, ServeFile};
 use tower_http::set_header::SetResponseHeaderLayer;

+use crate::parse::get_org_mode_version;
+
 mod error;
 mod owner_tree;
 mod parse;
+mod rtrim_iterator;
 mod sexp;

 #[tokio::main]
-async fn main() {
+async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let static_files_service = {
        let serve_dir =
            ServeDir::new("static").not_found_service(ServeFile::new("static/index.html"));
@@ -31,8 +34,15 @@ async fn main() {
        .route("/parse", post(parse_org_mode))
        .fallback_service(static_files_service);

-    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
-    axum::serve(listener, app).await.unwrap();
+    let (emacs_version, org_mode_version) =
+        tokio::join!(get_emacs_version(), get_org_mode_version());
+    println!("Using emacs version: {}", emacs_version?.trim());
+    println!("Using org-mode version: {}", org_mode_version?.trim());
+
+    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
+    println!("Listening on port 3000. Pop open your browser to http://127.0.0.1:3000/ .");
+    axum::serve(listener, app).await?;
+    Ok(())
 }

 async fn parse_org_mode(body: String) -> Result<impl IntoResponse, (StatusCode, String)> {
--- a/src/owner_tree.rs
+++ b/src/owner_tree.rs
@@ -1,6 +1,9 @@
 use serde::Serialize;

-use crate::sexp::{sexp_with_padding, Token};
+use crate::{
+    rtrim_iterator::RTrimIterator,
+    sexp::{sexp_with_padding, Token},
+};

 pub fn build_owner_tree<'a>(
    body: &'a str,
@@ -45,15 +48,15 @@ pub struct PlainListItem {

 #[derive(Serialize)]
 pub struct SourceRange {
-    start_line: u32,
-    end_line: u32, // Exclusive
-    start_character: u32,
-    end_character: u32, // Exclusive
+    start_line: usize,
+    end_line: usize, // Exclusive
+    start_character: usize,
+    end_character: usize, // Exclusive
 }

 fn build_ast_node<'a>(
    original_source: &str,
-    parent_contents_begin: Option<u32>,
+    parent_contents_begin: Option<usize>,
    current_token: &Token<'a>,
 ) -> Result<AstNode, Box<dyn std::error::Error>> {
    let maybe_plain_text = current_token.as_text();
@@ -61,19 +64,13 @@ fn build_ast_node<'a>(
        Ok(plain_text) => {
            let parent_contents_begin = parent_contents_begin
                .ok_or("parent_contents_begin should be set for all plain text nodes.")?;
-            let parameters = &plain_text.properties;
+            let mut parameters = plain_text.properties.iter();
            let begin = parent_contents_begin
-                + parameters
-                    .get(0)
-                    .ok_or("Missing first element past the text.")?
-                    .as_atom()?
-                    .parse::<u32>()?;
+                + maybe_token_to_usize(parameters.next())?
+                    .ok_or("Missing first element past the text.")?;
            let end = parent_contents_begin
-                + parameters
-                    .get(1)
-                    .ok_or("Missing second element past the text.")?
-                    .as_atom()?
-                    .parse::<u32>()?;
+                + maybe_token_to_usize(parameters.next())?
+                    .ok_or("Missing second element past the text.")?;
            let (start_line, end_line) = get_line_numbers(original_source, begin, end)?;
            AstNode {
                name: "plain-text".to_owned(),
@@ -95,12 +92,25 @@ fn build_ast_node<'a>(
                .as_atom()?;
            let position = get_bounds(original_source, current_token)?;
            let mut children = Vec::new();
-            let mut contents_begin = get_contents_begin(current_token)?;
-            for child in parameters.into_iter().skip(2) {
-                let new_ast_node = build_ast_node(original_source, Some(contents_begin), child)?;
-                contents_begin = new_ast_node.position.end_character;
-                children.push(new_ast_node);
-            }
+            let original_contents_begin = get_contents_begin(current_token);
+            match original_contents_begin {
+                Ok(original_contents_begin) => {
+                    let mut contents_begin = original_contents_begin;
+                    for child in parameters.into_iter().skip(2) {
+                        let new_ast_node =
+                            build_ast_node(original_source, Some(contents_begin), child)?;
+                        contents_begin = new_ast_node.position.end_character;
+                        children.push(new_ast_node);
+                    }
+                }
+                Err(_) => {
+                    // Some nodes don't have a contents begin, so hopefully plain text can't be inside them.
+                    for child in parameters.into_iter().skip(2) {
+                        let new_ast_node = build_ast_node(original_source, None, child)?;
+                        children.push(new_ast_node);
+                    }
+                }
+            };

            AstNode {
                name: name.to_owned(),
@@ -133,39 +143,13 @@ fn get_bounds<'s>(
    original_source: &'s str,
    emacs: &'s Token<'s>,
 ) -> Result<SourceRange, Box<dyn std::error::Error>> {
-    let children = emacs.as_list()?;
-    let attributes_child = children
-        .iter()
-        .nth(1)
-        .ok_or("Should have an attributes child.")?;
-    let attributes_map = attributes_child.as_map()?;
-    let standard_properties = attributes_map.get(":standard-properties");
-    let (begin, end) = if standard_properties.is_some() {
-        let std_props = standard_properties
-            .expect("if statement proves its Some")
-            .as_vector()?;
-        let begin = std_props
-            .get(0)
-            .ok_or("Missing first element in standard properties")?
-            .as_atom()?;
-        let end = std_props
-            .get(1)
-            .ok_or("Missing first element in standard properties")?
-            .as_atom()?;
-        (begin, end)
-    } else {
-        let begin = attributes_map
-            .get(":begin")
-            .ok_or("Missing :begin attribute.")?
-            .as_atom()?;
-        let end = attributes_map
-            .get(":end")
-            .ok_or("Missing :end attribute.")?
-            .as_atom()?;
-        (begin, end)
-    };
-    let begin = begin.parse::<u32>()?;
-    let end = end.parse::<u32>()?;
+    let standard_properties = get_standard_properties(emacs)?;
+    let (begin, end) = (
+        standard_properties
+            .begin
+            .ok_or("Token should have a begin.")?,
+        standard_properties.end.ok_or("Token should have an end.")?,
+    );
    let (start_line, end_line) = get_line_numbers(original_source, begin, end)?;
    Ok(SourceRange {
        start_line,
@@ -175,38 +159,19 @@ fn get_bounds<'s>(
    })
 }

-fn get_contents_begin<'s>(emacs: &'s Token<'s>) -> Result<u32, Box<dyn std::error::Error>> {
-    let children = emacs.as_list()?;
-    let attributes_child = children
-        .iter()
-        .nth(1)
-        .ok_or("Should have an attributes child.")?;
-    let attributes_map = attributes_child.as_map()?;
-    let standard_properties = attributes_map.get(":standard-properties");
-    let contents_begin = if standard_properties.is_some() {
-        let std_props = standard_properties
-            .expect("if statement proves its Some")
-            .as_vector()?;
-        let contents_begin = std_props
-            .get(2)
-            .ok_or("Missing third element in standard properties")?
-            .as_atom()?;
-        contents_begin
-    } else {
-        let contents_begin = attributes_map
-            .get(":contents-begin")
-            .ok_or("Missing :contents-begin attribute.")?
-            .as_atom()?;
-        contents_begin
-    };
-    Ok(contents_begin.parse::<u32>()?)
+fn get_contents_begin<'s>(emacs: &'s Token<'s>) -> Result<usize, Box<dyn std::error::Error>> {
+    let standard_properties = get_standard_properties(emacs)?;
+    Ok(standard_properties
+        .contents_begin
+        .ok_or("Token should have a contents-begin.")?)
 }

 fn get_line_numbers<'s>(
    original_source: &'s str,
-    begin: u32,
-    end: u32,
-) -> Result<(u32, u32), Box<dyn std::error::Error>> {
+    begin: usize,
+    end: usize,
+) -> Result<(usize, usize), Box<dyn std::error::Error>> {
+    // This is used for highlighting which lines contain text relevant to the token, so even if a token does not extend all the way to the end of the line, the end_line figure will be the following line number (since the range is exclusive, not inclusive).
    let start_line = original_source
        .chars()
        .into_iter()
@@ -214,12 +179,96 @@ fn get_line_numbers<'s>(
        .filter(|x| *x == '\n')
        .count()
        + 1;
-    let end_line = original_source
-        .chars()
-        .into_iter()
-        .take(usize::try_from(end)? - 1)
-        .filter(|x| *x == '\n')
-        .count()
-        + 1;
-    Ok((u32::try_from(start_line)?, u32::try_from(end_line)?))
+    let end_line = {
+        let content_up_to_and_including_token = original_source
+            .chars()
+            .into_iter()
+            .take(usize::try_from(end)? - 1);
+        // Remove the trailing newline (if there is one) because we're going to add an extra line regardless of whether or not this ends with a new line.
+        let without_trailing_newline = RTrimIterator::new(content_up_to_and_including_token, '\n');
+        without_trailing_newline.filter(|x| *x == '\n').count() + 2
+    };
+
+    Ok((usize::try_from(start_line)?, usize::try_from(end_line)?))
+}
+
+struct StandardProperties {
+    begin: Option<usize>,
+    #[allow(dead_code)]
+    post_affiliated: Option<usize>,
+    #[allow(dead_code)]
+    contents_begin: Option<usize>,
+    #[allow(dead_code)]
+    contents_end: Option<usize>,
+    end: Option<usize>,
+    #[allow(dead_code)]
+    post_blank: Option<usize>,
+}
+
+fn get_standard_properties<'s>(
+    emacs: &'s Token<'s>,
+) -> Result<StandardProperties, Box<dyn std::error::Error>> {
+    let children = emacs.as_list()?;
+    let attributes_child = children
+        .iter()
+        .nth(1)
+        .ok_or("Should have an attributes child.")?;
+    let attributes_map = attributes_child.as_map()?;
+    let standard_properties = attributes_map.get(":standard-properties");
+    Ok(if standard_properties.is_some() {
+        let mut std_props = standard_properties
+            .expect("if statement proves its Some")
+            .as_vector()?
+            .into_iter();
+        let begin = maybe_token_to_usize(std_props.next())?;
+        let post_affiliated = maybe_token_to_usize(std_props.next())?;
+        let contents_begin = maybe_token_to_usize(std_props.next())?;
+        let contents_end = maybe_token_to_usize(std_props.next())?;
+        let end = maybe_token_to_usize(std_props.next())?;
+        let post_blank = maybe_token_to_usize(std_props.next())?;
+        StandardProperties {
+            begin,
+            post_affiliated,
+            contents_begin,
+            contents_end,
+            end,
+            post_blank,
+        }
+    } else {
+        let begin = maybe_token_to_usize(attributes_map.get(":begin").map(|token| *token))?;
+        let end = maybe_token_to_usize(attributes_map.get(":end").map(|token| *token))?;
+        let contents_begin =
+            maybe_token_to_usize(attributes_map.get(":contents-begin").map(|token| *token))?;
+        let contents_end =
+            maybe_token_to_usize(attributes_map.get(":contents-end").map(|token| *token))?;
+        let post_blank =
+            maybe_token_to_usize(attributes_map.get(":post-blank").map(|token| *token))?;
+        let post_affiliated =
+            maybe_token_to_usize(attributes_map.get(":post-affiliated").map(|token| *token))?;
+        StandardProperties {
+            begin,
+            post_affiliated,
+            contents_begin,
+            contents_end,
+            end,
+            post_blank,
+        }
+    })
+}
+
+fn maybe_token_to_usize(
+    token: Option<&Token<'_>>,
+) -> Result<Option<usize>, Box<dyn std::error::Error>> {
+    Ok(token
+        .map(|token| token.as_atom())
+        .map_or(Ok(None), |r| r.map(Some))?
+        .map(|val| {
+            if val == "nil" {
+                None
+            } else {
+                Some(val.parse::<usize>())
+            }
+        })
+        .flatten() // Outer option is whether or not the param exists, inner option is whether or not it is nil
+        .map_or(Ok(None), |r| r.map(Some))?)
 }
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -51,3 +51,40 @@ where
    }
    output
 }
+
+pub async fn get_emacs_version() -> Result<String, Box<dyn std::error::Error>> {
+    let elisp_script = r#"(progn
+     (message "%s" (version))
+)"#;
+    let mut cmd = Command::new("emacs");
+    let proc = cmd
+        .arg("-q")
+        .arg("--no-site-file")
+        .arg("--no-splash")
+        .arg("--batch")
+        .arg("--eval")
+        .arg(elisp_script);
+
+    let out = proc.output().await?;
+    out.status.exit_ok()?;
+    Ok(String::from_utf8(out.stderr)?)
+}
+
+pub async fn get_org_mode_version() -> Result<String, Box<dyn std::error::Error>> {
+    let elisp_script = r#"(progn
+     (org-mode)
+     (message "%s" (org-version nil t nil))
+)"#;
+    let mut cmd = Command::new("emacs");
+    let proc = cmd
+        .arg("-q")
+        .arg("--no-site-file")
+        .arg("--no-splash")
+        .arg("--batch")
+        .arg("--eval")
+        .arg(elisp_script);
+
+    let out = proc.output().await?;
+    out.status.exit_ok()?;
+    Ok(String::from_utf8(out.stderr)?)
+}
--- a/src/rtrim_iterator.rs
+++ b/src/rtrim_iterator.rs
@@ -0,0 +1,86 @@
+/// Removes 1 character from the end of an iterator if it matches needle
+pub struct RTrimIterator<I> {
+    iter: I,
+    needle: char,
+    buffer: Option<char>,
+}
+
+impl<I> Iterator for RTrimIterator<I>
+where
+    I: Iterator<Item = char>,
+{
+    type Item = char;
+
+    fn next(&mut self) -> Option<I::Item> {
+        loop {
+            match (self.buffer, self.iter.next()) {
+                (None, None) => {
+                    // We reached the end of the list and have an empty buffer, meaning the string did not end with the needle character.
+                    return None;
+                }
+                (None, Some(chr)) if chr == self.needle => {
+                    // We came across an instance of needle, buffer it and loop again because we do not know if this is the end of the string.
+                    self.buffer = Some(chr);
+                }
+                (None, Some(chr)) => {
+                    // We have an empty buffer and the next character is not the needle character, return it immediately.
+                    return Some(chr);
+                }
+                (Some(buf), None) if buf == self.needle => {
+                    // We reached the end of the list and have the specified needle in the buffer where it will stay forever.
+                    return None;
+                }
+                (Some(_), None) => {
+                    // We reached the end of the list and the buffered character is not the needle character, so write it out.
+                    return self.buffer.take();
+                }
+                (Some(_), Some(chr)) => {
+                    // We have a buffered character, but it is not the end of the string, so regardless of its contents we can write it out.
+                    return self.buffer.replace(chr);
+                }
+            };
+        }
+    }
+}
+
+impl<I> RTrimIterator<I> {
+    pub fn new(iter: I, needle: char) -> RTrimIterator<I> {
+        RTrimIterator {
+            iter,
+            needle,
+            buffer: None,
+        }
+    }
+}
+
+mod tests {
+    use super::*;
+
+    #[test]
+    fn no_match() {
+        let input = "abcd";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, input);
+    }
+
+    #[test]
+    fn middle_match() {
+        let input = "ab\ncd";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, input);
+    }
+
+    #[test]
+    fn end_match() {
+        let input = "abcd\n";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, "abcd");
+    }
+
+    #[test]
+    fn double_match() {
+        let input = "abcd\n\n";
+        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
+        assert_eq!(output, "abcd\n");
+    }
+}
--- a/static/script.js
+++ b/static/script.js
@@ -21,7 +21,6 @@ function clearOutput() {

 function renderParseResponse(response) {
    clearOutput();
-    console.log(response);
    renderSourceBox(response);
    renderAstTree(response);
 }
@@ -59,7 +58,7 @@ function renderAstNode(originalSource, depth, astNode) {
    const nodeElem = document.createElement("div");
    nodeElem.classList.add("ast_node");

-    let sourceForNode = originalSource.slice(astNode.position.start_character - 1, astNode.position.end_character - 1);
+    let sourceForNode = unicodeAwareSlice(originalSource, astNode.position.start_character - 1, astNode.position.end_character - 1);
    // Since sourceForList is a string, JSON.stringify will escape with backslashes and wrap the text in quotation marks, ensuring that the string ends up on a single line. Coincidentally, this is the behavior we want.
    let escapedSource = JSON.stringify(sourceForNode);

@@ -137,14 +136,14 @@ function highlightLine(htmlName, lineOffset) {
 }

 function highlightCharacters(htmlName, originalSource, startCharacter, endCharacter) {
-    let sourceBefore = originalSource.slice(0, startCharacter - 1);
-    let precedingLineBreak = sourceBefore.lastIndexOf("\n");
+    let sourceBefore = unicodeAwareSlice(originalSource, 0, startCharacter - 1);
+    let precedingLineBreak = unicodeAwareLastIndexOfCharacter(sourceBefore, "\n");
    let characterIndexOnLine = precedingLineBreak !== -1 ? startCharacter - precedingLineBreak - 1 : startCharacter;
    let lineNumber = (sourceBefore.match(/\r?\n/g) || '').length + 1;

    for (let characterIndex = startCharacter; characterIndex < endCharacter; ++characterIndex) {
        document.querySelector(`#${htmlName} > code:nth-child(${lineNumber}) > span:nth-child(${characterIndexOnLine})`)?.classList.add("highlighted");
-        if (originalSource[characterIndex - 1] == "\n") {
+        if (unicodeAwareCharAtOffset(originalSource, characterIndex - 1) == "\n") {
            ++lineNumber;
            characterIndexOnLine = 1;
        } else {
@@ -153,3 +152,43 @@ function highlightCharacters(htmlName, originalSource, startCharacter, endCharac
    }

 }
+
+function unicodeAwareSlice(text, start, end) {
+    // Boooo javascript
+    let i = 0;
+    let output = "";
+    for (chr of text) {
+        if (i >= end) {
+            break;
+        }
+        if (i >= start) {
+            output += chr;
+        }
+        ++i;
+    }
+    return output;
+}
+
+function unicodeAwareLastIndexOfCharacter(haystack, needle) {
+    // Boooo javascript
+    let i = 0;
+    let found = -1;
+    for (chr of haystack) {
+        if (chr == needle) {
+            found = i;
+        }
+        ++i;
+    }
+    return found;
+}
+
+function unicodeAwareCharAtOffset(text, offset) {
+    // Boooo javascript
+    let i = offset;
+    for (chr of text) {
+        if (i == 0) {
+            return chr;
+        }
+        --i;
+    }
+}
Author	SHA1	Message	Date
Tom Alexander	3760358783	Compile the elisp ahead of time so it is not done on every docker container launch.	2023-09-20 02:30:58 -04:00
Tom Alexander	024b2ade03	Update org-mode version.	2023-09-16 14:46:02 -04:00
Tom Alexander	55e5c31368	Update org-mode version.	2023-09-08 13:11:28 -04:00
Tom Alexander	4a556bc84f	Use read-only root for docker containers.	2023-08-31 21:21:14 -04:00
Tom Alexander	9bf2a912d6	Enable unicode in the docker container.	2023-08-31 20:23:00 -04:00
Tom Alexander	e8f262727d	Add a script to build and launch the docker container in one step.	2023-08-31 15:18:25 -04:00
Tom Alexander	b4170dda1f	Update org-mode version.	2023-08-25 04:58:45 -04:00
Tom Alexander	bd99fbc4c4	Get the versions of emacs and org-mode and write them to stdout.	2023-08-25 02:30:52 -04:00
Tom Alexander	79c834a1e6	Add the init flag to the docker run command.	2023-08-25 02:10:03 -04:00
Tom Alexander	2505a10275	Parameterize the emacs and org-mode versions in the dockerfile.	2023-08-25 02:02:57 -04:00
Tom Alexander	cfc9153c28	Handle nodes that do not have a contents begin like fixed width areas.	2023-08-20 16:25:57 -04:00
Tom Alexander	13a73efdcf	Handle line numbers properly when selected node does not end in a line break.	2023-08-20 15:55:54 -04:00
Tom Alexander	cba1d1e988	Add notes from plain list ownership investigation.	2023-08-19 01:10:42 -04:00
Tom Alexander	e8a89dfeca	Remove log statement.	2023-08-18 23:33:48 -04:00
Tom Alexander	367dfaa146	Update org-mode version.	2023-08-18 23:32:44 -04:00
Tom Alexander	c4762510f4	Handle unicode. Turns out javascript iterates over strings by character, but all the string functions like slicing, lastIndexOf, and indexing with [] are all based on codepoints without taking into account surrogate pairs like orange heart. It would have been nice if that was mentioned in the documentation...	2023-08-18 23:32:21 -04:00
Tom Alexander	372542d914	Add a print to announce the server is running.	2023-08-18 22:32:01 -04:00
Tom Alexander	0d6621d389	Add docker.	2023-08-18 22:26:42 -04:00
Tom Alexander	e96c39e3e0	Add a README.	2023-08-18 21:35:39 -04:00