Compile the elisp ahead of time so it is not done on every docker container launch.

Update org-mode version.
2023-09-20 02:30:58 -04:00 · 2023-09-16 14:46:02 -04:00 · 2023-09-08 13:11:28 -04:00 · 2023-08-31 21:21:14 -04:00 · 2023-08-31 20:23:00 -04:00 · 2023-08-31 15:18:25 -04:00
13 changed files with 562 additions and 101 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,7 @@
 **/.git
 target/
 docker/
 LICENSE
 readme/
 README.md
 notes/
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -10,3 +10,8 @@ serde = { version = "1.0.183", features = ["derive"] }
 tokio = { version = "1.30.0", default-features = false, features = ["macros", "process", "rt", "rt-multi-thread"] }
 tower = "0.4.13"
 tower-http = { version = "0.4.3", features = ["fs", "set-header"] }
 [profile.release-lto]
 inherits = "release"
 lto = true
 strip = "symbols"
--- a/README.md
+++ b/README.md
@@ -0,0 +1,33 @@
 # Org-Mode AST Investigation Tool
 This repository contains a slapdash tool to make visualizing the abstract syntax tree of an org-mode document easier. Write your org-mode source into the top text box, and below on the right it will create a clickable tree of the AST. When you click on a node, the contents of that node will be highlighted on the left.
 ![Screenshot showing the interface to the org-mode abstract syntax tree investigation tool.](readme/screenshot.png?raw=true "Org-mode investigation tool interface")
 ## Running
 Running in docker is the recommended way to run this. It creates a consistent working environment, without impacting (or requiring you to install) emacs, org-mode, or rust.
 ### Docker
 First we need to build the docker container. On the first run, this will pull the emacs and org-mode source code so this build will take a while the first time. After that, subsequent builds should be fast because docker caches the layers.
 ```bash
 # from the root of this repository:
 make --directory=docker
 ```
 Next we need to launch the server:
 ```bash
 docker run --init --rm --publish 3000:3000/tcp --read-only --mount type=tmpfs,destination=/tmp org-investigation
 ```
 This launches a server listening on port 3000, so pop open your browser to http://127.0.0.1:3000/ to access the web interface.
 (alternatively, you can run the `scripts/launch_docker.bash` script which performs these two steps.)
 ### No docker
 You will need a fully functional rust setup with nightly installed (due to the use of exit_status_error). Then from the root of this repo you can launch the server by running:
 ```bash
 cargo run --release
 ```
 It will use your installed version of emacs and org-mode which may differ from what the docker users are using.
 This launches a server listening on port 3000, so pop open your browser to http://127.0.0.1:3000/ to access the web interface.
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -0,0 +1,44 @@
 FROM alpine:3.17 AS build
 RUN apk add --no-cache build-base musl-dev git autoconf make texinfo gnutls-dev ncurses-dev gawk libgccjit-dev
 FROM build AS build-emacs
 ARG EMACS_VERSION=emacs-29.1
 RUN git clone --depth 1 --branch $EMACS_VERSION https://git.savannah.gnu.org/git/emacs.git /root/emacs
 WORKDIR /root/emacs
 RUN mkdir /root/dist
 RUN ./autogen.sh
 RUN ./configure --prefix /usr --without-x --without-sound --with-native-compilation=aot
 RUN make
 RUN make DESTDIR="/root/dist" install
 FROM build AS build-org-mode
 ARG ORG_VERSION=c703541ffcc14965e3567f928de1683a1c1e33f6
 COPY --from=build-emacs /root/dist/ /
 RUN mkdir /root/dist
 # Savannah does not allow fetching specific revisions, so we're going to have to put unnecessary load on their server by cloning main and then checking out the revision we want.
 RUN git clone https://git.savannah.gnu.org/git/emacs/org-mode.git /root/org-mode && git -C /root/org-mode checkout $ORG_VERSION
 # RUN mkdir /root/org-mode && git -C /root/org-mode init --initial-branch=main && git -C /root/org-mode remote add origin https://git.savannah.gnu.org/git/emacs/org-mode.git && git -C /root/org-mode fetch origin $ORG_VERSION && git -C /root/org-mode checkout FETCH_HEAD
 WORKDIR /root/org-mode
 RUN make compile
 RUN make DESTDIR="/root/dist" install
 FROM rustlang/rust:nightly-alpine3.17 AS build-org-investigation
 RUN apk add --no-cache musl-dev
 RUN mkdir /root/org-investigation
 WORKDIR /root/org-investigation
 COPY . .
 RUN CARGO_TARGET_DIR=/target cargo build --profile release-lto
 FROM alpine:3.17 AS run
 ENV LANG=en_US.UTF-8
 RUN apk add --no-cache ncurses gnutls libgccjit
 COPY --from=build-emacs /root/dist/ /
 COPY --from=build-org-mode /root/dist/ /
 COPY --from=build-org-investigation /target/release-lto/org_ownership_investigation /usr/bin/
 COPY static /opt/org-investigation/static
 WORKDIR /opt/org-investigation
 CMD ["/usr/bin/org_ownership_investigation"]
--- a/docker/Makefile
+++ b/docker/Makefile
@@ -0,0 +1,9 @@
 IMAGE_NAME:=org-investigation
 .PHONY: build
 build:
 	docker build -t $(IMAGE_NAME) -f Dockerfile ../
 .PHONY: clean
 clean:
 	docker rmi $(IMAGE_NAME)
--- a/notes/plain_list_ownership_notes.org
+++ b/notes/plain_list_ownership_notes.org
@@ -0,0 +1,130 @@
 * Test 1
 ** Source
 #+begin_src org
  1. foo
     1. bar
     2. baz
  2. lorem
  ipsum
 #+end_src
 ** Ownership
 This table is just showing ownership for the plain list items, not the containing plain list nor the elements inside each item.
 | Plain List *Item*      | Owns trailing blank lines |
 |------------------------+---------------------------|
 | foo (includes bar baz) | Yes                       |
 | bar                    | Yes                       |
 | baz                    | Yes                       |
 | lorem                  | No                        |
 ** Analysis
 In this test case, we see that the only list item that doesn't own its trailing blank lines is "lorem", the final list item of the outer-most list.
 * Test 2
 We add "cat" as a paragraph at the end of foo which makes "baz" lose its trailing blank lines.
 ** Source
 #+begin_src org
  1. foo
     1. bar
     2. baz
     cat
  2. lorem
  ipsum
 #+end_src
 ** Ownership
 | Plain List *Item*             | Owns trailing blank lines |
 |-------------------------------+---------------------------|
 | foo -> cat (includes bar baz) | Yes                       |
 | bar                           | Yes                       |
 | baz                           | No                        |
 | lorem                         | No                        |
 ** Analysis
 In isolation, this implies that the final plain list item does not own its trailing blank lines, which conflicts with "baz" from test 1.
 New theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a list item.
 | Plain List *Item*             | Owns trailing blank lines | Why                                                       |
 |-------------------------------+---------------------------+-----------------------------------------------------------|
 | foo -> cat (includes bar baz) | Yes                       | Not the final list item                                   |
 | bar                           | Yes                       | Not the final list item                                   |
 | baz                           | No                        | Final item of bar->baz and not the final element of "foo" |
 | lorem                         | No                        | Final item of foo->lorem and not contained in a list item |
 * Test 3
 So if that theory is true, taking the entire (foo -> lorem) list from test 1 and nesting it inside a list should coerce "lorem" to own its trailing blank lines since it would then be a final list item (of foo -> lorem) and the final element of the new list.
 ** Source
 #+begin_src org
  1. cat
     1. foo
        1. bar
        2. baz
     2. lorem
  ipsum
 #+end_src
 ** Ownership
 | Plain List *Item*           | Owns trailing blank lines |
 |-----------------------------+---------------------------|
 | cat (includes foo -> lorem) | No                        |
 | foo (includes bar baz)      | Yes                       |
 | bar                         | Yes                       |
 | baz                         | Yes                       |
 | lorem                       | No                        |
 ** Analysis
 Against expectations, we did not coerce lorem to consume its trailing blank lines. What is different between "baz" and "lorem"? Well, "baz" is contained within "foo" which has a "lorem" after it, whereas "lorem" is contained within "cat" which does not have any list items after it.
 New theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a non-final list item.
 | Plain List *Item*           | Owns trailing blank lines | Why                                                  |
 |-----------------------------+---------------------------+------------------------------------------------------|
 | cat (includes foo -> lorem) | No                        | Final list item and not contained in a list item     |
 | foo (includes bar baz)      | Yes                       | Not the final list item                              |
 | bar                         | Yes                       | Not the final list item                              |
 | baz                         | Yes                       | Final element of non-final list item                 |
 | lorem                       | No                        | Final list item and final element of final list item |
 * Test 4
 So if that theory is true, then we should be able to coerce lorem to consume its trailing blank lines by adding a second item to the cat list.
 ** Source
 #+begin_src org
  1. cat
     1. foo
        1. bar
        2. baz
     2. lorem
  2. dog
  ipsum
 #+end_src
 ** Ownership
 | Plain List *Item*           | Owns trailing blank lines |
 |-----------------------------+---------------------------|
 | cat (includes foo -> lorem) | Yes                       |
 | foo (includes bar baz)      | Yes                       |
 | bar                         | Yes                       |
 | baz                         | Yes                       |
 | lorem                       | Yes                       |
 | dog                         | No                        |
 ** Analysis
 For the first time our expectations were met!
 Enduring theory: List items own their trailing blank lines unless they are both the final list item and not the final element of a non-final list item.
 | Plain List *Item*           | Owns trailing blank lines | Why                                              |
 |-----------------------------+---------------------------+--------------------------------------------------|
 | cat (includes foo -> lorem) | Yes                       | Not the final list item                          |
 | foo (includes bar baz)      | Yes                       | Not the final list item                          |
 | bar                         | Yes                       | Not the final list item                          |
 | baz                         | Yes                       | Final element of non-final list item             |
 | lorem                       | Yes                       | Final element of non-final list item             |
 | dog                         | No                        | Final list item and not contained in a list item |
--- a/readme/screenshot.png
+++ b/readme/screenshot.png
--- a/scripts/launch_docker.bash
+++ b/scripts/launch_docker.bash
@@ -0,0 +1,12 @@
 #!/usr/bin/env bash
 #
 set -euo pipefail
 IFS=$'\n\t'
 DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 function main {
    make --directory "$DIR/../docker"
    exec docker run --init --rm --read-only --mount type=tmpfs,destination=/tmp --publish 3000:3000/tcp org-investigation
 }
 main "${@}"
--- a/src/main.rs
+++ b/src/main.rs
@@ -4,18 +4,21 @@ use axum::http::HeaderValue;
 use axum::response::IntoResponse;
 use axum::{http::StatusCode, routing::post, Json, Router};
 use owner_tree::build_owner_tree;
-use parse::emacs_parse_org_document;
+use parse::{emacs_parse_org_document, get_emacs_version};
 use tower::ServiceBuilder;
 use tower_http::services::{ServeDir, ServeFile};
 use tower_http::set_header::SetResponseHeaderLayer;
 use crate::parse::get_org_mode_version;
 mod error;
 mod owner_tree;
 mod parse;
 mod rtrim_iterator;
 mod sexp;
 #[tokio::main]
-async fn main() {
+async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let static_files_service = {
        let serve_dir =
            ServeDir::new("static").not_found_service(ServeFile::new("static/index.html"));
@@ -31,8 +34,15 @@ async fn main() {
        .route("/parse", post(parse_org_mode))
        .fallback_service(static_files_service);
-    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
+    let (emacs_version, org_mode_version) =
-    axum::serve(listener, app).await.unwrap();
+        tokio::join!(get_emacs_version(), get_org_mode_version());
    println!("Using emacs version: {}", emacs_version?.trim());
    println!("Using org-mode version: {}", org_mode_version?.trim());
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
    println!("Listening on port 3000. Pop open your browser to http://127.0.0.1:3000/ .");
    axum::serve(listener, app).await?;
    Ok(())
 }
 async fn parse_org_mode(body: String) -> Result<impl IntoResponse, (StatusCode, String)> {
--- a/src/owner_tree.rs
+++ b/src/owner_tree.rs
@@ -1,6 +1,9 @@
 use serde::Serialize;
-use crate::sexp::{sexp_with_padding, Token};
+use crate::{
    rtrim_iterator::RTrimIterator,
    sexp::{sexp_with_padding, Token},
 };
 pub fn build_owner_tree<'a>(
    body: &'a str,
@@ -45,15 +48,15 @@ pub struct PlainListItem {
 #[derive(Serialize)]
 pub struct SourceRange {
-    start_line: u32,
+    start_line: usize,
-    end_line: u32, // Exclusive
+    end_line: usize, // Exclusive
-    start_character: u32,
+    start_character: usize,
-    end_character: u32, // Exclusive
+    end_character: usize, // Exclusive
 }
 fn build_ast_node<'a>(
    original_source: &str,
-    parent_contents_begin: Option<u32>,
+    parent_contents_begin: Option<usize>,
    current_token: &Token<'a>,
 ) -> Result<AstNode, Box<dyn std::error::Error>> {
    let maybe_plain_text = current_token.as_text();
@@ -61,19 +64,13 @@ fn build_ast_node<'a>(
        Ok(plain_text) => {
            let parent_contents_begin = parent_contents_begin
                .ok_or("parent_contents_begin should be set for all plain text nodes.")?;
-            let parameters = &plain_text.properties;
+            let mut parameters = plain_text.properties.iter();
            let begin = parent_contents_begin
-                + parameters
+                + maybe_token_to_usize(parameters.next())?
-                    .get(0)
+                    .ok_or("Missing first element past the text.")?;
                    .ok_or("Missing first element past the text.")?
                    .as_atom()?
                    .parse::<u32>()?;
            let end = parent_contents_begin
-                + parameters
+                + maybe_token_to_usize(parameters.next())?
-                    .get(1)
+                    .ok_or("Missing second element past the text.")?;
                    .ok_or("Missing second element past the text.")?
                    .as_atom()?
                    .parse::<u32>()?;
            let (start_line, end_line) = get_line_numbers(original_source, begin, end)?;
            AstNode {
                name: "plain-text".to_owned(),
@@ -95,12 +92,25 @@ fn build_ast_node<'a>(
                .as_atom()?;
            let position = get_bounds(original_source, current_token)?;
            let mut children = Vec::new();
-            let mut contents_begin = get_contents_begin(current_token)?;
+            let original_contents_begin = get_contents_begin(current_token);
-            for child in parameters.into_iter().skip(2) {
+            match original_contents_begin {
-                let new_ast_node = build_ast_node(original_source, Some(contents_begin), child)?;
+                Ok(original_contents_begin) => {
-                contents_begin = new_ast_node.position.end_character;
+                    let mut contents_begin = original_contents_begin;
-                children.push(new_ast_node);
+                    for child in parameters.into_iter().skip(2) {
-            }
+                        let new_ast_node =
                            build_ast_node(original_source, Some(contents_begin), child)?;
                        contents_begin = new_ast_node.position.end_character;
                        children.push(new_ast_node);
                    }
                }
                Err(_) => {
                    // Some nodes don't have a contents begin, so hopefully plain text can't be inside them.
                    for child in parameters.into_iter().skip(2) {
                        let new_ast_node = build_ast_node(original_source, None, child)?;
                        children.push(new_ast_node);
                    }
                }
            };
            AstNode {
                name: name.to_owned(),
@@ -133,39 +143,13 @@ fn get_bounds<'s>(
    original_source: &'s str,
    emacs: &'s Token<'s>,
 ) -> Result<SourceRange, Box<dyn std::error::Error>> {
-    let children = emacs.as_list()?;
+    let standard_properties = get_standard_properties(emacs)?;
-    let attributes_child = children
+    let (begin, end) = (
-        .iter()
+        standard_properties
-        .nth(1)
+            .begin
-        .ok_or("Should have an attributes child.")?;
+            .ok_or("Token should have a begin.")?,
-    let attributes_map = attributes_child.as_map()?;
+        standard_properties.end.ok_or("Token should have an end.")?,
-    let standard_properties = attributes_map.get(":standard-properties");
+    );
    let (begin, end) = if standard_properties.is_some() {
        let std_props = standard_properties
            .expect("if statement proves its Some")
            .as_vector()?;
        let begin = std_props
            .get(0)
            .ok_or("Missing first element in standard properties")?
            .as_atom()?;
        let end = std_props
            .get(1)
            .ok_or("Missing first element in standard properties")?
            .as_atom()?;
        (begin, end)
    } else {
        let begin = attributes_map
            .get(":begin")
            .ok_or("Missing :begin attribute.")?
            .as_atom()?;
        let end = attributes_map
            .get(":end")
            .ok_or("Missing :end attribute.")?
            .as_atom()?;
        (begin, end)
    };
    let begin = begin.parse::<u32>()?;
    let end = end.parse::<u32>()?;
    let (start_line, end_line) = get_line_numbers(original_source, begin, end)?;
    Ok(SourceRange {
        start_line,
@@ -175,38 +159,19 @@ fn get_bounds<'s>(
    })
 }
-fn get_contents_begin<'s>(emacs: &'s Token<'s>) -> Result<u32, Box<dyn std::error::Error>> {
+fn get_contents_begin<'s>(emacs: &'s Token<'s>) -> Result<usize, Box<dyn std::error::Error>> {
-    let children = emacs.as_list()?;
+    let standard_properties = get_standard_properties(emacs)?;
-    let attributes_child = children
+    Ok(standard_properties
-        .iter()
+        .contents_begin
-        .nth(1)
+        .ok_or("Token should have a contents-begin.")?)
        .ok_or("Should have an attributes child.")?;
    let attributes_map = attributes_child.as_map()?;
    let standard_properties = attributes_map.get(":standard-properties");
    let contents_begin = if standard_properties.is_some() {
        let std_props = standard_properties
            .expect("if statement proves its Some")
            .as_vector()?;
        let contents_begin = std_props
            .get(2)
            .ok_or("Missing third element in standard properties")?
            .as_atom()?;
        contents_begin
    } else {
        let contents_begin = attributes_map
            .get(":contents-begin")
            .ok_or("Missing :contents-begin attribute.")?
            .as_atom()?;
        contents_begin
    };
    Ok(contents_begin.parse::<u32>()?)
 }
 fn get_line_numbers<'s>(
    original_source: &'s str,
-    begin: u32,
+    begin: usize,
-    end: u32,
+    end: usize,
-) -> Result<(u32, u32), Box<dyn std::error::Error>> {
+) -> Result<(usize, usize), Box<dyn std::error::Error>> {
    // This is used for highlighting which lines contain text relevant to the token, so even if a token does not extend all the way to the end of the line, the end_line figure will be the following line number (since the range is exclusive, not inclusive).
    let start_line = original_source
        .chars()
        .into_iter()
@@ -214,12 +179,96 @@ fn get_line_numbers<'s>(
        .filter(|x| *x == '\n')
        .count()
        + 1;
-    let end_line = original_source
+    let end_line = {
-        .chars()
+        let content_up_to_and_including_token = original_source
-        .into_iter()
+            .chars()
-        .take(usize::try_from(end)? - 1)
+            .into_iter()
-        .filter(|x| *x == '\n')
+            .take(usize::try_from(end)? - 1);
-        .count()
+        // Remove the trailing newline (if there is one) because we're going to add an extra line regardless of whether or not this ends with a new line.
-        + 1;
+        let without_trailing_newline = RTrimIterator::new(content_up_to_and_including_token, '\n');
-    Ok((u32::try_from(start_line)?, u32::try_from(end_line)?))
+        without_trailing_newline.filter(|x| *x == '\n').count() + 2
    };
    Ok((usize::try_from(start_line)?, usize::try_from(end_line)?))
 }
 struct StandardProperties {
    begin: Option<usize>,
    #[allow(dead_code)]
    post_affiliated: Option<usize>,
    #[allow(dead_code)]
    contents_begin: Option<usize>,
    #[allow(dead_code)]
    contents_end: Option<usize>,
    end: Option<usize>,
    #[allow(dead_code)]
    post_blank: Option<usize>,
 }
 fn get_standard_properties<'s>(
    emacs: &'s Token<'s>,
 ) -> Result<StandardProperties, Box<dyn std::error::Error>> {
    let children = emacs.as_list()?;
    let attributes_child = children
        .iter()
        .nth(1)
        .ok_or("Should have an attributes child.")?;
    let attributes_map = attributes_child.as_map()?;
    let standard_properties = attributes_map.get(":standard-properties");
    Ok(if standard_properties.is_some() {
        let mut std_props = standard_properties
            .expect("if statement proves its Some")
            .as_vector()?
            .into_iter();
        let begin = maybe_token_to_usize(std_props.next())?;
        let post_affiliated = maybe_token_to_usize(std_props.next())?;
        let contents_begin = maybe_token_to_usize(std_props.next())?;
        let contents_end = maybe_token_to_usize(std_props.next())?;
        let end = maybe_token_to_usize(std_props.next())?;
        let post_blank = maybe_token_to_usize(std_props.next())?;
        StandardProperties {
            begin,
            post_affiliated,
            contents_begin,
            contents_end,
            end,
            post_blank,
        }
    } else {
        let begin = maybe_token_to_usize(attributes_map.get(":begin").map(|token| *token))?;
        let end = maybe_token_to_usize(attributes_map.get(":end").map(|token| *token))?;
        let contents_begin =
            maybe_token_to_usize(attributes_map.get(":contents-begin").map(|token| *token))?;
        let contents_end =
            maybe_token_to_usize(attributes_map.get(":contents-end").map(|token| *token))?;
        let post_blank =
            maybe_token_to_usize(attributes_map.get(":post-blank").map(|token| *token))?;
        let post_affiliated =
            maybe_token_to_usize(attributes_map.get(":post-affiliated").map(|token| *token))?;
        StandardProperties {
            begin,
            post_affiliated,
            contents_begin,
            contents_end,
            end,
            post_blank,
        }
    })
 }
 fn maybe_token_to_usize(
    token: Option<&Token<'_>>,
 ) -> Result<Option<usize>, Box<dyn std::error::Error>> {
    Ok(token
        .map(|token| token.as_atom())
        .map_or(Ok(None), |r| r.map(Some))?
        .map(|val| {
            if val == "nil" {
                None
            } else {
                Some(val.parse::<usize>())
            }
        })
        .flatten() // Outer option is whether or not the param exists, inner option is whether or not it is nil
        .map_or(Ok(None), |r| r.map(Some))?)
 }
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -51,3 +51,40 @@ where
    }
    output
 }
 pub async fn get_emacs_version() -> Result<String, Box<dyn std::error::Error>> {
    let elisp_script = r#"(progn
     (message "%s" (version))
 )"#;
    let mut cmd = Command::new("emacs");
    let proc = cmd
        .arg("-q")
        .arg("--no-site-file")
        .arg("--no-splash")
        .arg("--batch")
        .arg("--eval")
        .arg(elisp_script);
    let out = proc.output().await?;
    out.status.exit_ok()?;
    Ok(String::from_utf8(out.stderr)?)
 }
 pub async fn get_org_mode_version() -> Result<String, Box<dyn std::error::Error>> {
    let elisp_script = r#"(progn
     (org-mode)
     (message "%s" (org-version nil t nil))
 )"#;
    let mut cmd = Command::new("emacs");
    let proc = cmd
        .arg("-q")
        .arg("--no-site-file")
        .arg("--no-splash")
        .arg("--batch")
        .arg("--eval")
        .arg(elisp_script);
    let out = proc.output().await?;
    out.status.exit_ok()?;
    Ok(String::from_utf8(out.stderr)?)
 }
--- a/src/rtrim_iterator.rs
+++ b/src/rtrim_iterator.rs
@@ -0,0 +1,86 @@
 /// Removes 1 character from the end of an iterator if it matches needle
 pub struct RTrimIterator<I> {
    iter: I,
    needle: char,
    buffer: Option<char>,
 }
 impl<I> Iterator for RTrimIterator<I>
 where
    I: Iterator<Item = char>,
 {
    type Item = char;
    fn next(&mut self) -> Option<I::Item> {
        loop {
            match (self.buffer, self.iter.next()) {
                (None, None) => {
                    // We reached the end of the list and have an empty buffer, meaning the string did not end with the needle character.
                    return None;
                }
                (None, Some(chr)) if chr == self.needle => {
                    // We came across an instance of needle, buffer it and loop again because we do not know if this is the end of the string.
                    self.buffer = Some(chr);
                }
                (None, Some(chr)) => {
                    // We have an empty buffer and the next character is not the needle character, return it immediately.
                    return Some(chr);
                }
                (Some(buf), None) if buf == self.needle => {
                    // We reached the end of the list and have the specified needle in the buffer where it will stay forever.
                    return None;
                }
                (Some(_), None) => {
                    // We reached the end of the list and the buffered character is not the needle character, so write it out.
                    return self.buffer.take();
                }
                (Some(_), Some(chr)) => {
                    // We have a buffered character, but it is not the end of the string, so regardless of its contents we can write it out.
                    return self.buffer.replace(chr);
                }
            };
        }
    }
 }
 impl<I> RTrimIterator<I> {
    pub fn new(iter: I, needle: char) -> RTrimIterator<I> {
        RTrimIterator {
            iter,
            needle,
            buffer: None,
        }
    }
 }
 mod tests {
    use super::*;
    #[test]
    fn no_match() {
        let input = "abcd";
        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
        assert_eq!(output, input);
    }
    #[test]
    fn middle_match() {
        let input = "ab\ncd";
        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
        assert_eq!(output, input);
    }
    #[test]
    fn end_match() {
        let input = "abcd\n";
        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
        assert_eq!(output, "abcd");
    }
    #[test]
    fn double_match() {
        let input = "abcd\n\n";
        let output: String = RTrimIterator::new(input.chars(), '\n').collect();
        assert_eq!(output, "abcd\n");
    }
 }
--- a/static/script.js
+++ b/static/script.js
@@ -21,7 +21,6 @@ function clearOutput() {
 function renderParseResponse(response) {
    clearOutput();
    console.log(response);
    renderSourceBox(response);
    renderAstTree(response);
 }
@@ -59,7 +58,7 @@ function renderAstNode(originalSource, depth, astNode) {
    const nodeElem = document.createElement("div");
    nodeElem.classList.add("ast_node");
-    let sourceForNode = originalSource.slice(astNode.position.start_character - 1, astNode.position.end_character - 1);
+    let sourceForNode = unicodeAwareSlice(originalSource, astNode.position.start_character - 1, astNode.position.end_character - 1);
    // Since sourceForList is a string, JSON.stringify will escape with backslashes and wrap the text in quotation marks, ensuring that the string ends up on a single line. Coincidentally, this is the behavior we want.
    let escapedSource = JSON.stringify(sourceForNode);
@@ -137,14 +136,14 @@ function highlightLine(htmlName, lineOffset) {
 }
 function highlightCharacters(htmlName, originalSource, startCharacter, endCharacter) {
-    let sourceBefore = originalSource.slice(0, startCharacter - 1);
+    let sourceBefore = unicodeAwareSlice(originalSource, 0, startCharacter - 1);
-    let precedingLineBreak = sourceBefore.lastIndexOf("\n");
+    let precedingLineBreak = unicodeAwareLastIndexOfCharacter(sourceBefore, "\n");
    let characterIndexOnLine = precedingLineBreak !== -1 ? startCharacter - precedingLineBreak - 1 : startCharacter;
    let lineNumber = (sourceBefore.match(/\r?\n/g) || '').length + 1;
    for (let characterIndex = startCharacter; characterIndex < endCharacter; ++characterIndex) {
        document.querySelector(`#${htmlName} > code:nth-child(${lineNumber}) > span:nth-child(${characterIndexOnLine})`)?.classList.add("highlighted");
-        if (originalSource[characterIndex - 1] == "\n") {
+        if (unicodeAwareCharAtOffset(originalSource, characterIndex - 1) == "\n") {
            ++lineNumber;
            characterIndexOnLine = 1;
        } else {
@@ -153,3 +152,43 @@ function highlightCharacters(htmlName, originalSource, startCharacter, endCharac
    }
 }
 function unicodeAwareSlice(text, start, end) {
    // Boooo javascript
    let i = 0;
    let output = "";
    for (chr of text) {
        if (i >= end) {
            break;
        }
        if (i >= start) {
            output += chr;
        }
        ++i;
    }
    return output;
 }
 function unicodeAwareLastIndexOfCharacter(haystack, needle) {
    // Boooo javascript
    let i = 0;
    let found = -1;
    for (chr of haystack) {
        if (chr == needle) {
            found = i;
        }
        ++i;
    }
    return found;
 }
 function unicodeAwareCharAtOffset(text, offset) {
    // Boooo javascript
    let i = offset;
    for (chr of text) {
        if (i == 0) {
            return chr;
        }
        --i;
    }
 }
Author	SHA1	Message	Date
Tom Alexander	3760358783	Compile the elisp ahead of time so it is not done on every docker container launch.	2023-09-20 02:30:58 -04:00
Tom Alexander	024b2ade03	Update org-mode version.	2023-09-16 14:46:02 -04:00
Tom Alexander	55e5c31368	Update org-mode version.	2023-09-08 13:11:28 -04:00
Tom Alexander	4a556bc84f	Use read-only root for docker containers.	2023-08-31 21:21:14 -04:00
Tom Alexander	9bf2a912d6	Enable unicode in the docker container.	2023-08-31 20:23:00 -04:00
Tom Alexander	e8f262727d	Add a script to build and launch the docker container in one step.	2023-08-31 15:18:25 -04:00
Tom Alexander	b4170dda1f	Update org-mode version.	2023-08-25 04:58:45 -04:00
Tom Alexander	bd99fbc4c4	Get the versions of emacs and org-mode and write them to stdout.	2023-08-25 02:30:52 -04:00
Tom Alexander	79c834a1e6	Add the init flag to the docker run command.	2023-08-25 02:10:03 -04:00
Tom Alexander	2505a10275	Parameterize the emacs and org-mode versions in the dockerfile.	2023-08-25 02:02:57 -04:00
Tom Alexander	cfc9153c28	Handle nodes that do not have a contents begin like fixed width areas.	2023-08-20 16:25:57 -04:00
Tom Alexander	13a73efdcf	Handle line numbers properly when selected node does not end in a line break.	2023-08-20 15:55:54 -04:00
Tom Alexander	cba1d1e988	Add notes from plain list ownership investigation.	2023-08-19 01:10:42 -04:00
Tom Alexander	e8a89dfeca	Remove log statement.	2023-08-18 23:33:48 -04:00
Tom Alexander	367dfaa146	Update org-mode version.	2023-08-18 23:32:44 -04:00
Tom Alexander	c4762510f4	Handle unicode. Turns out javascript iterates over strings by character, but all the string functions like slicing, lastIndexOf, and indexing with [] are all based on codepoints without taking into account surrogate pairs like orange heart. It would have been nice if that was mentioned in the documentation...	2023-08-18 23:32:21 -04:00
Tom Alexander	372542d914	Add a print to announce the server is running.	2023-08-18 22:32:01 -04:00
Tom Alexander	0d6621d389	Add docker.	2023-08-18 22:26:42 -04:00
Tom Alexander	e96c39e3e0	Add a README.	2023-08-18 21:35:39 -04:00