some thoughts on referencing data from R

2024-11-26 07:33:39 +00:00 · 2009-02-12 11:07:25 -05:00 · 2009-02-12 11:07:25 -05:00 · a81fe7e15b
commit a81fe7e15b
parent 1dd3e1c330
2 changed files with 79 additions and 144 deletions
--- a/nogit-rorg-dan.org
+++ b/nogit-rorg-dan.org
@ -1,113 +0,0 @@
-#+TITLE: rorg --- R and org-mode
-
-* Objectives
-** Send data to R from org
-   Org-mode includes orgtbl-mode, an extremely convenient way of using
-   tabular data in a plain text file.  Currently, spreadsheet
-   functionality is available in org tables using the emacs package
-   calc.  It would be a boon both to org users and R users to allow
-   org tables to be manipulated with the R programming language.  Org
-   tables give R users an easy way to enter and display data; R gives
-   org users a powerful way to perform vector operations, statistical
-   tests, and visualization on their tables.
-
-*** Implementations
-**** naive
-     Naive implementation would be to use =(org-export-table "tmp.csv")=
-     and =(ess-execute "read.csv('tmp.csv')")=.  
-**** org-R
-     org-R passes data to R from two sources: org tables, or csv
-     files. Org tables are first exported to a temporary csv file
-     using [[file:existing_tools/org-R.el::defun%20org%20R%20export%20to%20csv%20csv%20file%20options][org-R-export-to-csv]].
-**** org-exp-blocks
-**** RweaveOrg
-     NA
-
-** Evaluate R code in org and deal with output appropriately
-*** vector output
-    When R code evaluation generates vectors and 2-dimensional arrays,
-    this should be formatted appropriately in org buffers
-    (orgtbl-mode) as well as in export targets (html, latex). Values
-    assigned to in the global environment should be available to
-    blocks of R code elsewhere in the org buffer.
-**** Implementations
-***** org-R
-     org-R converts R output (vectors, or matrices / 2d-arrays) to an
-     org table and stores it in the org buffer, or in a separate org
-     file (csv output would also be perfectly possible).
-***** org-exp-blocks
-***** RweaveOrg
-*** graphical output
-    R can generate graphical output on a screen graphics device
-    (e.g. X11, quartz), and in various standard image file formats
-    (png, jpg, ps, pdf, etc). When graphical output is generated by
-    evaluation of R code in Org, at least the following two things are desirable:
-    1. output to screen for immediate viewing is possible
-    2. graphical output to file is linked to appropriately from the
-     org file This should have the automatic consequence that it is
-     included appropriately in subsequent export targets (html,
-     latex).
-**** Implementations
-***** org-R
-      org-R does (1) if no output file is specified and (2) otherwise
-***** org-exp-blocks
-***** RweaveOrg
-
-
-* Notes
-** Special editing and evaluation of source code in R blocks
-   Unfortunately org-mode how two different block types, both useful.
-   In developing RweaveOrg, a third was introduced.
-
-   Eric is leaning towards using the =#+begin_src= blocks, as that is
-   really what these blocks contain: source code.  Austin believes
-   that specifying export options at the beginning of a block is
-   useful functionality, to be preserved if possible.
-
-   Note that upper and lower case are not relevant in block headings.
-
-*** Source code blocks 
-    Org has an extremely useful method of editing source code and
-    examples in their native modes.  In the case of R code, we want to
-    be able to use the full functionality of ESS mode, including
-    interactive evaluation of code.
-
-    Source code blocks look like the following and allow for the
-    special editing of code inside of the block through
-    `org-edit-special'.
-
-#+BEGIN_SRC r
-
-,## hit C-c ' within this block to enter a temporary buffer in r-mode.
-
-,## while in the temporary buffer, hit C-c C-c on this comment to
-,## evaluate this block
-a <- 3
-a
-
-,## hit C-c ' to exit the temporary buffer
-#+END_SRC     
-
-*** dblocks
-    dblocks are useful because org-mode will automatically call
-    `org-dblock-write:dblock-type' where dblock-type is the string
-    following the =#+BEGIN:= portion of the line.
-
-    dblocks look like the following and allow for evaluation of the
-    code inside of the block by calling =\C-c\C-c= on the header of
-    the block.  
-
-#+BEGIN: dblock-type
-#+END:
-
-*** R blocks
-    In developing RweaveOrg, Austin created [[file:existing_tools/RweaveOrg/org-sweave.el][org-sweave.el]].  This
-    allows for the kind of blocks shown in [[file:existing_tools/RweaveOrg/testing.Rorg][testing.Rorg]].  These blocks
-    have the advantage of accepting options to the Sweave preprocessor
-    following the #+BEGIN_R declaration.
-
-
-* tasks
-
-* buffer dictionary
- LocalWords:  DBlocks dblocks
--- a/rorg.org
+++ b/rorg.org
@ -190,20 +190,20 @@ Are there side-effects which need to be considered aside from those
 internal to the source-code evaluation process?

 ** reference to data and evaluation results
-I think this will be very important.  I would suggest that since we
-are using lisp we use lists as our medium of exchange.  Then all we
-need are functions going converting all of our target formats to and
-from lists.  These functions are already provided by for org tables.
+   I think this will be very important.  I would suggest that since we
+   are using lisp we use lists as our medium of exchange.  Then all we
+   need are functions going converting all of our target formats to and
+   from lists.  These functions are already provided by for org tables.

-It would be a boon both to org users and R users to allow org tables
-to be manipulated with the R programming language.  Org tables give R
-users an easy way to enter and display data; R gives org users a
-powerful way to perform vector operations, statistical tests, and
-visualization on their tables.
+   It would be a boon both to org users and R users to allow org tables
+   to be manipulated with the R programming language.  Org tables give R
+   users an easy way to enter and display data; R gives org users a
+   powerful way to perform vector operations, statistical tests, and
+   visualization on their tables.

-This means that we will need to consider unique id's for source
-blocks, as well as for org tables, and for any other data source or
-target.
+   This means that we will need to consider unique id's for source
+   blocks, as well as for org tables, and for any other data source or
+   target.

 *** Implementations
 **** naive
@ -214,25 +214,71 @@ target.
     files. Org tables are first exported to a temporary csv file
     using [[file:existing_tools/org-R.el::defun%20org%20R%20export%20to%20csv%20csv%20file%20options][org-R-export-to-csv]].
 **** org-exp-blocks
-org-exp-blocks uses [[org-interblock-R-command-to-string]] to send
-commands to an R process running in a comint buffer through ESS.
-org-exp-blocks has no support for dumping table data to R process, or
-vice versa.
+     org-exp-blocks uses [[org-interblock-R-command-to-string]] to send
+     commands to an R process running in a comint buffer through ESS.
+     org-exp-blocks has no support for dumping table data to R process, or
+     vice versa.

 **** RweaveOrg
     NA

 *** reference format
-This will be tricky, Dan has already come up with a solution for R, I
-need to look more closely at that and we should try to come up with a
-formats for referencing data from source-code in such a way that it
-will be as source-code-language independent as possible.
+    This will be tricky, Dan has already come up with a solution for R, I
+    need to look more closely at that and we should try to come up with a
+    formats for referencing data from source-code in such a way that it
+    will be as source-code-language independent as possible.
+
+**** Dan: thinking aloud re: referencing data from R
+     Suppose in some R code, we want to reference data in an org
+     table. I think that requires the use of 'header arguments', since
+     otherwise, under pure evaluation of a code block without header
+     args, R has no way to locate the data in the org buffer. So that
+     suggests a mechanism like that used by org-R whereby table names
+     or unique entry IDs are used to reference org tables (and indeed
+     potentially row/column ranges within org tables, although that
+     subsetting could also be done in R).
+
+     Specifically what org-R does is write the table to a temp csv
+     file, and tell R the name of that file. However:
+
+     1. We are not limited to a single source of input; the same sort
+        of thing could be done for several sources of input
+
+     2. I don't think we even have to use temp files. An alternative
+        would be to have org pass the table contents as a csv-format
+        string to textConnection() in R, thus creating an arbitrary
+        number of input objects in the appropriate R environment
+        (scope) from which the R code can read data when necessary.
+
+	That suggests a header option syntax something like
+    
+#+begin_src emacs-lisp
+'(:R-obj-name-1 tbl-name-or-id-1 :R-obj-name-2 tbl-name-or-id-2)
+#+end_src emacs-lisp
+
+As a result of passing that option, the code would be able to access
+the data referenced by table-name-or-id-2 via read.table(R-obj-name-1).
+
+An extension of that idea would be to allow remote files to be used as
+data sources. In this case one might need just the remote file (if
+it's a csv file), or if it's an org file then the name of the file
+plus a table reference within that org file. Thus maybe something like
+
+#+begin_src emacs-lisp
+'((R-obj-name-1 . (:tblref tbl-name-or-id-1 :file file-1))
+  (R-obj-name-2 . (:tblref tbl-name-or-id-2 :file file-2)))
+#+end_src emacs-lisp
+

 *** source-target pairs

-The following can be used for special considerations based on
-source-target pairs
+    The following can be used for special considerations based on
+    source-target pairs

+    Dan: I don't quite understand this subtree; Eric -- could you give
+    a little more explanation of this and of your comment above
+    regarding using [[lists as our medium of exchange]]?
+    
 **** source block output from org tables
 **** source block outpt from other source block
 **** source block output from org list
@ -240,15 +286,16 @@ source-target pairs
 **** org table from org table
 **** org properties from source block
 **** org properties from org table
-
+     
+     
 ** export
-once the previous objectives are met export should be fairly simple.
-Basically it will consist of triggering the evaluation of source code
-blocks with the org-export-preprocess-hook.
+   once the previous objectives are met export should be fairly simple.
+   Basically it will consist of triggering the evaluation of source code
+   blocks with the org-export-preprocess-hook.

-This block export evaluation will be aware of the target format
-through the htmlp and latexp variables, and can then create quoted
-=#+begin_html= and =#+begin_latex= blocks appropriately.
+   This block export evaluation will be aware of the target format
+   through the htmlp and latexp variables, and can then create quoted
+ =#+begin_html= and =#+begin_latex= blocks appropriately.


 * Notes
@ -395,7 +442,7 @@ a
     following the #+BEGIN_R declaration.

 *** block headers/parameters
-regardless of the syntax/format chosen for the source blocks, we will
+Regardless of the syntax/format chosen for the source blocks, we will
 need to be able to pass a list of parameters to these blocks.  These
 should include (but should certainly not be limited to)
 - label or id :: Label of the block, should we provide facilities for
@ -488,6 +535,7 @@ through the process of fleshing out objectives, and cashing those
 objectives out into tasks.  That said, please feel free to make any
 changes that you see fit.

-
+** Dan <2009-02-12 Thu 10:23>
+   Good job Eric with major works on this file.
 * Buffer Dictionary
 LocalWords:  DBlocks dblocks