A Multi-Language Computing Environment for Literate programming and Reproducible research

The paper titled, A Multi-Language Computing Environment for Literate programming and Reproducible research, gives an introduction to org-mode. In order to communicate a data analyst’s work to others, it is often important to mix prose and code in same document. There are many tools out there that do the job. However org-mode is one such tool that is useful for literate programming as well as reproducible-research. Be it a research environment or a pedagogical environment, the need for mixing code and prose is always present. This paper talks about Org-mode that is probably one of the most powerful tools prose and code from many languages.

There are typically two approaches to combining prose and code:

Literate programming : embed code in an explanatory essay
Reproducible Research : embed code in research reports with the aim of allowing readers to re-run the analyses described

In the case of Literate programming, there are two types of view into the document: articles of typeset prose with marked-up code blocks intended for human consumption, and computer-readable documents of pure source code. The literate programming terms for generating these views are weaving and tangling, respectively. A common feature of literate programming tools is the ability to organize code blocks differently when tangling and weaving, thereby allowing the programmer to introduce material to humans in a different order than code is introduced to the computer. Here is an event history for the development of various tools :

Year	Developments
1984	Knuth developed WEB, that consisted of TANGLE and WEAVE. It supported pascal
1994	Knuth and Levy produced a `c` version, `cweb`
2000	Modern descendant of `cweb` is `noweb` which is language agnostic. Its primary programs are `notangle` and `noweave`

The above tools enable authoring of both prose and code but do not provide facilities for the execution of code from within documents.

In the case of Reproducible Research, the concept of compendium is followed, i.e. the document is both a container for different as both a container for the different elements that make up the document and its computations, and as a means for distributing, managing and updating the collection.

What’s the basic difference between Reproducible Research and Literate Programming ?

Reproducible research approaches mixed natural and computational language documents from a different direction than literate programming. Rather than adding prose to computational projects, reproducible research seeks to augment publications of scientific research with the computer code used to carry out the research. Whereas literate programming extracts embedded code into an external file used as input to a compiler or an interpreter, code embedded in a reproducible research document is intended to be executed as part of the document generation process. In this way the data, analysis, and figures supporting a publication can be generated from the publication itself.

Year	Developments
2002	`Sweave` was created
2004	Gentleman and Temple Lang proposed the idea of compendium
2007	`SASSweave` was created
2009	`Statweave` was created
2009	`Scribble` was created

Sweave and its descendants do not support code block re- organization during tangling and thus only partially support literate programming. Out of all the tools mentioned, Org-mode is the only tool that provides full support to literate programming and reproducible research across wide range of languages.

Source code and data are located in active blocks, distinct from text sections, where active here means that code and data blocks can be evaluated to return their contents or their computational results. The results of code block evaluation can be written to a named data block in the document, where it can be referred to by other code blocks, any one of which can be written in a different computing language. In this way, an Org-mode buffer becomes a place where different computer languages communicate can communicate with one another.

Embedding R snippet

x <- 9 x^2

x <- 9 print(paste0(“the value is “,x))

[1] “the value is 9”

library(ascii) a <- runif(100) c <- “Quantiles of 100 random numbers” b <- ascii(quantile(a),header=T,include.colnames=T,caption=c) print(b,type=“org”) rm(a,b,c)

#+CAPTION: Quantiles of 100 random numbers | 0% | 25% | 50% | 75% | 100% | |——+——+——+——+——| | 0.01 | 0.18 | 0.57 | 0.77 | 1.00 |

Embedding Python snippet

def test(): x = 10 return x^2 print test()

def test(): x = 10 return x^2 return test()

def pascals_triangle(n): if n == 0: return [[1]] prev_triangle = pascals_triangle(n-1) prev_row = prev_triangle[n-1] this_row = map(sum, zip([0] + prev_row, prev_row + [0])) return prev_triangle + [this_row]

return pascals_triangle(n)

1
1	1
1	2	1
1	3	3	1
1	4	6	4	1
1	5	10	10	5	1

Takeaway

The paper discussed examples of embedding snippets from various programming languages and shows the versatility of org-mode that can combine prose and code in to one single document. It can be used for literate programming as well as reproducible research. Simply amazing and priceless for anyone who uses multiple programming languages.