Choosing Workflow applications
I stumbled upon a very interesting write-up that captures the necessary software needed for managing one’s projects. The paper is written by Kieran Healy of Duke University. The takeaways from the paper are as follows :
-
Three principles of work flow :
-
Keep a coherent record of actions. Instead of doing a bit of statistical work and then just keeping the resulting table of results or graphic that you produced, for instance, write down what you did as a documented piece of code. Rather than figuring out but not recording a solution to a problem you might have again, write down the answer as an explicit procedure
-
A second principle is that a document, file or folder should always be able to tell you what it is. Beyond making your work reproducible, you will also need some method for organizing and documenting your draft papers, code, field notes, datasets, output files or whatever it is you’re working with.
-
DRY principle. Repetitive and error-prone processes should be automated if possible.
-
Learnt about Dropbox feature. Found it very interesting as it gives an instant back up of everything that you consider important.
-
Git can be used for version control
-
Sweave your files so that you always have a track of the code/syntax/libraries/datasets used for a specific research project.
-
I guess one can use ProjectTemplate library to generate a rails like structure for statistical projects
-
Create a beamer presentation of any project you do, no matter how trivial it is. The discipline will all be useful when you need to produce non-trivial presentation
-
After reading this paper, I guess the following can be a rough guideline for a project using R
-
Use Project Template library to create a directory structure
-
Start recognizing the datasets you would want to work with and document the datasets
-
Code your scripts / functions and document them
-
Do your analysis – Decide wisely between R scripts and Rnw files. May be all the trials you can do in .R files and mini finalized steps in .Rnw files
-
Create Rnw documents that capture the essentials of your project and use Sweave to convert them in to reproducible research documents like pdf
-
Use git to do version control of the project
-
Use beamer to produce a sample presentation of the project/purpose/ learnings/ applications etc
-
If need be, pick out the functions, datasets and convert in to R package and install it on your machine.
-
As one can see, the learning curve is steep here. Each step in the above process will take time and effort to learn and execute. But the effort would be worth it as the projects will be there with you for as long as you code in life. If ever you want to look back at a project, you can start with the beamer presentation then proceed to sweave output, and then to the package developed. It will all make sense very quickly. Doing work without this workflow will make it difficult to revisit your projects/research work. I guess its like the difference between short term and long term memory. Following this workflow gives you a schema of the project and hence gets stored in your long term memory. Doing adhoc work stores stuff in your short term memory and then it just vanishes from there.