clip_image002

This book helps one “get up to speed“ with git. With github being the goto repository-master for most of the R projects out there, it has become imperative for any R programmer to have a decent knowledge of git and the know about ways to interact with github. As of today github has 1.1 million programmers hosting close to 3.2 Million repositories. This has surpassed sourceforge long back. So, an R programmer cannot be ignorant of this distributed version control system. Personally I found it a stretch to have a version control system. But once I started using it for a few projects along with ProjectTemplate directory structure , I found that it helped me in carrying out analysis in a better way. So, how do you go about using it? Well, there are many plug-ins to get going on git. Before you start using the plugin, you need to get the vocabulary of git.

There are a lot of terms that might be overwhelming to begin with for a newbie. Here is a sample
tag, stage, tracked, untracked, branch, working tree, master, head, treeish, SHA-1,carrot parent, tilde parent,patch,stash,diff,clone,push, rebase

Even if you use some plug-in like Egit( that I use), you got to understand the basics of git and terminology that goes with git. Git is supremely elegant for use but at the same time, understanding the internals is takes some time. My objective was mainly to use effectively git and hence this book was kind of ideal to begin with. The book starts off by giving some basic principles of distributed version control system( git is an example of DVCS). One of the main use of Git for me is that I am tired of maintaining v_1, v_2 files and my working directory has become very huge. Git forces me to think in versioning and thus relieves me of maintaining the versions manually. Git works based on snapshots , meaning you have the entire repository on your local machine. You no longer check in and check out files. This is a crucial difference between git and other systems. Here’s some history on Git. Git was developed by Linus Torvalds after relationship broke down with Linux community and BitKeeper, a proprietary DCVS. The purpose of developing was to create an open source DCVS with the following goals.

  • Simple design

  • Strong support for non-linear development (thousands of parallel branches)

  • Fully distributed

  • Able to handle large projects like the Linux kernel efficiently (speed and data size)

The first version was released in 2005 and since then, in the last 7 odd years, its popularity is growing wide and far.

So, what are the advantages of using Git

  • Git is fast ( you can do everything from a simple command line)

  • Easy to learn( I don’t know about this as it depends on what you want to learn about git!)

  • Git offers a staging area ( this is immensely useful feature if you have gone through other versioning systems where you can a file is checked in or checked out by a user)

  • GitHub is available for sharing – I guess this is one of main reasons for its growing popularity.

The book then gives a quick tour of commands for DOS and LINUX. It then dives in to the bare minimum stuff that you need to know about git. They are the following

  • git init

  • git add

  • git commit

  • git status

  • git diff : shows the difference between staged files and rest of files

  • git log

  • git diff SHA-1-a SHA-1-b

  • git describe

  • git tag

  • gitk

  • gitbranch

The book then takes to some advanced level git commands that will be helpful in big multi-user projects. The following illustration (Via Scott Chacon’s Pro Git ) gives an idea of the life cycle of a file in git.

clip_image004

Each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged. Untracked files are everything else - any files in your working directory that were not in your last snapshot and are not in your staging area. When you first clone a repository, all of your files will be tracked and unmodified because you just checked them out and haven’t edited anything.

The book shows the various stages a file can be , and the appropriate git commands that need to be used. One way to explore the above life cycle is to add files in interactive mode. In the interactive mode, git shows options like status, update, revert, add untracked, patch, diff, quit and help. Each of these options are useful in a specific scenario depending on the file status.

Some of the commands explained in this section are

  • git diff : shows the difference between staged files and rest of files

  • git diff –cached : shows the difference between files in original repository and working directory

  • git commit –a –m , -a : lets you avoid the staging area and commits every file that was tracked earlier

  • git reset

  • git stash

  • git push

  • git rebase

The book ends with a nice tutorial for a) setting up a repository on github, b) cloning the repository on your local machine and c) working with using various git commands to synch your working directory with the remote repository.

imageTakeaway :

With screencasts and a large font size, this book is a quick read. Perfect for a newbie who wants to understand just enough, to start experimenting with git.