Version control

Version control is software used to track modifications to a collection of files over time. The goal is not only to be able to determine exactly who made which modifications, but to also be able to have access to the full history of the project at any time.

This page is not meant to be a tutorial on using any particular version control system; there are excellent tutorials available online (and your system man pages) to consult.

There are many version control systems (VCS) available today, and they can largely be categorized as either centralized or distributed version control systems. While we will focus our attention on one particular distributed version control system (DVCS) – Git – we will discuss several here that enjoy common use.

General terminology

There are several basic operations that are common to most version control systems. Note: I use Git almost exclusively, and so there may be a bias towards Git terminology.

Repository

A collection of files under version control.

Cloning

Making a (local) copy of a (remote) repository.

Checkout

Moving your repository contents to a point in the project history.

Commit (noun)

A pointer to a checkpoint in the revision history.

Commit (verb)

Creating a checkpoint in the revision history. If you have created a set of changes to a collection of files and want to mark this work in the history, you commit your changes.

Branch (noun)

A line of development within the repository. There can be many branches in a single repository. For example, a new feature might be developed on a feature branch, while a stable copy of the working project might live on the master branch.

Branch (verb)

Creating a new branch in the repository.

Pulling

Fetching and merging changes from a remote repository with your local repository. This action should be performed whenever you want to get changes from someone else.

Pushing

Publishing local changes to a repository to a remote repository.

Merging

Bringing changes from one repository or branch into a local branch.

Version control as a graph

There is a very nice representation of version control systems as a directed acyclic graph. In this graph each node is a commit representing the full state of the repository at some point in the history. There is a root node representing the start of the project. Each other node is connected to another node by one or more arcs. An arc between two nodes indicates that the head of the arc is a parent of the tail of the arc. In this model a branch is simply a selected node along with all nodes reachable from this node back to the root.

In Git, you can see your revision history as a graph with the command git log --graph. At the time of this writing, the command git log --graph --oneline produces the following output:

* bc630a3 Minor language changes.
*   50787e8 Merge branch 'master' of ssh://bitbucket.org/prsteele/orie-6125-sp2016
|\
* | 291163d Re-ordered the syllabus.
|/
*   cf0424e Merge branch 'vc'
|\
| * d749d42 Updated the commit hash.
| * 889c83c Updated the version control writeup.
* | 582b13f Improved the IEE 754 section.
|/
* 8b4b680 Added a commit reference.
* d2bd989 Added venv to the gitignore.
* 0d4bea5 Working on the version control section.
* a555334 Working on IEEE754.
* a2f463f Splitting off architecture and IEEE 754.
* 339a649 Improved the architecture writeup.
* 57450c5 Working on arch.
* ce83730 Removed the remainder of the old build-systems.
* c967e14 Added links to the syllabus.
* f025200 Added a bit on benchmarks.
* a3e8952 Progress on build systems.
* be0904e Working on build systems.
* ab6accf Working on getting a skeleton up there.
* 941dc18 Testing seems to be somewhat finished.
* 07c1979 Added a bit on hypothesis testing.
* 3f79e77 Progress on the testing writeup.
* 495f803 Cleaned up the testing section.
* 26e45a5 Moving to Sphinx.
*   3e5018e Merge remote-tracking branch 'origin/master'
|\
| *   7ddeee9 Merge branch 'build-systems'
| |\
| * | a884188 Initial commit.
|  /
* | 9060111 Progress on the syllabus.
* | 774846e Messy.
* | ac05c19 Working on the testing writeup.
* | 80ea2d0 Minor fixes.
|/
* 04ded6c Reasonable first pass.
* 095e8c9 Fixed a silly bug in main.c.
* 0872b41 Added an ignore for SCons.
* cbff3d0 Working on the writeup.
* c7b19e7 Working on a lecture for build systems.

As you can see the history is mostly linear except for some branching and merging. (If you are interested, the commit that generated this history is bc630a313a193b045eeb98c0fad620d894db02ef – you can see the first 7 characters of this commit in the first line of the log).

Centralized version control systems

A centralized version control system has a notion of privileged copy of the repository. Each person working on the project can check out copies of files from the privileged repository, make changes, and commit those changes back to the repository. Anyone checking out a copy of the modified files later will receive the updates.

There is an obvious problem here, which is what happens when two or more people try to modify the same file. As an example, Alice and Bob both check out the README file for a project they are working on at roughly the same time. Alice adds her name to the list of authors, and commits this change. This commit succeeds, because no one has modified the file since she checked out her copy. Bob also adds his name to the authors list, but when he tries to commit he will get an error, since he has not yet merged changes from the privileged repository made by Alice. Bob will need to somehow merge these changes.

One way that a centralized version control system can prevent this situation is file locking. When you want to make a change to a file, you check out a copy of this file and lock it; this only succeeds if the file is currently unlocked. The privileged repository will no longer allow anyone else to commit to the locked file until the first user commits her changes and relinquishes the lock. This can make working in large teams difficult.

One advantage of centralized version control systems is that for large projects each team member need not have all the files on their system at once; whenever a file is needed it can be requested on-the-fly.

Notable centralized version control systems are CVS, SVN and Perforce. SVN is intended to replace CVS, and largely has; both are under permissive licenses. Perforce is a proprietary system.

There is a free book discussing how to utilize SVN.

Distributed version control systems

Unlike centralized version control systems, distributed version control systems have no notion of a privileged repository. Rather, each team member maintains a private copy of the full repository, and makes any changes they desire locally. When they are ready they can push or pull changes from peer repositories. Note that it is still possible to have a centrally networked repository; however, this repository has no special status beyond each personal repository.

Advantages of distributed version control systems include speed and reliability. Since all changes are being performed locally, there is no network access required to commit changes (unless you are publishing said changes to a remote repository). Since each team member has a full copy of the repository, there is also a small amount of protection against data loss.

The most well-known distributed version control systems are Git and Mercurial. To first order, these systems are comparable, and choosing which to use might come down to personal preference (there are even extensions allowing Mercurial and Git repositories to interact).

There are many tutorials for Git online, but the standard reference materials are quite good. The man pages can be a bit intimidating if you don’t already know what you are doing, but are very useful if you are trying to remember an infrequently used command or flag.

Workflows

There are many ways to use version control effectively. You can choose any one you like, but I would suggest the “feature branch workflow” as it works nicely for collaboration in small groups.