Git or how to keep track of your changes#

In this section we will discuss version control of your projects. We will focus on git

Learning outcomes#

You will learn about:

  • what is version control and

  • why you should use it

  • the most common version control systems

  • why to use git

  • a basic git workflow including

    • creating a repository

    • adding files

    • commiting changes

    • branching

    • merging

  • hosting repositories

Resources and further reading#

There are many resources about version control in general and git in particular online. Some good resources are

What is version control#

Version control or revision control or source control management (SCM) in the most general sense is a system of managing changes to documents, files or any kind of information on a computer. In the context of software engineering and programming Version Control Systems (VCS) manage revisions of source code. While SCM can be part of other software (e.g. track changes in Word), for revision control of source code VCS are typically standalone programs.

Some of the most common ones are:

  • CVS (once very common but now largely obsolete)

  • Subversion (spiritual successor of CVS)

  • Mercurial

  • Git

These systems can roughly divided into two classes:

Centralized#

All changes are tracked in a central repository on a central server

Distributed#

Changes are tracked within local repositories

Functionality#

Good version control offers a way

  • to keep snapshots of the project

  • documents and tracks changes

  • to branch projects

    • work on different features separately but at the same time

    • different people can work on the same code without interfering

    • easy experimenting which can be easily undone

  • allows to merge changes from different branches

Why use version control#

Even if you do not use VCS software you are likely doing some sort of manual version control. The directory of a project which has been manually managed often looks like this:

 mpyproject_final_final.py
 myproject_1.py
 myproject_2.1.py
 myproject_2.py
 myproject_3.1.py
 myproject_3.2.py
 myproject_3.py
 myproject_final_corrected_1.py
 myproject_final_corrected.py
 myproject_final.py

What is the problem with this?#

  • difficult to know what is the version one should use

  • what is the difference between the versions?

  • what if I realise I need go back to an older version but still apply some of the changes from newer versions

  • how to work on several things at the same time

This becomes much worse if several people work on the same project.

Why git?#

Why should you use git? It is:

  • Easy to set up - use even by yourself with no server needed.

  • Very popular: chances are high you will need to contribute to somebody else’s code which is tracked with Git.

  • Distributed: good backup, no single point of failure, you can track and clean-up changes offline, simplifies collaboration model for open-source projects.

  • Important platforms such as GitHub, GitLab, and Bitbucket build on top of Git.

  • Many platforms build on top of GitHub.

  • Sharing software and data is getting popular and required in research context and GitHub is a popular platform for sharing software.

However, “Git is a four-handle, dual boiler espresso machine, not instant coffee.” [citation needed]. Git isn’t the most user friendly and has its design quirks but deep design is great and is definitely the most popular and what you are most likely to need to know. So we teach it.

Installing git#

Git is available for Linux, Windows and OSX.

Linux#

Git is available in the repositories of all major distributions, so you can generally install using your package manager, e.g. apt install git.

Windows#

Downloads for the windows installers of the official releases can be found at https://git-scm.com/. Note that there is also the git for windows project which is essentially a fork of the official git version with a focus of providing a native windows toolset.

OSX#

On OSX git can be installed with the official packages from https://git-scm.com/ or using homebrew.

GUIs#

A large number of graphical user interfaces exist for all operating systems, a list can be found at https://git-scm.com/downloads/guis. Git is also integrated into the workflow of many IDEs including Pycharm, Visual Studio Code and Eclipse.

Using git#

The central element where git stores all information is the repository. A local git repository is simply a working directory with an additional .git folder. In this folder git stores a snapshot of the work contained in the working directory everytime you commit a change. Note that git stores the changes only, so the snapshots are very space-efficient.

Create a repository#

To create a repository you use the git init command, which creates an empty repository. Alternatively you can “clone”, i.e. create a working copy of another repository using git clone [location] where location can be a local path or remote url for example to clone the shortcourse notebooks you would use git clone https://gitlab.com/python4photonics/ofcshortcourse.git

Basic git workflow#

The local repository contains three “trees” that git manages to keep track of your changes.

gitrees from https://rogerdudler.github.io/git-guide/

Working directory#

The working directory holds your actual files that you edit using your favourite editor.

Index/Stage#

The index or stage is an area where you add or stage changes before committing them to the repository. This is useful to group changes in several files that belong together in a single commit.

Example#

Let’s work on an example, if you have git installed you can follow along.

Send your changes to a remote repository#

While your changes are now commited to the HEAD of your local repository it is generally a good idea to keep a remote copy of you changes for example on github. If you cloned your repository from a remote you can send your changes using the push command git push origin master. If you have not cloned from an existing repository but want to connect your repository to a remote server you need to first connect the two with git remote add origin <server url> before you push your changes.

Branching#

The real power of version control is the ability to work with branches. This allows to develop different features simultaneously without changing the working state of the main part of the repository, i.e. isolate different tracks of work. Because git is so powerful, there are many different workflows around branching and a whole course could be taught around this, here we will cover the basics which is sufficient for many however.

The basic concept behind branching is illustrated below: octopus

Branching enables us to:

  • keep one working branch (by convention this branch is often called the “master” branch)

  • work on a new feature, often several, that might remain unfinished

  • separate the work on different lines well

The points where lines separate is called branching, the points where they reconnect is merging.

To create a new branch#

New branches can be created with git branch [branchname]. However, with that command we are still on the starting branch to switch to the new branch us git checkout [branchname]. There is a shortcut to create and switch to a new branch with git checkout -b [branchname].

Merging branches#

To merge a branch back into the current branch use git merge [branchname]. Assuming you are on current branch master this would merge the changes from branchname into master. If there have been changes that conflict with each other, i.e. changes to the same lines in the same file on both branches, then conflicts need to be resolved manually. However, we will not cover that here.

Example#

Lets do another example

Viewing changes#

To view a log of changes you can use the git log command. An example output for the QAMpy project is shown below

commit 39261102c5db3ecc25b4227d528dc59b275b0c39 (HEAD -> master, origin/master, origin/HEAD)
Merge: 72ca866 783d5c1
Author: Jochen Schroeder <jochen.schroeder@gmail.com>
Date:   Tue Jan 21 22:57:27 2020 +0100

    Merge remote-tracking branch 'origin/master'

commit 783d5c1b7347eb9d104b3e412c118e59656c1aba
Author: Jochen Schroeder <jochen.schroeder@gmail.com>
Date:   Mon Jan 20 23:36:41 2020 +0100

    Add TODO item on the selected modes

commit 0e4af633e4381ecbd7a1e69c17f2e540df3a189b
Author: Jochen Schroeder <jochen.schroeder@gmail.com>
Date:   Mon Jan 20 23:35:37 2020 +0100

    Return only the selected modes from equalise_signal
    
    When selected modes are given by equalise signal only return those when applying the filter. Current implementation is suboptimal, because the filter is still applied to all modes.

The number after commit: is the hash which identifies a specific commit. You can also see the commit messages, which describe the changes being made.

The convention for commit messages is a short single line sentence written in imperative, describing what the commit does. If necessary this is followed by a longer describtion of the changes, such as why the commit was made …

Graph#

It is also possible to view a graph of changes across multiple branches using git log --all --graph --decorate --online

* b6f6b98 (origin/feature/pythran_equalizer) Remove manual loop collapse to avoid confusion
* ddb0048 Add note about error in number of returned modes
*   101b657 Merge branch 'master' into feature/pythran_equalizer
|\  
* | da0defd Fix typo, that caused very long runs when doing foe
* | bf1c071 (origin/test/transceiver_impairments) Rename script to avoid triggering pytest
* | b62bbd4 Cast 1j to complex64 to avoid conversion to complex128
* | f68c560 Wrong return arguments after recent api update
* | e55f6f0 Add comment on wrong type generation for complex64
* | 011abf0 (feature/pythran_equalizer) Update setup.py to compile pythran under windows
| | *   3926110 (HEAD -> master, origin/master, origin/HEAD) Merge remote-tracking branch 'origin/master'
| | |\  
| | |/  
| |/|   
| * | 783d5c1 Add TODO item on the selected modes
| * | 0e4af63 Return only the selected modes from equalise_signal
| * | bb9c42b Fix recursive list generation
| * | 1b5a967 Fix error when synced and there are more dims in symbols
| * | 840c5c1 Fix test for #27 issue to be more general
| | | *   14673bd (test/zonglong) Merge branch 'UngerAndreas/PyCommunication-master' into test/zonglong
| | | |\  
| | |/ /  
| | * | 72ca866 Rename test to reflect long ago change
| | * | 63d198d Fix missing parametrize for test
| | * | c471990 Fix signal object test which could fail if a did not contain all symbols
| | * | 64e4207 Fix wrong return argument number if searching offset in real-valued signals
| | * | 6c371cd Fix find_sequence_offset_complex test to changed api
| | | | * ff63adb (bugfix/fix_tests) Fix wrong return argument number if searching offset in real-valued signals
| | | | * f649334 Fix find_sequence_offset_complex test to changed api
| |_|_|/  
|/| | |   
| | | | *   7086756 (refs/stash) On realvalued4d: real_valued
| | | | |\  
| | | | | * b0f1f97 index on realvalued4d: 673c5bc Missing changes to pythran export
| | | | |/  
| | | | * 673c5bc (origin/feature/pythran_4d, feature/realvalued4d, feature/pythran_4d) Missing changes to pythran export
| | | | * 59e9b00 More fixes on type to adjust to input precision
| | | | *   60ae998 Merge branch 'feature/pythran_equalizer' into feature/pythran_4d
| | | | |\  
| |_|_|_|/  
|/| | | |   

Repository hosting (supplementary material)#

There are a number of services that allow for hosting of git repositories. Most of these offer other services such as issue tracking, forking etc.. Below is a by no means complete list of ways to host git repositories

Github#

github is the biggest hosting service for git repositories. Even projects that do not use github for their work often mirror their project to github for visibility.

Advantages#

  • many “social” features, forking, starring …

  • largest community

  • issue tracker, wiki …

  • free of charge for public repositories

  • continuous integration

  • rendering of jupyter notebooks

Disadvantages#

  • private repositories cost

  • issue tracker has proprietary format, can make migration difficult

  • lacking behind in features to some of the others

Gitlab#

gitlab is probably the second biggest service. Its feature set is very similar to github, however gitlab has been a bit quicker in implementing new features.

Advantages#

  • issue tracker, wiki, …

  • free of charge for public and private projects

  • continuous integration

  • open source version available for self-hosting

  • Groups

  • rendering of jupyter notebooks

Disadvantages#

  • community not as large as github

  • open source version lacks behind enterprise version in features (if you are self-hosting)

Bitbucket#

Bitbucket is Atlassian’s hosting service. One of their selling argument is the integration with their other tools such as trello and their team functionality.

Advantages#

  • similar feature-set as github/gitlab

  • team functionality

  • integration with Trello and Jira

  • free private respositories for teams up to 5

  • very extensive documentation (see link above)

Disadvantages#

  • cost for teams more than 5 members

  • smaller community to github, gitlab

A note on Jupyter notebooks and git (supplementary material)#

Both jupyter notebooks and git are incredibly useful and powerful tools. Unfortunately, working with notebooks and git is less than ideal. The reason for this is that notebook store some of the information about their state, i.e. which cells have been executed inside the notebook. This results that even just opening a notebook can result in a change of the notebook document, which will be registered by git. These changes makes finding the important edits to notebooks difficult. Even more important the metadata changes will typically result in conflicts when branching and merging which make using branches extremely cumbersome. Some discussions on the issues and possible ways around it can be found here: