Layik Hama

Git/GitHub workshop

L Hama 2020-10-05

Workshop date: 5th October 2020 Estimate time: One hour Location: MS Teams

Introduction

Git is really easy to learn.

Official docs:

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

You can find the “Pro Git” book from Scott Chacon and Ben Straub free to read.

But this part is important:

Git thinks of its data more like a series of snapshots of a miniature filesystem. With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

Visualized: git visualized Image from (Chacon and Straub 2014)

Prerequisites

Terminal

I am happy to spend some time convincing you to use the terminal to do this instead of using any GUI. Therefore, this tutorial assumes you have access to a Unix (Linux or macOS) or Windows 10 PowerShell with git installed.

For those of you on Windows please look at this or this guide to get git on your machine installed.

I think those of you who use Unix can easily install it on your machines if not installed already when the OS was installed.

No terminal? No git?

There are terminal simulators with git command support to play with. One of these is this one: https://www.katacoda.com/courses/git

Hands on

Let us run this session with each of you doing at least one or more commands and the rest of us will follow/lead/watch.

I just copied the titles of the section two of the book here but we will do it our way:

  1. Git Basics 2.1 Getting a Git Repository

Creating on our machine

mkdir repo # anywhere on your machine
git init # 
git status

Or show us how you can do this using GitHub desktop? I found this link but never tried the application.

Cloning from a remote?

git clone https://github.com/layik/eAtlas

2.2 Recording Changes to the Repository

# write some R code
echo "print('Hello world')" >> hello-world.R
git status
git add *.R
git status
git commit -m "my first file added"
git status

2.3 Viewing the Commit History

git log
git log --oneline

2.4 Undoing Things

# edit the file hello-world.R
git status
# undo
git checkout hello-world.R
git status

2.5 Working with Remotes

git remote -v 
# none?
# time to create our first github repository!
# www.github.com
# new private repo or if brave enough make it public
# come back and bring the instructions shown on github
# git remote add ...

Creating a repo on github? create repo on github

Lets be brave and send the current commits to the remote.

# try 
git push # error message?

2.6 Tagging

2.7 Git Aliases

2.8 Summary

GitHub

There is a great interactive GitHub guides pages.

README

The index file of GitHub. Just open a repository and compare what you see on the landing with the file README.md

GH pages

https://USER.github.io/repo

A repo with USER.github.io will translate to: https://USER.github.io for example layik.github.io actually points to: https://github.com/layik/layik.github.io

Worth mentioning R packages: * packagedown * bookdown * devtools::install_github * covr?

DOI

Checkout this short tutorial to get one on the repo.

Branch

  1. Git Branching

3.1 Branches in a Nutshell

A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in Git is master.

Read the rest of the section in the book. 3.2 Basic Branching and Merging

# branch or no branch, you can always branch
git branch <name>
git checkout <name>
# combine those two 
git checkout -b <name>
git branch 
git status

Lets edit hello-world.R

# this will append the comment to the file
echo "# some R comment" >> hello-world.R
# or just
vim hello-world.R 
# and add some changes
git status
# a for all staged
# m for message required for commits
git commit -am "added oneline comment to hello-world.R"

Or something or change somethign on your branch:

echo "File to merge" >> fix.txt
git status
git add fix.txt
git status
git commit -am "add fix.txt file to branch <name>"

Go back to master just to see one or both of those changes

git status
git merge <name>
# voila!

Create a branch on GitHub? (not recommended :)) gif create branch on githu

3.3 Branch Management A whole section from the book which is great. Picks for this one hour tutorial:

git branch
# notice the asterisk
git branch -v 
# productive!

When working with github and you want to create your first PR (pull request):

git push origin <name>
# just created a branch called <name> on remote go check.

Delete locally and remotely?

git branch -D <name>
# did that work?
git branch
# now this beauty
git push origin --delete <name>

3.4 Branching Workflows You will want to read this in future and no doubt will probably have your own way of doing things.

3.5 Remote Branches

In this section just want to highlight “branch tracking”: Your colleague just created a branch and you want to edit something and send it back to them.

git checkout --track origin/<name>

3.6 Rebasing

3.7 Summary

Conflict resolution

So what looks really scary is when you have been writing some code/R/Python workflow and a colleague eits the same line that you edit and when you try to merge your work you come across a conflict.

Git does help you get of this and there are various variables which you can use in your commands to automatically solve which version should be commited into the current branch.

So lets do this: on the master branch add one line to the hello-world.R file.

git checkout master
echo "# another comment line" >> hello-world.R
git status
git commit -am "prep for conflict"

And then on the branch we can do the same but with a different comment (feel free to see if the same line would cause a conflict).

git checkout <name>
echo "# a different comment line" >> hello-world.R
git status
git commit -am "prep on branch for conflict"

and now you can try and merge master by:

git merge master
# fail?

Lets inspect the tiny file and see what git has done to it. Open the file in your favourite editor. Remove the parts that you feel should or should not stay. Commit your changes.

R package showcase

Data science relies on git and git repository hosting services such as github and there are great tools made for data scientists to use. One of those is the case of R packages. Not only that Rstudio comes with built in support to “initialize” your new project/pakcage with a git repo, there is also support to build documentations which are ready to be deployed as github pages which look great.

There are also tools which assist with various tasks such as CI, file upload and download from repository and even writing a whole book using markdown and host it on git hosting services. There are hosting services which are built entirely on git and they integrate with GitHub and similar repository hosting services.

If you like to know more one source could be this, it is a more GUI based approach than this tutorial.

Beginner advice

Awesomeness

References

Chacon, Scott, and Ben Straub. 2014. *Pro Git*. Springer Nature.