L Hama 2020-10-05
Workshop date: 5th October 2020 Estimate time: One hour Location: MS Teams
Git is really easy to learn.
Official docs:
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
You can find the “Pro Git” book from Scott Chacon and Ben Straub free to read.
But this part is important:
Git thinks of its data more like a series of
snapshots
of aminiature filesystem
. With Git, every time youcommit
, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to thatsnapshot
. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like astream of snapshots
.
Visualized: Image from (Chacon and Straub 2014)
I am happy to spend some time convincing you to use the terminal to do this instead of using any GUI. Therefore, this tutorial assumes you have access to a Unix (Linux or macOS) or Windows 10 PowerShell with git installed.
For those of you on Windows please look at this or this guide to get git on your machine installed.
I think those of you who use Unix can easily install it on your machines if not installed already when the OS was installed.
There are terminal simulators with git command support to play with. One of these is this one: https://www.katacoda.com/courses/git
Let us run this session with each of you doing at least one or more commands and the rest of us will follow/lead/watch.
I just copied the titles of the section two of the book here but we will do it our way:
Creating on our machine
mkdir repo # anywhere on your machine
git init #
git status
Or show us how you can do this using GitHub desktop? I found this link but never tried the application.
Cloning from a remote?
git clone https://github.com/layik/eAtlas
2.2 Recording Changes to the Repository
# write some R code
echo "print('Hello world')" >> hello-world.R
git status
git add *.R
git status
git commit -m "my first file added"
git status
2.3 Viewing the Commit History
git log
git log --oneline
2.4 Undoing Things
# edit the file hello-world.R
git status
# undo
git checkout hello-world.R
git status
2.5 Working with Remotes
git remote -v
# none?
# time to create our first github repository!
# www.github.com
# new private repo or if brave enough make it public
# come back and bring the instructions shown on github
# git remote add ...
Creating a repo on github?
Lets be brave and send the current commits to the remote.
# try
git push # error message?
2.6 Tagging
2.7 Git Aliases
2.8 Summary
There is a great interactive GitHub guides pages.
The index
file of GitHub. Just open a repository and compare what you
see on the landing with the file README.md
A repo with USER.github.io
will translate to: https://USER.github.io
for example layik.github.io
actually points to:
https://github.com/layik/layik.github.io
Worth mentioning R packages: * packagedown
* bookdown
*
devtools::install_github
* covr
?
Checkout this short tutorial to get one on the repo.
3.1 Branches in a Nutshell
A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in Git is master.
Read the rest of the section in the book. 3.2 Basic Branching and Merging
# branch or no branch, you can always branch
git branch <name>
git checkout <name>
# combine those two
git checkout -b <name>
git branch
git status
Lets edit hello-world.R
# this will append the comment to the file
echo "# some R comment" >> hello-world.R
# or just
vim hello-world.R
# and add some changes
git status
# a for all staged
# m for message required for commits
git commit -am "added oneline comment to hello-world.R"
Or something or change somethign on your branch:
echo "File to merge" >> fix.txt
git status
git add fix.txt
git status
git commit -am "add fix.txt file to branch <name>"
Go back to master just to see one or both of those changes
git status
git merge <name>
# voila!
Create a branch on GitHub? (not recommended :))
3.3 Branch Management A whole section from the book which is great. Picks for this one hour tutorial:
git branch
# notice the asterisk
git branch -v
# productive!
When working with github and you want to create your first PR (pull request):
git push origin <name>
# just created a branch called <name> on remote go check.
Delete locally and remotely?
git branch -D <name>
# did that work?
git branch
# now this beauty
git push origin --delete <name>
3.4 Branching Workflows You will want to read
this
in future and no doubt will probably have your own way of doing things.
3.5 Remote Branches
In this section just want to highlight “branch tracking”: Your colleague just created a branch and you want to edit something and send it back to them.
git checkout --track origin/<name>
3.6 Rebasing
3.7 Summary
So what looks really scary is when you have been writing some code/R/Python workflow and a colleague eits the same line that you edit and when you try to merge your work you come across a conflict.
Git does help you get of this and there are various variables which you can use in your commands to automatically solve which version should be commited into the current branch.
So lets do this: on the master branch add one line to the
hello-world.R
file.
git checkout master
echo "# another comment line" >> hello-world.R
git status
git commit -am "prep for conflict"
And then on the
git checkout <name>
echo "# a different comment line" >> hello-world.R
git status
git commit -am "prep on branch for conflict"
and now you can try and merge master
by:
git merge master
# fail?
Lets inspect the tiny file and see what git has done to it. Open the file in your favourite editor. Remove the parts that you feel should or should not stay. Commit your changes.
Data science relies on git and git repository hosting services such as
github and there are great tools made for data scientists to use. One of
those is the case of R packages. Not only that Rstudio comes with built
in support to “initialize” your new project/pakcage with a git repo,
there is also support to build documentations which are ready to be
deployed as github pages
which look great.
There are also tools which assist with various tasks such as CI, file upload and download from repository and even writing a whole book using markdown and host it on git hosting services. There are hosting services which are built entirely on git and they integrate with GitHub and similar repository hosting services.
If you like to know more one source could be this, it is a more GUI based approach than this tutorial.
Do not commit large files into git. Chances are, they need to live somewhere else. What is a large file? Anything larger than 1mb? Video? Audio? PDFs? etc.
Work with a remote origin. If you keep working on your local machine you may not see the benefit of git and reproduciblity of your code/workflows.
Commit frequently. There is no small commit, every commit is a snapshot of your current work which you may want to come back to.
Always leave your working directory in a functioning state. If you
have some breaking changes you can always use git stash
to keep it
hidden and once you are ready to make some changes you can bring
them back alive using git stash pop
.