Tutorial: Git on Ubuntu and OS X - 2020
In this chapter, we'll setup git, and learn how to use it.
In Git, every checkout is really a full backup of all the data. The user can copy an existing repository. This copying process is typically called cloning in a distributed version control system and the resulting repository can be referred to as clone. Every clone contains the full history of the collection of files and a cloned repository has the same functionality as the original repository.
If we want to delete a Git repository, we can simply delete the folder which contains the repository.
We can install the Git command line tool using the command below:
$ sudo apt-get install git
Git allows us to store global settings in the .gitconfig file located in the user home directory(~). Git stores the committer and author of a change in each commit. This and additional information can be stored in the global settings.
These values can be setup with the git config command.
We can also configure the settings for a specific repository. If we use the --global flag the configuration is global, otherwise it is specific for the current Git repository.
We need to configure the user which will be used by git - user.name and user.email:
$ git config --global user.name "k" $ git config --global user.email "k@bogotobogo.com" $ git config --list user.name=k user.email=k@bogotobogo.com
The commands below enables color highlighting for Git in the console:
$ git config --global color.ui true $ git config --global color.status auto $ git config --global color.branch auto
Now we want to configure default text editor that will be used when Git needs us to type in a message. By default, Git uses our system's default editor, which is generally vi. If we want to use vim as default editor for Git:
$ git config --global core.editor vim
Another useful option we may want to configure is the default diff tool to use to resolve merge conflicts. Since Git does not provide a default merge tool for integrating conflicting changes into our working tree, we can set our own tool as default merge tool. We may want to use kdiff:
$ git config --global merge.tool kdiff3
To query our Git settings of the local repository:
$ git config --list user.name=k user.email=k@bogotobogo.com color.ui=true color.status=auto color.branch=auto core.editor=vim khong@K-PC:~$ git config --global --list user.name=k
To query the global settings we can use:
$ git config --global --list user.name=k user.email=k@bogotobogo.com color.ui=true color.status=auto color.branch=auto core.editor=vim
Now it's time to create a local Git repository and commit our files into that repository.
$ mkdir ~/Repository1 $ cd ~/Repository1 $ mkdir MyFiles
The following command creates a Git repository in the current directory:
$ git init Initialized empty Git repository in /home/khong/Repository1/.git/
Every Git repository is stored in the .git folder of the directory in which the Git repository has been created. This directory contains the complete history of the repository. The .git/config file contains the configuration for the repository.
All files inside the repository folder excluding the .git folder are the working tree for a Git repository.
$ ls -la total 16 drwxr-xr-x 4 khong khong 4096 Nov 12 00:21 . drwxr-xr-x 43 khong khong 4096 Nov 12 00:06 .. drwxr-xr-x 7 khong khong 4096 Nov 12 00:21 .git drwxr-xr-x 2 khong khong 4096 Nov 12 00:06 MyFiles
Let's create some files:
$ touch MyFiles/simple.txt $ echo "file1" > file1 $ echo "file2" > file2 $ echo "file3" > file3
Let's check what we've done:
$ tree -a
The git status command shows the working tree status, i.e. which files have changed, which are staged and which are not part of the staging area.
$ git status
We need to mark the changes that should be committed before committing change to a Git repository. We do this by adding the new and changed files to the staging area, and it creates a snapshot of the affected files.
Now, we want to add all files to the index of the Git repository:
$ git add .
$ tree -a
After adding the files to the Git staging area, we can commit them to the Git repository. This creates a new commit object with the staged changes in the Git repository and the HEAD reference points to the new commit. The -m parameter allows us to specify the commit message.
Let's commit our file to the local repository:
$ git commit -m "Initial commit" [master (root-commit) 4324d18] Initial commit 4 files changed, 3 insertions(+) create mode 100644 MyFiles/simple.txt create mode 100644 file1 create mode 100644 file2 create mode 100644 file3 $ git status # On branch master nothing to commit, working directory clean
The Git operations we performed have created a local Git repository in the .git folder and added all files to this repository via one commit. We can see the the changes using git log command:
$ git log commit 4324d189e5996c8c442a2a284852a4750e1ca829 Author: kDate: Tue Nov 12 00:55:50 2013 -0800 Initial commit
To remove a file from Git, we have to remove it from our tracked files (more accurately, remove it from our staging area) and then commit.
We can use the git rm command to delete the file from our working tree and record the deletion of the file in the staging area.
$ touch rm_file $ git add . $ git commit -m "to_be_removed" [master 1bb66ef] to_be_removed 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 rm_file $ git rm rm_file rm 'rm_file' $ git commit -m "removing rm_file" [master 92c1778] removing rm_file 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 rm_file
How Git does branching?
How Git stores its data?
Git doesn't store data as a series of changesets or deltas, but instead as a series of snapshots.
When we commit in Git, Git stores a commit object that contains a pointer to the snapshot of the content we staged, the author and message metadata, and zero or more pointers to the commit or commits that were the direct parents of this commit: zero parents for the first commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches.
To see how Git does branching, let's try a directory containing three files, and we stage them all and commit. Staging the files checksums each one, stores that version of the file in the Git repository (Git refers to them as blobs), and adds that checksum to the staging area:
$ echo "README" >> README $ echo "test.rb" >> test.rb $ echo "LICENSE" >> LICENSE
$ git add README test.rb LICENSE $ git commit -m 'initial commit'
Running git commit checksums all project directories and stores them as tree objects in the Git repository. Git then creates a commit object that has the metadata and a pointer to the root project tree object so it can re-create that snapshot when needed.
Our Git repository now contains five objects: one blob for the contents of each of our three files, one tree that lists the contents of the directory and specifies which file names are stored as which blobs, and one commit with the pointer to that root tree and all the commit metadata. Conceptually, the data in our Git repository looks something like this:
If we make some changes and commit again, the next commit stores a pointer to the commit that came immediately before it.
After two more commits, our history might look something like the picture below:
A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in Git is master. As we initially make commits, we're given a master branch that points to the last commit we made. Every time we commit, it moves forward automatically.
What happens if we create a new branch?
Doing so creates a new pointer for us to move around. Let's say we create a new branch called testing. We do this with the git branch command:
$ git branch testing
This creates a new pointer at the same commit we're currently on
How does Git know what branch we're currently on?
It keeps a special pointer called HEAD. Note that this is a lot different than the concept of HEAD in other VCSs we may be used to, such as Subversion or CVS.
In Git, this is a pointer to the local branch we're currently on. In this case, we're still on master. The git branch command only created a new branch - it didn't switch to that branch as shown in the picture below.
HEAD file is still pointing to the branch we're on.
To switch to an existing branch, we run the git checkout command. Let's switch to the new testing branch:
$ git checkout testing Switched to branch 'testing'
This moves HEAD to point to the testing branch
What is the significance of that?
Well, let's do another commit:
$ vim test.rb $ git commit -a -m 'made a change'
The picture we see below is the outcome:
This is interesting, because now our testing branch has moved forward, but our master branch still points to the commit we were on when we ran git checkout to switch branches. Let's switch back to the master branch:
$ git checkout master Switched to branch 'master'
We can see the result from the picture below:
That command did two things. It moved the HEAD pointer back to point to the master branch, and it reverted the files in our working directory back to the snapshot that master points to. This also means the changes we make from this point forward will diverge from an older version of the project. It essentially rewinds the work we've done in our testing branch temporarily so we can go in a different direction.
Let's make a few changes and commit again:
$ vim test.rb $ git commit -a -m 'made other changes'
Now our project history has diverged as shown in the picture below. We created and switched to a branch, did some work on it, and then switched back to our main branch and did other work. Both of those changes are isolated in separate branches: we can switch back and forth between the branches and merge them together when we're ready. And we did all that with simple branch and checkout commands.
This branch section is largely based on Git Branching - What a Branch Is.
Git/GitHub Tutorial
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization