The Git version control system abstracts code changes in a repository by storing those changes inside of a Directed Acyclic Graph. While this abstraction might sound complex, it is actually an elegant way to visualize the history of a project and it gives us powerful tools to manipulate that history. Once a Git user starts visualizing the graph they are building while running the Git commands, they are able to manipulate the history of the code in complex ways. Let’s break down what a Directed Acyclic Graph is and how it is used by Git to build out the history.
A Graph is a visual representation of objects (vertices), and relationships between those objects (edges). In Git the vertices represent a change in the code and the edges represent the order in which those changes happened. Here is a simple graph with 5 vertices, labeled v1 through v5 and 5 edges between those vertices:
A Directed Graph is a graph where the edges have a relationship that requires a direction. For example, if we were graphing out a family tree we would use a directed graph to show a parent-child relationship. In the Git world the direction shows that one commit is ‘on top’ of another commit, meaning any code that changed in commit v2 happened after the code change in commit v1. With this logic in place Git is able to store the difference between the two commits instead of the whole repository in each commit. Here is an example graph as a Directed Graph:
A Directed Acyclic Graph is a graph whose direction never makes a circle. Our above example is not Acyclic because you are able to travel in a cycle from v2 → v3 → v5 → v4 → v2. Git’s graph needs to be acyclic because Git is representing the history of a code base. If we had any circular path in the graph that would mean there are two ways to get to one point in time, we would have a Back to the Future situation on our hands. So Git doesn’t allow this and all of Git’s Graphs are Acyclic. To fix the above graph we need to change the direction of one of the edges causing this cycle. Lets change the edge between v2 and v4.
And that’s it! We now know what a Directed Acyclic Graph is, woot! Let’s try and use that knowledge to run a few Git commands to reproduce different graphs.
Merging Two Branches
We can start off by trying to reproduce the above graph. Here are the steps we need to follow:
- Initializing a Git project
- Creating commits v1 and v2 on the current branch (default will be master)
- Create a new branch for commit v3
- Jumping back to our intial branch and create commit v4
- Then bringing it all back together with a merge commit v5
Here are the commands to accomplish this:
git init echo "First File Content" >> first_file.txt git add first_file.txt git commit -m 'v1' echo "second master commit" >> first_file.txt git add first_file.txt git commit -m 'v2' git checkout -b new_branch_for_v3 echo "on new_branch_for_v3 branch" >> second_file.txt git add second_file.txt git commit -m 'v3' git checkout master echo "3rd master commit" >> first_file.txt git add first_file.txt git commit -m 'v4' git merge new_branch_for_v3 -m 'v5'
We can look at what we have done with the
git log --graph --oneline --decorate command.
And there we have it! We reproduced our sample graph using Git commands. A few interesting takeaways about this image are 1) each commit has a
sha or a unique set of numbers and letters and 2) we can see that
HEAD is equal to master, meaning that our currently checked out branch and master are at the same spot. I really like the
'git log --graph --oneline --decorate' command because it helps me visually understand the history of the repository so I have a git alias for it to set to
Rebase Two Branches
Now that we understand the basics of the graph that is being drawn we can ask interesting questions like: What will happen to our graph if instead of merging in the last step we were to rebase the
git init echo "First File Content" >> first_file.txt git add first_file.txt git commit -m 'v1' echo "second master commit" >> first_file.txt git add first_file.txt git commit -m 'v2' git checkout -b new_branch_for_v3 echo "on new_branch_for_v3 branch" >> second_file.txt git add second_file.txt git commit -m 'v3' git checkout master echo "3rd master commit" >> first_file.txt git add first_file.txt git commit -m 'v4' git rebase new_branch_for_v3
This time let’s look at the
git log before and after we run the rebase.
There are a few interesting things about the above image:
- We didn’t end up with 5 commits - we only have v1 through v4. Rebasing doesn’t make an extra
- The graph never branches out. After having rebased, the history of the extra branch is gone.
shafor commit v4 has changed. Since v4 is now based off of v3 instead of v2 the content of that commit had to change so effectively it is a new commit with a new
We can visualize the rebase process by drawing out the graphs that made this happen.
Here is the graph before we run the
git rebase command:
Here is the rebase in action:
Here is what we are left with once the rebase is over.
Merge vs Rebase
Now that we analyzed the graphs generated by the two different Git commands we can come up with some key differences between them. Merge keeps the branching in history and records when a branch diverged and when it was merged back in. Merge never changes history; it only adds new commits to the graph. Rebase, on the other hand, re-writes history as if we never branched. Rewriting history requires the graph to be rewound and played forward making new
shas along the way. This means that if you push changes that have been rebased this push will impact anyone else that is working on that branch since they will have the wrong version of history.
Most teams, generally, don’t allow rebased changes to be pushed to their main line branches but they will encourage rebases to happen on ‘feature’ branches. Doing so allows the Git history to be a clean set of merges for different features, while the extra clutter is rebased away.