Every time I see someone teach git, they use confusing diagrams which just don’t help. At all.
Here’s my take on git, without the stupid diagrams that make no sense.
Git is a tool created by Linus Torvalds to manage different versions of the linux kernel source code. It has since become the most used version control system in the world.
The most core aspect of git is called a commit. A commit is a series of changes to some code, which can be analogous to a videogame save file. Essentially, a single commit stores a series of changes and a parent commit. The parent commit is the previous commit made, thus through a chain of parent commits, you can store every sequence of changes made to get to a specific version of code. Thus, in a lot of situations, a git commit acts like a save file, which you can go back to at any point in time.
You can create a git “repository” in a directory by running git init
. More on
that later, including what a repository is. For now, just know you need to do it
before you can start working with git for a project.
The other useful command you might want is git status
. This allows you to see
a lot of useful information about the current state of your local git
repository, including changes to the “working tree” (the stuff tracked by git).
One last thing is to read through git’s output. It often contains useful information and hints on what you might want to do.
To create a commit, simply run git commit
in the shell. You may wish to
specify a number of command line flags, such as -m
which will let you specify
a message from the command line instead of being dropped into vim.
By default, git won’t commit any changes when you run git commit
. This is
because you need to specify which changes you want to commit. You do this with
the git add
command. For example, to add every change in a file called
README.md
, run git add README.md
At this point, I recommend you open up a terminal and make a few commits. These will be useful for the rest of this tutorial.
Remember, if you’re having trouble, don’t hesitate to google things and consult the git man pages.
Let’s say you want to go back in time to a previous commit. This is where the
git reset
command comes in. You simply provide it with enough of the commit
hash to uniquely identify whatever commit you want to go back to and git will do
the rest.
But what is a commit hash and how do I get it? A commit hash is a series of
hexadecimal characters which can uniquely identify any commit ever created using
the magic of cryptography. You can find a commit hash via a couple different
ways, however the main one is git log
. If you run git log
it will open up
the less
command with a history of everything. You can instead pipe git log
into other tools like head
or use the -n
flag to reduce the number of
commits shown.
Next to each commit will be a long string of seemingly random letters and numbers. This string is your commit hash. Now go back and read this section again.
Another useful tool for finding commits is git bisect
. I suggest you do your
own research on this as I will not explain it here. Bisecting can be useful if
you don’t know which commit broke something and you want to find said commit.
Using git reset
won’t always be helpful, as there are certain issues with time
travel. Those mainly being that it’s rather hard to communicate back and forth
with someone in a different timeline. That’s where git revert
comes in. Git
revert allows you to undo the changes in one commit with another commit. This
ensures time only travels forwards. It’s like building a house and then
demolishing it instead of travelling back it time to before it was built. This
ensures everyone stays on the same timeline.
You may also want to use git revert
if you have a few commits in between the
one you are undoing and the current git head (a shorthand for the latest
commit).
Sometimes you will also end up with a situation where git log
isn’t enough.
I’d encourage you to also look at git reflog
for these niche situations.
Before we can properly introduce remotes (i.e. cloud storage for git), we first need the concept of branches.
Going back to our analogies of timelines and videogame saves, a branch is like going back in time to some previous save and continuing development in a different path, thus creating effectively a new timeline with it’s own unique set of saves.
Git has this idea of “merging” two branches, which means changing the state of any tracked files (i.e. the worlds of the two timelines you you are combining) so that they become the same.
Let’s say in one timeline someone builds a house with a pool and a shed and in another timeline someone builds a mansion in that same spot. Merging would be the process of finding some common point between the house and the mansion. This could be just getting rid of the mansion and building an identical house and pool, or it could be to remove the shed and build a wing, doing the equivalent to the mansion to end up with a smaller mansion that also has a pool.
One thing to note about git is that in a new git repository created on the command line, the default branch is named “master”. This can of course be configured, and due to various reasons many people use an alternative such as “main” or “trunk” instead. A number of online git providers such as github and gitlab also use one of these alternative names. There is nothing wrong with any of these options (despite what some people online might tell you), but you should be aware that this discrepancy does exist.
The git commands related to dealing with branches are as follows:
git checkout
git branch
git merge
There are also a couple of other commands that I won’t cover, those being
git rebase
and git cherry-pick
. I encourage you to research these on your
own, as they are both very useful tools.The checkout subcommand allows you to switch branches. As a byproduct, it can also be used to create a new branch from the current commit and switch to it.
The branch subcommand allows you to list, create and delete branches. You can
list branches with the --list
option. Listing is also the default option, so
you can list branches with just git branch
. Creating can be done with the -c
and -C
flags and deleting with the -d
and -D
flags. Have a read through
the man pages if you want to learn more.
The merge subcommand allows you to merge changes from one branch into the current branch. You may have to deal with merge conflicts, which occur when git cannot automatically deal with the changes, such as if the same piece of code has been altered differently, such as in the mansion and house analogy above.
If, continuing with the analogy, a public park was built in one timeline and a house was built in another, git may be able to automatically merge these changes if the public park and the house weren’t built in the same place.
It is at this point that I suggest you go back to your command line and tinker around with your previous commits. Try adding a new branch. Try merging it with another one. Try inducing a merge conflict and dealing with that. Once you feel you are familiar with this process, come back and continue on to the next section.
Keep in mind that the git man pages and google are your friend. If something hasn’t clicked, have another read over it here. If it still doesn’t make sense, ask google or read through the relevant sections in the git man pages.
One of the great advantages of the way git is architected is that it allows for people to collaborate on work without any kind of real time synchronisation, which can be annoying if someone doesn’t have internet, a pain if someone is using a text editor which doesn’t support it and expensive if someone has a slow computer.
Git has a concept of remotes. Remotes are branches that aren’t stored locally. They can be stored somewhere else on your computer if you have the same “repository” twice (keeping in mind that a git repository is an instance of a codebase tracked by git), or somewhere on your network or the internet. Really anywhere that can store files can also store a remote git repository.
I also won’t continue with the analogies, because remotes are another layer on
top of the core git functionality and so the analogies won’t help. If you are
still struggling to understand something, it may be because some of the previous
parts of this document don’t make sense. Have another read over it and maybe try
google or the git man pages (see man git
). It is also possible this tutorial
is simply not for you and you would be better off with another one. I strongly
encourage you to try things and use every resource you have available to you.
An “upstream” repository is a repository whose primary purpose is to serve as a hub for changes, which will then trickle down to every other repository. This means any changes “upstream” will eventually be reflected “downstream” on your local machine.
I won’t go over setting up a remote, as most of the time you will be cloning a
git repository with git clone
, which will automatically set the cloned url as
an upstream repository, however, as usual, google is your friend, as are the git
man pages.
There is usually only one remote, and it is usually called “origin”, but this may not always be the case.
The important thing to know about remotes is that they have their own set of
branches. These branches are kept in sync with git pull
and git push
. There
is also a git fetch
command, but I won’t go over it here.
The git pull
command will download any changes on the remote repository and
pull them into the current branch. This may require dealing with merge
conflicts. There are a few useful options, including enabling pull.rebase
with
git config
or running git pull --rebase
to avoid dealing with many merge
issues. I won’t go into these here, however I suggest you do your own research
on the git rebase
subcommand.
The git push
command will upload any changes in the local repository to the
remote one. It is important to note that this only affects commits and not the
“working tree”. The working tree is the current state of code in the repository,
regardless of if it has been committed or not, hence the term “working”. The
“tree” part refers to the tree-like structure of directories.
There is one thing I haven’t mentioned. Each repository has a number of
branches. Git doesn’t know how to correlate these branches together, as they
could have different remote repositories they need to connect to. In order to
tell git which branches should go where, git push
and git pull
take some
extra arguments. The first one is the name of the remote or the remote branch to
push to. The second is the local branch to push changes from. Hopefully the
following examples will clear things up a bit.
git push origin master
git pull origin master
To make life easier, git can store upstream remotes for a branch. This can
either be done with git branch
or while pushing with the --set-upstream
flag
as follows:
# When creating a branch
git branch -c my-branch --track=inherit
# Leaving out my-branch will use the current branch
git branch --set-upstream-to=origin my-branch
git branch -u origin my-branch
# While pushing
git push --set-upstream origin master
git push -u origin master
Keeping in mind that the -u
option is just shorthand for the relevant longer
form option.
Once this has been done, future pushes and pulls will use the previously set
upstream and you can simply use git push
.
I strongly encourage you to set up a remote, either locally on using a website
such as github or gitlab. There
are a number of good tutorials online on how to use these websites. The git man
pages for git
and git clone
have good information on the different ways to
clone a repository, including locally.
Now start playing around and seeing whats possible. Please don’t do this on an actual project until you are comfortable with git, as you may end up in a state you don’t know out to get out of.
The last thing to mention about working with remote repositories is the idea of “forks”. Forks are a dedicated branch, which usually serves as an upstream for developers, however are typically downstream of the original project. Typically forks are merged back into the upstream repository at some point using what is known as a “pull request”.
The primary purpose of forks is to enable a developer to work on their own remote copy of a project, with all the benefits of having said remote, without allowing them direct write access to the original project’s source code. Their changes are then discussed and reviewed in a pull request, which may be merged back into the original project. This enables a developer to create a feature or fix a bug in their own fork, before sending the changes back upstream to the original project.
There are also hard forks, which are effectively just a copy of the codebase and are not considered to be downstream of the original project. Hard forks are not intended to be merged into the original project and are instead used for creating a new project based on the original.
This tutorial is missing a few crucial aspects of git which I have left out intentionally.
The first is working with git cloud providers[^1]. This is left out because this tutorial is about git, not a specific git infrastructure provider.
The second thing is configuring git. There are a lot of options you can
configure with git using the git config
subcommand. I haven’t covered these
because I believe they are up to the individual git user and are subjective.
These config options allow you to change the look and feel of git as well as the
default behaviour of commands like git status
and git pull
. There are a lot
of options and I’d recommend going through the man pages for a list.
I’ve also not covered git diff
. This is a very useful command which allows you
to view the changes made by one or more commits and/or the working tree.
Lastly, I haven’t covered editor integrations. This is because git is text editor agnostic, and therefore this tutorial should be too. For anyone who uses vim or a derivative thereof, I can strongly suggest tpope’s fugitive vim plugin for working with git inside the editor.
There are a few commands which are very useful but I haven’t properly covered here. If you want to learn more on your own, here’s a list:
git bisect
git reflog
git rebase
git cherry-pick
git fetch
git diff
This page is Copyright (C) Cyuria 2024.