Turning Source Code into a Program

Before getting straight into Makefiles, lets briefly cover how source code gets turned into an actual program that can run on a computer. Source code consists of a set of files and folders that contain code. This source code usually needs to be converted into a form that the computer can understand. This process is called compilation or compiling. A program that performs this conversion is called a compiler.

Sometimes the compiler needs to be given certain pieces of information so it can properly do its job. This information may include:

  1. The names and locations of the source code (input) files to compile
  2. The set of compiled (output) programs to create
  3. The names and locations to put the compiled (output) programs
  4. Whether or not to apply any special options in the compilation process

The process of choosing a compiler, identifying the set of source code files to be included, performing preperation steps, and compiling the code into its final form is called building, or the build process.

What is Make?

Make is a build automation tool. It would be very tedious for a developer to manually run all of the build steps in sequence each time they want to build their program. Build automation tools like Make allow developers to describe the build steps and execute them all at once.

What is a Makefile?

Makefiles are text files that developers use to describe the build process for their programs. The make command can then be used to conveniently run the instructions in the Makefile.

Baby Git Makefile

Below is the original Makefile for Git. It is used to invoke the gcc C compiler to build binary executable files for each of the original 7 git commands:

  1. init-db
  2. update-cache
  3. cat-file
  4. show-diff
  5. write-tree
  6. read-tree
  7. commit-tree

This Makefile can be invoked in 3 variations (referred to as 3 targets), by running the 3 following commands from the command line inside the same directory as the Makefile:

  1. make clean: This removes all previously built executables and build files from the working directory.
  2. make backup: This first runs make clean and then backs up the current directory into a tar archive.
  3. make: This builds the codebase and creates the 7 git executables.

Enough talk - here is the code from Git's first Makefile:

CFLAGS=-g # The `-g` compiler flag tells gcc to generate source-level debug information.
CC=gcc # Use the `gcc` C compiler.
# Specify the names of all executables to make. PROG=update-cache show-diff init-db write-tree read-tree commit-tree cat-file all: $(PROG)
install: $(PROG) install $(PROG) $(HOME)/bin/
# Include the following dependencies in the build. LIBS= -lssl
# Specify which compiled output (.o files) to use for each executable. init-db: init-db.o
update-cache: update-cache.o read-cache.o $(CC) $(CFLAGS) -o update-cache update-cache.o read-cache.o $(LIBS)
show-diff: show-diff.o read-cache.o $(CC) $(CFLAGS) -o show-diff show-diff.o read-cache.o $(LIBS)
write-tree: write-tree.o read-cache.o $(CC) $(CFLAGS) -o write-tree write-tree.o read-cache.o $(LIBS)
read-tree: read-tree.o read-cache.o $(CC) $(CFLAGS) -o read-tree read-tree.o read-cache.o $(LIBS)
commit-tree: commit-tree.o read-cache.o $(CC) $(CFLAGS) -o commit-tree commit-tree.o read-cache.o $(LIBS)
cat-file: cat-file.o read-cache.o $(CC) $(CFLAGS) -o cat-file cat-file.o read-cache.o $(LIBS)
# Specify which C header files to include in compilation/linking. read-cache.o: cache.h show-diff.o: cache.h
# Define the steps to run during the `make clean` command. clean: rm -f *.o $(PROG) temp_git_file_* # Remove these files from the current directory.
# Define the steps to run during the `make backup` command. backup: clean cd .. ; tar czvf babygit.tar.gz baby-git # Backup the current directory into a tar archive.

Build Variables

Build variables are variables than can be defined in the Makefile to hold specific values. In the Makefile above, words such as CFLAGS and CC are not special in any way. They are just variable names used to store the values that come after the equals sign. Variable names like $(CFLAGS) can be used later in the Makefile to substitute in the variable values where needed. This is convenient since we can use a variable name in multiple places, while only updating it in one place if the value changes.

Specifying the Compiler

Git is written in C, so this Makefile is tailored to a C build process.

The first line CFLAGS=-g specifies the compiler flags - special compiler options - to use during compilation. In this case, the -g flag tells the compiler to output debug information to the console.

The second line CC=gcc identifies the actual compiler to use. GCC is the GNU Compiler Collection. It supports compilation of code in several programming languages including C, C++, Java, and more.

Specifying the Executables

The third line defines a build variable called PROG which contains the names of the executables we'll be creating.

Linking External Libraries

We'll quickly skip ahead to the line which defines the LIBS variable. This stores the external libraries that we want to link into the build process. In this case, we link in the SSL library which allows Git to access cryptographic functions like hashing.

Make Targets and Commands

Throughout the Makefile, there are multiple lines that start with a keyword followed by a colon such as all:, install:, init-db:, etc. Each of these is called a target. Each target essentially maps to a command that you can specify when running Make, in the form make target.

For example, if you open a terminal window and browse to this Makefile's directory, you could run the make all command to run Make on the all target. Similarly you could run make install to run Make on the install target. If no target is specified, the all target will be used by default.

When Make runs a target, it executes the instructions associated with that target in the Makefile.

The All Target

Back to the Makefile, the all: $(PROG) line states that, when Make is run without specifying a target, all targets listed in $(PROG) will be executed. Since $(PROG) lists all 7 of the Baby Git executables, each of them will be executed.

The Install Target

The next target in the Makefile is install. It is run at the command line using make install. This starts the same way as the all target, by specifying the executables to compile using $(PROG). But then it uses the install command to move those built executables into the users home directory.

Baby Git Program Targets

Now for the targets corresponding to the executable names:

  • init-db:
  • update-cache:
  • show-diff:
  • write-tree:
  • read-tree:
  • commit-tree:
  • cat-file:

Each one of these targets specifies which compiled C object (.o) files we want in each of our executables. Below that each one specifies the compiler command to run based no the build variables specified earlier in the file.

The first executable init-db is very simple since it only includes 1 source file: init-db: init-db.o

The other executables (we'll take update-cache as an example) link together multiple C object (.o) files:

update-cache: update-cache.o read-cache.o
     $(CC) $(CFLAGS) -o update-cache update-cache.o read-cache.o $(LIBS)

The second line above gets converted to the following after variable substitution:

gcc -g -o update-cache update-cache.o read-cache.o -lssl

Linking Header Files

After the program targets, there are two lines that specify the C header (.h) files to link to each object (.o) file. The only header file in the Baby Git codebase is cache.h which gets linked to read-cache.o and show-diff.o.

The clean Target

This target is invoked using make clean and simply deletes all compiled code and executables from the working directory. It leaves the source files alone so that the program can be built again.

The backup Target

This target is invoked using make backup. First it invokes the clean target. Then it backs up the source code files in the working directory as a tar archive in the parent directory.

Conclusion

In this article we described how Git's first Makefile works line by line. We hope it helped you understand how Makefiles work and how they are implemented in practice.