README
======

The script analyzes the file dependencies in C or C++ source code as generated
by the -H option of gcc[1]. The dependencies are described as a graph in the
.dot[2] file format. The script includes various filters to postprocess the
output, e.g., removing all nodes not matching a given regular expression.

The following section briefly discusses the advantages and disadvantages of
various approaches to generate such dependencies. The following sections
explain how to use this script.


1. Different approaches to collect dependency information
=========================================================

There are several approaches to collect the dependency information. Three
possible approaches are parsing the source files directly, using the -MD option
of gcc, and using the -H option of gcc.

The first approach is to parse the source files and search for #include
statements. This approach is used by the cinclude2dot[3] script. An advantage
of this approach is that it is not necessary to modify the build system in any
way (because the build system is not used at all, the source files are parsed
directly by the script). A disadvantage of this approach is that you have to
setup the include paths to resolve the include directives. This can be very
difficult if you have a large project with different include paths per module
or even per file. Another disadvantage are conditional includes. Most parsers
including the cinclude2dot script just scan for #include statements without
taking #ifdef statements into account.

Another approach is to use the -MD option (or related options) of gcc. Since
these options are already used by some build systems (e.g. autoconf/automake),
the dependency information generated by this options are sometimes available
for free. This approach alleviates the need to set up any include paths, it
just uses the same include paths as the compiler. Moreover, conditional include
are handled correctly as well. A disadvantage is that this options generates a
*set* of all direct and indirect dependencies. Assume for example a file a.cpp
including b.h which in turn include c.h. Then the -MD option lists both b.h and
c.h as dependencies of a.cpp. But there is no information why c.h is a
dependency of a.cpp, nor is the dependency of b.h on c.h mentioned.

The last approach is to use the -H option of gcc. This approach is used by the
script in this package. Since this option is usually not used by common build
systems to generate dependency information for their own purpose, it is
necessary to patch the build system. Like the previous approach, there is no
need to set up include paths. Conditional includes are also handled correctly.
Moreover, this approach generates most accurate dependency information.
Consider again the example from the previous paragraph. The -H option does not
only generate b.h and c.h as dependencies of a.cpp. It also recognizes that c.h
is an indirect dependency of a.cpp because c.h is a dependency of b.h which is
a dependency of a.cpp.

These advantages have been the motivation to write this software.


2. Preparation of the build system
==================================

This chapter explains how to patch the build system of your project to generate
the required dependency information. Since the patch redirects stderr you
should make sure that your project builds without any errors. The script should
be able to cope with unexpected output on stderr, but you will not be able to
see any error messages in case something goes wrong.

After patching the build system as described below you should rebuild your
project from scratch such that the dependency information is actually
generated. You should obtain an additional file with dependency information for
each of your source files.

The examples below use ".d" as filename suffix for the files with the
dependency information. The filename suffix can be arbitrary, ".d" is used here
because it is the default of the script.

Note that the -H option does not generate a (direct) dependency if a given
headerfile has already been included indirectly though other include directives
and is basically empty due to effect of the include guards. If you want to see
such dependencies as well you should add some dummy statements to your
headerfiles (outside the include guards), e.g.,

#undef FOO
#define FOO 42


2.1 Custom build systems
========================

The -H option needs to be added to each command that generates an object file
from a C or C++ source file.

For example, a pattern rule to compile C++ source files in your makefile might
look like

%.o: %.cpp
	(CXX) -o $@ -c $< $(CXXFLAGS)

Simply add "-H 2> $<.d" to the build command. The modified rule looks like

%.o: %.cpp
	(CXX) -o $@ -c $< $(CXXFLAGS) -H 2> $<.d

As a byproduct of the compilation process you will obtain a file with
suffix .cpp.d for each source file.

Note that relative paths in the output generated by the -H option are relative
to the current working directory during compilation of the file in question.
This is fine as long as the current working directory during compilation of a
particular file is the same directory as the one that contains the file (or, in
out-of-source builds, the same as the corresponding directory in the build
tree). If this is not the case, you need to record this working directory
which is needed later to resolve the relative paths. You can do so by adding

  "; echo $(PWD) > $<.d.cwd"

at the end of the compile command. The script looks for optional files with
extension .d.cwd and interprets their content as base directory for relative
paths in the .d file. If such a .d.cwd file does not exist, the directory that
contains the .d file is used as base directory.


2.2. Build systems based on autoconf/automake, cmake, or qmake
==============================================================

Below you can find commands to patch and unpatch the makefiles that are used by
build systems based on autoconf/automake, cmake, or qmake. Note that these
commands depend on the structure of the makefiles generated by these tools.
The commands are known to work for automake 1.10, autoconf 2.61, cmake 2.6.0,
and qmake 4.4.3. Changes might be necessary to support other versions.

You might want to use "-i.bak" instead of "-i" to create backups and check that
the suffix " -H 2> $<.d" has been successfully added to the build rules.

You can avoid using xargs by passing the output of the find command as argument
to sed. However, for large projects the length of the command line generated by
this approach might exceed the feasible limit.

autoconf/automake:

  find . -name Makefile | xargs -r sed -i '/COMPILE)/s/$/ -H 2> $<.d/'
  find . -name Makefile | xargs -r sed -i 's/ -H 2> $<\.d//'

cmake:

  find . -name build.make | xargs -r sed -i '/_FLAGS) -o/s/$/ -H 2> $(PWD)\/$<.d/'
  find . -name build.make | xargs -r sed -i 's/ -H 2> $(PWD)\/$<\.d//'

qmake:

  find . -name Makefile | xargs -r sed -i '/FLAGS) $(INCPATH) /s/$/ -H 2> $<.d/'
  find . -name Makefile | xargs -r sed -i 's/ -H 2> $<\.d//'

The note about the working directory in the previous section applies here as
well. So far, the problem has been observed for build systems based on qmake,
e.g., for Qt[4] itself. The workaround mentioned in the previous section works
fine.


3. Generation of the dependency graph
=====================================

After preparation of the build system and building your project as usual you
should have a file with suffix .d for each of your source files. Possibly you
also have files with suffix .d.cwd.

Now simply run the script. Use the -r option to point it to the root directory
of your project containing all the files with the dependency information.

  ./dependencies.py -r /path/to/root/of/your/project

If you have chosen a filename suffix different from .d you have to use the -s
option. For example, if you have chosen the filename suffix .foo, then call the
script as follows.

  ./dependencies.py -r /path/to/root/of/your/project -s .foo

The generated graph is written to stdout. You can redirect the output using the
-o option. See the output of the -h or --help option for further options, in
particular for the various filters to postprocess the output.


4. Visualization of the dependency graph
========================================

Most probably you want to visualize the generated dependency graph. This can be
done with the dot[2] command from the graphviz software. Assuming you have
written the output of the script to a file named graph.dot, you can call dot as
follows.

  dot graph.dot -Tpng -o graph.png -Granksep=2.0

You might want to generate an SVG image which consumes much less disk space than
than a PNG image.

  dot graph.dot -Tsvg -o graph.svg -Granksep=2.0

The option -Granksep=2.0 doubles the vertical spacing which is often too small
in the default settings.



[1] http://gnu.gcc.org/
[2] http://www.graphviz.org/
[3] http://flourish.org/cinclude2dot/
[4] ftp://ftp.trolltech.com/qt/
