A Practical Approach to Debugging Apache Modules

These days, The Apache webserver is seen as rather unfashionable, and out of favour, compared to nginx, but despite this, it's still the most widely deployed, powering 43% of the active sites on the internet, and 37% of the busiest.

By default, httpd is commonly built with a very small number of statically linked modules:

$ httpd -l
Compiled in modules:
  core.c
  mod_so.c
  http_core.c

The modules core.c and http_core.c are, as might be expected, the core of the webserver. mod_so.c is what allows additional modules to be loaded.

The Apache documentation lists 128 modules which can be included, providing functionality ranging from authentication, through forensic logging to session encyption, so the chances are that for most use cases there's an open source module available that can be dynamically loaded, that will meet your needs.

However, sometimes there isn't an appropriate module module, and one needs to be developed in house, or perhaps one already exists. Also, not all of those 128 modules are commonly shipped by distributions and vendors, so it's important to be able to build modules yourself.

Building dynamically shared objects (DSOs) is done with the helper tool apxs - the Apache Extension Tool. The DSO is built from one or more source or object files, and then can be loaded into the Apache server under runtime via the LoadModule directive from mod_so. In most cases this is pretty straightforward and 'just works', but when it doesn't it can be tricky to debug.

I recently had to build a custom module, which didn't have much/any documentation, and experienced a few wrinkles on the way. Based on that experience, here are some suggestions to considert, if/when you find yourself needing to debugging the building and loading of custom modules.

Watch out for apxs order

The first gotcha to watch out for is the order with which apxs is called. apxs has a number of modes - it can query httpd to determine the variables and environment used at buld time, it can be used to generate scaffolding to develop one's own modules, and it can be used to build DSOs. When invoked to build a module, it can also install and configure the webserver, by modifying the httpd.conf and placing modules in the server's modules directory.

The simplest case of all is simply to build the DSO, with the apxs -c command. The order is important.

apxs itself is really just a perl script which generates libtool commands. In some respects it's very clever, but in others it's rather stupid. If you mix up the order, apxs won't complain, will generate the wrong libtool commands, and you will get unexpected results. So make sure you follow the right order. In simple terms:

apxs -c -o YOUR_DSO YOUR_MODULE_SRC

In detail:

 apxs -c [ -S name=value ] [ -o dsofile ] [ -I incdir ] [ -D name=value ] [ -L libdir ] [ -l libname ] [ -Wc,compiler-flags ] [ -Wl,linker-flags ] files ...

I've been burned by getting this wrong, so make sure it's right. Consult the documentation at https://httpd.apache.org/docs/2.4/programs/apxs.html for more information.

Using readelf

Once the DSO has been built, we can check it by using the readelf command.

The first thing to do is check the ELF header:

➜  ~ readelf -h mod_redacted.so
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x540
  Start of program headers:          64 (bytes into file)
  Start of section headers:          2416 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         4
  Size of section headers:           64 (bytes)
  Number of section headers:         23
  Section header string table index: 22

We're looking here to verify the TYPE is DYN. If it isn't there's something badly wrong! If it's wrong, again, check how you called apxs.

Next we can look at the symbol table:

➜  ~ readelf -s mod_redacted.so

Symbol table '.dynsym' contains 12 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000000004f0     0 SECTION LOCAL  DEFAULT    8
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
     3: 00000000002008a8     0 NOTYPE  GLOBAL DEFAULT   19 _edata
     4: 0000000000000640     0 FUNC    GLOBAL DEFAULT   11 _fini
     5: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     6: 00000000002008b0     0 NOTYPE  GLOBAL DEFAULT   20 _end
     7: 00000000002008a8     0 NOTYPE  GLOBAL DEFAULT   20 __bss_start
     8: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
     9: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
    10: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (2)
    11: 00000000000004f0     0 FUNC    GLOBAL DEFAULT    8 _init

There are a couple of clues that something's not right here - there are only 12 symbols, and there's no evidence of any linking to apr.

Now, let's have a look at the one which has been successfully linked:

➜  ~ readelf -s mod_redacted.so | grep entries
Symbol table '.dynsym' contains 85 entries:
➜  ~ readelf -s mod_redacted.so | grep apr | head
     2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_table_add
     4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_md5_init
     5: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_md5_update
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_global_mutex_lock
    15: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_global_mutex_create
    16: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_pool_create_ex
    19: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_table_get
    22: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_itoa
    33: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_palloc
    34: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND apr_shm_create

This looks much better! We have a healthy-looking DSO.

Using gdb

Now that we have a DSO, which seems to be appropriately linked, the next step is to make the module available to httpd. When apxs speaks of 'installing' and 'configuring', all it means is putting the DSO into a canonical place, and inserting a LoadModule directive into the configuration. We can do that manually, or with our configuration management tool of choice.

Once httpd has been restarted/reloaded, hopefully we'll see some log output indicating a successful load. However, with custom modules, especially those that perhaps haven't been maintained, or those where the author is no longer involved, things might not go as smoothly as this.

The most likely thing to happen, if all isn't quite right, is for the httpd process to die, with a segmentation fault. This might happen immediately, or after a bit of time. Either way, the next step is going to be to breakout gdb and try to find out what's wrong.

This can be done in one of two ways - by running the httpd process under the debugger, or by analysing a coredump.

Before running httpd under gdb, we need to consider a number of important things. We will be running the webserver as a real user with a real login shell. This introduces attack vectors normally not exposed when running httpd. This is an even more important consideration if the process is started as the root user. Additionally, because we're running the process as a different user, any issues related to filesystem privileges may well look quite different, not arise, or arise as a consequence and side-effect of our debugging. For these reasons, although it's obvious, I strongly recommend making sure that wherever you're running the testing be isolated and protected, before beginning the debugging process.

The process to run under a debugger is to call gdp httpd, and then within the debugger itself, start the service, with r -X - that is run, with the -X flag. This makes sure the process doesn't background itself, or fork itself, which makes debugging much easier.

In my case, I configured Apache to generate a coredump, which I then investigated with gdb.

Apache httpd will not generate core files unless explicity instructed, with the CoreDumpDirectory. Bear in mind that the default ServerRoot directory should not be writable by the user under which the service runs. The simplest solution is to tell the service to write coredumps to /tmp.

CoreDumpDirectory /tmp

Next we need to lift the restriction on the size of a coredump file:

ulimit -c unlimited

Now when the system segfaults, we'll get a coredump to analyse.

Again, we need to make sure httpd is started with -X, so it doesn't fork or background itself, and then start the webserver. When it crashes, a coredump will appear.

Once we have the coredump, we can investigate by passing the path to the webserver and the coredump to gdb

gdb /path/to/httpd /tmp/core.1234

Learning to use gdb is a subject in itself, and my knowledge on the subject is very slender. However, in essence, what we're trying to find out is where in the program the execution stopped, and how it got to that point.

If you think about the anatomy of an apache module, it basically consists of a module definition, that tells httpd how to load the module, and creates a namespace for configuration, followed by a hook into the request-handling logic, and a handler, which receives a callback when a request is made, together with the request itself. The handler contains various functions to do whatever it is that our module does. Some of these will be libary functions provided by the Apache portable runtime, some may come from other sources, and some will be in the module code itself.

Each time the module carries out a function call, information about that call is generated, which includes the location of the call in the module, its arguments, and any local variables belonging to the function being called. All this information is saved in a block of data called a stack frame. The stack frames are allocated in a region of memory called the call stack.

When a program crashes, all that data is lost. Unless you're running under a debugger, or, you're able to dump the memory (or core) to disk, and then inspect it.

Once the dump is loaded into gdb we can use bt full (for backtrace) to get a stack trace from the time of the crash. In the backtrace, each function invocation is given a number, one line per stack frame, for many stack frames, starting with the currently executing frame (frame zero), followed by its caller (frame one), and on up the stack.

This gives us important information about what caused the crash. Of course, each case will vary, and it's not possible to give guidance or advice on what to look out for. In my case, the trace looked a bit like this:

(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x00007f25868b7b19 in redacted_rw_init () from /path/to/mod_redacted.so
No symbol table info available.
#2  0x000000000045eae2 in ap_run_pre_config ()
No symbol table info available.
#3  0x000000000042e0fb in main ()
No symbol table info available.

The first frame we see has a pointer of 0x0000000000000000, which means that a function inside redacted_rw_init () returned NULL. Looking inside that function in the source code revealed the following:

redacted_fn = APR_RETRIEVE_OPTIONAL_FN(ap_register_rewrite_mapfunc);
redacted_fn("redacted", rewrite_mapfunc_redacted);

The redacted_fn assignment uses a macro from apr which looks up the most appropriate function from the currently loaded modules. So the assignment redacted_fn was NULL, so the macro was unable to find an appropriate function. ap_register_rewrite_mapfunc comes from mod_rewrite, but the macro couldn't find it. Why not? Because at the time of loading the custom module, mod_rewrite wasn't available. Solution: load mod_rewrite.

Conclusion

I've only scraped the surface of troubleshooting and debugging here, but my objective is to provide the beginning of a framework. Whenever troubleshooting, it's vital to keep a calm head, and proceed logically. Information is power, and thankfully, there are excellent tools available which give us the opportunity to extract information and drill right down into the source of the problem.

Don't be disheartened if, at some point, you finally find yourself out of your depth. If you're able to reproduce the issue, and have clearly documented steps to reproduce it, a stack trace, and a function block in the code that is breaking, the chances of you getting a positive and helpful response from either the maintainer, someone in the open source community, or even just another more experienced engineer are massively stacked in your favour.

Show Comments