Memory leak & invalid memory deallocation detection in the Linux kernel for modules using exclusively Operating System Abstraction Layer
The post processing script
Complimenting the header file, the post processing script needs to be
developed to parse the log file (/var/log/messages) and match the
allocations and deallocations based on the address, size and type.
If any address is allocated it must match a following deallocation
of the same size and type. Similarly for any deallocation that happens,
the script will need to match it with an allocation of the same size
and type.
If any mismatches are found, the script will need to generate the
appropriate message that there is a possible incorrect
allocation/deallocation. For the above set of logs the script will
match the memory allocations with the frees.
Assuming the condition that the first for loop (commented as LOOP
1) changes and from i<5 becomes i<=5 instead, and keeping the
second loop (commented LOOP 2) the same, a memory leak is induced and
one extra log will be created:
MEM_ALLOCATION:osal_mem_alloc():e2ac1200:1024:3:/home/myuser/allocations.c:8423
The script will not match it with a free, causing an error message
to be produced.
Multithread warning
This methodology can have a pitfall in multithreading applications.
Assuming the above macro implementation of osal_mem_alloc() and a
similar for deallocation, a race condition can occur where the
allocation in one thread can be preempted in this stage where the
actual allocation occurred but before the logging occurs.
If at that stage the deallocation thread is called that completes
before its own preemption. That would result to a proper allocation
followed by a deallocation, but the logs of the deallocation will be
appearing first, and followed by the allocation logs. This scenario is
displayed in Figure 3 below:
 |
| Figure
3 " Logs in wrong order due to multithreading |
A way to accommodate for this is to have the parsing script to check
the PID in case of a deallocation preceding an allocation for the same
memory address and type.
If the PID is the same it is definitely a malpractice and if the
illegal free hasn't caused a kernel panic, the script should report it
as an error. If the PID is different, the script can give out a warning
for a possible invalid free rather than an error.
Another possible solution is to statically define a locking
primitive (such as a mutex) in the OsalMemLeak.h, and surround all the
inline functions with that locking primitive.
Care in this case should be taken if there is a possibility of
sleeping or running in the context of an interrupt[9]. Also using this
primitive will result to non-portable code. Although one obvious way to
recover this is to log as soon as entering the inline function and then
have extra logic in the script to determine what happened first, this
is a wrong approach.
Being preempted right after the first logging in the allocation and
before doing the actual allocation cause the logs to appear in reverse
order than the events occurred in case the deallocation gets scheduled
right after and completes before the actual memory allocation.
SEC4
Using the method as part of dynamic analysis
Our proposed method of identifying memory leaks and corruptions can
easily be used for dynamic analysis in a nightly build and/or test
system. The production code can be built with the header file, and then
the unit test binaries can be run to produce logs that can be analyzed
from the post processing script.
The output of the script can be kept as a log and the script will
return zero if no memory allocation warnings and/or errors were found,
whereas on the opposite case, it will exit with an error code.
Evaluating the results
Using the methodology described in this article, a header file needs to
be produced. It must replace functions that allocate and deallocate
memory with macros that call the actual functions and log data such as
memory address and size.
Also, a Perl script that parses the logs generated by the binary
when it runs, and analyzes the data producing alerts when possible
leaks and memory corruptions occur needs to be created. This
methodology has a number of effects on the system being developed, both
pro and con:
Pros:
* Does not require any kernel patches or any special setup.
* Works efficiently in a virtualized environment which is one of the
most common methods for debugging in kernel space.
* Suitable for RT Linux.
* Not hardware specific.
* Easily portable to any OS.
* Can easily integrate with unit tests and nightly build.
Cons:
* As described earlier in the earlier multithreaded implementation, a
thread that allocates memory, for example, can be pre-emptied before
logging the allocation with the result that the de-allocation might be
logged before the initial allocation.
* In case the logging needs to be disabled (for example when syslogd
needs to be disabled) there's no way to get the logs and further use
the script to debug.
* Negative (but very small) performance impact with the printk(). Might
be crucial in some systems.
Future work
As a result of the work we have done so far, there are a number of
directions we could go to extend its functionality, including:
* Add all the kernel functions in order to cover not only OSAL but
also software that uses kernel API as well.
* Extend the work for other operating systems.
Joseph Gasparakis is a senior software engineer working
for Intel Corp. in Shannon,
Ireland for the Performance Product Division of the Embedded Computing
Group. He has a 9 year long work experience in telecoms industry
working on several proprietary and open source software stacks. He has
also worked as system administrator, and he is a Certified Ethical
Hacker (CEH). Joseph is currently specializing in optimizations of
multicore and network related technologies.
Rajalekha, Kilar Rajaram is working as network software
engineer for Intel in Shannon, Ireland for the Performance Product
Division of the Embedded Computing Group. She has extensive experience
in 3G and 4G wireless protocol stack development on a variety of real
time operating systems. She is currently focusing her energy in the
development of Media Access Control layer for Wimax Base stations.
Mario Nicolas is a software engineer working for Intel,
in Shannon, Ireland for the Performance Product Division of the
Embedded Computing Group. Having a mathematical background, his domain
of expertise is the implementation and optimization of algorithms,
including search, cryptographic and compression/decompression.
References
[1] http://lwn.net/Articles/187979/
[2] http://en.wikipedia.org/wiki/OSAL
[3] http://oss.sgi.com/projects/kdb
[4] http://kgdb.linsyssoft.com/
[5] Pichai Raghavan, Amol Lad, Sriram Neelakandan " Embedded
Linux System Design and Development " paragraph 3.7.1, p. 81-82
[6] http://freshmeat.net/projects/efence/
[7] http://duma.sourceforge.net/
[8] http://duma.sourceforge.net/
[9] Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman " Linux Device Drivers
" p. 121-123.