Using static analysis to make open source Web applications more secure

Editor’s Note:  Excerpted from their book Embedded Systems Security , the authors demonstrate how static analysis can be used to find and eliminate coding errors. They use as their case study three popular safety critical open source applications – Apache, OpenSSL and sendmail – and analyze them using Green Hill’s DoubleCheck analyzer.

The Apache open source hypertext transfer protocol (HTTP) server is the most popular web server in the world, powering a majority of the websites on the Internet. Given the ubiquity of Apache and the world’s dependence on the Internet, the reliability and security of Apache represent an important concern for all of us. A serious flaw in Apache could cause widespread inconvenience, financial loss, or worse. The Apache web server consists of approximately 200,000 lines of code, 80,000 individual executable statements, and 2,000 functions.

OpenSSL is an open source implementation of Secure Sockets Layer (SSL) and Transport Layer Security (TLS) as well as a comprehensive cryptographic algorithm library. TLS is the modern reimplementation of SSL, although SSL is often used as a general term covering both protocols.

SSL forms the basis of much of the secure communication on the Internet. For example, SSL is what enables users to send private credit card information securely from their browsers to an online merchant’s remote server. In addition to being intimately involved with data communication, OpenSSL contains implementations of a variety of cryptographic algorithms used to secure the data in transit.

OpenSSL is available for Windows; however, OpenSSL is the standard SSL implementation for Linux and UNIX worldwide. In addition, because of its liberal licensing terms (not GPL), OpenSSL has been used as a basis for a number of commercial offerings. Like Apache, OpenSSL is a keystone of worldwide secure Internet communication.

Flaws in this software could have widespread deleterious consequences. OpenSSL consists of approximately 175,000 lines of code, 85,000 individual executable statements, and 5,000 functions.

Although its use is in decline, sendmail is among the most popular electronic mail server software used in the Internet. Sendmail has been the de facto electronic mail transfer agent for UNIX (and subsequently, Linux) systems since the early 1980s.

Given the dependence on electronic mail, the stability and security of sendmail is certainly an important concern for many. The name sendmail might lead one to think that this application is not very complicated. Anyone who has ever tried to configure a sendmail server knows otherwise. Sendmail consists of approximately 70,000 lines of code, 32,000 individual executable statements, and 750 functions.

Output of a Static Source Code Analyzer
Many leading source code analyzers generate an intuitive set of web pages, powered by an integrated web server. The developer can browse high-level summaries of the different flaws found by the analyzer and then click on hyperlinks to investigate specific problems.

Within a specific problem display, the error is displayed inline with the surrounding code, making it easy to understand. Function names and other objects are hyperlinked for convenient browsing of the source code. Since the web pages are running under a web server, the results can easily be shared and browsed by any member of the development team.

The following sections provide examples of actual flaws in Apache, OpenSSL, and sendmail that were discovered by DoubleCheck. The results are grouped by error type, with one or more examples of each error type per section:

  1. potential NULL pointer access;
  2. buffer underflow; and
  3. resource leaks.

Potential NULL Pointer Access
By far the most common flaw found by the analyzer in all three suites under testing was potential NULL pointer access. Many cases involved calls to memory allocation subroutines that were followed by accesses of the returned pointer without first checking for a NULL return.

This is a robustness issue. Ideally, all memory allocation failures are handled gracefully. If there is temporary memory exhaustion, service may falter but not terminate. This is of particular importance to server programs such as Apache and sendmail. Algorithms can be introduced that prevent denial of service in overload conditions such as that caused by a malicious attack.

The Apache web server, sendmail, and OpenSSL all make profligate use of C runtime library dynamic memory allocation. Unlike Java, which performs automatic garbage collection, dynamic memory allocation using the standard C runtime requires that the application itself handle potential memory exhaustion errors. If a memory allocation fails and returns a NULL pointer, a subsequent unguarded reference of the pointer is all but guaranteed to cause
a fatal crash.

In the Apache source file scoreboard.c, we have the following memory allocation statement:

ap_scoreboard_image =
  calloc(1,sizeof(scoreboard) + server_limit *
  sizeof(worker_score *) + server_limit *
  lb_l imit * sizeof(lb_score *));

Clearly, the size of this memory allocation could be substantial. It would be a good idea to make sure that the allocation succeeds before referencing the contents of ap_scoreboard_image. However, soon after the allocation statement, we have this use:

ap_score_board_image->global = (global_score
  *)more_storage;

The dereference is unguarded, making the application susceptible to a fatal crash. Another example from Apache can be found in the file mod_auth_digest.c:

entry = client_list->
table[idx]; prev = NULL;

while (entry->next){/* find last entry */ prev = entry;

entry = entry->next;


}

The variable entry is unconditionally dereferenced at the beginning of the loop. This alone would not cause the analyzer to report an error. At this point in the execution path, the analyzer has no specific evidence or hint that entry could be NULL or otherwise invalid. However, the following statement occurs after the loop:

if (entry) {
  …
}

By checking for a NULL entry pointer, the programmer has indicated that entry could be NULL. Tracing backward, the analyzer now sees that the previous dereference to entry at the top of the loop is a possible NULL reference.

The following similar example was detected in the sendmail application, in the file queue.c, where the code unconditionally dereferences the pointer variable tempqfp :

errno = sm_io_error(tempqfp);

sm_io_error is a macro that resolves to a read of the tempqfp->f_ flags field. Later in the same function, we have this NULL check:

if (tempqfp != NULL) sm_io_close(tempqfp,
  SM_TIME_DEFAULT);

In addition, there are no intervening writes to tempqfp after the previously noted dereference. The NULL check, of course, implies that tempqfp could be NULL; if that were ever the case, the code would fault. If the pointer can never in practice be NULL, then the extra check is unnecessary and misleading. What may seem harmless sloppiness can translate into catastrophic failure under certain conditions.

In sendmail, there are many other examples of unguarded pointer dereferences that are either preceded or followed by NULL checks.

The final example in this category comes from OpenSSL, in file ssl_lib.c:

if (s->handshake_func == 0) {     
  SSLerr(SSL_F_SSL_SHUTDOWN, SSL_R_UNINITIALIZED);
}

Shortly thereafter, we have a NULL check of the pointer s:

if ((s != NULL) && !SSL_in_init(s))

Again, the programmer is telling us that s could be NULL, yet the preceding deference is not guarded.Buffer Underflow
A buffer underflow is defined as an attemptto access memory before an allocated buffer or array. Similar to bufferoverflow, buffer underflows cause insidious problems due to theunexpected corruption of memory. The following flaw in file queue.c insendmail was discovered by static analysis:

if ((qd == -1 || qg == -1) && type != 120)
  …
  else {

  switch (type) {
  …
  case 120:
    if (bitset(QP_SUBXF,
      Queue[qg]->qg_qpaths[qd].qp_subdirs))
        …
  }
}

As you can see, the if statement implies that it is possible for qd or qg to be e1 when type is
120. But in the subsequent switch statement, always executed when type is 120, the Queue
array is unconditionally indexed through the variable qg. If qg were e1, this would be an
underflow.The program was not studied exhaustively to determine whether qg canindeed be e1 when type is 120 and hence reach the fault. However, if qgcan’t be e1 when type is 120, then the initial if check is incorrect,misleading, and/or unnecessary.

Another example of buffer underflow is found in file ssl_lib.c in OpenSSL:

p = buf;
sk = s->session->ciphers;
for (i = 0; i < sk_SSL_CIPHER_num(sk); i++) {
  …
  *(p++)=‘:’;
}
p[-1] = ‘’;

Theanalyzer informs us that the underflow occurs when this code is calledfrom file s_server.c. From a look at the call site in s_server.c, it isclear that the analyzer has detected that buf points to the beginning ofa statically allocated buffer. Therefore, in the ssl_lib.c code, ifthere are no ciphers in the cipher stack sk, then the access p[e1] is anunderflow. This demonstrates the need for an inter-module analysis,since there would be no way of knowing what buf referenced withoutexamining the caller.

If it is the case that the number ofciphers cannot actually be 0 in practice, then the for loop should beconverted to a do loop to make it clear that the loop must always beexecuted at least once (ensuring that p[e1] does not underflow).

Anotherproblem is a potential buffer overflow. No check is made in thessl_lib.c code to ensure that the number of ciphers does not exceed thesize of the buf parameter. Instead of relying on convention, a betterprogramming practice would be to pass in the length of buf and then addcode to check that overflow does not occur.

Resource Leaks
In file speed.c in OpenSSL:

fds=malloc(multi*sizeof *fds);

fdsis a local pointer and is never used to free the allocated memory priorto return from the subroutine. Furthermore, fds is not saved in anothervariable where it could be later freed. Clearly, this is a memory leak.A simple denial-of-service attack on OpenSSL would be to invoke orcause to be invoked the speed command until all of memory is exhausted.

Manywould argue that the code quality of such popular open sourceapplications is expected to be relatively high. As one person put it,“By sharing source code, open source developers make software morerobust. Programs get used and tested in a wider variety of contexts thanone programmer could generate, and bugs get uncovered that otherwisewould not be found.”

Unfortunately, in a complex software application such as Apache, it is simply not
feasiblefor all flaws to be found by manual inspection. In addition to thiscase study, other commercial static code analyzers have been usedsuccessfully on large open source
applications, including the Linux operating system, to locate numerous latent security vulnerabilities.

Numerousmechanisms are available to help in the struggle to improve softwarequality, including improved testing and design paradigms. But automatedsource code analyzers are one of the most promising technologies.

David Kleidermacher , Chief Technology Officer of Green Hills Software , joined the company in 1991 and is responsible for technology strategy,platform planning, and solutions design. He is an authority in systemssoftware and security, including secure operating systems,virtualization technology, and the application of high robustnesssecurity engineering principles to solve computing infrastructureproblems. Mr. Kleidermacher earned his bachelor of science in computerscience from Cornell University.

This article is excerpted from Embedded Systems Security by David and Mike Kleidermacher, and is used with permission fromNewnes, a division of Elsevier. Copyright 2012. All rights reserved.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.