Geospatial visualization helps manage million-line-plus embedded code bases
The architectural building blocks
The present implementation of our geospatial visualization has three major components: an analysis process, a special-purpose HTTP server, and a client. The analysis process extracts the call-graph from the source code of the target project and sends it to the server as XML. The server component parses the XML and populates an SQL database with information about the call-graph, and serves this data to the client via HTTP. The client retrieves data from the server and renders it using a Java applet. These components are described in more detail below.
Call-graph extraction.We extract the call-graph and other information using CodeSonar, a commercial static-analysis tool. The visualization technology has been integrated with CodeSonar to make this process seamless.
HTTP server.The server component has two phases: an offline import phase, where it parses the XML-encoded call-graph and computes the bundled and extension edges of the Map view; and a serve phase, where it answers queries sent by the client. Whenever possible, information is pre-computed during the import phase and stored to minimize the time required to respond to user actions during the serve phase. The call-graph is stored in a SQLite database. In the serve phase, a session begins with the server sending an HTML page with an embedded Java applet. Once the applet starts, the server is responsible for responding to queries. Query responses are encoded using JSON.
The client component.The client runs as a web page inside a web browser (Figure 2). The client component is a combination of Java (running in an applet) and JavaScript. The Java applet handles the sophisticated graph visualization tasks: retrieving graph data, computing layouts, and working with the graphics hardware via the OpenGL API. The JavaScript controls the HTML elements that surround the applet.

Click on image to enlarge.
Figure 2. Screenshot of the tool
Rendering and manipulating large graphs can be time- and memory-intensive. We avoid many scalability issues by retrieving and rendering graph data only when it is required by the position and depth of the virtual camera.
When the user is zoomed out, the client only needs to retrieve and lay out the top one or two layers of the component hierarchy. As the user zooms into a subcomponent, the client will retrieve deeper layers of the hierarchy, but only those parts that will appear on screen. We use background threads to handle time-intensive tasks like asynchronous requests to the server and graph layout; this allows the user to continue working with the tool while new data is retrieved.
Figures 3a-d show the Map view of some components present in an analysis of curl-7.19.7. Figure 3a is a composite image that shows views at three different levels of detail; these levels of detail are shown individually in higher resolution in Figures 3b-c. Figure 3b shows the lib and src components at a broad level of detail. Both lib and src contain a hierarchy of files and procedures. Figure 3c shows the file ftp.c, a child of lib, in finer detail. Figure 3d shows the ftp.c file in greater detail still, showing its procedures and their associated calls. (View a video demonstration of the tool at work)

Click on image to enlarge.
Figure 3a. Map view of some components present in an analysis of
curl-7.19.7.

Click on image to enlarge.
Figure 3b. Broad view of the lib and src components of curl-7.19.7

Click on image to enlarge.
Figure 3c. Detailed view of the file ftp.c, a child of lib

Click on image to enlarge.
Figure 3d. Detailed view of the ftp.c file showing its procedures and their associated calls
We use the JOGL library to access OpenGL from a Java applet. Using OpenGL allows us to render graphs (with smooth zooming and panning) that are much larger than we can with the Java 2D drawing packages.
Some of the graph layout duties are handled by the commercial yFileslayout engine, and some are handled by layout algorithms we developed in house. The client leverages some of the user interface features that come with CodeSonar. We link to and embed CodeSonar’s sophisticated source listings so users can see both textual and graphical representations of their program.
Putting geospatial visualization to work
The tool we’ve developed has been tested on a proprietary project with twenty million lines of code (MLOC), as well as on several large open source projects including Wireshark with its X11 dependencies (3.5 MLOC) and Firefox (1.8MLOC).
Extracting the call graph from a large project requires negligible extra time during a CodeSonar analysis. On a modern Linux workstation, the import phase for a 20MLOC project takes approximately 2 hours. Once the import is finished, a user can connect to the server and browse the project in real time with latencies similar to a typical web browsing session.
The primary feature of the Map view is that it shows the entire project in a single picture. Of course, the user does not see all the details at one time, but since the tool supports smooth zooming and panning, it feels like one is seeing everything, much like Google Earth and NASA Worldwind give one the sense of seeing the whole planet in detail.
This ‘whole world’ property makes the Map view a useful canvas for projecting additional layers of data. For example, we can color zones of the view according to metrics like code complexity or rate of change. Or we can color those zones of the Map view that contain matches to a search engine query.
The Map view does have limitations. It requires a project to be organized into a component hierarchy, though this can be implicit in the file system layout. Also, if the hierarchy is too flat— for example, all the procedures are in one file—the Map view offers few advantages over other node and edge visualizations.
In addition, if the hierarchy does not group procedures into logical components with limited inter-component dependencies, the Map view may display a tangled mess of nodes and edges, though arguably this indicates that the component structure should be changed.
Finally, the Map view is not suited for studying the relationship between components that are distant from each other in the layout. To address this weakness, the tool supports custom maps where components can be selectively hidden and shown, as discussed earlier in this article.
Other ways of visualizing code. There are several other tools under development elsewhere that take an explicitly cartographic approach to software visualization. Codemap [3] visualizes a software project as a sequence of mountainous islands. CodeCity [4] renders software as a set of columns that resemble buildings in a city. Steinbrückner and Lewerentz [5] show software evolution as a growing city. CodeCanvas [6] supports deep zooming in and out of software in a manner similar to modern mapping applications, though they do not use cartographic metaphors for representing the software.
By reusing techniques originally employed to display large geographic data sets, we’ve been able to display large software projects (tens of millions of lines of code) in a single view with smooth zooming and panning.
Where we’re going next
We recently extended the visualization tool to overlay information about code metrics and defect density. We are exploring displaying additional information about software projects. When adding new information, there are two primary (and related) challenges: 1) the technical challenge of managing and collating large datasets so they can be retrieved quickly, and 2) the usability challenge of presenting those datasets to the user so they can be interpreted while dealing with hundreds of thousands of procedures.
References
[1] Google Earth
[2] World Wind
[3] A. Kuhn, D. Erni, and O. Nierstrasz, “Embedding spatial software visualization in the IDE: an exploratory study”, in Proceedings of SOFTVIS 2010, pp 113–122.
[4] R. Wettel and M. Lanza, “Visual exploration of large-scale system evolution”, in Proceedings of the 15th Working Conference on Reverse Engineering, 2008, pp. 219 - 228.
[5] F. Steinbrückner, C. Lewerentz. “Representing development history in software cities”, In Proceedings of SOFTVIS 2010, pp.193–202.
[6] R. DeLine, G. Venolia, and K. Rowan, “Software development with code maps”, in Communications of the ACM, vol. 53,.
Paul Anderson is VP of Engineering at GrammaTech. He received his B.Sc. from Kings College, University of London, and his Ph.D. in computer science from City University, London. Paul manages GrammaTech’s engineering team and is the architect of the company’s static analysis tools. Paul has worked in the software industry for 20 years, with most of his experience focused on developing static analysis, automated testing, and program transformation tools. He can be contacted at paul@grammatech.com.
David Cok is Associate Vice President of Technology at GrammaTech. He graduated with honors from Calvin College with A.B. degrees in Physics and Mathematics. He also earned an A.M. and Ph.D. in Physics from Harvard University. Dr. Cok's career includes research in static analysis with an emphasis on practical application to industrial software development. Before joining GrammaTech, Dr. Cok was a Senior Research Associate and leader of research and product development groups at Eastman Kodak Company.
Michael McDougall is a GrammaTech Senior Scientist. He received a B.Sc. in Mathematics and Computer Science from McGill University in 1997. He also graduated with a Ph.D in Computer Science from the University of Pennsylvania in May 2005. At GrammaTech, he has led and contributed to a variety of research projects in the areas of software engineering and security. He works on tools for finding and mitigating security flaws and other flaws in software and is the lead developer on the software visualization project for large systems, and on improving tools used by NASA for software quality.
John Von Seggern joined GrammaTech as a software engineer in 2007. He received his B.A. in Computer Science at the University of Chicago in 2007. In his time at GrammaTech, John has played an integral role in implementing many of the enterprise features of CodeSonar, a tool for detecting bugs and security vulnerabilities in software. These features include CodeSonar's search language, charting wizard, and architecture visualization platform.
Ben Fleis joined GrammaTech in 2006 as a software engineer and was affiliated with the company until 2010. He received his B.S. in Computer Science from Michigan State University in 1998, followed by his M.S. in Computer Science from the University of Michigan in 2000. In addition to a decade of experience as a computer scientist, Fleis, who is currently a consultant in computer science, is also a successful designer and maker of elegant furniture.


Loading comments... Write a comment