Design Con 2015

Testing the security of connected industrial infrastructure systems and devices

Nate Kube, Wurldtech Security Technologies

December 16, 2012

Nate Kube, Wurldtech Security TechnologiesDecember 16, 2012

Editor’s Note: Nate Kube of Wurldtech details the security pitfalls in the supervisory control and data acquisition (SCADA) systems in industrial infrastructure. In this Product How-To, he describes how methodologies such as his company’s Achilles Vulnerability Database can be used to automate security testing of these systems.

The rash of recent cyber attacks on several of the biggest owner/operators of critical infrastructure and manufacturers of industrial control systems have generated much conversation on this subject. Even the president found it a subject worthy of addressing during recent debates.

The STUXNET computer virus wreaked havoc on Iran’s nuclear program for nearly two years and sparked worldwide controversy and concern about the threat of cyber attacks and cyber crime within critical infrastructures around the world. Today, the U.S. Department of Homeland Security is warning that a number of recent events make it increasingly likely that politically or ideologically motivated hackers may launch digital attacks against industrial control systems.

Both anecdotal evidence and research indicates that Supervisory Control and Data Acquisition (SCADA) applications, particularly those running services on top of transport protocols such as TCP/IP, have vulnerabilities that could be exploited by network hackers or terrorists to cause considerable disruption to our critical infrastructures. Little is known about these vulnerabilities and there are limited security tools or methodologies available for vendors or users to detect these flaws prior to equipment deployment.

Embedded systems and industrial control systems (ICS) have traditionally been protected within the silos of gas and oil refineries, nuclear facilities, and wastewater management and utility plants. However, the introduction of information technologies such as Windows, Ethernet, TCP/IP and wireless technologies within industrial control devices has resulted in significantly less isolation from the outside world.

The lack of industrial security data puts operators in a difficult position, as they are unable to make informed decisions regarding their security policies. Without data, it is impossible to answer questions, such as:

  • What vulnerabilities are present in our systems?
  • What is our security risk?
  • What is the return-on-investment for security improvements?

Furthermore, if significant vulnerabilities do exist within a plant, operators need to know how they can be mitigated, preferably without significant interruption to normal plant operation.

In this article, we discuss how one approach to automated security testing developed by Wurldtech addresses the needs of the industrial cyber-security community. A Wurldtech approach has three main characteristics:

An automated test suite capable of generating thousands of test inputs and millions of attack packets. The tests must handle significant variation across devices, including specialized protocols for industrial devices.
A test platform capable of supplying the test inputs, at precisely controllable packet rates. The platform must also monitor the functional health of the device under test, watching for failures in the protocol stack and in the process control functions.

As highly integrated control systems are relatively new, there is shockingly little data on network security for these industrial devices. The current methodologies for security testing focus on business systems and their dependence on common operating system such as Windows and UNIX. Similarly, vulnerability past reporting methods, such as CERT or BugTraq, primarily addresses IT products and rarely includes issues with industrial control products. In order to determine the security robustness of integrated control systems new testing methodologies are required.

The Achilles Vulnerability Database
The Achilles Vulnerability Database (AVD), part of Wurldtech's Achilles Threat Intelligence Software, is a database designed to centralize current industrial automation vulnerability information and distribute effective mitigations. Over the past several years, with the aid of many of the world’s largest equipment vendors and operators, the database has been populated through extensive testing of industrial control systems and components.

There are two primary reasons why the AVD data is useful: to distribute vulnerability information to vendors and to provide mitigation strategies to operators. With detailed knowledge of which vulnerabilities are present, vendors can produce more robust devices and operators can construct business cases and calculate return-on-investment when considering investments in security measures.

With many thousands of zero-day vulnerabilities identified in industrial systems to date, AVD provides a unique view into the risk posture of today’s global infrastructure. Here we will present a view of one of ADV’s set of vulnerabilities, their common attack vectors, and their severity.

Through identification and mitigation, this vulnerability data allows vendors to produce more robust devices and operators to make informed security decisions. For instance, perhaps Asset Owner Z is considering the purchase of an expensive firewall. Will this firewall mitigate the vulnerabilities that are present in Z's network? Would an optional and expensive firewall security feature be worth purchasing?

A compensating control strategy is a way to prevent existing system vulnerabilities from being exploited without applying a patch to the device. For instance, such a mitigation strategy could take the form of a firewall rule set or an Intrusion Protection System/Intrusion Detection System signature. Patches from the system vendor are excluded because the primary utility of the AVD’s mitigation strategies is that they can be deployed without taking a system offline or shutting down a plant, which is typically a prerequisite for patch application.

The vulnerability data within AVD has been gathered primarily through system and component testing with the Achilles Test Platform (ATP), a product designed specifically to test industrial devices. The ATP runs network security tests against a device-under-test (DUT) and monitors the behavior of the DUT as the tests are run. The ATP tests the link, network, transport, and application layers of a device’s network stack. Test cases for industrial control-specific protocols, such as Modbus/TCP, DNP3, and MMS are also supported.

Test cases fall into three general categories: storms, fuzzers/grammars, and known vulnerabilities. Storms are denial-of-service tests that send packets at high rates; fuzzers/grammars send packets that do not conform to protocol specifications to determine how the DUT handles invalid packets; known vulnerabilities send specific packets which are known to exploit particular vulnerabilities. To determine test results, the ATP monitors the DUT’s network stack and process control functionality as the tests are run.

The resulting test data is used to create resilience profiles, which consist of a device’s compensating control strategies for mitigating existing vulnerabilities. The vulnerabilities are ranked according to a modified version of the Common Vulnerability Scoring System (CVSS). Furthermore, the gathered test data is consolidated to determine critical device performance parameters, such as the packet rates at which problems are observed for each layer in the protocol stack.

AVD statistics
This section presents a sanitized view of the vulnerabilities in AVD’s embedded device category further focused on network stack layers 2 through 4 and their ranking based on the Achilles Test Platform results and an industrial adaptation of CVSS. The results shown here do not include application layer vulnerabilities.

Embedded devices, as defined in ISA99, are special-purpose devices running embedded software designed to directly monitor, control, or actuate an industrial process. Examples of Embedded Devices are Programmable Logic Controllers (PLCs), Distributed Control System (DCS) controllers, and Safety Instrumented Systems (SIS) controllers. For the data presented within this section, embedded devices are categorized into one of two categories: DCS controllers and SIS controllers.

Achilles Test Platform results
The ATP determines test results through automated functional monitors, that is, software programs that observe the behavior of the DUT during a test. There are two general categories of monitors: those that monitor the DUT's process control functionality and those that monitor the DUT's network stack. The ATP discrete monitor observes the DUT's process control functionality, while the Internet Control Message Protocol (ICMP) Monitor and the Address Resolution Protocol (ARP) Monitor observe the DUT's network stack.

When there is a disruption to the DUT's process control functionality, a loss-of-control (LoC) vulnerability has been exposed. Similarly, when the DUT's network stack is no longer sending or processing legitimate network traffic, this is termed a loss-of-view (LoV) vulnerability. If loss-of-control or loss-of-view persists and manual intervention is required to return the DUT to a normal operating state, then the vulnerability is categorized as a permanent loss-of-control (PLoC) or permanent loss-of-view (PLoV).

Achilles vulnerability entries
A simplified vulnerability entry within AVD consists of nine characteristics, as shown in Table 1:

1. Device: the device under test
2. Test: the test case that elicited anomalous behavior
3. CVSS: the CVSS score associated with this vulnerability
4. Monitors impacted: the satellite monitors that were affected during the test
5. Rate: the rate at which the test was run, in frames per second
6. Packet size: the size of the test packets, in bytes
7. Recovery time: the amount of time required for the device to return to a normal state.
8. Device type: DCS or SIS.
9. Industries: the industries in which the device operates.

Table 1 - An Achilles Vulnerability is unique if it has a unique 3-tuple: (Device, Test, Monitors Impacted). Therefore, if the same test produces two different monitor behaviors at different packet rates on the same device, then these two test results are considered unique vulnerabilities within Achilles.

Embedded device vulnerability distribution
The AVD embedded device layer 2-through-4 category is populated with vulnerabilities detected in OSI layers 2 through 4 of current generation DCS controllers and SIS devices. A total of 31 common devices have vulnerablities in this category: 17 DCS controllers and 14 SIS controllers. Of the 505 layer 2 through 4 vulnerabilities detected, 298 reside in DCS controllers and 207 in SIS controllers. These vulnerabilities were found in the link, network, and transport layers (layers 2-4) of the network stack.


Figure 1 - Distribution of vulnerabilities by monitor result

Figure 1 shows the total distribution of vulnerabilities across monitor results. In Figure 1, observe how 91% (68%+16%+4%+3%) of the discovered vulnerabilities cause some form of loss-of-view, whereas 28% (9%+16%+3%) cause some form of loss-of-control. These results are to be expected, as these test cases target layers 2 through 4 of the network stack. If the robustness of the network stack is indicative of the robustness of the system’s other key components, there is reason to assume that negative input on other key software components could elicit analogous results. Regardless of the distribution, there are a large number of loss-of-view and loss-of-control vulnerabilities that, if exploited, can severely disrupt the operation of a plant.


Figure 2 - Distribution of vulnerabilities by CVSS score

Figure 2 depicts the same set of vulnerabilities as in Figure 1, except that the vulnerabilities are ranked by CVSS scores. Note how each of the CVSS score ranges are represented, even the most severe category. This representation across the CVSS score range illustrates the diversity of the discovered vulnerabilities, indicating that the vulnerabilities vary in their severity and their attack vectors.

Of the score ranges, 6.0-6.9 is the best represented. In Figure 1 we saw that the majority of vulnerabilities were loss-of-view. Loss-of-view vulnerabilities receive lower scores than their loss-of-control counterparts, due to the much more serious nature of losing process control functionality. Therefore, in order for there to be a large number of CVSS scores in the 6.0-6.9 score range, there was a correspondingly large number of loss-of-view vulnerabilities that were exposed by routable packets, meaning that the attack can be launched remotely over the Internet. Of course, remotely executable attacks are far more dangerous than those that must be executed on the local network, as the latter requires a direct physical connection to the plant network.

DCS vulnerability distribution

We now consider the DCS and SIS vulnerabilities individually. Figure 3 shows the distribution of discovered layers 2 through 4 DCS vulnerabilities across monitor results. Compared to Figure 2, Figure 3 contains more loss-of-control and fewer loss-of-view vulnerabilities.


Figure 3 - Distribution of DCS vulnerabilities by monitor result

Figure 4 shows the CVSS score distribution for the same set of DCS vulnerabilities. We see that 60% of the vulnerabilities are below 5.0. Vulnerabilities with scores in this range should be noted, but they are not particularly urgent. As in Figure 1, however, we still see each CVSS score range represented, with 5% of vulnerabilities over 8.0 and 23% of vulnerabilities between 6.0 and 7.9.


Figure 4: Distribution of DCS vulnerabilities by CVSS score

SIS vulnerability distribution
Figure 5 shows the distribution of discovered layer 2 through 4 SIS vulnerabilities across monitor results. Note that there is a higher percentage of loss-of-view vulnerabilities than in Figure 3 and Figure 1. This result is not surprising as safety systems are generally designed and tested more carefully, with greater attention paid to I/O failure modes. Therefore, on average, one would expect fewer loss-of-control vulnerabilities.


Figure 5: Distribution of SIS vulnerabilities by ATP result

Figure 6 shows the distribution of CVSS scores for SIS vulnerabilities. Note that there are no scores below 5.0 due to the manner in which our modified version of CVSS ranks vulnerabilities found in SIS devices. CVSS for ICS considers the device type when calculating scores. Generally, similar vulnerabilities for an SIS device are assigned higher scores than for a DCS due to the SIS device’s critical role in safe plant operation.

The vast majority of CVSS scores (83%) are in the 5.0-6.9 range, due to the high number of loss-of-view vulnerabilities and the relatively small number of loss-of-control issues. However, over 10% of the vulnerabilities have a CVSS score over 8.0. For safety-critical SIS devices, these are urgent issues that must be addressed.
Figure 6: Distribution of SIS vulnerabilities by CVSS score

Test case statistics
We now take a look at the test cases that revealed the vulnerabilities. Figure 7 shows the distribution of vulnerabilities across layer 2-through-4 protocols. The most vulnerabilities were found in IP, whereas the fewest vulnerabilities were found in ICMP. Notice how 17% of all vulnerabilities were found in ARP, which is surprising given the relative simplicity of the protocol. This example illustrates that if the protocol implementer does not constantly think about robustness, even seemingly benign aspects of the network stack may be vulnerable.

Figure 7: Vulnerability distribution across protocol levels

Figure 8 shows the role of frame rate in finding vulnerabilities. As shown below, the vast majority of vulnerabilities were found by rate-dependent tests. This result is encouraging, as it implies that so-called “killer packet” attacks, where one or a small number of packets exploit device vulnerabilities, are uncommon. The high number of rate-dependent vulnerabilities is likely due to the nature of industrial control devices. These devices are resource sparse, and so high traffic rates will consume these resources, leading to undesirable behavior.


Figure 8: The importance of rate in exploiting vulnerabilities

Impact across industries
We now consider the distribution of layer 2-through-4 vulnerabilities across industries. Each tested device has been associated with up to eight separate industries. Figure 9 shows the number of such vulnerabilities that have been found for each industry. There are a very large number of oil and gas vulnerabilities because we initially focused our efforts on devices in this industry. Therefore, these results do not imply that Water/Waste Water is the safest industry or that Oil and Gas is the most vulnerable. The lesson to be learned from this graph is that every industry has vulnerabilities.


Figure 9: Vulnerability distribution across industries

Figure 10 shows an in-depth look at the vulnerabilities found in the oil and gas industry, ranked by CVSS scores. Note the almost normal distribution of vulnerabilities, with the mean in the 6.0-6.9 range.


Figure 10: Vulnerability distribution for oil and gas

Mitigation and prevention

Depending on whether one is a vendor or an asset owner, there are two distinct methods in which vulnerabilities can be prevented. Vendors can prevent the vulnerabilities from being present (vulnerability prevention) whereas operators can mitigate them after they are discovered (mitigation strategies). Based on the data in the previous section, we offer the following high-level suggestions for implementing each of these approaches.

Top five operator mitigation strategies

Rate-limit network traffic. As observed in the previous section, rate-dependent tests trigger the most vulnerabilities. Rate-based problems are present across the entire network stack. The implication is obvious: rate-limit traffic to a level that will maintain plant operation but will not impair your controllers.

Block the TCP/IP LAND attack. The TCP/IP LAND attack is derived from the decade-old IT LAND attack that still has the capacity to wreak havoc on industrial control systems. A TCP/IP LAND attack consists of TCP packets that have an identical source and destination IP addresses, as well as an identical source and destination TCP port. These packets should never be found in legitimate traffic; they can be safely blocked.

Prevent port scans. Port scans have been found to disrupt the operation of industrial control devices. Furthermore, port scans are often precursors to a more sophisticated network attack, as they are used to perform reconnaissance on target devices.

Be wary of fragmented packets. The IP fragmentation attacks, particularly the IP Fragmented Storm, are common triggers of vulnerabilities. At the very least, fragmented packets should be handled carefully by firewalls. For instance, they should be limited to a slower rate than their non-fragmented counterparts. Ideally, fragmented packets should not be allowed onto the control network. This strategy is perhaps unrealistic, although there are other, arguably superior alternatives to packet fragmentation, such as path MTU (Maximum Transmission Unit) discovery, which involves discovering the smallest allowed MTU on the network path, thus obviating the need for packet fragmentation.

Block impossible packet header field combinations. There are packet header field value combinations that should never occur in valid traffic. For instance, within the TCP header, the TCP SYN flag and FIN flag should never both be enabled. Within the IP header, the “Don’t Fragment” bit and “More Fragment” bit should never both be enabled. These packets are almost certainly being transmitted for a nefarious purpose and should be blocked.

Top five vendor prevention methods
Drop unsolicited ARP replies. The ARP Cache Saturation Storm sends unsolicited ARP replies to the DUT, filling up its ARP cache. This test has been found to disrupt the operation of many industrial control devices. To prevent vulnerabilities of this type, a device’s network stack should add entries to its ARP cache only when it has sent a corresponding ARP request; all other replies should be ignored.

Carefully manage memory/CPU utilization. Most operating systems have the ability to place limits on the memory and CPU time allocated to a process. Limits should be placed on non-critical processes, such as a web server, to ensure that they do not take memory and processing time away from more critical processes, such as those that manage process control functionality. All processes running on a controller should be ranked on how essential they are to a controller’s operation and limited accordingly.

Assume the worst about network input.
All network input must be considered untrustworthy and thus checked for errors and formatting correctness. For instance, data lengths and types should be checked and verified before data is processed. If the data does not fall into any of the expected valid formats, it should be discarded. Testing a device with both valid and invalid data can help find input processing issues.

Properly manage data buffers. As network input may not be what is expected, proper buffer management is essential. Before any data is moved to a buffer, the size of the data and the buffer must be compared. If the data is larger than the buffer, a buffer overflow may occur, which can lead to a host of security vulnerabilities.

Rate-limit network traffic.
Handling high-speed network traffic can require significant processing power. Control protocols rarely need large amounts of bandwidth, so the rate at which network traffic is received should be limited to an appropriate level. Rate limiting should be applied to all basic network protocols such as ARP, ICMP, TCP and UDP. Rate limiting can be performed with internal firewall software that essentially drops packets, or with an external firewall that always accompanies the controller.

Conclusion

As highly integrated control systems typically consist of many different devices and because these devices may contain implementations of a variety of protocols, a truly valuable vulnerability-testing tool must be easily applicable to a wide variety of protocols. As well, a valued tool must be employable by users with varying skill sets. For example, the tool should be employable by the vendor, by a field engineer, or by a plant floor worker.

However, no amount of testing guarantees correct device behavior in the field; only running all possible tests could do that, and there are generally far too many possible tests to exhaustively run them all. Worse, due to timing variations, a device may pass a test once and fail when the test is rerun later. The pioneering software engineer, E.W. Dijkstra, summarizes the situation well, “program testing can show the presence of bugs but never their absence.”

Nonetheless, testing is the most common approach for finding bugs; a failed test definitively proves that a device has a bug. Well-designed tests are able to exercise a device in near real-world conditions, demonstrating device capabilities and limitations, qualitatively and quantitatively. Such tests increase confidence in device performance, even though absolute confidence is not achievable.

Testing provides valuable data to support comparisons between devices, including different versions of the same device and devices from different vendors. Additionally, testing provides legal and regulatory protection and as systematic testing of networked devices becomes common engineering practice, it will become increasingly risky to omit.

The singular aim of any vulnerability protocol test is protecting critical infrastructure and “keeping the lights on.” In support of this goal, testing is not just critically important, it is imperative. This imperative extends from the plant floor to every point where a facility’s systems touch or are touched by the Internet.

In the critical infrastructure environment, we are no longer able to count on “security through obscurity” as more and more devices become Ethernet and wireless enabled devices. Automated testing of SCADA protocols is key to helping achieve that goal.

Nate Kube founded Wurldtech Security Technologies in 2006 and as the company’s Chief Technology Officer is responsible for technology direction, strategic alliances, OEMs, and thought leadership. Nate is an internationally recognized subject matter expert in embedded device protection for high-availability industrial automation, medical and health care industries. Nate has created an extensive Intellectual Property portfolio including numerous patents in formal test methods and critical systems protection.

Nate has also co-authored many security publications for the embedded device security market, and frequently presents on cyber security issues. Nate has testified on smart grid interoperability standards for the US Federal Energy Regulatory Commission (FERC) and serves as an expert for the TC65 working group on the IEC 62443-2-4 international standards project. For a selection of his patents and publications please see Wurldtech’s Patents and Publications.

Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER