Packet classification, as described in Part 1 , ensures that the system only processes what it needs to process, and therefore goes a long way to minimizing power consumption in networked devices. However, further optimization is not only possible, but desirable.
Real-world applications can contain extensive software footprints, and the time to boot is non-trivial. In systems where clock cycles are measured in nanoseconds, the time for software to boot may be measured in seconds.
Y. Agarwal  lists 10 seconds boot as the shortest achievable today for a PC, although Microsoft claims  that Windows 8 can resume an active state from its S3 sleep state in 2 seconds. Even lower boot times are achievable in embedded applications; for example, the Lineo Warp Website  lists an optimized boot of 1.07 seconds for X-Windows on the Armadillo 500-FX platform, and 1.9 seconds for Android on the same platform.
Even with packet classification, the arrival rate of packets that require further processing may approach the time it takes for the system to wake to process a packet. In such a scenario, the system will be continually waking and sleeping, without ever being able to spend any significant time in the lowest power dormant state.
This situation can be improved through packet accumulation, which allows multiple packets to be buffered until such time as the system is ready to wake to process them. This minimizes the overhead of waking and sleeping, allowing the system to efficiently process a group of packets in bulk.
There are several caveats to be aware of when using packet accumulation:
- The system must be able to help guarantee that its packet accumulation buffer does not overflow, regardless of whether that buffer is in dedicated on-chip SRAM in a SoC or in external DRAM. This implies that while performing packet accumulation, the system maintains a count of the number or size of received packets to help ensure that it does not exceed the available buffer, and that the system wakes before this occurs.
- The system must be able to respond to packets within a defined maximum time, regardless of network traffic. This means that as the system starts to accumulate packets, a timer must be started. When the timer expires, the system needs to wake up, regardless of how many packets have been accumulated. This prevents network protocols from timing out if a packet is received and accumulated, but subsequently the network has relatively little relevant traffic to force the packet accumulation buffer to fill.
- There may be certain types of packets for which it is desirable to wake immediately and process, rather than accumulate multiple packets to process. For example, for a networked printer it may be desirable to respond and accumulate multiple ARP requests before wake, but if a packet that looks like the beginning of a print job arrives, then the system should wake immediately in order to print as soon as possible. S Gobriel et. al.  uses heuristics to differentiate between packets that are “idle” (bufferable) and packets that are “active” (need fast response), but relatively simple deep packet inspection is also a workable solution.
Freescale’s QorIQ P1022 Communications Processor  can classify packets with its eTSEC controller, as well as accumulate packets as needed, storing them in external DRAM, while maintaining counters in its eTSEC and timers in its interrupt controller to guarantee packet response within predefined maximum times.
The shortcoming of packet classification and accumulation on larger networks is the amount of time spent servicing protocols such as ARP and SNMP that are required to maintain network connectivity. As an example, if it takes a system 500ms to go through a cycle of wake-up, message processing, and return-to-sleep, then even modest message frequency (<500ms) could force a system to stay permanently in a high power state.
Microsoft  and B. Combs  provide a framework for protocol offload. In particular, they standardize the way that systems running Microsoft Windows 7 can allow IPv4 address resolution (ARP) and IPv6 network solicitation (NS) to be offloaded to an external Network Interface Controller (NIC), rather than the primary Windows host.
The ECMA-393 standard  is not tied to the Windows 7 operating system and is therefore more suitable to a wide range of embedded applications. Similar to  and , it also has the requirement for IPv4 ARP and IPv6 NS proxying. It goes further to provide options of further proxying of other protocols such as IGMP, DHCP, IPv4 SIP, IPv6 Teredo tunneling, SNMP, mDNS, and LLMNR.
Fundamentally however, the concept of all such proxying is similar – to maintain the ECMA-393 standard ’s “Full Network Connectivity” by using some sort of hardware that is distinct from the primary processor in a system that would otherwise maintain network connectivity. The intent is for the proxying hardware to be much lower power than the primary processor, thereby allowing the primary processor to be in a low power state, or potentially even off, for extended periods of time.
S. Nedevschi et. al.  also implements several types of autorespondproxy in its “proxy_2” through “proxy_4” definitions, although there isno mention of packet accumulation. It analyses in detail the types ofincoming packets and provides information as to which may be the best“low hanging fruit” protocols to proxy. This data shows that the bestprotocol to proxy is ARP, as it has the highest percentage of incomingbroadcast traffic, and packets destined for the host cannot be ignored.
However,other packets, which they classify as both “Don’t wake” because theymay occur frequently, and “Mechanical response”, for which a proxyautoresponse may be possible, includes the protocols SSDP, IGMP, ICMP,and NBDGM as well as ARP. All of these as well as others may beconsidered as candidates for an autorespond proxy, although a definitivelist is highly network-dependent and an optimized proxy should be tunedto specific use cases.
While classification and accumulation ofpackets can be handled by an intelligent Ethernet controller withscratch pad memory, an autoresponding proxy typically requires anadditional processing element capable of running a networking stack.
Ina quiet network, the power saving between classification andaccumulation methods and autorespond proxy is similar. However, for realnetworks the cumulative power saving for autorespond proxy can be anorder of magnitude greater if the packet classification and accumulationtechniques fail to keep the system in an idle state most of the time.
Anotherkey consideration when implementing an autorespond proxy is that on anetwork the system must behave similarly to a fully powered version ofitself. In the printer example, the PC sending a print job must not seenoticeable differences in communication interactions with the printerelse it could break driver compatibility across diverse OSes andplatforms.
The printer with an autorespond proxy can’t makeassumptions that the devices it communicates with have the ability todeal with its special “low power modes”. Many devices such as NetworkAttached Storage (NAS) and set-top boxes – which have similar workloadprofiles of being online but having its main purpose idle most of thetime – would see similar power reduction benefits by implementing anautorespond proxy.
An autorespond proxy can be implemented eitherexternally in “Smart” NIC, or embedded in an equivalent integratedfunction on a SOC. The Freescale QorIQ T1042 Communications Processor() implements such an integrated autorespond proxy function throughits Frame Manager. The T1042’s Frame Manager integrates both an EthernetController and small processor running firmware capable of handling orterminating ECMA-393 protocols without intervention from the main CPUsin the SoC. The result is that any ECMA-393 protocol packet can beprocessed while the entire SoC apart from the Frame Manager stays in alow power idle state. This capability enables systems to achieve lessthan ½ W while maintaining full network connectivity, as measured fromthe AC wall plug.
Ben Eckermann is a Senior Member of Technical Staff, andSystems Architect for Digital Networking at Freescale. He has designed and architectedlow-power products for Freescale (and formerly Motorola) for more than 12years. He holds a Bachelor of Engineering (Computer Systems) with First ClassHonours from the University of Adelaide, Australia.