Implementing SSL on 8-bit micros - Embedded.com

Implementing SSL on 8-bit micros

The Secure Sockets Layer protocol is used on every web browser and web server to encrypt secure transactions. But SSL is not just the province of 32-bit microprocessors. It can be used on low-cost 8-bitters as well.

They're out there—Internet-enabled refrigerators and washing machines are here, although they're not yet common household items and might take awhile to catch on. A large, fast, and growing market for Internet-enabled embedded devices is already here, however, and it includes vending machines, security and access devices, building control, utility monitoring, point of sale, and remote data acquisition.

The ability to remotely monitor system usage and health, update firmware, or monitor stock creates huge savings in operational costs by eliminating or reducing the need to send technicians, meter readers, or vending machine stockers to multiple sites—or sometimes by alerting them of the need to go to sites quickly. At the same time, there's huge risk if sensitive data, control commands, or firmware updates are done in an unsecured fashion over the Internet.

SSL (Secure Sockets Layer) is the de facto standard security protocol for securing transactions over the Internet. Other protocols exist but are less widely used and supported and don't have as many application possibilities as SSL. SSL was designed to complement the TCP/IP sockets model, making any TCP/IP application a candidate for SSL encryption.

Perhaps the most common and effective way to interface with a remote embedded device over the Internet is with a web browser on a PC. If the embedded device can act as a web server, it can serve standard HTML pages, HTML forms, or Java applets that eliminate the need to develop complex, proprietary PC software and the accompanying maintenance, testing, and support headaches. Every Internet user knows how to use a web browser, and every modern PC on the Internet has one. SSL security is built into every modern commercial web browser in the form of a secure HTTP client—HTTPS. Whenever your browser shows you the message, “you are about to enter a secure web site,” you're using HTTPS and SSL. You will see that the URL displayed in your browser begins with “https://” rather than “http://” and you'll probably see a padlock icon somewhere on the browser's status bar.

SSL is a computationally intensive protocol, particularly during the initial handshake to open a session. While many low-end microprocessors and microcontrollers can handle running anHTTP server, a secure HTTPS server requires something more than a Z180 or a plain-vanilla 8051 to work—but it doesn't necessarily need a fast 32-bit processor or even a 16-bit one. With some careful coding and some hardware tricks, a robust, secure web server can easily be made for under $35 with an off-the shelf, Ethernet-enabled controller.

This article discusses the basics of SSL and Transport Layer Security (TLS, the Internet Engineering Task Force standard for SSL) and the problems inherent in constructing an SSL implementation for a system with very limited resources.

Cryptography basics
Cryptography is the science of encoding data such that the data can't be easily recovered without knowledge of some secret key. Cryptography is as old as writing; there is evidence ancient Romans and Egyptians had notions of cryptography and used it to protect military and political correspondence. Cryptography also forms the basis for all computer-communications security today. We'll cover some basic cryptographic concepts here without delving into algorithmic details, which are thoroughly covered in the listed references.

Symmetric key cryptography is sonamed because the sender and receivermust have identical copies of the key andlike algorithms in order to encrypt anddecrypt messages. Figure 1 illustrates thesymmetric nature of this type of encryption;both sides must have the key k toexchange the message M (encryptedusing k to get the encrypted message E ).


FIGURE 1: Cryptography—private key


FIGURE 2: Cryptography—public key


FIGURE 3: RSA authentication

Some common algorithms for symmetric-key encryption include DES(data-encryption standard), 3DES (tripleDES), AES (advanced encryption standard),and RC4 (Rivest Cipher four).AES is a newer algorithm intended toreplace the aging DES and 3DES. RC4 isused extensively by SSL implementationsdue to its simple structure and excellentperformance.

Public key encryption
The counterpart to symmetric cryptographyis asymmetric cryptography , commonlyreferred to as public key encryption . Theidea behind public key encryption is thateach user has a different key, and each ofthese keys is split into two parts, a publickey and a private key . The public key isshared with everyone; the private key iskept secret and provides the security.The most well known public key algorithmis RSA, which is used in almost allSSL implementations. Figure 2 showspublic key encryption with RSA, whereM is a plaintext message, E is the encryptedversion of M with n , the public key,and d is the private key used to extract M from E . As Figure 2 shows, the encryptionis not symmetric; a different key isused in each direction.

Each part of the key is one-way (asymmetric).A message encrypted with thepublic key can't be decrypted using thepublic key. However, the message may bedecrypted using the private key.Therefore, if someone wants to send youa message secretly, you can give her yourpublic key and she encrypts the messagewith it. If you have kept your private keysecret, then only you can decrypt themessage.

The process also works in reverse forsome public-key algorithms, and can beused to authenticate data that you send. Ifyou encrypt some data using your privatekey, then that data can only be decryptedusing your public key. If someone knowsyour correct public key, then they can besure that the data you provide has notbeen tampered with in transit. Figure 3shows authentication using RSA. It isessentially a mirror image of the RSAencryption from Figure 2.

We could build an entire secure systemusing only public-key algorithms.The advantage would be not having toever pass a secret key around, since thepublic keys are safe to transmit, and privatekeys remain private. However, public-key algorithms are based on difficultmath problems and are many orders-ofmagnitudeslower than their symmetriccounterparts, even with hardware assistance(as we'll see later). This is a seriousproblem for high-performance systems,and becomes an even bigger problemfor limited-resource embedded systems.We'll see how SSL solves this problem,and we'll look at some techniques thatcan help speed up these algorithmswhen we discuss how SSL is implementedfor an 8-bit machine.

Message digests
Cryptographic algorithms are useful forsecuring data against eavesdropping, butthey don't prevent loss of data integrityfrom communications problems or tampering.SSL uses message digest algorithmsto protect the integrity of data. Messagedigests generate secure hashes of data,which are used to verify that data has notchanged. A secure hash works by mappingsome arbitrary amount of data into afixed-length value. The key properties ofhashes are that the message cannot begenerated from the hash, and that theprobability of two messages generatingthe same hash is extremely low. Withthese properties, it's fairly certain that amessage hasn't been changed in transit ifthe message and the hash match. SSLuses two algorithms for message digests,MD5 and SHA-1. Two hashes are usedbecause the chances of both hashesbeing compromised are extremely low.

Digital certificates
Every secure web site has a certificatethat must be presented to the clientwhen establishing an SSL connection.The certificate contains the public key ofthe web server and the identificationinformation (such as name and address)used by the client to authenticate thevalidity of the web server. The certificateis hashed using either SHA-1 or MD5,and the hash is signed using the authenticationprocedure previously describedin the public-key cryptography section,either by the owner of the certificate orby a trusted third party.

SSL needs these certificates to provethe authenticity of a server or client sothat an attacker cannot hijack the connection.To see why this is important,imagine you have a remote monitoringdevice from which you occasionallydownload a security log of people enteringa building. If someone could spoofthe address of your device they couldgive you a dummy log. The digital certificateprotects against this type ofattack.

The question of whether a browsercan trust a digital certificate is addressedusing digital signatures . Due to the natureof public key authentication, a privatekey can be used to sign a certificate (thatis, generate a hash of the certificate andencrypt it using a private key). In practicethis is done through a <> (CA), an organization or companysuch as VeriSign that provides certificatesigning services for a fee. The feepays for the authority to do the appropriatechecks to make sure someone iswho he or she claims to be; therefore,VeriSign can vouch for the identity ofthat individual. The advantage for thecertificate owner is a universally trustedcertificate. Manufacturers of webbrowsers and SSL implementationsinclude the public certificates of CAswith their applications, and these certificatesare used to check the signatures ofany received certificates.

The alternative for the manufacturerof the embedded device to using a CA isto use self-signed certificates. The disadvantageis that a big, scary-looking warningdialog will pop up when users begina session on the embedded device. Thewarning will allow them to say they'lltrust the self-signed certificate and continue,or quit. It will usually allow theuser to install the certificate if theychoose to, so they don't get the samewarning next time. The advantage of thisis, of course, no CA fee.

SSL protocol overview
The SSL protocol describes a frameworkfor encryption to work within,rather than define any of the actualcryptgraphic algorithms. SSL works byestablishing a session using public-keycryptography to exchange a secretvalue, which is then used to generatesession keys for the symmetric cryptographicalgorithms used for the bulkSSL transfer. This hybrid approachgives SSL the flexibility of public keyencryption and the performance advantageof symmetric encryption.

The first version of SSL was internalto Netscape and was never released. SSLversion 2 was publicly released but hassome known vulnerabilities. SSL version3 was developed by Netscape to fix thevulnerabilities in SSL 2. Finally,Transport Layer Security (TLS 1.0, alsoknown as SSL version 3.1) was developedby the Internet Engineering Task Force(IETF) to be the first official “standard”SSL. SSL version 3 and TLS are very similarprotocols, but due to the stringentstandards of the IETF, TLS was releaseda couple of years later, allowing SSL version3 to become the de facto standardInternet security protocol.

SSL version 3 is now consideredobsolete for new development (infavor of TLS), but it's still supportedby so many applications that it will certainlybe around for awhile. AlthoughSSL version 3 has no known vulnerabilities,TLS is assumed to be moresecure thanks to IETF's strict designstandards. It's a good idea, then, forembedded implementations of SSL tosupport both SSL version 3.0 and TLS1.0. Backward compatibility with SSL2.0 is not built into the SSL 3.0 andTLS 1.0 specifications, but mostbrowsers still use an SSL 2.0 messageto initiate a session. Internet Explorer6.0 supports SSL 2.0 and SSL 3.0 bydefault, and TLS 1.0 support can beenabled in the advanced options.


FIGURE 4: SSL record structure

SSL records
SSL communicates using discrete messagescalled records , as shown in Figure 4.The record is the SSL equivalent of aTCP frame. A record is defined by aheader (containing protocol information,message type and length), a body(containing the message data itself), anda message authentication code (MAC),which is a hash of all the data in the message.The MAC is used to detect tamperingor data corruption of the messageafter it's received. All SSL data iswrapped in these records; the recordlayer is a vital component of any SSLimplementation.

The SSL handshake
A new SSL session begins with a handshake consisting of a series of messagessent between the client and the servercontaining negotiable values for the session.The entire handshake is shown inFigure 5.


FIGURE 5: SSL handshake

The first message sent, Client Hello ,contains the client's protocol version,what ciphersuites it supports, andsome data to be used in the key derivationprocess. A ciphersuite is a predeterminedcombination of ciphers to beused in the session establishmentprocess and in the session itself. Acommon example of a ciphersuite isTLS_RSA_WITH_RC4_128_MD5,which indicates the use of the TLSprotocol with RSA for the public keyoperation, 128-bit RC4 for the symmetriccipher, and MD5 for generatingthe verification hash for eachrecord.

The server responds to the ClientHello message with its chosen ciphersuiteand some data for the key derivationin a message called the Server Hello .This is followed by the server's digital certificateand a message indicating the Server Hello is complete (this is to allowmultiple certificates to be sent).

The client then uses the chosen public-key algorithm to encrypt a randomchunk of data called the pre-master secret ,which is used to generate the symmetricencryption keys later. The encrypted premastersecret is sent to the server in theClient Key Exchange message, followed bythe Change Cipher Spec message, whichindicates that all further records fromthe client will be encrypted. Next, theserver and client generate the sessionkeys simultaneously. These keys are simplysymmetric encryption keys used bythe chosen symmetric algorithm. Thepre-master secret and the data from theClient Hello and Server Hello messagesare used to generate the master secret ,which is then used to generate two separatekeys to encrypt outgoing messages(one for the server, and one for theclient). Using two keys assures that if onekey is compromised, the entire communicationschannel will not be compromised;only messages in one directionwill be able to be decrypted using thatkey.

The client uses the newly generatedkeys to encrypt the last client handshakemessage, called Finished , which containsa hash of all the previous handshakemessages. This hash is used to verify thatno tampering or corruption occurredduring the handshake. The Finishedmessage also notifies the server that theclient is ready to send and receive applicationdata, encrypted using the symmetricalgorithm.

Upon receiving the Finished messagefrom the client, the server decryptsit, verifies the handshake message hashagainst its own hash of the handshakemessages, and sends to the client its ownChange Cipher Spec and Finished messages(which are handled by the clientin the same manner as the client versionsof these messages were handled bythe server).

At this point, the session has begunand all data is hashed and encryptedusing the chosen hash function andencryption algorithm. The MAC is usedto validate the data, assuring with highconfidence that the data was not corruptedor tampered with in transit.

SSL alerts, session closure
When an error occurs during the handshakeor session, that error needs to betransmitted to the other side of the communicationchannel. SSL does this withwhat are called Alerts : specialized messagesencapsulated in SSL records thatindicate what error occurred and theseverity of the error. Most errors are consideredfatal—the connection is terminatedimmediately upon receiving a fatalalert.

Once the application is finishedtransmitting data, either the client orthe server sends a special alert calledClose Notify , indicating that it's donecommunicating and the connectionwill close. This final step protectsagainst what is called a TruncationAttack , where an attacker prevents all the data that was sent from beingreceived. This is especially importantin situations such as banking transactions,where all the data must bereceived.

Implementing SSL on 8-bit
Developing SSL implementations forsmaller embedded systems is no smallchallenge, since there is no “embeddedSSL” protocol. Encryption operationsare expensive, both in terms of processorcycles and memory usage. The protocolwas designed for powerful machineswithout resource constraints. However, acarefully implemented, viable SSL implementationcan add as little as 50KB tothe code footprint of an application, andless than 20KB of static data space.

Many embedded systems will needonly server-side SSL, since they will beaccessed using a web browser (a client).This allows us to eliminate the client-sideSSL code, including the resource-intensivecertificate-authentication code androot CA certificates. At 1 to 2KB for eachcertificate, this is a significant constantdata space savings.

We can also take advantage of thestructural similarities between SSL version5 and TLS by writing cross-protocolcode that works with both protocols.Almost 90% of the code can be sharedbetween these two protocols.

The choice of supported algorithmsalso affects our code size. AES and3DES are relatively complex cipheralgorithms, whereas RC4 is very simpleand requires only a few lines of C code.Not coincidentally, RC4 also has muchbetter performance than either AES orDES. The TLS specification (RFC2246) requires 3DES for TLS compliance,but in practice, all major commercialweb browsers and SSL implementationssupport RC4, so by deviatingfrom the spec a little bit and supportingonly RC4, we can save severalkilobytes of code space with no practicalfunctional impact.

Obviously, writing in assembly codeboth saves code space and improves performance.The cipher and digest algorithmsare prime candidates for assemblycoding since they're typically the performancebottleneck of the protocol, andthe algorithms are fairly straightforwardto port to assembly.

Digital certificates are needed for SSLcommunication, since they contain boththe public key and authentication information.Unfortunately, all this informationmakes the certificates fairly large, onthe order of 1 to 2KB each. The SSL protocolspecifies that any SSL client or servershould be able to support multiplecertificates. However, in practice onlyone certificate is actually needed for anycommunication. By restricting certificatestorage to a single certificate per device,we can save some space in constant dataspace or a flash file system.

Another possible savings is in certificateparsing. Certificates are stored in aderivative of the Abstract Syntax Notation (referred to as ASN.1). If we're onlyusing server-side SSL, we can eliminatethe ASN.1 parsing code (which would beneeded for client-side SSL) by separatingout the public key from the certificateand storing it separately. Public keys aretypically only 64 to 128 bytes, so we cansave quite a bit of code space with only asmall loss of constant data space.

We can also limit the informationthat is stored in the certificate; ASN.1supports extensions to the certificate foradditional information. If we don'tinclude these extensions, we can savesome constant space by making the certificatesmaller.

Minimizing data space
The best ways to reduce data size are tocarefully construct message bufferingalgorithms and minimize buffer sizes.We need to store an entire message to beable to digest or verify it before passing italong to the TCP connection or theapplication, respectively. This means thatthe incoming message buffer must be atleast 16KB in size (the maximum size ofan SSL record, according to the SSLspecification) so we can be sure to get anentire message.

The outgoing message buffer, however,can be reduced in size significantly,since we can control the size ofrecords that our implementation sendsout. It needs to be at least 2KB to holdthe certificate message, which isbetween 1KB and 2KB. A minimalbuffer sacrifices some performance,since smaller records can be less efficientfor throughput, but the RAM savingsare significant. Buffer sizes can beeasily parameterized, giving the programmer(and even the user) controlover memory usage to fine tune theimplementation.

The initial handshake requires morebuffer space than the actual session does(it needs to derive the session keys anddo the public-key operation), so we cantemporarily allocate some memory for it.We can do this efficiently by implementinga simple stack in static memory fromwhich we can allocate buffer space. Thestack-based approach gives a dynamicmemory allocation scheme without theneed for the costly overhead of malloc() and free() .

By encrypting and decrypting messagesin-place, we can use just onebuffer each for input and output ratherthan keeping separate buffers forencrypted and decrypted messages.This is possible because all SSL-supportedcryptographic algorithms onlywork on one fixed-size block of data ata time, and the output data is the samesize as the input (as well as being independentof all other output).

We can also assume that the encryptedand decrypted data are the same sizefor any given algorithm, so there will beno overflow of the buffer (all currentlysupported algorithms have this property).Figure 6 shows a circular bufferscheme that works well for decryptingreceived messages in place.


FIGURE 6: Read buffer layout


FIGURE 7: Write buffer layout

For incoming messages, we read theheader into the buffer first (1), then wewrite the incoming encrypted data overthe header (the footer containing theMAC is part of the encrypted data) (2).After the entire record is read into thebuffer it is decrypted and verified (3),and finally the next record being readoverwrites the record footer (4).

For outgoing messages, we reserve anumber of bytes for both the header andthe footer; these numbers are calculatedfrom the ciphersuite chosen during theSSL handshake (and remain constantthroughout the entire session). We needto reserve space for the header at theend of completed records so that we canbuild the header later. The header containsthe length of the record, which cannotbe known until the record data iscomplete. We reserve space for the footer(MAC), which also cannot be generateduntil the data is complete. Figure 7shows the layout of the write buffer foroutgoing messages. The record to theleft is complete, consisting of a header,the data, and a footer, which contains theMAC. The record on the right is not yetfinished, but contains the applicationdata to be written. Once a completedrecord is written to the network layer, thebuffer space is released.

Performance
Cryptography is notoriously computationallyexpensive and SSL was designedfor maximum security, not optimal performance.Here we'll look at some of thetricks you can apply to make SSL runfaster and more smoothly on low-endCPUs.

By far the most “expensive” operationin any SSL session is the public-key operation.Public-key algorithms such as RSArequire thousands of operations on largemultibyte numbers (512 to 2048 bitseach). Beyond efficient coding, the onlyway to speed up these operations is withsome type of hardware assist. A few companiesmake dedicated public-keyencryption hardware, such as Atmel'sTrusted Platform Module, which providesan RSA-specific cryptography acceleratorin hardware.

Another possibility is to use a CPUwith special instructions to accelerate theRSA algorithm. Multiple precision multiplicationinstructions can speed up RSAby as much as 10 times. Examples are theRabbit 3000A and Rabbit 4000 processors,which have UMA (unsigned multiplyadd) and UMS (unsigned multiplysubtract), which are used to greatlyspeed up exponentiation and modulusoperations. The details of applying theseinstructions to RSA is beyond the scopeof this article. Alfred Menezes, et al,Handbook of Applied Cryptography , (CRCPress, 1996) is good reference for implementingRSA efficiently.

With instructions like these, the initialhandshake time can easily bereduced from as much as 30 seconds toless than 3 seconds on a 44MHzprocessor.

If performance is an issue, strongconsideration must be given to sometype of hardware assistance for publickeycryptography. Performanceshould be an issue if someone is interfacingto an embedded web server,because the user is likely to think thedevice is down if it takes more than afew seconds for initial authenticationbefore the device can serve a webpage.

Session renegotiation
One of the most important advancedfeatures of SSL for optimizing embeddedSSL is session renegotiation.During session establishment, theclient and server negotiate a Session ID ,a large integer number used touniquely identify the current session.The server and client both cache thesession keys and other session information,and, after a session has ended,keep that information for some time.This information can be used toreestablish a previously completed session,avoiding the expensive publickeyoperation. The reduced overheadis definitely an advantage on embeddedsystems.

Unlike public-key cryptography,hashing algorithms lend themselves toeasy programmatic optimization. Thealgorithms such as MD5 and SHA-1can easily be coded completely inassembly, and much of the code isredundant, allowing for fairly goodcode factoring. These algorithms arealso comparatively faster than othercryptographic operations, so they willrarely be the cause of any performancebottleneck.

Random numbers are also importantin the generation of session keys.Seeding a pseudo-random numbergenerator (PRNG) with a constantnumber or the current time is not consideredadequate in high-end securityapplications because it creates vulnerabilityto certain forms of attack (see”Generating Random Numbers,” EricUner, June 2004, p.14). It may be adequate999 of 1,000 times for lower-endapplications, but for that 1,000th timewhen a skilled and motivated hackerattacks your web-enabled embeddedsystem, something a bit more random,like network entropy, should be used.This works by using received TCP/IPor UDP packets to generate entropyfrom a real-time clock (the time atwhich a network packet is received isconsidered fairly random).

Very high security applications willgo further and use real hardware randomnumber generators (RNGs), butif a good PRNG is good enough foryour PC, it ought to be good enoughfor most embedded applications aswell.

A preemptive real-time operatingsystem (RTOS) may use moreresources than is desirable for a systemthat already must stretch its resourcesto handle a TCP/IP stack, a web server,SSL, stored web pages, and, ofcourse, the application itself, so acooperative multitasking method suchas state machines may be preferable.HTTPS, SSL, and a TCP/IP stack canall be driven on an RTOS-less systemusing a single tick-type function, calledas often as possible in “big loop” programs.If an RTOS is used, the sametick function could also be put in ahigh-priority task. There are a fewcomplications, since HTTP is inherentlya stateless protocol and SSLrequires state to be preserved in orderto function. These problems are notdifficult to overcome and are mostlyexplained in RFC 2818.

Checklist
A viable 8-bit implementation of SSL willinclude:

  • SSL 2.0, at least the session-initiationportion
  • SSL 3.0, and preferably TLS 1.0
  • SHA-1
  • MD5
  • RC4
  • RSA
  • Multiprecision arithmetic for RSA, preferably with hardware acceleration.

Not just for 32-bitters
The market for secure web-enabledembedded applications is growing. SSLis generally perceived as feasible only fordesktops higher-end 32-bit embeddedsystems, but with some creativity andcareful coding it's perfectly usable on 8-bit microprocessors.

Timothy Stapko is a software engineer forZ-World/Rabbit Semiconductor, where hedevelops networking applications and programsembedded devices. Tim has a master'sin computer science from the University ofCalifornia at Davis. You can reach him at.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.