Miniature Web Server

Just how small can a usable HTTP server be? The answer might surprise you!

The phenomenal growth of the Internet, and its entry into many aspects of daily life, has led to the suggestion that TCP/IP will find its way into the most humble of domestic devices. Leaving aside the marketing question as to which appliances should be web-enabled, we're faced with a fundamental technical question: how small can a web server get?

It is not my intention to get caught up in a battle to create the world's smallest Web server, since this would require the use of highly optimized machine code, and result in an implementation so inflexible as to be of little practical use. Instead, we will be looking at the underlying techniques for miniaturizing a web server, based on a microcontroller implementation in the C language. Furthermore, the web server must have the ability to control and monitor real-world I/O signals, to pave the way for its use as a net appliance.

To do this, I'll need to re-evaluate the web server from scratch. I'll be taking a fundamental look at TCP/IP from first principles, with a view to establishing the core elements that are required, and the simplest way of implementing them. I'll review:

  • Typical microcontroller hardware and its constraints
  • The network protocols needed for a web server
  • Implementation techniques to minimize resource usage

It is important to remember that a web server is just a method of delivering web pages (web content) to the user; there's little point in creating a server that can only deliver the most rudimentary of pages. Discussion of the web content is outside the scope of this article.

Microcontroller hardware
A microcontroller is a computer-on-a-chip that is designed for high-volume low-cost applications. When compared with a conventional CPU, there are notable differences.

Program memory
The program code is usually stored in a dedicated on-chip ROM, which is programmed by inserting the microcontroller into an external system (such as an EPROM programmer), or is programmed with the device already fitted into the final system, using a dedicated serial interface. There may be two distinct memory spaces: program memory and data memory, with severe limitations as to their usage; the data memory may be unusable for program storage, and vice-versa.Until recently, the size of the on-chip program memory (1,000 bytes or so) would have been a problem for the kind of protocols I will be discussing, necessitating the use of an external ROM and its associated support circuitry. However, on-chip ROM sizes of 8KB and greater are now widely available, so it is possible to embed complex applications without additional external memory.Similarly, it used to be the case that all microcontrollers had to be programmed in assembly language to make the best possible use of their slender resources. The increase in on-chip RAM size, coupled with improvements in compiler efficiency, means that a high-level language can be used, with the attendant benefits of source-code readability, maintainability, and portability.

Data memory
At the time of writing, there seems to be an unfortunate tendency amongst the microcontroller manufacturers to restrict the amount of data memory (RAM) on devices with low pin counts; that is, it is necessary to use a device with a large number of I/O pins if one requires a few kilobytes of RAM. To a certain extent, the reverse is true when dealing with protocols. A protocol handler chip needs lots of RAM (for data buffering and state-machine storage), yet may only need a few pins for data I/O and diagnostic indicators.To fulfill the objective of being able to embed the microcontroller into an appliance, the decision was taken to use a device with a low pin count, and hence a small amount of RAM. This has profound effects on the design philosophy, as will be discussed later; for example, the total RAM size is considerably smaller than the size of the messages that will actually be sent and received. Nevertheless, the discipline of carefully scrutinizing RAM usage is a sensible one, and will still be of use when dealing with large RAM sizes.

CPU limitations
To minimize the complexity of the microcontroller CPU core, significant compromises have to be made in the instructions it implements, such as:

  • Native word-length restrictions. Arithmetic and logical operations may be restricted to eight bits only. The high-level language compiler will support longer word-lengths by chaining instructions, but the programmer must be aware of the speed and code-size penalties this carries.
  • Stack size. The processor may have a call/return stack of limited size implemented in hardware, which will restrict the depth to which function calls can be nested.
  • Local variables. The high-level language compiler must support local variables, even though the CPU may have no suitable hardware for this-it may have no provision for stack-based data storage or index-plus-offset addressing. The compiler can work around this by allocating a fixed memory location for each local variable, but this will make the code non reentrant, and prevent the use of recursive calls.
  • Pointers. C programmers are accustomed to using pointers for a wide variety of purposes, and they are particularly useful when encoding or decoding data streams. The separation of the microcontroller memory space into special-purpose areas, and the possible segmentation of those areas, can lead to considerable inefficiencies in the use of pointers, and make it impossible to use them for certain tasks. This reinforces the need for careful planning of memory usage, with particular reference to the buffering of incoming and outgoing data.

Choice of microcontroller
A very large number of microcontrollers are on the market, and I don't claim to have performed an exhaustive analysis to find the best one. The choice of the Microchip PICMicro family was dictated by personal preference based on past experience, and the PIC16C76 device (shown in Figure 1) was chosen as a good compromise between a small physical size (low pin count), and adequate on-chip peripherals, such as a bi-directional serial port that can use interrupts.

PIC16C76/16F876
The PIC16C76 has the following on-chip hardware:

  • 8,192 14-bit words of EPROM program memory
  • 368 bytes of RAM data memory
  • Eight-level hardware stack
  • Interrupt capability
  • 8-bit analogue-to-digital converter (ADC) with input multiplexer
  • One 16-bit timer and two 8-bit timers
  • Two capture/compare/PWM modules
  • Synchronous serial port (SSP)
  • Universal synchronous asynchronous receiver transmitter (USART)
  • 22 input/output pins (parallel I/O, shared with the above functions)

The PIC16F876 is essentially an enhanced version of the PIC16C76, with flash memory in place of EPROM and additional EEROM for non-volatile data storage. This is more convenient, as it can be programmed and erased in-circuit.

The memory architecture will seem fairly strange to a PC programmer. It has three completely separate memory spaces:

  • Read-only program memory
  • Read/write data memory
  • Dedicated hardware stack

I must stress that these spaces are absolutely separate, and are accessed using completely different addressing schemes.

Program memory
The program memory is 14 bits wide, so that every instruction can fit in a single program word, and take a single CPU cycle (four external clock cycles) to execute, with the exception of branches, which take two CPU cycles. The program memory only contains program instructions and no data-not even constant data, because there is no mechanism for accessing it. The program memory is segmented into four banks of 2,000 words each. There are specific bit-manipulation operations to switch between banks, which are automatically inserted by the compiler, with the limitation that one function cannot straddle two banks.

The inability to put constant data in the program space (or, more specifically, the inability of the CPU to read any such data) makes it difficult to store large amounts of constant data, such as constant strings. There is a “return with a byte value” instruction, which has to be used repeatedly to form the string from a series of character-value instructions. Such strings have awkward properties; they can't be accessed by pointers, and must be copied into RAM before use. Although the Custom Computer Services PCM compiler provides some support for this, mistakes can easily go undetected, so string constants should be avoided if possible.

Data memory
The data memory space is eight bits wide and is shared between the I/O registers and the workspace RAM. It is segmented into four banks, using bank-switching bits that are completely independent of the code-space switching. The workspace RAM occupies the memory locations that aren't taken up by I/O registers (of which there are a large number), so the 368 bytes of RAM is fragmented into the following address ranges:

Bank 0: 96 bytes 20h – 7Fh
Bank 1: 80 bytes A0h – EFh
Bank 2: 96 bytes 110h – 16Fh
Bank 3: 96 bytes 190h – 1EFh

In addition, the following areas are common to all banks; data written in one bank can be read in all others:

70h – 7Fh
F0h – FFh
170h – 17Fh
1F0h – 1FFh

It is fortunate that we don't have to work our way around this strange map, but can offload the job onto the compiler. However, it is possible to confuse the compiler into generating wrong code, so it is best to be cautious in the use of data pointers.

Hardware stack
There is a hardware stack for machine-code calls and returns; it is 14 bits wide (to accommodate the full address range) and eight levels deep. We have no provision for extending the stack into RAM, or detecting overruns. Nesting function calls more than eight deep will have unforeseen consequences, though this should be detected at compile-time.

Data values cannot be stored in the stack, so how are function arguments and local variables handled? The compiler assigns fixed locations in RAM for these variables, having carefully assessed the function nesting, to ensure one function won't destroy another's variables.

External memory
A web server needs ample storage for web pages, and the on-chip ROM is clearly inadequate for this. The I2C bus is a simple two-wire synchronous interface that can be used to add external devices, such as a 32KB EEPROM, which is adequate for a miniature web server.

Network interface
Mindful of the large numbers of laptop and palmtop computers with an infrared interface, and the fragility of sub-miniature serial connectors, I decided to implement an infrared interface, using the IrDA (Infrared Data Association) standards for low-level communications.

After considerable work, it became clear that the IrDA protocols weren't as low-level as I thought, and the simple task of sending IP datagrams over an infrared link demanded a significant additional programming effort, which threatened to eclipse the rest of the project. In view of this, I decided to revert to an RS-232 SLIP interface, which may be a lowest-common-denominator interface, but is still very useful for a wide range of applications.

The PIC micro has an on-chip asynchronous serial interface (USART), so the only extra network components are the level-shifters for the RS-232 voltage-levels. To allow modem interfacing, a three-wire interface is implemented, using a general-purpose output line for the output handshake.

Web server protocols
From the top down, the protocols we need for our web server are:

  • HTTP-document request/response
  • TCP-reliable communications
  • IP-low-level data transport
  • ICMP-diagnostics (ping)
  • SLIP-serial interface
  • Modem emulation

I will now review each of these, with the aim of creating a small, yet fully-functional, web server implementation.

HTTP request
The Hypertext Transfer Protocol (HTTP) defines a request-response mechanism for obtaining documents from a web server.The web browser sends a request to the server in the form of a multi-line string, each line being terminated with Carriage Return and Line Feed ( and ) characters. The first line specifies an upper-case “method” (that is, command), followed by an argument string. The most common method is “GET,” followed by a filename to be fetched, and a protocol version identifier. Subsequent lines contain additional information about the browser configuration:

GET /index.htm HTTP/1.0
User-Agent: Mozilla/4.5 [en]
(Win95; I)
Pragma: no-cache
Host: 10.1.1.11
Accept: image/gif, image/x-
xbitmap, image/jpeg, */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8

If we're keen to keep memory usage to a minimum, the question must be asked: what use is the additional information? We don't care about the type of browser, our server hasn't got sufficient resources to maintain a cache or an access log, and if the file we're sending has an unacceptable character set, there's nothing that can be done about it. Even the HTTP version number on the first line isn't needed; we're planning to use the simplest HTTP interface anyway.

It would seem that we can chop off the remainder of the command after the filename, without losing any functionality, but what is the maximum length of the filename? Surprisingly, this is largely under the control of the server, for the following reason: when the user wishes to access the server for the first time, an IP address is entered into a browser window, such as:

http://172.16.1.2

The web client (browser) locates the given address, and submits the request to that server using a null filename:

GET / HTTP/1.0

By convention, most web servers interpret this as a request for the default index file, which is INDEX.HTM or INDEX.HTML. This, in turn, contains pointers to other files on the server. While the user clicks on pages we've provided, they will only be requesting filenames we've defined, so if we keep these short, we won't have to handle any long filenames.

A possible exception to this rule would occur if we included any HTML forms on our server. When a form is submitted, all the state-information is appended to the filename, making a much longer string. Various other difficulties are associated with forms-handling on a very small web server, so for the time being, we'll assume that forms aren't being used.

HTTP response
The response from the server to the client consists of an HTTP header, and, if the request succeeded, the document itself. The header consists of several text lines, each terminated with a delimiter. The header is separated from the document's contents by a single blank line.

As a minimum, the header must identify the HTTP protocol version, the success or failure status, and the content-type of the document (plain text, HTML, GIF graphic). For example:

HTTP/1.0 200 OK
Content-type: text/html


Test page
This is a test page

Unfortunately, it isn't possible to send out the same HTTP header for all files; it must be adapted to reflect the file's contents. A minimum list of file formats would be:

text/plain
text/html
image/gif

though it would be highly desirable to add other formats to the list.If the client's request fails, an appropriate HTTP error message must be sent out, and it is also desirable to send out a document explaining the problem, so that the browser has something to display. The simplest explanation could be in the form of a plain text document, which has no fancy formatting:

HTTP/1.0 404 Not found
Content-type: text/plain

File 'abc.htm' not found

TCP
To convey the HTTP request and response between client and server, a reliable communications channel is required. This is provided by Transmission Control Protocol (TCP), which provides a reliable logical connection between two endpoints on the network, known as sockets. The objective of TCP is to make the network connection appear as transparent as possible. Regardless of the network type, or the distances involved, data should be transferred between sockets in as timely and error-free fashion as possible.

TCP sockets
A socket is an endpoint of a network connection that acts as a source and sink of connection data. Each active socket is implicitly linked to an application that sends and receives this data. In the case of a web client, the application is a browser; in the case of a web server, the application is an HTTP server as described in the previous section.Aside from the IP (Internet Protocol) addresses of client and server, the other parameters that define a socket are the port numbers. In the case of servers, a port number defines the service being offered; for example, a web server should only respond to incoming requests on port number 80.

At any one time, a server may support several simultaneous transactions, each of which involves a unique client-server socket pair. Clients will frequently open up several simultaneous connections to the same server, in order to fetch several items in parallel, such as the graphical items on a web page. To save on resources, a web server can restrict the number of simultaneous connections to one, but this leads to a very sluggish response, even when in use by only one client. If the client attempts to fetch, say, a page of text and three graphic images simultaneously, the server can do one of two things:

  • Ignore the request. The TCP client will retry after about 1.5 seconds; the next retry time doubles to three seconds, the one after that to six seconds, and so on. If several images are being requested, there can be an unacceptably long wait until the last one is successfully obtained.
  • Reject the request. If a TCP reset is sent, the client will quickly retry the request; I have observed retry rates of around two per second, for 40 seconds. This is a lot of extra traffic for the serial link to handle, and will only succeed in slowing up the data transfer yet further.

Ideally, we would respond to as many simultaneous network requests as the network bandwidth would permit; the problem is how to do this, without requiring a large amount of socket storage.

Passive open
Convention dictates that a server application must passively open a TCP socket, before exchanging data through it. This model is derived from standard implementations on multi-user systems, where there is a strong separation between the system's TCP code and the user's application code. To fit in the microcontroller, our application code will have to be tightly coupled to the TCP stack, so the distinction between the two becomes blurred.

There is no point in maintaining the fiction of passive open; if a network frame arrives, and the server has the resources to handle it, then it should do so.

Sequence space To control and monitor the establishment of a connection, transfer of data, and closure of a connection, each TCP transmission (segment) is identified by a TCP sequence number, which refers to its position in an imaginary sequence space. The start and end of a transaction (known as SYN and FIN) can be seen as fixed points in this space; using the sequence number, the recipient can place an incoming segment in its rightful location within that space, and detect whether it forms a logical progression from the last segment received, is a duplicate of a previous segment, or is a future segment received out of sequence.

The sequencing process is symmetrical; both client and server use sequence numbers to place their transmitted segments in the outgoing sequence space and send acknowledgment numbers to confirm the point they have reached in the (completely separate) incoming sequence space. Figure 2 shows a sample data transfer, with 120 hex bytes being sent in two unequal-sized blocks. In addition to the actual 32-bit sequence number, the relative sequence number is shown in brackets; I've used the convention that the first data byte has a relative sequence number of zero, which means that the first synchronizing byte has a value of –1.

At the start of any transaction, the 32-bit sequence number must be set to a completely new value, to avoid confusion with past transactions. The good news is that, within certain constraints, the server can choose any 32-bit starting sequence value it likes. This suggests it might be possible to use the sequence number as a kind of file pointer, indicating the current location of the file (in ROM) being sent. The bad news is that the sequence value must be chosen before the first segment of the new transaction is sent. At this time, the client hasn't yet revealed the filename to be accessed, so it isn't possible to choose a value that is convenient for accesses to that file.

A lesser option, which is still very useful, is to choose a sequence value that reflects the relative position within the file. This has already been done in Figure 2; the least-significant word of the sequence number has the following hex values:

FFFF: Initial SYN marker
0000: First data byte of file
xxxx: Offset of xxxx into the file

So long as the file size is less than 64KB (a reasonable assumption, given the ROM size limitations of our miniature web server), this technique will result in a useful simplification to our code.

Of course, we have no control over the client's choice of sequence number, which makes it impossible to use a similar trick with that. However, the client's request string is sufficiently short that it should fit within a single frame, so we have only the one incoming data frame to worry about; if any more arrive, they can be discarded.

Managing connections It is traditional to view the opening and closing of a TCP connection in terms of the state diagram transitions. This implies we need to maintain an individual state machine for each simultaneous connection (that is, each socket), which will consume a lot of RAM.

To solve this problem, it is worth bearing in mind that the web server always responds to client HTTP requests; it never takes the initiative. If we could also guarantee that it only ever responded to TCP segments, rather than initiating them, then a massive simplification in the TCP stack would result. We wouldn't have to store any node addresses or port numbers, since these could just be copied from the incoming to the outgoing message. By careful choice of initial sequence number (as discussed in the previous section), we can deduce the location in the current file by examining the incoming acknowledgment value, so we'd always know what action to take next. We'd be creating an implicit, rather than explicit TCP state machine, using the incoming acknowledgment value as a state variable. Such implementation is known as “stateless” TCP, because the server doesn't keep any state information about the client.This looks promising, but two more problems need to be solved:

  • Current filename. If the server doesn't keep a record of which client requested which file, how does it know what to send next? It may know the relative position within the file from the sequence number, but if it doesn't know the filename, this isn't a lot of use.
  • Retransmissions. A normal TCP stack will retransmit a TCP segment if it doesn't receive an acknowledgement within a certain time. If our stack doesn't store any information about its clients, it won't ever be able to retransmit anything without being prompted.

My initial attempt to solve the first problem was a fiendishly clever plan to use the least-significant bits of the sequence number to indicate the filename (or more precisely, the index number in a file directory). So long as the file is sent in blocks of, say, 64 bytes and the front is padded with a variable-length HTTP message that depends on the filename, then…well, work it out for yourself. The disadvantage of this technique is the inflexibility of having to send out fixed-length blocks, which is a real nuisance when generating web pages dynamically.

To solve the second problem, it is tempting to rely on the client's retry mechanisms, but this doesn't work. The server can't rely on receiving an acknowledgment for every segment, since some may have been lost in transit. If the server stops sending data (because it failed to see an acknowledgement), it will be a long time (two hours) before the client's “keepalive” timer triggers it to send a “keepalive probe” to see if the server has crashed. That is a long time to wait; the client application will have abandoned the connection long before.

A simple solution to both problems is to restrict the outgoing page to one TCP data block (segment), and to send it out as soon as the client's HTTP request has been received.

One-segment pages
We are dealing with a small web server, with extremely limited resources, so the idea of fitting a web page into a single TCP segment isn't quite as crazy as it sounds. True, this may force web designers to be less lavish in their use of page embellishments, but is that necessarily a bad thing? A small web server has a small amount of information to convey, and padding it out unnecessarily is pointless.

We have adopted the usual maximum SLIP size of 1,006 bytes, the IP and TCP headers are a minimum of 20 bytes each, so our maximum TCP data size is 966 bytes. It is remarkable how much can be achieved within this limitation.

The key advantage of one-segment pages is that a one-to-one relationship exists between the actions of the client and the server; this relationship is reinforced if the closing FIN is piggybacked onto the page data.

TCP segment format
A TCP header plus data block is known as a “segment.” The format is shown in Figure 4.

Destination port. We'll be checking the destination port field of every incoming TCP segment to see if it refers to a service we support; at present there are only two of these:

Port 13: daytime service
Port 80: HTTP server

The daytime service returns a simple string giving the current date and time. It is by no means essential for a web server to provide this, but is nevertheless a useful step on the road to creating an HTTP server, since the TCP transaction is simpler and easier to debug.

Sequence and acknowledgement numbers. The 32-bit sequence and acknowledgment numbers have already been discussed. 32-bit arithmetic is a problem on the PIC, since the CPU only supports 8-bit operations directly. Our chosen compiler provides no support for 32-bit data types, so we have to create all our own functions. If we assume that the incoming and outgoing data is less than 64KB, then a useful simplification is to only perform 16-bit operations on the 32-bit values, propagating the carry value to the upper 16 bits.

Header length. The header length reflects the length of the standard header plus the options; there is no length value for the data, since this can be deduced from the value in the lower protocol layer (IP).

Flags. There are 1-bit Boolean option flags:

FIN 0x01
SYN 0x02
RESET 0x04
PUSH 0x08
ACK 0x10
URGENT 0x20

I have defined the header length and flags as two single-byte values for simplicity, but it should be pointed out that the standard defines these as a 4-bit header length, a 6-bit reserved field, and then six code bits.

Window size. The window size indicates the amount of receive buffer space that is available, and is used for flow control. It can be set to a fixed size on transmit, and ignored on receive. We can safely assume that any client contacting our server has sufficient buffer space for our humble pages. (If they don't, it was pretty stupid of them to send the request in the first place!)

Urgent pointer. This can be ignored, since all data can be treated with equal priority.

Options. In addition to a variable-length data field, there is a variable-length header options field. Mercifully, we don't have to generate any options, and can safely discard any incoming options.

Checksum. The only awkward point to note about this header is that it must include a valid checksum value, which is computed across the whole TCP segment, plus a pseudo-header containing parts of the IP header. See Figure 5.

The usual checksum computation technique is to scan the TCP segment image in memory, but we don't have sufficient RAM to do this. If the checksum came at the end of the data, it would be easy to compute it on-the-fly as the header plus data was being sent out, then append the resulting value to the data. TCP checksum generation is a major issue in our small implementation, particularly when attempting to include dynamic data on the web pages.

Long segments
If we are to make any headway creating a web server, we'll have to handle TCP segments that are larger than the available RAM. In the case of transmitted segments, the bulk of data will reside in external ROM, and so will be copied directly from there to the RS-232 output. We'll also be receiving long HTTP requests, where the only items of interest are in the first few tens of bytes.

Transmit. The IP and TCP headers must be created in RAM, so that their checksums can be computed. For normal (short) segments, these images are then SLIP encoded (by inserting escape sequences) while they are being sent down the serial line. If the segment is long (that is, includes ROM data), then a flag is set such that the ROM-to-SLIP transmission takes over when the RAM-to-SLIP transmission stops. This depends heavily on the TCP checksum being known in advance, that is, pre-computed for the ROM image, and added in when the TCP header and pseudo-header checksum is being calculated.

Receive. We're only interested in the start of an HTTP request, and can happily discard the rest. TCP doesn't possess any mechanism for discarding data; if we don't acknowledge it, it will simply be resent until we do! There are two possible solutions: we could reduce the TCP window size so that the request is sent in two or more chunks, and discard all but the first chunk. A simpler method is to only store the start of each segment data in RAM and discard the rest, and this is what we'll do. It is tempting to ignore the checksum on the incoming TCP segment, and assume it is correct, but this is rightly frowned on in the TCP community. Instead, we could compute a checksum for the discarded portion of the segment, and add it on after the complete segment is received. The approach I have adopted is a minor variation of this, whereby the checksum of all the incoming TCP data is computed separately, irrespective of how much is stored in RAM or discarded. This is added to the value computed from the TCP header and pseudo-header, which are always stored in RAM.

IP

To convey the TCP segments between hosts, the Internet Protocol (IP) is used. After the difficulties of TCP, IP is relatively easy to implement.

Datagram format
An IP header plus data block is known as a datagram. See Figure 6.

Version and header length. We're using IPv4, and the default header size (measured in 32-bit words) is 5, so we'll be assuming a value of 0x45 for both transmit and receive.

Service. This field is used to prioritize datagrams, and is set to zero, which is normal precedence.

Length. The total datagram length in bytes, including the IP header.

Ident. A value that is incremented for each datagram sent.

Fragmentation. IP allows a large datagram to be split into two or more smaller datagrams, using a process called fragmentation. Considering the acute lack of RAM on our microcontroller, it is impossible to support fragmentation. This is unlikely to be a problem in practice, since it carries a very significant performance penalty, so is generally avoided wherever possible.

Time to live. An expiry time for the datagram, to prevent it from endlessly circulating the Internet. A constant value generated on transmit, and ignored on receive.

Protocol. Indicator of which protocol is used in the data area of the datagram. We'll only be using the following values: ICMP and TCP.

Checksum. A simple checksum of the IP header only.

Source and destination addresses. These are IP addresses, expressed as 32-bit values. An important question is what IP address to assign to our system, and how to program it with that address. This issue can be side-stepped by making the assumption that as we're using a point-to-point serial link, there can only be one intended recipient for all the network traffic, namely our web server. Hence we can disregard the destination IP address value, but must be careful to use this value in the source address field of our outgoing datagrams.

Options. Header options are occasionally used to give tighter control over datagram routing; for simplicity, we won't be accepting or transmitting any options.

Long datagrams
To accommodate long TCP segments, we have to accept IP datagrams that are longer than the available RAM. Unlike TCP, there are no checksum problems, since the IP checksum does not include the data area, so we can discard excessive input data or add extra output data, without any checksum problems.

ICMP
Internet Control Message Protocol (ICMP) is very useful for performing network diagnostics. An ICMP message is contained within the data field of an IP datagram.

Ping
The most commonly used facility is the ICMP Echo Request, or “ping.” We don't have to implement this on our web server, but it will be very useful to check the lower protocol layers prior to implementing the web server itself. The echo request is type 8 code 0, and the reply is type 0 code 0. The checksum covers the complete ICMP header and data area. The ident and sequence numbers, and all the data, are echoed back to the sender as a check of network integrity.

Buffer Size
The default data size for a Unix ping is 64 bytes, which is too large for our available buffer RAM. Fortunately, the ping utility has an argument to specify the data size, so it can be reduced to, say, 32 bytes, which is the default size for DOS systems. This requires a buffer size of 60 bytes (including 20-byte IP header and 8-byte ICMP header), which is more realistic.

SLIP
This is a simple method of converting a stream of serial data characters into a defined block, which we're calling a frame. It's easy to implement: a delimiter-character is put at the end of each frame (and also, by convention, at the start). If the delimiter character is encountered in the data stream, a two-character escape sequence is substituted.

#define SLIP_END 0xc0
/* SLIP escape codes */
#define SLIP_ESC 0xdb
#define ESC_END 0xdc
#define ESC_ESC 0xdd

/* Start a transmission */
void tx_start(void)
{
    putchar(SLIP_END);
}

/* Encode and transmit a single SLIP byte */
void tx_byte(BYTE b)
{
    if (b==SLIP_END || b==SLIP_ESC)
    {
        putchar(SLIP_ESC);
        putchar(b==SLIP_END ? ESC_END
            : ESC_ESC);
    }
    else
        putchar(b);
}

Modem driver
The implicit assumption in most PC communications software is that serial networking should be configured for access via a modem and telephone line. We would like to be able to link a PC directly to our PIC server's serial port, so the server will have to impersonate a modem to keep the PC happy.

Fortunately, this only involves accepting modem command strings, which are prefixed by “AT” and delimited by a Carriage Return character, and returning an “OK” string. This is usually sufficient, though it is wise to assert the Data Carrier Detect (DCD) hardware handshake line to the PC as well, in case its software uses this to check that the (emulated) phone link is still functioning correctly.

Table 1: A typical PC-to-modem interaction
PC Modem
AT OK Check that modem is responding.
ATE0V1 OK Disable command echo, enable text-message responses.
AT OK Check that modem is responding.
ATDT12345 OK Tone-dial telephone number 12345.

A typical PC-to-modem interaction is shown in Table 1. Some modem scripts look for a CONNECT message after dialling, but this does not appear necessary when using the standard Windows modem types. Disconnection follows a similar pattern, as shown in Table 2.

Table 2: Disconnection
PC Modem
ATH OK Disconnect from line.
ATZ OK Reset modem.

Software techniques
Having discussed the limitations of microcontrollers, we need to work out how we can squeeze the complete TCP/IP stack into one of them.

RAM limitation
The most acute problem, by far, is the lack of RAM. The usual assumption is that the incoming and outgoing frames are stored in RAM, and structures are overlaid onto this RAM, so that specific values can be checked, read, or modified, for example:

/**** IP packet ('datagram') ****/
typedef struct
{
    IPHDR    i;            /* IP header */
    BYTE        data[MAXIP];     /* Data
                        area */
}    IPKT;<

If we were to use this technique, the only way of cramming these structures into 368 bytes of RAM is to severely restrict the maximum frame size we can send or receive. At best, we might be able to accommodate a 128-byte frame, which is unacceptable. To permit the use of full-size frames, they must be decoded on the fly as they are received, and created on the fly as they are transmitted.

To achieve a good response time, the incoming frame must be decoded on the fly as it is received, and the outgoing frame must be prepared on the fly as it is being transmitted.

Creating protocols on-the-fly
Consider the code fragment shown in 1. It isn't too hard to imagine the same data being created on the fly as it is being transmitted, using code such as that in Listing 2, where the “put” functions send the values via a SLIP driver to the serial port. The abolition of the structure reduces RAM consumption significantly, and also reduces the code size slightly; the complicated indexed-addressing assignment operation is replaced by a single-word function call. Unfortunately, the new code is harder to debug, since there isn't a convenient memory image to browse when we want to check the last frame sent. To compensate, we may have to employ the services of a protocol analyzer to monitor the external communications when debugging.

Listing 1: Packet assembly using structures

/* ***** IP (Internet Protocol) header ***** */
typedef struct
{

BYTE vhl, /* Version and header len */
service; /* Quality of IP service */
WORD en, /* Total len of IP datagram */
ident, /* Identification value */
frags; /* Flags & fragment offset */
BYTE ttl, /* Time to live */
pcol; /* Protocol used in data area */
WORD check; /* Header checksum */
LWORD sip, /* IP source addr */
dip; /* IP dest addr */

} IPHDR;

IPKT *ip;

ip->i.vhl = 0x40+(sizeof(IPHDR)>>2); /* Version 4, header len 5 LWORDs */
ip->i.service = 0; /* Routine message */
ip->i.len = len + sizeof(IPHDR) /* Data length */

Listing 2: On-the-fly packet assembly

put_byte(0x40+(sizeof(IPHDR)>>2); /* Version 4, header len 5 LWORDs */
put_byte(0); /* Routine message */
put_word(len + sizeof(IPHDR)); /* Data length */

Checksums
Another problem occurs with the calculation of protocol checksums. Usually a memory image is scanned to compute these, but we don't have a memory image to scan, so we'll have to get the “put” functions to do the job, for example:

/* Send a byte out to the SLIP link, /*then add to checksum */void put_byte(BYTE b){    putchar(b);    check_byte(b);}

It would have been really helpful if the checksum was the last word transmitted, because then we could just send out the calculated value as part of the end-of-frame sequence, as shown in Listing 3.

Listing 3: Checksum after data (wishful thinking)

put_byte(0x40+(sizeof(IPHDR)>>2); /* Version 4, header len 5 LWORDs */
put_byte(0); /* Routine message */
put_word(len + sizeof(IPHDR)) /* Data length */
put_word(sum); Checksum value */

Unfortunately, the TCP checksum is in the header, where the data to be checked hasn't been transmitted yet. If the data is being fetched from ROM, then its checksum can be pre-computed and stored in the file directory, and just added on to the TCP header fields to produce the final value.

Reception
Just as the transmit structures can be replaced by a string of function calls, so can those used for receive. Rather than storing the complete frame and then using structure references to pick out the elements of interest, the frame is scanned on input, with only the important information being stored (for example, the code in Listing 4 scans the start of the IP header).

Listing 4: On-the-fly packet decoding

if (match_byte(0x45) && skip_byte() // Version, service
get_word(iplen) && skip_word() && // Len, ID
skip_word() && skip_byte() && // Frags, TTL

I'm using three basic types of input functions:

  • match: ensures that the specified value is present
  • skip: checks that the byte(s) are present, then discards them
  • get: checks that the byte(s) are present, and saves them for re-use later

These functions are surprisingly versatile, in that they allow us to indicate which values are unimportant, those that must be checked but need not be retained, and those that must be saved for later use. If these values are separated out when the frame is received, we can free up a significant amount of storage space.

All these functions return a Boolean true/false value, so the surrounding if statement will only return “true” if all the required data has been obtained and matched correctly. The obvious disadvantage with this method is that the processor will tend to lock up if communications fail in mid-message; it will wait forever for a byte that never comes. This can be avoided by inclusion of a timeout in the function that fetches the bytes from the serial link, which makes all subsequent input calls fail (return “false”) until communications is restored.

Source code

In this article, I have reviewed the key areas associated with creating a microcontroller-based web server implementation.The implementation has sufficient resources to support the creation of useful web pages, including dynamic data. Full source code for this project (and various PC-based network utilities) is included in my book TCP/IP Lean: Web servers for Embedded Systems (CMP Books, 2000).

Jeremy Bentham cofounded the industrial networking company Io Ltd., as well as the software consulting offshoot IoSoft Ltd., where he develops embedded TCP/IP solutions. He was software manager at Arcom Control Systems, a UK manufacturer of boards and systems for industrial applications. Contact him at .

2 thoughts on “Miniature Web Server

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.