I
t goes without saying that Internet-enabled devices are all the rage these days. A few short years ago, the only mainstream embedded users of the Internet were set-top boxes and network infrastructure equipment. Today, on the other hand, everybody wants to interact with every gadget they own via a web browser, and most of them can provide rational reasons for doing so.
But to most embedded devices, the Internet doesn't come easy. What do you do when your company's main product doesn't have a network port? How can critical applications like industrial controllers be placed online, without disrupting their primary functions? And what about the hordes of existing devices that don't have the resources to support TCP/IP and other Internet protocols?
In some of these situations, an HTTP proxy may come to the rescue.
What is a proxy?
Simply put, in a networking context, a proxy is any program that provides a communications bridge that other applications can use to exchange data. Proxies are widely used to help protect applications from each other, as in the case of a network firewall. Our situation, however, illustrates another popular use for proxies: as translators between applications with seemingly incompatible
communications strategies. Such a proxy can bring the Internet to an embedded system, while allowing the embedded target to speak its native tongue.
Proxy implementations come in a variety of shapes and sizes, which makes them difficult to present comprehensively in a single article. The fundamental concepts are the same in almost all cases, however, so even the relatively limited treatment I provide here will be useful in a more general setting.
For the remainder of this article, I will assume
that the motivation for web-enabling a legacy device is to allow a customer to interact with the product using an ordinary web browser. This assumption allows us to focus on a single kind of proxy, one that can translate between HTTP and the target device's own, proprietary protocol.
What is an HTTP proxy?
An HTTP proxy, as I present it here, is a program that implements a browser's HTTP requests for data using one
or more proprietary message exchanges with the target embedded device. Once this exchange is complete, the proxy returns the result to the client as an HTML document, or some other kind of browser-friendly format like PNG, JPG, or even raw ASCII text.
The proxy executable is placed at the most convenient point between the client and the target, depending on the desired capabilities of the overall solution. In most cases, the best location for the proxy is on the PC running the browser, especially
the case when the target doesn't support Ethernet, or access is needed only when the client is standing next to the product.
When something resembling true Internet-wide connectivity is necessary, however, the proxy can be installed on an inexpensive, single-board computer located between the target and the target's link to the network.
Why a proxy?
The traditional approach to putting a device "on the Internet" is
to add TCP/IP and various other capabilities to the target itself. While this approach has its advantages, it is usually an unreasonable option for mature embedded products, particularly those that lack the necessary hardware interfaces, spare memory, or processor cycles.
Proxies enable Internet-style communications with legacy hardware without modification of the target application (of course, the target must support some kind of communications capability beforehand). The proxy application runs
on a computer located somewhere between the client's browser and the target, and uses the target's native tongue to extract information to send back to the web browser. As a result, the target device has no idea that it has been Internet-enabled.
A proxy-based solution is more flexible than an embedded system that speaks IP directly. Because it doesn't need to physically coexist with the target application, a proxy can support the overhead necessary to present a uniform user interface for multiple
target versions. In addition, the target device's visual interface, as shown on the client's browser, can be changed without taking the target system out of service simply by upgrading the proxy application.
A proxy also permits communication with targets that don't offer a connection medium normally associated with IP protocols. For example, an HTTP-to-CAN proxy could be used to provide browser access to a target that had only a CAN port.
HTTP 101
Obviously, an understanding of how web browsers communicate is needed before we can use a browser to interact with an embedded target via its HTTP proxy. I'm not going to try to train you for a new career as a web server designer in this section, but I will try to cover all the basics.
Contrary to popular belief, your web browser's primary language is actually HTTP, not HTML. When you type in a URL like http://www.embedded.com/index.html, for example,
your browser sends the following HTTP message to the web server on the machine named
www.embedded.com:
1
GET /index.html
A typical web server's response to this message is to return the contents of an HTML file named index.html, but this isn't always the case. In fact, the particulars of the response are left entirely to the server, and sophisticated ones like
Zope (
www.zope.org) routinely break the conventional notion of a one-to-one mapping between URLs and file names on the serving machine (in Zope's case, this divergence is a good thing).
Moving on, when you fill in some text and then click on a button in an HTML form (the home page for an Internet search engine, for example), your browser sends a slightly different HTTP message to the server:
GET
/query?textfield
=textdata&pressme=press_here
This message tells the server that you typed the word
textdata
into a field called
textfield
, and then clicked button
pressme
(which was showing the text
press_here
at the time) in an HTML form called
query
. As with the previous message, what happens next is entirely up to the server. Often the result is that the web server passes the message to a standalone application that performs a
host-specific function (a database lookup, for example), and then returns HTML to the browser.
The HTTP protocol contains several other messages, including ones for PUTting and POSTing data. We don't need to consider those for our simple proxy, however, so in the interest of space I'll include references at the end of the article for further reading.
A basic example
With a proxy-based solution, the key to connecting an
embedded device to a browser lies in the ability to translate between HTTP and whatever language and media the target system supports.
To illustrate one way to do this, I have developed a very basic HTTP proxy (supplied in
proxy.c
, available at
www.embedded.com/code/2000code.htm). To use this code, you must enhance the included
parse_http_request()
function to decode an HTTP message in a manner most suited to your needs, and then use the information the message
contains to decide what to do next.
Listing 1: A simple "home page" for your product
int
parse_http_request(char *http_request, int connfd)
{
const char *header = "http/1.0 ok\ncontent-type: text/html \n\n<html>";
const char *footer = "<html>";
char *target_timestr;
write( connfd, header, strlen( header ));
target_timestr = proprietary_localtime();
write( connfd, target_timestr, strlen( target_timestr ));
write( connfd, footer, strlen( footer ));
return ;
}
For example, let's say that all you want to do is provide a simple "home page" for your product that shows calendar time at the target device. To do this, you don't need to look at the
arguments supplied with the HTTP request at all, because the response will be the same in all cases. Listing 1 shows how to do this, assuming you can use a function called proprietary_localtime() to get time information from the target.
To see this example in action, simply compile
proxy.c
, launch the resulting executable, and then supply the following URL to your browser:
http://localhost/
If your workstation already has a web server installed, try changing the definition of LISTENPORT in the example code to an unused port number (for example, 8000). Recompile, then connect using this
URL instead:
2
http://localhost:8000/
In any case, here is what the code in Listing 1 does:
- Provides an initial "okay" response to the client's
browser
- Calls the function that gets the local time from the target
- Builds an HTML page that contains the response, and
- Sends that response back to the client's browser
A more sophisticated example
Let's now suppose that we want the target's home page to contain a button that the user can click to get more information from the target. This requires more intelligence in
parse_http_request()
, because we have to:
- Send the client the home page with the button, and
- Determine which button the user pressed and respond accordingly
Listing 2: A more sophisticated page
typedef struct
{
char *method;
char *object;
} http_request_T;
int
parse_http_request(char *http_request, int connfd)
{
http_request.method = strtok( http_request, " " );
http_request.object = strtok( 0, " " );
if ( strcmp( http_request.object, "/query?press=value" ) ==
0 )
{
send_other_page( connfd );
}
else
{
send_home_page( connfd );
}
}
Code to demonstrate this is included
in
proxy.c
as well. The general idea is shown in Listing 2.
This code is doing essentially the same thing as the previous example, except that it is choosing which page to return based on whether the HTTP message says the user clicked on the button labeled value or not.
Hidden values and proxy simplification
The previous examples are straightforward, but they won't scale very well to
applications with more than a handful of pages. The reason is that
parse_http_request()
requires specific parsing code for each page, something that quickly becomes tedious and error-prone for anything beyond the simplest functionality.
When an application with many pages is desired, HTML's hidden values are the preferred way to manage the complexity without the drudgery of lots and lots of parsing code.
Listing 3: An HTML page with hidden values
typedef struct
{
char *method;
char *object;
} http_request_T;
int
parse_http_request(char *http_request, int connfd)
{
http_request.method = strtok( http_request, " " );
http_request.object = strtok( 0, " " );
if ( strcmp( http_request.object, "/query?press=value" ) ==
0 )
{
send_other_page( connfd );
}
else
{
send_home_page( connfd );
}
}
Listing 3 shows the source for an HTML page with two forms, each containing a unique hidden value and a single button. When the user clicks one of the buttons, the browser includes the associated hidden value in the HTTP message, which makes it a
convenient way for the proxy to determine what to do next.
When the user clicks on Help!, the browser sends state_id=1234 to the proxy. Likewise, when the user clicks Conclusions, the browser sends state_id=5678. The code in Listing 4 extracts the value of state_id from the HTTP message, and then looks up and invokes the state's associated function to generate the proper response. This code is included in proxy2.c (also available on
www.embedded.com/code/2000code.htm).
Listing 4: A state based implementation
const http_state_T http_states[] =
{
{ 1, home_state },
{ 1234, page_1234 },
{ 5678, page_5678 },
{ 0, 0 }
};
void
parse_http_request(int connfd, char *http_request_buf)
{
http_request_T http_request;
char *state_idstr;
int state_id;
int wstate;
/* make sure it's a "GET" message; if it isn't,
we don't know what to do with it right now */
http_request.method = strtok( http_request_buf, " " );
if( strcmp(
http_request.method, "GET" ) == 0 )
{ /* crack apart the rest of the request */
http_request.object = strtok( 0, " " );
http_request.protocol = strtok( 0, " \r\n" );
/* find the "state_id=" portion of the message */
state_idstr = strstr( http_request.object, "state_id=" );
if( state_idstr )
{
/* get the number that follows "state_id=" */
state_idstr = strchr( state_idstr, ý=' ) + 1;
sscanf( state_idstr, "%d", &state_id );
/* look it up */
for( wstate = 0; http_states[wstate].id; wstate++ )
{
if( http_states[wstate].id == state_id )
{
/* found it! invoke the state function */
http_states[wstate].state( connfd, http_request.object );
break;
}
}
if( http_states[wstate].id == 0 )
{
error_state( connfd );
}
}
/* there wasn't a "state_id=" in the message;
default to the home
page */
else home_state( connfd, 0 );
}
turn;
}
To add a new page to the application just add its associated state_id value and function to
http_states[], and then adjust the contents of the referring page to deliver this value to the proxy at the proper time (when the user clicks on a button, for example). In other words, you no longer need to modify parse_http_request() when a page is added.
The
http_states[]
table is a kind of "site map" for the entire application that parse_http_request() uses to move the client through pages in the appropriate order. From another
perspective,
http_states[]
is a state machine that drives the behavior of the proxy in response to user events encoded in
state_id
values.
Whatever your interpretation, it should be clear that a state-driven proxy architecture makes it far easier to manage applications with a lot of pages than anything else I've shown you so far.
But can't I do all of this with CGI?
Yes and no. The examples shown here
include portions of web server functionality that most CGI applications don't have, in particular the ability to receive HTTP requests from an IP port via
bind, accept
, and
read
. As such, our proxies can run on machines that lack a web server, which would be the case for most of your client's PCs.
On the other hand, a CGI-based approach makes sense when you need a proxy that can run on different kinds of hosts, or there is the possibility that the proxy will run on a host that is already
running a web server. In the event that you produce a CGI proxy but a web server isn't available, the example proxy in this article can serve as a minimal web server that forwards HTTP to the proxy via an exec or similar system call.
Disadvantages of proxies
HTTP proxies are a simple and powerful way to get a legacy product onto the Internet, but they do have their limitations. To begin with, the proxy must be
properly installed and running somewhere before communication with the target system is possible. In contrast, for targets with integrated Ethernet and HTTP/TCP/IP capabilities, the user only needs to plug in a cable and type in a URL.
A standalone proxy also does nothing to assure that the target interfaces it uses are properly maintained. A compiled-in HTTP server, in contrast, will likely produce compilation or link errors if a function it needs is accidentally removed from a new version of the
target's application.
Finally, successful proxies require some knowledge of the host's networking and other APIs, which may present problems for developers with no skills in this area. I consider this an item of minimal concern, however, given the number of excellent TCP/IP and other networking books available in the mainstream press today.
Flexibility and frugality
When you need to get a legacy system
talking to the Internet, a proxy is probably the best way to go about it. In addition to their simplicity, proxies offer flexibility and frugality that's tough to match using any other approach.
HTTP proxies are not difficult to implement, and they don't require modification of target software. As a result, the Internet appliances your customers want tomorrow could very well be the devices you are already building today.
Bill Gatliff is an independent consultant who specializes in
solving difficult embedded problems using open-source tools and techniques. He is also a member of the Embedded Systems Conference Advisory Panel and a frequent contributor to
Embedded Systems Programming
. Comments and questions are always welcome at
bgat@billgatliff.com.
References
1. The actual HTTP message is a bit longer than this
because it also includes information on the type of browser and operating system you are using and the identity of your machine. The GET is the essential text, however.
Back
2. In some cases, you'll need to use raw IP addresses, for example, http://127.0.0.1/ or http://127.0.0.1:8000/.
Back
Resources
Gundavaram, Shishir.
CGI Programming on the World Wide Web. This book is only available on-line now, at www.oreilly.com/openbook/cgi.
Guelich, Scott, Shishir Gundavaram, and Gunther, Birznieks.
CGI Programming with Perl, 2nd ed
. This book will be published by O'Reilly Associates in July.
Stevens, W. Richard.
Unix Network Programming
. Upper Saddle River, NJ: Prentice-Hall, 1997.
www.webmonkey.com/backend/protocols/
Just about everything on
www.w3.org, if you want the gory details.