Tiny File System - Embedded.com

Tiny File System



To read original PDF of the print article, click here.

Embedded Systems Programming
Internet Appliance Design

Tiny File System

Sometimes when flash memory is present, you want to refer to a chunk of code or data by name, as though it is a file. This capability need not come with all the complexity of most commercial flash file systems.
By: Ed Sutter

If you've ever worked on an embedded system project that uses flash, then you've probably contemplated using a flash file system (FFS). Like the RTOS discussions a few years back, the FFS for current embedded systems is usually considered to be an unnecessary luxury or something you can write yourself. Also like the RTOS, an FFS certainly makes the embedded system easier to manage, but to do it right, writing one from scratch is likely to be more complicated than originally expected.

This article presents tiny file system (TFS), a package that allows the memory space allocated to flash to be treated as name space instead of address space. TFS attempts to solve some of the basic problems of interfacing to flash. It provides an API that is independent of the underlying flash device, yet does not inhibit the programmer from directly accessing the raw memory if necessary. It bridges the gap between the implementation that tries to go without using an FFS because it's not really necessary, and those applications that absolutely need a sophisticated FFS with wear-leveling, directory hierarchy, and DOS-file-format compatibility.

TFS is part of a generic embedded system boot platform, so other components exist (outside of TFS, but within the platform) to exploit some of the described functionality in a way that may not be useful for your specific application. TFS is a command, an API, and an integral part of an embedded system boot platform. As a command, it provides a user interface with functions like list, delete, create, display, copy, decompress/load/execute, and cleanup of files. As an API, it provides application code with functions like read(), write(), open(), close(), stat(), and seek(). Finally, as part of an embedded system boot platform, it provides the environment with the ability to automatically boot one or more application files and allows other portions of that boot platform to assume the existence of an FFS.

The boot platform

Before getting into the meat of the file system, I'd like to briefly describe the environment in which I designed TFS to run. This is not a requirement for TFS, but it is helpful for understanding some of the discussion that follows. TFS is part of a generic boot monitor platform for embedded systems called MicroMonitor.[1] The platform assumes the embedded system flash is broken up into two major chunks: the flash used by the boot monitor executable and the flash used by TFS. It is assumed that the boot monitor is deployed as part of the product. The actual application is a file in TFS that the boot monitor automatically starts up. TFS is a major part of the monitor. In fact, the monitor configures itself based on the content of a file in TFS (to retrieve IP address, console baud rate, and so on). The boot monitor contains other facilities that depend heavily on TFS, but they are beyond the scope of this discussion.

The TFS design criteria

The initial goal of TFS was to provide my firmware with the ability to treat system flash as name space instead of address space. This eliminates the need for each new application to deal with the flash in some unfamiliar or clumsy way. At the same time, I didn't want to eliminate the ability to access the flash as simple memory; hence, I needed an API to support the namespace model, but I wanted a hook into the raw flash to support the basic address/data model. Other goals in the design were to make TFS somewhat device- and RTOS-independent, and that it not require any system interrupts to run. The only restriction on the underlying flash is that its sector size be larger than the TFS header size (currently 76 bytes). The result is an FFS implementation that supports the needs of a high-level application that wants a file system model as well as a real-time application that needs quick memory access.

TFS is a linear file system that gives a typical embedded system project all of the file-system-like capabilities it will ever need. TFS does not support any sophisticated wear-leveling algorithm, it doesn't have a directory hierarchy, and it is not compatible with any other file system. I have yet to work on a project that was accessing the flash frequently enough to need wear leveling and as far as DOS compatibility, if the media is not removable, then there's no need for that anyway. The TFS implementation is independent of the RTOS used (doesn't need one) and it is easily hooked into an application.

The user interface

At the user interface (typically an RS-232 console, but optionally a UDP port), TFS is a command. All of the capabilities within TFS are made available as sub-commands under the TFS command. For example, to list the files currently in the flash, the command would be “tfs ls”, not just “ls”. The general syntax of the TFS command is:

tfs [options] {sub-command} [arguments dependent on sub-command]

TFS command options

  • -d {device prefix}
    Apply the command to the specified TFS device (assumes more than one flash device is covered by TFS space)

  • -f {flags}
    Flags (or attributes) applied to the TFS file

  • -i {info}
    Information field included with the file created

  • -m
    Enable “more”-style output throttling when displaying a file or list of files. This is an additive flag. Multiple -m options on the command line (for example, -mm) will increase the pagesize

  • -v
    Enable verbosity level (-v=lvl 1, -vv=lvl 2, -vvv=lvl 3)

TFS subcommands

  • add {name} {src_address} {size}
    Create the file 'name' to contain the data starting at location 'src_address' of size 'size'. Options -f and -i can be used to specify the flags and information field associated with the newly created file

  • cat {filename}

Dump the content of the specified file to the user. Use of the -m option will throttle the output in bursts of eight lines

  • check

Check the sanity of the files stored in TFS by running various tests (like a CRC32 on the data and header). If the -d option is specified, only check files in the specified device

  • clean[r]

Clean up (defragment) the file system to free-up flash space. If 'r' is appended, the system will automatically restart after the cleanup. If the -d option is present, only clean up the device specified

  • cp {name} {newfilename | hex address}

Copy the named file to the new named file. If the destination begins with '0x', then it is assumed to be a hex address pointing to RAM

  • freemem [varname]

Return (or store in 'varname') the amount of flash memory that is still available for use by TFS. If the -d option is specified, then list the memory that is available for that device only. Since there is per-file overhead, the value returned here is the amount of data space available if one additional file is stored in TFS. If additional files are to be stored, the user must take into account the TFS overhead

  • info {filename} {varname}

Load the shell variable 'varname' with the information field stored with the file 'filename'

  • init

Initialize the file system (remove all files and erase flash)

  • ld

Load an executable COFF, ELF, or a.out file from flash to RAM

  • ldv<

Verify that the executable COFF, ELF, or a.out file in flash space matches what was copied to RAM space

  • log [{on|off} {message}]

Turn on or off (or determine the current state of) the change-log facility. With no args, the current state is returned. If “on or off” is specified, then it must be followed by a message string that is appended to the change log as an information field to log the reason for the change in the logging state

  • ls [filter] [filter…]

List the current set of files in the file system. The -m option can be used to throttle the output of the listing. There are four different levels of verbosity for ls:

  • lvl 0: No verbosity, short list of all active files

  • lvl 1: Display “hidden” files (beginning with '.') in short format

  • lvl 2: Display active files in long format

  • lvl 3: Display active and deleted files in long format

Specifying the filter can limit the number of files listed:

  • *filter indicates a suffix match

  • filter* indicates a prefix match

  • filter indicates a full filename match

  • run {filename}

Execute the specified file based on the creation attributes (see below)

  • rm {filter} [filter …]

Remove the specified file(s). See the previous discussion on “ls” for details on the filter. Note that the file is not actually removed; it is simply marked as removed

  • size {filename} {varname}

Load the shell variable 'varname' with the size of the file 'filename'

  • stat

Display file system statistics

  • Trace {lvl}

Turn on one of three different levels of trace so that as an application uses the TFS API, output is generated (printed to console) indicating what is going on

The TFS API

This section provides a brief introduction to the API that TFS presents to the application code with which it interfaces. Complete details are omitted at this point, but the reader should get a good feel for the capabilities provided.

  • int tfsadd(char *name, char *info, char *flags, uchar *src, int size)

Add a file to the file storage space

  • int tfsinit(void)

Initialize the flash used by TFS

  • int tfsclose(int tfd, char *info)

Similar to standard close() on a TFS file

  • int tfsctrl(int rqst, long arg1, long arg2)

Perform various control functions on TFS

  • int tfsgetline(int tfd, char *buffer, int max)

Retrieve next line of an opened ASCII file in TFS

  • int tfsipmod(char *name, char *buffer, int offset, int size)

Do an in-place-modification on a TFS file

  • struct tfshdr *tfsnext(struct tfshdr *tfp)

Step through the list of file headers in TFS.

  • int tfsopen(char *filename, long flagmode, char *buffer)

Similar to standard open() on a TFS file

  • int tfsread(int tfd, char *buffer, int size)

Similar to standard read() on a TFS file

  • int tfsrun(char *arglist[], int

verbose)

Run some executable file in TFS

  • int tfsseek(int tfd, int offset, int whence)

Similar to standard seek() on a TFS file

  • int tfseof(int tfd)

Return EOF state of an opened file

  • int tfsfstat(char *filename, struct tfshdr *hdr)

Return the status of a file (1 if present, else -1) and populate a local structure with the current file

header

  • struct tfshdr *tfsstat(char *filename)

Return the status of a file (pointer to the file's header) in TFS

  • int tfsunlink(char *filename)

Remove a file from TFS

  • int tfswrite(int tfd, char *buffer, int size)<

Similar to standard write() on a TFS file

Implementation details

The following details assume a basic system: CPU, IO, RAM, and one flash device. Note that TFS can span across several different devices (where each device is considered a directory) and the devices need not be in contiguous memory space. For this discussion we will assume just one flash device.

The system flash memory has three defined sections: flash space used by the monitor code itself, the flash space used to store the files, and space dedicated to interruptible defragmentation. The flash that is used by TFS for file storage begins on a sector boundary and the “spare” sector is located immediately after the last sector in which files are stored. The spare sector must be at least as large as any other sector in the device and the sector prior to the spare is assumed to be of equal size.

TFS organizes the files within the flash in a contiguous one-way linked list. The initial portion of the file is a file header, which contains information about the file, pointer to the next file, and 32-bit CRCs of the header and data portion of the file. Maintaining unique CRC checks for header and data allows TFS to more accurately detect corruption. File size is limited only by the amount of flash allocated to TFS. There is no restriction with regard to sector boundaries.

When the system is first built, TFS must be initialized. This means that the flash space allocated to TFS must be erased. From that point on, as a file is created it is appended to the end of the linked list of files. If a file is deleted from the list, it is simply marked as deleted. At some point, after several files have been deleted, it becomes necessary to clean up the TFS flash space by running a defragmentation. This requires that a sector be dedicated to the defragmentation process and it also uses a small block of flash at the end of the TFS flash space for maintaining a non-volatile state that can be retrieved in the event of an interrupted defragmentation (power hit or reset).

Note that the spare sector cannot reside within the space used by TFS. It must be at the end because TFS assumes that all files are contiguous within the flash space it occupies. This, by the way, is a nice feature for extremely time-critical applications. A data file can be stored in flash, accessed by name to retrieve the starting point of the data, and from that point on, simple (and more efficient) memory accesses can be made to read data from the memory space. It can be assumed by the application that the file data is in contiguous memory space.

Figure 1

Flash space overhead required by TFS

Overlaying TFS onto a flash device is not free. TFS requires a certain amount of space; some of that space is fixed; other space is based on the number of files stored. Referring to Figure 1, four portions of overhead must be considered:

  • TFS header: this is a per-file overhead of 76 bytes. The value of 76 assumes that the sizes of the filename and info field are each set to 23 (+1 for NULL termination) characters

-Post-defragmentation header table: this is a table of file headers (plus defrag information) that is created at the end of the TFS space. One entry is created in this table for each “active” (non-deleted) file that exists at the time of a defragmentation. Each entry in the table is 100 bytes (assuming a header size of 76 bytes)

  • Defragmentation state table: this is a table of states (bit fields) used by defragmentation. This overhead is equal to four times the number of sectors that are covered by TFS plus 16

-The spare sector: this is a sector of the flash that is used during defragmentation to copy into. It must be at least as large as any other sector within TFS space

Use the following equation to compute the total overhead introduced by TFS:

overhead = (FTOT * ((HDRSIZE * 2) + 24)) + SPARESIZE + (SECTORCOUNT * 4) + 16

Where:

  • FTOT is the number of files to be stored

  • HDRSIZE is the size of a TFS file header (currently 76 bytes)

  • SPARESIZE is the size of the spare sector

  • SECTORCOUNT is the number of sectors allocated to TFS (not including spare)

Note that a file that is marked as deleted in TFS requires less overhead than a file that is “living.” This is because when a file is dead, there is no need to allocate a defragmentation header to that file. This means that removal of a file (even though it is not actually erasing the flash) still frees up some memory space for new file storage.

Figure 2

Defragmentation

As files are created within TFS they require memory space; this is fairly obvious. As files are deleted within TFS they still require memory space; maybe this isn't as obvious. The reason for this is that the underlying flash technology does not allow any random range of memory within the device to just be arbitrarily erased; hence, when a file is deleted, it is simply marked that way in the header and left in the flash. At some point, these deleted files can take up a significant amount of space in the device, so we need some way to occasionally clean up (or defragment) the flash space to make the space that is taken up by deleted files available for use again. Figure 2 illustrates the defragmentation process. The heavily shaded areas represent the file headers and the lightly shaded areas represent the active files. Files F1, F2, and F5 have been deleted and will be removed from the flash space as a result of the defragmentation.

You can perform the cleanup or defragmentation process in several different ways. Like most embedded system issues, the chosen technique depends largely on the environment in which the system will exist, the hardware that is available on board, and the trade-off between complexity and capability. Certainly, more complex ways of handling this exist, but usually, that added complexity makes the implementation of the entire file system much more complicated. The goal here was to keep the file storage simple and linear, guaranteeing that any one file can be assumed by the application to exist in contiguous space in the flash. The following discussion covers two different approaches. The first is quite simple, but has its flaws (which, depending on the system, may be acceptable); the second is more robust and, with a few added options, can be quite flash friendly.

The easiest solution is to concatenate each of the “non-deleted” files into RAM space, erase all the flash space allocated for TFS file storage, then copy the entire concatenation back into the flash. On the positive side, this is simple and there is no need to pre-allocate one of the flash sectors as a spare. On the negative side, it assumes that a block of RAM is available that is statically allocated for this job that is as big as the flash space allocated to TFS. Worse, if the system is reset or takes a power hit while the flash is being erased and prior to it having completed the copy-back, the file system will be corrupted, because the RAM is volatile and may have been corrupted.

A more practical approach is to provide a means of defragmentation that is robust enough to deal with the possibility of system reset during defragmentation. For this approach, no large block of RAM is needed, but one non-volatile block of memory (at least as large as the largest sector that TFS covers) must be pre-allocated to the defragmentation process. In TFS, this block of memory is the sector in the flash space immediately following the last sector used by TFS for file storage. Because this space is likely to be smaller than the amount of space needed to store copies of all “non-deleted” files, the defragmentation process gets quite a bit more complicated because it must now be done in chunks instead of all at once. The advantage of this is that the defragmentation process can be restarted at any time; hence, there is no corruption as a result of a power hit or reset. The one disadvantage of this technique is that the spare sector is hit hard. It is the spare sector that is likely to reach the technological limit (number of erase cycles) and begin to fail. The point at which this limit is reached depends on the device used, the number of sectors dedicated to TFS, and the rate at which files are deleted and recreated. One major improvement to this scheme (to enhance the overall lifetime of the flash) would be to use battery-backed RAM instead of the spare sector, but obviously this is a luxury that is unlikely to be approved for a budget-sensitive application.

Multiple storage devices

Some hardware designs may have more than one device that could be used for file storage. TFS supports this. A basic system has a boot monitor in the base of the flash; all remaining flash in that device is used by TFS and that's it. A more complicated system may contain battery-backed RAM, a boot flash device, a secondary storage flash device, and so on. TFS supports multiple devices that are not necessarily in contiguous address space. Each device appears to the user as a directory, so any file can be stored in any device (limited by the size of the device, of course), but a file cannot span across multiple non-contiguous devices. For each device, the same power-safe defragmentation method is used; hence, if battery-backed RAM was on-board, it could be used to reduce the problem of flash-life expectancy (see below) if there is a need to modify files at a high frequency.

To “steer” a file to a particular device, each device has a unique prefix that, when made part of the file name, tells TFS that the file is destined for that device. If the prefix is omitted from the filename, the default device is used for storage. Similarly, the file-system maintenance commands (tfs check, tfs clean, tfs freemen, and so on) can also be pointed to a particular device by specifying the device prefix.

Figure 3

File attributes

TFS supports file attributes. The attribute describes the file to TFS. It lets TFS know if the file tis to be autobooted, whether it is exectuable or just data, the user level of the file, and so on. It is part of the file header created when the file is added to the flash. An attribute in the file header is simply a bit setting. At the command line, each attribute is assigned a letter that is used to display (in a non-verbose mode) or create the files in TFS. Table 1 is a list of all the file attributes, including a brief description of each.

Table 1 TFS file attributes

Auto-bootable files

At some point in the monitor startup, the system looks to TFS for auto-bootable files. Three different types of autobootable files exist: two types established by the attributes assigned to the file and one special case, the monitor run-control (monrc) file.

For the monrc file to automatically run, it must exist and be marked as executable. It will be run prior to any other autobootable file in TFS. Actually, it is run prior to the monitor having completed its own initialization. This is done so that the execution of monrc can be used to configure the monitor as it starts up. Refer to the source code for further details on the monrc file.

The remaining autobootable files are run after the monitor has completed initialization. They will be run in alphabetical order, so the order in which they are placed in TFS doesn't matter. The order in which they are listed by “tfs ls” (alphabetically) is the order in which they will be executed, as shown in Table 2.

Table 2 Output of “tfs ls” command

In Table 2, the order of autoboot execution would be monrc, boot_diag, and ias_app. All other files listed are simply data files used by the application. The two autobootable attributes supported are “B” and “b,” both of which will run at startup; but the “B” type will query the user at the console port, providing an opportunity to abort the autoboot of that file. Both scripts and binary executables can be configured with autoboot flags enabled.

What is an “in-place-modifiable” file?

Typically, when a file is modified, the original file is marked as deleted, and the new version of the file is appended to the end of the list of files currently stored in TFS. This can involve a relatively large amount of overhead if the modification to be made is trivial. As an alternative, a file can be created as an “in-place-modifiable” file, which means that the API provides a means by which a file can be modified without the typical deletion/re-creation step. Creating the file as in-place-modifiable and specifying the file to be of some size does this. The space is then allocated in TFS for this file, but the flash is all left in an erased state. This usually means that the bytes in the flash are all 0xff (usually, bits in flash can be cleared on a bit-by-bit basis, but to reset them, an entire sector must be erased). All subsequent writes to this file, then, are done directly to the currently allocated flash instead of to a new block of flash. Obviously this puts some responsibility back on the programmer, but it can potentially save quite a bit of overhead if necessary.

User levels

The monitor supports the concept of user levels. At any given time, there is an active user level. TFS supports the ability to store a file at a particular user level, then limit access of that file based on the user level at the time of the access. The access can be limited to read-only or not even readable. This means that certain files (and executables) can be configured to be accessible only at certain user levels. Since each user level is attainable only via password, a system can be built at user level 3, then lowered to user level 2, 1, or 0 and provide a certain degree of protection from unauthorized access.

File decompression

A typical application file will be COFF, ELF, or a.out. Each of these executable file formats has multiple sections of text and data that must be transferred out of the flash space in which the file resides, and into some RAM space in which the application was built to run. These files are inflated by decompressing each of the sections within the file from flash, directly into the RAM space for which the section is destined. This “section-at-a-time” decompression eliminates the need to decompress the entire file into some block of memory, then load from that block into the actual memory space for which the image was built.

This mechanism requires some post processing of the final application file built on a host using the ELF, COFF, and a.out support tools. Two different decompression methods are supported: Huffman and zlib. Each method has its own set of advantages and disadvantages. Using the Huffman decompressor requires just a small extension to the monitor footprint (3KB to 5KB) and no additional memory for malloc, but typical compression is only about 15% to 20%. Using the public domain zlib utilities requires a larger extension to the monitor footprint (35KB to 40KB) and additional heap space (see appnote on heap space expansion for file decompression) for the monitor's malloc, but compression with zlib can result in a 70% reduction in space needed for file storage.

Time of day

The basic model of the monitor is to run without the need of any interrupts from the host processor, so how is it that TFS can keep track of time of day? Actually, it doesn't. It depends on the application code to provide it with two functions that will support this: getLtime() and getAtime(). The first one, (long)getLtime(void), must return a long that is stored in the header of the TFS file when it is created. The second one, (char *)getAtime((long *)tval, (char *)buf, (int)buflen), can be used to simply return an ASCII string representing the current time (if tval is 0) or it can return an ASCII string representing the value stored in tval. The value in tval will typically be the value that was previously returned from getLtime(). With this interface, TFS really doesn't have a clue about time-of-day, but it uses the capabilities given to it by the application to make it look like it does.

Note that this is a feature used by TFS to populate an entry in the header of the file being written at the time. If the two above functions are not supplied to TFS, then the header entry is left blank, and the file simply has no recollection of its time of creation.

Change log

TFS supports the ability to keep track of all modifications made in the file system. This is done by logging an action (add, delete, or in-place-modify) and a filename to a file in TFS. By default this is not enabled, but can be enabled/disabled at any time using the tfs log command. The command will only run at the MAX user level and the file is created at the maximum user level. If the tfsctrl(TFS_TIMEFUNCS) command has been called to establish time-of-day functions in TFS, then the log will also reflect the time at which the change was made. Following is an example of the content of the TFS change-log (.tfschlog) file:

  • ON: startup

  • ADD: cardtilt.gif

  • ADD: construction.gif

  • ADD: lucentlogo.gif <

  • ADD: .httpadmin

  • ADD: .dhcpsadmin

  • DEL: cardtilt.gif

  • ADD: cardtilt.gif

  • DEL: construction.gif

  • ADD: construction.gif

  • EL: lucentlogo.gif

  • ADD: lucentlogo.gif

  • Flash life expectancy

It is important to be aware of the fact that the underlying technology (flash) has a limited number of erase cycles. Current flash devices typically support 100,000 to one million erases per sector. Applications will use TFS in different ways, so it is impossible to draw any conclusions here with regard to how long the flash will last in a

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.