Certainly the high-level intent of an application can be at least partially deduced by simply using the product. The user interface, system inputs and outputs, and general product behavior can be readily observed by any end user. Additionally, the architectures and instruction sets for commercial microprocessors are publicly available. Therefore, we can assume that someone who is familiar with your product and the microprocessor it uses already knows a good deal about your
application, whether or not they have access to the software. This person is familiar with the processor's addressing modes, on-chip peripherals, address space, and so forth. So even though real-time interactions and algorithms are "hidden" in the object code, they can be recovered to some degree (maybe to a large degree) by decompiling the executable object code, which is often stored in a read-only memory (ROM) chip. This information could then be used to your competitive disadvantage.
What follows is a
survey of possible techniques to discourage the would-be reverse engineer. Almost all of these techniques will only slow down the decompiling effort; they will not keep the reverse engineer from succeeding, given enough time, resources, and desire.
Methods of protection
Use a copyright notice in the binary file.
This is certainly not a technical protection schemeonly a legal one. But including an
ASCII copyright notice in the binary image at least notifies anyone reading the image that the software is legally protected and should not be tampered with. A copyright notice will not hinder an ethically challenged competitor or hacker, but it only takes up a handful of bytes in ROM, so little reason exists not to include it.
Use a checksum value over the whole application
. This method allows you to detect modifications to the executable image. You'll need a boot program, which stores the
correct checksum for your binary image. The boot program itself should reside in nonvolatile, protected memory, preferably inside of the microprocessor itself if it has on-board ROM of some type.
The boot program must also be able to calculate a checksum from the ROM image, and compare the result to the stored value. If the two values differ, the boot program inhibits execution of the application. This method offers no protection against reading and decompiling the code; it only protects against putting an
altered ROM in the product and having the product still function.
Note that neither the boot program nor the stored checksum should be part of the ROM image that you're trying to protect. If the boot program can be located and decompiled, then this method offers no protection at all. That's why locating the boot program within the microprocessor itself is a good ideathe program is much more difficult to retrieve that way. Many of today's microcontrollers offer some on-board ROM or flash which can be
used for this purpose. The checksum can be stored in EEPROM or flash.
Encrypt string data.
If your product has any sort of human user interface, the software likely includes text strings. Compilers typically store the string data in ASCII format, and by default place this string data into constant data sections in the binary image. Many binary file editors will display the ASCII interpretation of the raw hex data, so visually scanning the binary file and finding the address of each string is a simple
matter. Once you know the addresses, you can search the executable code to see where they're used and draw some conclusions about what each section of code does.
One protection technique, then, is to encode text strings so they don't show up as readable ASCII in the binary image. (The encoding takes place outside of your application software, so that only the encoded representations appear in the binary image.) On a character-by-character basis, each character in the string can be mathematically
manipulated in some way (for example, ANDing with a constant eight-bit "key") before being stored. The character must then be decoded before being sent to an output device. Or the encoding can take place on groups of characters.
In either case, you'll have to modify your language's output facilities, or perhaps write your own from scratch. In C, for instance, printf() has no inherent decryption capability, so you might choose to write your own version of printf() that knows how to turn an encoded string into
legitimate ASCII data at run time.
If you are going to use string encryption, it's good practice to store critical information that is required to decode strings in the microprocessor's internal memory. For example, if you AND each character with an eight-bit key, the key should be stored within the microprocessor, so that it does not appear in the binary image anywhere.
Write your own operating system.
If your application includes a commercial OS, certain "signatures" appear in the binary image.
The OS must include a start-up sequence, which is probably well documented by the OS vendor. Other run-time facilities will show up in the executable, and once these are found their interaction with your application code can be more quickly deciphered. And if the OS code is well designed and well coded, a decompiler program might have little trouble reconstructing the source code for OS functions.
If you use your own OS, a reverse engineer might need more time to figure out how your run-time environment
works. However, writing your own run-time environment may not always be an option. Project schedules or legacy designs may dictate that you use a commercial OS. A home-grown kernel may only be a viable option for small, lower-memory projects that don't require accelerated design cycles.
Scramble address lines through extra logic.
Another technique to hamper decompilation is to scramble address and/or data lines on your product's PC board(s). Instead of routing the address and data lines directly
from the microprocessor to the ROM, you can insert some extra logic in between. An EPLD or FPGA can be a good source for such logic. So the address put out by the microprocessor, for example, will not be the physical ROM address to which an access is made. The ROM image will appear nonstandard, since the addresses have been essentially encoded. This fact alone might tip off a reverse engineer that something has been done to protect the code.
This method has other drawbacks, though. If your product can be
instrumented with a logic analyzer, a reverse engineer can observe the "real" data being read by or written from the microprocessor. So while a decompiler may have a hard time with the ROM image, instruction sequences can still be determined at run time. In addition, you have the overhead of designing around the address/data line scrambling logic yourself, which may require you to have some type of special utility to create your ROM-able image, since linkers generally won't support this type of operation.
Replace library functions.
This method follows the same philosophy as using your own kernel. Since programming languages have standard libraries of functions, you could choose to not use these functions and instead write your own. This would make searching for common functions, such as printf(), more difficult. But this adds a significant overhead to your design cycle, since you have to reinvent the wheel.
Write lousy code.
This option may seem ridiculous, but in fact might slow down the
potential reverse engineer. The more convoluted a software design is, the more difficult it is to figure out, even at the source-code level (especially if it is also poorly documented). I'm certain there are a myriad of ways that code can be made to be "lousy," and I won't begin to offer suggestions for writing lousy code here.
I'm not really suggesting that this is a viable protection scheme, since it flies in the face of every software methodology written.