Parlez-vous Francais?

February 26, 2001

MurphysLaw-February 26, 2001

Parlez-vous Francais?


Most of us don't speak very many human languages. But sometimes the software we write needs to.

Everyone has their favorite mistranslation. Mine comes from the world of lung ventilators, in which the phrase pressure support means applying more air pressure. One Spanish translator converted it to pressure brassiere. While this translation might be uplifting, mistakes such as these must be caught before the product is released. If you are selling into a certain country, your business will most likely have access to at least one native speaker who is familiar enough with your product family to pass judgement on translations. However, if the product features a graphical or text-based interface, the number of possible strings could easily run into the hundreds. The local expert may not have the time or tools to perform the complete translation.

Frequently, a professional translation company performs the translation and the local expert approves it, based on his knowledge of the product and the local market. The professional translators are often the same people who will translate your user manuals. This is as it should be, because consistency of terminology between the product and the manual is important. You might think such consistency is easily achievable, but when you start managing a dozen languages and a few versions of the software, the headaches accumulate quickly.

Often, the local expert reviews the final product to ensure that the professional translator used appropriate terminology in the product-specific terms. So if we want to translate into 10 languages, we have 20 people to organize: 10 translators and 10 local experts. By definition it is unlikely that any of these people will ever meet, so someone has the task of trying to keep track of them all. As the engineer responsible for putting the final version of the translations into the product, I have performed this coordination task more than once, and it involves numerous personnel and technical issues.

String management

Before proceeding, let's define a couple of terms. The code file is a source-code file (often a set of C string declarations) containing the strings in a particular language and is usually not suitable for distribution to the translators. The translation file is a file that contains the English strings and a column for the strings in the target language. (For the purposes of this column, I assume that the original version is being developed in English.) When the translation file is created, the other-language column is empty. The objective is to have a translator fill that column; the new string can then be reinserted into your source code. I usually use a spreadsheet for the translation file, which facilitates columns for English, the foreign language, comments, and sometimes flags indicating special properties such as fonts. Spreadsheets also make it easy to calculate the length of strings, which is often an important consideration.

Some graphical toolkits include features to facilitate translation, and if you are using such a product, many of the features discussed here will be looked after for you. More likely, you will have to craft your own solution. For one possible solution to the string management problem see Nigel Jones' article.

My first experience in translation involved using an HD44780-based LCD.[1] On another project, we used a graphics display, which allowed more flexibility, because we could control the font as well. But the graphics display created more challenges, because the amount of space available for each string was no longer constant. We also implemented a Kanji character set for the Japanese market, which meant that double-byte characters had to be used.

You need a clear process by which to move pieces of text from code to a translator and back again, at which point the translations need to be re-inserted into the code. If your process requires the engineer to type in any text from a foreign language, it is likely to be full of errors. If any portion of the conversion from source file to translation file is done manually, you will have to consider what will happen if you have to do it all again. Ah, but under what circumstance would you have to do it again? Obviously, you would need to if the product were updated with new strings. But other reasons exist. The first pass of the translation cycle is with professional translators. The second pass is with the local expert. The ideal way to present the product to the local expert is to use the initial translations to produce a prototype product so that he may view the strings in their proper contexts. When the local expert decides he wants a change he can edit the translation file, but that edit should be based on observing the product, not just on reading the translation file. Sometimes, the text gets shuffled between the professional translator and the local expert a number of times before it is finalized.

Finding a good translation service is important. Such companies are usually experienced in translating user manuals and PC-based applications or web pages. The notion of translating strings that are going to appear on the type of limited user interface used in many embedded products is novel to them. Restrictions on the length of strings, or restrictions on the character set have to be explained. Things get even more difficult if the limit on the length of the string varies. If I have a 2 x 20 character display (the sort typically driven by an HD44780) then I may be able to tell the translator to keep all strings at 20 characters or less. If I switch to a small format graphics screen and I use a proportional font, the number of characters I can fit will depend on the characters themselves-"ill" will be narrower than "wow," just as it is in this magazine.

Explaining the context in which certain strings will be used also helps, but sometimes this requires a product demonstration, and providing one for someone on the other side of the world is rarely a trivial matter.

In addition, many translation companies discourage communication between the client and the translator. They fear that protracted e-mails or phone calls between the translator and the client will lead to the job taking up more time than was budgeted. This means that any special instructions for your job need to be in writing so that the translation company account manager can pass them on to the translator. Any queries from the translator back to you will also come through the translation company representative. More than likely they will not even reveal the name of the person doing the translation! Since the process and the product are likely to be complex, I usually apply the rule that any translator that does not send back any questions about the job probably made a wild guess whenever he was confused, and the local expert is going to have to pick up the pieces during the final review.

Some of the preceding problems can be alleviated if your translation process has the following properties:

  • Each language is managed independently.
  • All source and translation files contain version numbers.
  • Each source and translation file file contains the original English string.
  • The conversion from source to translation file and back again handles any character encoding issues your product might have.
  • Some manual or automatic means is available to step through enough states of the product that every string can be observed in its appropriate context.
  • If possible, a PC-based prototype of the product is used to review translations.

Let's look at each of these properties in turn and discuss why they are important.

Language independence

We want to manage each translation independently because the different people involved will not be available at the same time. Some will receive their piece of work and return it within hours; others will take weeks. It is likely that the translators and reviewers report into a different part of the company tree (or another company's tree) and it will be hard to pressure them to give this translation work priority over their other duties. Another reason the timelines will vary widely is that you may not choose to do all of your translations at once. Some languages represent bigger slices of the market and smaller segments may not follow until several months later.

Version numbers

Your translation process should use version numbers. If an update to the software leads to a few English strings being changed, you will have to update the translation file. The translation file might be mailed back to you a few weeks later, by which time a few other strings might have been updated. Now you might be confused as to which version you last sent to the translator. Or worse, the translator may have updated and sent an older version of the translation file by mistake. It is unlikely that the various translators use the same source code control system as the development team, so make sure that the version number appears in the source file and the translation file, and make sure that number is updated for each new set of changes.

Ideally you would not be working on translations while the product is in such a state of flux, but if you wait until the product is completely frozen before doing any translation, you may miss a window for certain markets. Also remember that it is often later in the life of the product that many people within the organization see it for the first time. A senior marketing person or an influential customer may see the prototype and say "I love it, but can you just change that string." They are unlikely to ask you to change the control constants used in your PID loop, but everyone will have an opinion on the text that appears on the front panel.

English everywhere

The original string should appear in the translation file and also in the source file. I have had translators take a translation file and replace the English, probably because this is exactly what would have been the correct behavior if I had given them a web page for translation. For this reason, I often lock the English column so that it can't be edited. When you are troubleshooting the translation file, it will be a lot easier if the English is right in front of you rather than having to check the equivalent position in a separate file, especially if the English file is now at a newer revision than the translation file.

When the translation file is converted to the code file for a specific language, I believe that the English should follow it. I usually insert the English string as a comment at the end of the line where the foreign language string is defined. If you have to search for a string within the final source, it is usually easier to search for the English string, especially if you do not know how to type some of the characters in the foreign language.

Character encoding issues

This brings us to the next important issue-character sets. You have to consider the character set in three places: in the application that edits the translation file (typically a spreadsheet), in the application that edits the source file (typically your code editor), and on the product itself. For most European languages, the Latin alphabet covers the language, but unfortunately, no single standard mapping contains all of the Latin alphabet within eight bits. The Latin mappings defined by ISO standards each cover some subset of the total set of European languages.[2] Ideally, you will be able to find one variation that covers the combination of languages with which you are concerned. Otherwise, you will have to change character encoding for some languages.

It is worth pointing out that font and character encoding differ in a significant way. The character encoding dictates which numerical values correspond to characters of a particular alphabet, while the font decides what those characters will look like. The implementation of each font must use some character mapping to order its characters. In English, ASCII defines the character mapping normally used, but life gets more difficult when other languages are involved. While your translators are working on a spreadsheet file, changing font may cause the letters to be represented differently if the new font uses a different encoding. Unfortunately, I have not found a way to determine the encoding used for a True Type font, apart from comparing the full set to a table of standard encodings.2

If the spreadsheet and code editor use the same fonts, characters will appear consistently in both. Ideally, your embedded program will use the same encoding and no characters will need to be converted. If you have control of the source of the font you are using for a graphics-based display, you could change the font order to ensure that it conforms to the font used by your editor. In other cases you may have less flexibility. On an HD44780-based controller, the character encoding will be fixed. Some manufacturers give you a small number of choices, and may agree to implement a font to your specification if the volume is high enough. If you are not dealing with a volume product, you have to be a bit more resourceful. The HD44780 allows you to custom define the first eight characters. This may corner you into a situation in which the value 3 represents "ö," while in the Latin 1 character set that symbol should be the value 0xF6. So in your code you will want to represent the word "iöu" as i\x03u. It is the job of the program that converts the translation file into the source file to map the character in this way. It is this sort of mapping change that makes it difficult to come up with a single tool to meet the needs of all embedded systems, or to coerce a tool intended for PC applications into solving our embedded development problems.

Once we move to a language like Russian, which uses the Cyrillic alphabet, we represent many characters as hex values because none of them are likely to be readable in our source-code editor. This is where the advantage of the English strings in the comments becomes even more apparent. The source code for a given string will be a series of numbers which is not recognizable when compared to the translation file, because the translation file contains the readable text-assuming you read Russian.

It is vital that all character mapping issues are hidden from the translators. If you expect the translator to insert even a single character as a hexadecimal value then you will get into trouble. It is not just characters of the alphabet you have to consider here. I am typing this article in a font that does not have the Euro currency symbol, but many products currently under development will have to represent it on their user interface.

On the HD44780, I have managed to implement a Russian translation by restricting it to all capitals, then defining the eight most important characters in the user-definable area. While this provides a workable solution, the final mapping does not bear direct relationship to any standard encoding of Cyrillic. Again, the mapping has to be handled by the script that converts the translation file to a source file.

Prototyping and reviewing

Once the translations have been performed and inserted into the source code, it is important to present them to the reviewer in a form that is close to their final context. If the entire user interface can be implemented as a PC simulation, this step gets a lot easier. You may not be able to ship prototype hardware to 10 different countries for review, but a .EXE file can be e-mailed at no cost. I will be discussing PC prototypes at length in future columns. It is not a trivial task, but translation is one of the areas where such a prototype pays for itself many times over.

Before you send the product to the local expert for review, you should review it yourself for any obvious mistakes. You may not understand the language but many mistakes can still be spotted. The alignment between certain strings on the display may be corrupted. If there is a problem with the font, the appearance of certain characters on the display will not match the text in the translation file. On a graphics display, strings may overlap each other, or they may overlap graphical elements on the display. String length is often the biggest issue that needs to be checked.

In the case of a text controller such as the HD44780, it is possible to tell the reviewers that all strings have a fixed maximum length. Calculating those lengths on a graphical display may be difficult. Proportional fonts, positions of line breaks, and the lengths of nearby text may all affect the ideal length. In that situation I do not give the translator a guideline length at all. I wait until they have done a first pass, and then, using the PC prototype, identify the strings that do not fit. I then request that those strings be retranslated at the required length. I find this is less work than trying to calculate the ideal length for every single string in advance.

For the engineer to review lengths and for the local expert to review the strings in context, a procedure that allows the reviewer to view every string should be in place. One way is to create a checklist that tells the reviewer which button to press to get to a certain state. At each state the reviewer will see some new strings. On a PC prototype, you can make it easier by programming the sequence into a test version of the program. In this test version, every time the user presses Return they are presented with another screen. If all screens are programmed into the sequence, the reviewer can view all strings without having to fully understand the sequence of keys to get from one state to another.

Ideally, the reviewer should tick off each string on a copy of the translation file as they see it on the product. This marked list is your evidence that the reviewer actually read each string. It is far too easy for a busy regional sales manager to glance at the prototype and nod his approval. Any strings that do not get careful attention at this point will be expensive to fix later. It is also worth mentioning that this is one of the few areas in which product quality depends on someone outside of the product development team. No amount of in-house testing will discover whether the product contains incorrect statements in some languages.

And finally

The translation files remain useful after the product has been translated. Alphabetical listings are helpful when troubleshooting. From the translations file, produce a file where the English strings are in alphabetical order, and a separate listing that is alphabetical by foreign language string. These prove useful when service technicians or factory staff have to deal with a machine that is reporting an error message in a foreign language.

This column has asked more questions than it has answered, so next month we will discuss the translation to source conversion scripts in more detail. We will also discuss the thorny issue of double-byte character sets, which are necessary to implement Japanese and other Asian languages. The most important lesson from this month's piece is that the people issues are at least as tricky as the technical issues, and you have to hide enough of the engineering details that the translators can get on with their job without having to be explicitly aware of the character mappings that you have chosen to implement.

Niall Murphy has been writing software for user interfaces and medical systems for 10 years. He is the author of Front Panel: Designing Software for Embedded User Interfaces. Murphy's training and consulting business is based in Galway, Ireland. He welcomes feedback and can be reached at nmurphy@panelsoft.com. Reader feedback to this column can be found at www.panelsoft.com/murphyslaw.

References

  1. The HD44780 FAQ is available online at www.teleport.com/~raybutts/lcd-faq.htm.
    Back

  2. The ISO 8859 Alphabet Soup, by Roman Czyborra is available online at http://czyborra.com/charsets/iso8859.html.
    Back

Return to Internet Appliance Design Home Page

Return to February 2001 ESP Index

Loading comments...