Video technology is playing an increasing role in a number of real-time systems. While many early systems were tied in to standards developed for the broadcast industry, video is now moving out on its own. One of the areas where this is most evident is in machine vision, where systems today use frame sizes and frame rates totally distinct from common broadcast industry standards.
The video camera represents the interface electronics between the optical imaging system and the viewing or image analysis system, which in machine vision applications, is the computer.
In industrial applications, there is a wide variety of problems to be solved, depending on the type and size of object under study, whether the object is moving or stationary, the type and size of measurement to be made or defect to be detected, and so on. Hence the vision industry is populated with a wide variety of lighting, optics, cameras, and computer interface options.
The photosensitive devices used inside video cameras can be solid state or non-solid state technology. Non-solid state devices are the classical vacuum tube-based photosensors such as vidicons and image orthicons found in older cameras. These have gradually been replaced by solid state devices which are continually improving in resolution, uniformity (pixel to pixel variations), and cost. Solid state devices, as their name implies, are based on silicon technology and offer many advantages, including smaller size and higher reliability over their tube counterparts. Solid-state photosensor devices include photodiodes, phototransistors, charge injection devices (CIDs), and charge-coupled devices (CCDs). CCDs have advantages today in terms of both cost and performance, and are now almost the exclusive technology used in machine vision cameras.
CCD-based cameras come in two basic formats, area arrays and linear arrays. Each comes in different sizes to meet different resolution needs. Some common area cameras are 256×256, 512×512, and 1kx1k. Linear cameras typically range from 512×1 to 4kx1. When scenes are not linear or orthogonal, special camera formats may be available to meet special applications, such as circular line cameras.
The CCD photosensor is a sampled device. The CCD itself is composed of discrete photosites or pixels that accumulate (analog) electrical charges based on the quantity of light hitting each one. For area cameras, each photosite is scanned out of the CCD device in a pixel-by-pixel, line-by-line sequence, thereby creating an analog video signal. Typically the analog voltage levels are sequenced in accordance to the RS-170 or CCIR video standard and the appropriate synchronization signals are added by other pieces of the camera's electronics. The result is a standard video signal that is compatible with other standardized video devices.
For linear-array-based cameras, known as line-scan cameras, the 2-D image is produced a line at a time, typically by moving objects past a stationary camera, but occasionally by moving the camera over stationary objects. Line-scan cameras generally produce nonstandard video signals.
Video signals have long adhered to standards, a necessity in the world of broadcast TV, where content providers and broadcasting companies needed the support of multiple, independent video equipment vendors and the availability of consumer-level plug-and-play capabilities. With standards, cameras, monitors, recorders, and signal generators can all handle the same signals. Standards specify the specific scan-rate timing, the number and order of lines in an image frame, the image aspect ratio, synchronization signals that indicate the beginning of each line and each frame, color signal encoding, if any, and image brightness and color signal voltage levels. Hence, until recently, most cameras available in the market adhered to one of the video standards. Today, a wide variety of non-standard cameras are also available to meet various applications needs, but the low-cost segment of the market is dominated by standards-compliant units. While numerous video signal standards have evolved, the most common are those that have been adopted as national standards for commercial broadcast television use.
In the United States, RS-170, produced by the Electronic Industries Association, embodies the technical specifications that were originally defined in the late 1930s in order to standardize the black and white TV industry. RS-170 defines an aspect ration of 4:3, a 2:1 interlaced scan technique, and horizontal and vertical synch pulses. An entire RS-170 frame is made up of 525 lines; each frame is sequenced out every 33.33 milliseconds (ms). Hence each field contains 262.5 lines sequenced every 16.67 ms, for a line time of 63.49 s. This divides out into a line frequency of 15.75 kHz, a commonly referenced RS-170 line rate. The 262.5 lines/field however are not all for video information. The vertical sync interval and settling period chew up 20 line times, leaving 242.5 for image. Similarly the horizontal sync process uses up 10.9 s, leaving 52.59 s of active line time, which determines the sampling rate needed to achieve any give number of pixels per line. RS-170 also specifies electrical voltages. The overall range is a 1V swing from -0.286V to +.714V. The zero voltage level is the blanking level. Sync pulses go from 0V to -0.286V.
With 242.5 active lines per field or 485 active lines per frame and a specified 4:3 aspect ratio, RS-170 yields 646.66 square pixels per line. (Square pixels are easier to deal with in machine vision applications, but not absolutely a requirement.) The actual computer-based implementation is typically 640×480. The brightness level of any pixel is represented with a number which reflects the resolution of the A/D converter, which is not part of the RS-170 specification. Eight bits is typical, though 12, 16, and 32-bits are also used in specialty applications.
The RS-170 extension has several extensions. RS-343A defines video signals of higher resolution containing between 675 and 1023 lines per image frame. This is based on the RS-170 specification with modified timing waveforms. Similarly, RS-330 defines additional video signal electrical performance characteristics for the RS-170 signal.
In the 1950s, the National Television Systems Committee adapted a color standard widely known as NTSC, also known as RS-170A, since it is a modification or superset of the RS-170 standard. NTSC modifies the RS-170 standard to work with color video by adding color information to the existing monochrome brightness signal. The NTSC signal is a composite color signal because it is created by combining color and brightness information on a single signal. The alternative is to separate the components onto separate signals (R-G-B for example). Color signals representing hue and saturation are combined using phase and amplitude modulation techniques into a single chrominance signal. The chrominance signal is added to the RS-170 brightness signal (luminance), together with a color reference signal called color burst at the start of each line.
The NTSC system allows the coexistence of monochrome and color television, an important constraint at the time it was introduced. Other schemes being considered at the time would have required broadcasters to broadcast two signals, one for monochrome, one for color. Today, NTSC is often derided as a kludge. For many machine vision applications, NTSC is the low price-low performance configuration.
The CCIR video standard is the European equivalent of RS-170. CCIR specifies a 625-line image with a frame rate of 40 ms, a 2:1 interlaced scan, and a 4:3 aspect ratio. The 50 frame per second, like the 60 frame per second number in RS-170, matches the frequency rate of the electrical power system used in the different countries. The CCIR standard was also adopted for color. This is known as PAL, phase alternation line. However, France and a few other countries use a third color standard called SECAM.
CCIR-601 is a standard for the digital encoding of component color TV. It uses a 4:2:2 sampling scheme for Y, U, and V with luminance (Y) sampled at 13.5 MHz and chrominance (U and V) sampled at 6.75 MHz. These frequencies work for both 525/60 NTSC and 625/50 SECAM and PAL systems. CCIR-601 specifies that 720 pixels be displayed on each line of video. CCIR-601 is a digital video standard, different from the other standards discussed above, which are analog. It also deals with component signals rather than composite signals.
While component systems carry the R-G-B color information on separate signals, and composite signals carry all the information on one signal, an intermediate standard has evolved. The Y/C component color standard conveys the color video signal as a luminance (Y) signal identical to the standard RS-170 monochrome video signal and a chrominance (C) signal identical to the chrominance subcarrier defined in the NTSC standard. However, by using separate signals, a higher quality level is achieved. Y/C video is also known as S-Video, super-video, and S-VHS.
Some specialized nonstandard video formats have also emerged over the years. These are especially relevant to the vision industry. Some of these nonstandard formats use synchronization timing established by the RS-170 or CCIR standards. For example, some digital video cameras conform to RS-170 timing, but transfer their image as a digital data stream rather than as an analog signal. The video signal is typically of superior quality.
RS-170 was optimized for the human perceptual system and the technology available to the broadcast TV industry decades ago. Interlaced video reduces flicker for the human eye and 30 frames per second eliminates many noise problems associated with 60 Hz power supplies. The 4:3 aspect ratio makes for a pleasing TV image. But for machine vision applications like metrology and inspection, where a computer and not a human eye is the image recipient, these specifications make little sense. For example, with interlaced lines, the computer has to reorder the data to make a sensible image while in the human eye this is done automatically (through persistence). And the non-square aspect ratio typically results in non-square pixels. This complicates calculations, since 4 pixels in the x direction would represent a distance different from 4 pixels in the y direction. And calculating the length of a line at an arbitrary angle is more complicated still. For machine vision, square pixels are a great advantage. Finally, for applications where motion is involved, being locked in to the 30 frames or 60 fields per second specified in RS-170 has no logical basis and is a distinct disadvantage.
Hence, the standard for vision cameras today is the “nonstandard” variable scan camera. With variable scan there are no fixed restrictions on the organization of the pixels or the timing of the video. Rather, these are user defined and application dependent. Variable scan cameras provide a level of flexibility not afforded by RS-170 cameras; in this way, the data does not consist of a significant number of unwanted pixels nor does it require complicated lighting and exposure solutions. Progressive scanning allows for full frame resolution when images of moving objects are grabbed, and images can be processed as they are acquired, rather than having to wait until the entire image is available.
Almost all machine vision cameras today are based on CCD sensor technology. The CCDs in turn can be either array (2-D) or line scan (1-D) designs. A variation on line scan cameras is TDI (time delay integration), which does multistage integration for enhanced sensitivity.
Following are some factors to consider when evaluating a camera technology for a machine vision application.
- Area Cameras Versus Line-Scan Cameras
Both architectures are readily available. Area cameras are essentially made up of many line scan sensors stacked to form a 2-D matrix. Linear sensors are easier to fabricate to very tight tolerances than are matrix sensors and for applications where absolute uniformity is a concern, line scan CCDs have an advantage. In general, line scan cameras offer higher resolution and speed than is possible with matrix cameras.
Another major architectural concern is the use of multiple output arrays which provide maximum access (see speed discussion below) to the pixels but increase the processing complexity. Line-scan camera are available with linear output, in which case all pixels are read out from a single serial CCD; bilinear output, which divides the image into even and odd pixels; and multi-tapped output.
Area-scan cameras can be based on interline transfer, full frame, and frame transfer architectures. With interline, each column of pixels has a transfer gate and vertical shift register to transfer the charge to the horizontal CCD for read out. This format provides excellent image smear characteristics (and fast shutter times) without an external shutter. The full frame architecture utilizes the pixels both to collect charge and to shift charge to the horizontal CCD for read out. Because of the dual function, an external shutter is required to block incident light during the transfer period. Frame transfer is similar to full frame with the addition of a light shielded frame storage region. While one set of pixels is active imaging, the other set is busy transferring the previous frame, hence this format requires twice the silicon area of a full frame device for a given pixel size. Several protected pixel elements never see light and are used for dark field calibration.
Matrix cameras typically offer 512 x 512 resolution, though several vendors now offer 1K x 1K resolution. Line scan cameras typically have 512 or 1K pixels per line, but can go as high as 4K or even 8K in some high end applications. Resolution in the other dimension is a function of how often the lines can be grabbed and how fast the object is moving past the camera lens. Practical resolution of course is a function of the number of pixels and the size of the object imaged.
Another measure of resolution is gray scale depth or tonal resolution, rather than spatial resolution. Most machine vision systems work with 8-bit grays (256 shades), but some may work with 1-bit data (black and white only). Other applications in medicine, biology, and astronomy may require special cameras with far higher tonal resolution.
- Pixel Aperture
This is the ratio of the length of the two sides of a pixel. Square pixels have an aperture of 1:1 and are greatly favored for machine vision applications.
Non-standard line scan (and area array) cameras can operate at high speeds. Frame rates for RS-170 are fixed at 30 per second; line scans cameras have variable scan rates and can produce thousands of lines per second, each one of which is available for processing (in the order in which is received) almost immediately. This is in direct contrast to the RS-170 situation, where interlacing makes line ordering more difficult. With RS-170, the computer typically waits until the entire image is available before it begins any processing. It is common to align a line scan camera with the pixels perpendicular to the direction of motion for applications where object motion is present. The integration time can then be adjusted to suit the application.
Speed generally refers to how quickly the machine can get at a given pixel in order to make some calculation or decision. Typically, all the pixels have to be read out before any pixel value can be accessed (some cameras now have ways around this). For any given pixel output rate (say 40 million pixels per second), speed then increases as the number of pixels in an image decreases. Pixel output rate is one of the fundamental bottlenecks in machine vision. If the camera can deliver only 5 megapixels per second (or if the computer interface can handle only 5 megapixels per second), then a superfast imaging bus and imaging processor will be starved for data. Parallel and multi-tap camera outputs are now available to provide higher pixel output rates.
If the camera is being used as a measurement tool, then the geometric repeatability will be of concern, as it would be for any measurement device.
CCD cameras can (electronically) make short term exposures without mechanical shutter devices. Short exposures may require brighter lighting in order to be able to capture an image. Most cameras have asynchronous shuttering, the ability to integrate or shutter at any time based on an external signal. In essence, a trigger pulse resets the vertical sync. With RS-170 compatible devices, shuttering often means obtaining data from only one field, reducing the vertical resolution by a factor of two. This problem is not present with progressive scan cameras. Mechanical shutters are not reliable enough for industrial camera use. An electronic shutter allows a user to select the exact time to expose.
- Rugged Mechanical Design
Shake and bake industrial applications may involve hot, wet, oily, dirty environments. The camera needs to survive in all of this. Size may also be of importance, not only for reliability, but also because a camera may have to fit into some tight spaces, like inside another machine. Video cameras have moved far in the miniaturization realm in the past few years. The availability of standard lenses and optics for the camera may also be of concern, though most vendors are moving towards standard optical interfaces.
- Rugged Electronic Design
Noise is the enemy of all test and measurement devices and of all sensors. Camera electronics should be noise immune. New designs also have some measure of antiblooming capabilities which prevents bright spots on one array element from corrupting the output of an adjacent element. An electronic iris can account for overall brightness by sensing when to increase or decrease exposure time.
- Time Delay Integration (TDI)
A technique designed-in to some cameras. For example a 1024×64 line scan array would take 64 snapshots with a set time delay between each snapshot and then read out the line as a single image (with a light gain of 64). This “averaging” process greatly improves the signal-to-noise ratio without causing the blurring problems that would occur with large integration times and moving objects. TDI is useful for creating images for high speed processes with low light levels.
- Dynamic Range
All CCD sensors have a sigmoidal curve when their input is plotted against their output. This is also true for the human eye, photographic film, and display monitors.
A fundamental noise source (thermally generated electrons, a problem which can be mitigated by cooling the CCD) in any CCD sensor provides some level of output, even when input is absent. The saturation level occurs when the output level hits a ceiling, and the CCD device cannot provide any more output, no matter what the input level rises to. In between is a useful range where the inputs and the outputs correspond nearly linearly. This is known as the dynamic range and for a typical CCD might be 1000:1. A problem in machine vision is to have the lighting level of the application fall within the useful range of the vision system. Techniques to adjust the available light to the dynamic range include adding more or brighter lights or an image intensifier, adjusting the mechanical aperture on the lens to let in or leave out more light, adding a filter to cut down the light level, adjusting the shutter time, which on a CCD is the integration time (this will be limited by the motion of the object under study).
Each type of camera requires its own type of computer interface in order to meet the signal requirements for sync, strobe, and data control. Cameras can be grouped into approximately four families, and board level vendors (companies selling frame grabbers) typically produce products to support these different camera types:
- Variable Scan Camera Interfaces
These usually support both area-scan and line-scan cameras. The low end of the market might be described by cameras which output data at up to 25 MHz digitizing rates (8-bit pixel resolution) while the top end is 50 MHz and requires more expensive interface electronics. Interface boards provide the signals that define the pixel, line, and frame timing. Most of the products in the market here also support RS-170 and CCIR cameras and can either synchronize to the timing of the camera or generate timing to synchronize multiple cameras.
- Color Camera Interfaces
These digitize true color images in real time from video sources compatible with NTSC, PAL, RGB, and S-VHS. Many of the boards perform on-board color space conversion (HSI, YUV, YIQ, YCrCb) with a 3×3 matrix multiplier.
- Digital Camera Interfaces
These boards provide a direct interface to RS-422 or TTL video sources. They provide the same function as traditional frame grabbers, except that there is no need for A/D conversion, since the camera is already digital. Flexibility is very high, with support for line-scan and area-scan cameras with 8-, 16-, or 24-bit single-ended TTL and 8- or 16-bit differential inputs being very common.
- Multi-Tap Cameras
Multi-tap cameras provide a performance (frames per second) boost at the expense of some downstream programming complexity, since the pixels have to be put back together to form an image. Some cameras are available with up to 8-taps that can be working simultaneously at 10 or 20 MHz each.
Vendors today seem to be following one of two different approaches to camera interface design. One school of thought is that the lowest cost, optimized design is a frame grabber board designed specifically for one type of camera or application. Others design basic grabber boards with a modular approach to the acquisition front-end. A family of different acquisition modules support different camera types is available to plug-in to one or more “motherboards.” The “dedicated” school argues that the additional components and connectors required by a modular approach reduce reliability and increase cost. The counterpoint is that modularity optimizes flexibility, reduces time-to-market, and minimizes troubleshooting since time-proven modules can migrate across product lines. It also provides OEMs, integrators, and end users with flexibility and with a high level of insurance against obsolescence as new cameras come into the market.