Going from GDSII to OASIS
By Dr Philippe Morey-Chaisemartin, Xyalis
EDA DesignLine
(08/04/08, 09:05:00 AM EDT)
Why switching to OASIS ?
It's a banality to say that nowadays, databases for digital chips are more than huge. The physical description of an SOC, encoded in the classical GDSII format, now often goes over 20Gbytes. Files of up to 200Gbytes have been reported by mask houses. Even if storage systems and data transfer links can handle such sizes, it is obvious that such big files are difficult to manipulate.

GDSII was introduced by Calma in 1978 as a successor of GDS format created in 1971. Since almost 30 years, no major change have been made to this de-facto standard while chips complexity was multiplied by as much as 106. In addition to file size issue, numerical values needed to describe geometries of nanoscale structures on 300mm wafers will soon reach the 32 bits limits of GDSII format.

The OASIS format was developed to address such issues and its first official specification was released in 2004 [1].

This article describes how size and precision limitation issues are managed in OASIS format. It also singles out some critical points of this format and finally gives some ways to really get full benefits from OASIS and to circumvent potential pitfall and problem using this new standard.

How data size reduction works
The primary goal of the OASIS format is to reduce the data base size. This can be done in multiple ways: optimization of the file structure, suppression of all redundancies and compaction of the values.

1) Reduction of geometric description size
All geometries contained in the physical description of a chip are made of polygons, themselves described as lists of coordinates. The optimization can be done by reducing the number of coordinates needed to describe a polygon, and by reducing the size (in terms of used bytes) of each coordinate.


1. OASIS can support up to twentyfive types of trapezoids.

2) Optimization of geometric repetitions
Statistically, in any design many geometries are repeated. For example a simple contact may appear tens of time in a single small library cell. OASIS offers the possibility to instantiate multiple occurrence of the same geometry [2].


2. There are 11 ways to describe repetition in OASIS.

3) Optimization of cells call
A physical description of any chip is always hierarchical. A top cell call sub cells which are described separately.

4) Embedded compression
An other possibility offered by OASIS is to directly compress (gzip like method) some blocks inside the file. Usually a block is a full cell description. Each cell is then independently compressed, which makes random access in the file possible even if it has components in compressed format.

5) Performances
Depending on the database structure and on the chosen optimization, an OASIS file is between 5 and 20 times smaller than a GDSII [3][4].

That's a big improvement, but we need to compare these values on compressed files which is now a standard. The following diagram, gives some average compression ratio compared to the original GDSII file which has been given the reference value of 1.


3. Comparison betwqeen GDSII and OASIS with various compression methods.

The optimized GDSII data is obtained by replacing each repeated polygon by a cell containing this polygon and multiple calls to the cell. Cell names are chosen as short as possible to reduce file size as references are only made though names in GDSII [5].

It should be noticed that the compression with gzip -or bzip2- on an OASIS file is less efficient than on GDSII. This mostly comes from the fact that all numerical values are already compacted, i.e. reduced to the minimum number of bytes thanks to variable size coding. All unnecessary "0" bytes present in GDSII format (almost 50% of the file) are already removed from OASIS before this compression scheme is applied.

Potential problems
Although OASIS offers capabilities needed by new technologies and highly optimizes database size, it's far from being free of issues.

1) No restrictions means no limit
The first dramatic impact of having removed all the restrictions due to precision limits (i.e. 32 bits length for coordinates) is that anything is allowed. Any value can have an infinite precision. This is an interesting feature for fundamental mathematics but as no meaning to describe a circuit.
Description of a value is one thing, computation on infinite precision values is something else. All the tools which manipulate OASIS files will have an internal limit (due to hardware architecture). This makes them not 100% OASIS compliant even if they will be able to handle all OASIS files which should never use values of more than 64bits.

If we consider than 103 is almost the same as 210, a 32 bits value can describe a coordinate of +/- 2.109, a precision of 0.1 nm on a 20cm wafer. We are close to the limit of current process needs, but with 64 bits we are far from all future expected limits.

Adding an internal limit to coordinates at 64 bits is for sure safe, but some tools running on 32 bits architectures may have a limit at 32 bits. This makes a file created on a 64 bits platform unreadable on a 32 bits platform or worst of all, readable but introducing overflows and then converting positive coordinates into negative ones.
The risk is not very high with coordinates, but becomes dramatic for other integer values such as cell index or layer numbers as they can't be manage on standard computer architectures.

2) Tables and indexes
As described above, all the cells may be referenced through indexes. This index is an entry in a table containing cells name. This makes referencing quite easy, except that references may be stored at different places: beginning of the file, end of the file, or spread among the whole file. Worst of all, references can be also made by name. Even if all the combinations cannot be mixed in the same files, all the different possibilities exist. So an OASIS reader should be able to accept any kind of reference and must not be optimized for an option or the other.

It appears that a commonly used solution is to build the reference table at the end of the file. This makes an OASIS writer quite easy to manage. TheOASIS standard states that it is very convenient, while reading an OASIS file, for the position of this table (when present) to be at a fixed position, preferably at the end of the file. This should made its access, prior to full file parsing, very easy. That's true when the file is not compressed. Unfortunately, most of the users still compress their files in order to minimize the size of the database.

Uncompressing a file can only be done sequentially so, with GDSII format, which was originally developed to be read and written on tapes, there is no problem. The OASIS format uses the fact that all storage is now performed on random access media and allows direct access to any location in the file. When compression is used, if the OASIS reader uses this feature, it has a dramatic impact on read access time.

3) Equivalence with GDSII
OASIS is intended to replace the GDSII format, but still for many years, both format will co-exist. So managing heterogeneous environments and translating data between GDSII and OASIS is mandatory and will remain a constraint for many years.

4) Vulnerabilities
Despite its enhancement compared to GDSII, OASIS format may still contain inconsistent data. Usage of a checksum at the end of the file reduces the problem of data corruption during transfers but OASIS standard by itself doesn't specify how to interpret specific shapes. Worst of all, OASIS files may contain unidentified binary data.

How to get real benefits from OASIS format
Here are some basic rules when using a GDSII to OASIS converter or developing your own OASIS writer:

Conclusion
The OASIS format lifts the restriction on precision in numbers but doesn't correct all the limitations of GDSII and brings some new sources of errors and problems.

Depending on the method used for the optimization, the results in terms of file size and of analysis time may vary significantly. Many different ways of optimization are available but none of them can give the best result on any type of database. It's almost impossible, or it may cost too much time, to try all the methods and choose the best. So each CAD vendor will define a strategy and will generate its OASIS file by using a given method.

Some companies are starting to switch to OASIS format, while others remain on the GDSII format. For them, the only real issue related to GDSII is the file size, which is not considered as a blocking point. Extending disk and RAM capacities is still estimated to be a better deal than changing a qualified flow based on GDSII to a new one based on OASIS.

Due to the complexity of the OASIS standard, and to the fact that many different options are available to store the same data, the number of possible errors in an OASIS file dramatically increases compared to GDSII (at least 4 times). If we also consider that the weakness of GDSII regarding polygons shape interpretation has not been corrected in OASIS, it seems important to carefully validate all the databases using this new format.

Experience has proved that it took many years to correct all the errors in GDSII file generated by different tools. We are just at the beginning of OASIS, so detailed checks should be performed in order to achieve the same level of confidence than for existing GDSII based flows.

After many years developing tools based on GDSII format, Xyalis has released a OASIS format reader. It allows to check all critical points in an OASIS file including full specification compliance. It also validates the compatibility among 32/64 bits platforms, badly formed polygons, checks for the presence of unidentified binary code and more.

References
1) SEMI P39-1105 - OASIS -- Open Artwork System Interchange Standard. Abstract IEEE standards
2) Evaluation of the New OASIS Format for Layout Fill Compression Yu Chen & al.
3) GDSII to OASIS Converter " Performance and Analysis, Nageswara Rao G., Softjin Technologies, white paper.
4) OASIS vs. GDSII stream format efficiency, A.Reich & al., Proceedings of SPIE -- Volume 5256
5) Improved file sizes and cycle times through optimization of GDSII Stream, Chin Le & al., Proceedings of SPIE -- Volume 5992

About the author
Philippe Morey-Chaismartin
is Xyalis' CTO. He earned a master in computer science in 1983 and a PhD in microelectronics in 1986. He can be reached at: his Xyallis address.