Graphics startup aims to disrupt mobile raytracing with 1w core - Embedded.com

Graphics startup aims to disrupt mobile raytracing with 1w core

Seemingly out of nowhere — well, South Korea actually — a four-year old startup recently burst on the scene with a ray-tracing chip, the RayCore. The company – Silicon Arts http://www.siliconarts.com/ – was founded by Dr. Hyung Min Yoon, formerly at Samsung; Hee Jin Shin from LG; Byoung Ok Lee from MtekVision; and Woo Chan Park from Sejong University.

What they demonstrated simultaneously in talks at Hot Chips and Siggraph was a proof of concept for an IP block, not a product. No matter how delivered, they have an interesting and impressive piece of work. In fact, the company has been licensing its ray-tracing IP to OEM partners since 2011, and is currently working with a vendor of mobile apps processors on a next-generation SoC.

RayCore consists of two major components: ray-tracing units (RTUs) and a tree-building unit (TBU) for dynamic scenes. The MIMD execution model of the RTU's unified traversal and intersection (T&I) pipelines was chosen to meet power efficiency and silicon-area needs of mobile devices.

A look inside the RayCore device.

A look inside the RayCore device.

According to SiliconArts, its 28 nm evaluation ASIC measuring 18 mm2 with six RTUs can achieve up to 239 Mrays/second while consuming just one watt. The TBU uses a K-D tree construction to test models with up to 64,000 triangles, and the company says it can run such a test in 20 ms.

SiliconArts uses a novel latency-hiding technique to reduce performance degradation from off-chip memory accesses. The technique run on the T&I pipelines, combined with the TBU and texture mip-mapping, is called “looping for the next chance.” SiliconArts shows benchmarks of the approach delivering real-time Whitted ray tracing with six RTUs and K-D-tree construction with one TBU using less than 1.1 GByte/s of memory bandwidth, much less than the 12.8 GByte/s bandwidth provided by today's mobile LPDDR3 memory.

Remember, these are test results using an FPGA. A tightly coupled IP block in a 22 nm or smaller SoC would run significantly faster.

The startup provides OpenGL ES 1.1-like API extensions to separate static and dynamic objects. Static objects are retained for subsequent frames while dynamic objects are transferred to the tree builder via vertex arrays to reconstruct dynamic sub-trees during each frame.

To read more of this external content and to leave a comment, go to “Fast but not first.” 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.