Using sub-RISC processors in next generation customizable multi-core designs: Part 3 - Embedded.com

Using sub-RISC processors in next generation customizable multi-core designs: Part 3

With TIPI PEs, we can create architectures that are goodmatches for the application's concurrency. Process-level, data-level,and datatype-level concurrency are all considered. However, the factthat we are building unusual architectures means that we cannot use theusual compilation techniques to deploy application software.

TIPI PEs are not standalone elements that executesequential programs from an instruction memory; they are peers thatpass the flow of control among themselves using a signal-with-dataabstraction. A C-based programming abstraction will not work for thisstyle of architecture.

Simply programming at the level of TIPI operations isnot sufficient. This is micromanaging the datapaths at the pipelinelevel. Like writing assembly language, this is error prone, difficultto debug, and has poor productivity overall. Designers would muchrather work at a higher level, where application concepts like Clickpush and pull connections are the primary abstractions.

Approaches like NPClick provide a “programming model”that combines architectural concepts with DSL-like features that areintuitive for the application. This is a delicate balance betweenopacity of unimportant architectural details, visibility of featuresthat are crucial to performance, and understandability to domainexperts.

For one application and one architecture, this is anacceptable compromise. However, it is not flexible to architecturaldesign space explorationor evolving architectural families (e.g., IXP1200 to IXP2800). Also, itis difficult to add additional abstractions for multifacetedapplications.

These shortcomings motivate a more general approach.A fundamental tenet of Cairn is that there is no single abstraction orlanguage that covers all aspects of system design. Instead, designersshould have access to different abstractions for different facets ofthe system.

Figure13.13. Three abstractions for system design.

Cairn is built around not one but three majorabstractions, as shown in Figure13.13 abov e. These abstractions aredesigned to support a Y-chart-based design methodology. This is shownin Figure 13.14 below.

Application developers and hardware architects canwork in parallel, using separate abstractions for their tasks. Theapplication and the architecture are brought together in a mapping stepto create an implementation. Here, another abstraction is used tomanage the concurrency implementation gap.

Figure13.14. Cairn Y-chart design flow.

On the architecture side, designers use TIPI toconstruct PEs. TIPI operations capture the capabilities of individualPEs, and the signalwith- data abstraction captures the process-levelconcurrency of the platform as a whole. In this section, we describethe Cairn application and mapping abstractions.

The Cairn Application Abstraction
Inspired by systems like Ptolemy II, Cairn uses models ofcomputation and the concept of actor-oriented design asthe basis for application abstractions [20]. An actor is a modularcomponent that describes a particular computation, such as a checksumcalculation. A model of computation is a set of rules that governs thecontrol and communication within a composition of actors.

Entireapplications are described by building compositions of actors withdifferent models of computation governing different subgroups ofactors. It is important to be able to use different models ofcomputation to model different facets of the application. No singleabstraction is likely to be a good choice for the entire application.

To use high-level application models forimplementation, and not just early exploration, the models must make aprecise statement about the requirements of the application. Oneproblem with the Ptolemy approach is that the actors are written inJava. Actors frequently make use of JVM features that are undesirableto implement on embedded platforms for reasons of efficiency andperformance. Also, the Java language cannot strictly enforce the rulesimposed by a model of computation.

Figure13.15. Simplified Click IP forwarding application.

A solution to this is to use an actor descriptionlanguage such as Cal [21]. Cal provides a formalism for designingactors based on firing rules. However, Cal actors can still use controlstructures like conditional statements and loops. As describedearlier,  TIPI PEs do not necessarily support such controlmechanisms.

Therefore, Cairn uses an actor language where thefiring rules are restricted to be primitive mathematical functions.More complex computational models, such as general recursion, are builtusing compositions of actors. The Cairn actor language also supportsarbitrary bit types for describing datatype-level concurrency within anactor.

For network processing applications, Cairn implementsa model of computation based on Click. In this model, actors areequivalent to Click elements. Interaction between actors uses Click'spush and pull style communication semantics. This abstraction is goodfor modeling the data plane of a networking application.

Cairn provides a graphical editor for assemblingactor-oriented models. Just as in the C++ implementation of Click,designers assemble actors from a preconstructed library. To date, Cairnhas actors for only a small portion of the Click library. ConvertingClick elements from C++ to Cairn is straightforward, but an importantconcern is to correctly model the data-level and datatype-levelconcurrency of the element. These characteristics are difficult to seein the C++ implementation. However, it is not expected that most userswill need to extend the actor library.

A simplified IPv4 routing application model using theClick model of computation is shown in Figure13.15 above. This figureshows one input and one output chain where packets enter and leave arouter. Normally, several such chains would be connected to the centralroute lookup actor. This actor classifies packets according to arouting table and sends them along the proper output chains. Thegraphical syntax in this model matches that given in the Clickdocumentation.

<>Designers can perform functional simulation anddebugging of the application model as it is being built. Cairn providesan automatic code generationtool that converts the application model into a bit-true C++ simulatorthat runs on the development workstation. Design space exploration canbe done at the application level to make sure that functionalrequirements are met.

Model Transforms
As in Cal, Cairn uses model transforms to prepare high-levelapplication models for implementation. Model transforms perform syntaxchecking and type resolution, and check that a composition of actors isvalid under the rules of a particular model of computation. Thetransforms then replace the abstract communication and controlsemantics with concrete implementations based on transfers.

Push and pull connections are transformed intotransfer-passing connections. Each firing rule of each actor isinspected to determine where the flow of control can come from andwhere the flow of control goes next. This information can be determinedstatically from the structure of the Click diagram. The model transformmakes extensions to the firing rules so that they compute outgoingtransfers in addition to performing packet header computations.

For example, in Figure13.15 abov e, the FromDeviceand CheckIPChecksum actor are connected by a push link. The FromDeviceactor has one firing rule that receives a packet from off-chip andwrites the header onto its output port.

After this, computation is supposed to continue byexecuting the CheckIPChecksum actor's firing rule. FromDevice's firingrule is extended to compute the corresponding control word forCheckIPChecksum. This control word and the header data make up atransfer that will be sent to the PE that implements CheckIPChecksum.

The reappearance of the signal-with-data abstractionat the application level is not a coincidence. The concept of blocksthat communicate by passing transfers describes a fundamentalcapability of the architecture. By describing the application in thesame way, we can make a formal comparison between the capabilities ofthe architecture and the requirements of the application.

Mapping Models
Designers use a mapping abstraction to assign application componentsonto PEs in the architecture. In Cairn, this is done by making mappingmodels. Every PE in the architecture has one mapping model. After theappropriate model transform is applied to the application model, theresulting transformed actors are simply assigned to various PEs using adrag-and-drop interface. One-to-one and many-to-one mappings aresupported. Mappings are made manually, but in the future an automatedoptimization process may be used tohelp find a good assignment [22].

A mapping model is a high-level description of thesoftware that is to run on each PE. The goal is to make the PE behavelike the union of the actors that are mapped to it. When a PE receivesa transfer, it will begin executing a program that corresponds to oneof the firing rules for one of the mapped actors. Since a firing ruleis a primitive mathematical function, these programs will befinite-length schedules of TIPI operations without loops or jumps.

Thus, each program can be described as a TIPI macrooperation. Programming a TIPI PE is the process of converting actorfiring rules into macro operations and placing the resulting machinecode into the PE's microcode memory.

Code Generation
Designers do not convert actors into machine code by hand. This wouldbe slow and error prone, making it impossible to perform effectivedesign space exploration.

Instead, an automatic code generation tool isprovided. Actor firing rules are processed one at a time. The tooltakes as input a netlist of the PE architecture, a netlist representingthe firing rule computations, and a total cycle count constraint thatbounds the length of the program schedule. A combination of symbolicsimulation and Boolean satisfiability is used to find a solution.

A solution is not guaranteed to exist, since TIPI PEs can behighlyirregular and simply not capable of doing the specified computations.If this occurs, there is a serious mismatch between the application andthe architecture. Designers must explore alternative mappings or trymaking modifications to the PE architecture. Once the mapping modelsare made and code generation is performed, designers have a completemodel of the hardware and software of the system. 

The concept of having independent models of theapplication, the architecture, and the mapping is the core of the Cairnapproach. A clear model of the application's requirements and thearchitecture's capabilities helps designers understand what needs to bedone to cross the implementation gap. This comparison also providesinsight into how the application or the architecture can be changed tomake the implementation gap smaller.

Potential improvements can be explored using thefeedback paths in the Y-chart. The application and the architecture canbe modified separately, and different mapping strategies can be triedand evaluated quickly. Performing this design space exploration iscritical to finding an implementation that meets performance goals.


Next in Part 4: IPv4 Forwarding DesignExample
To read Part 1, go to  Concurrentarchitectures, concurrent  applications
To read Part 2: go to  Generating the architecture from theinstruction set

Usedwith the permission of the publisher, Newnes/Elsevier ,this series offour articles is based on copyrighted material from “Sub-RiscProcessors,” by Andrew Mihal, Scott Weber and Kurt Keutzer in CustomizableEmbedded Processors, edited byPaolo Ienne and RainerLerupers. The book can be purchased on line.

Paolo Ienne is aprofessor at the Ecole Polytechnique Federale de Luasanne (EPFL) andRainer Leupers is professor for software for systems in silicont atRWTH Aachen University. Kurt Keutzer is professor of electricalengineering and computer science at the University of California,Berkeley, where Andrew Mihal was a Ph.D. candidate and Scott Weberreceived a PhD., related to the design and simulation of sub-RISCprogrammable processing elements.

References:
[1 ]N. Shah. Understanding Network Processors. Master's thesis,University of California, Berkeley, California, 2001.

[2] E. Kohler, R. Morris, B. Chen, J.Jannotti, and M.F. Kaashoek. TheClick modular router. ACM Transactions on Computer Systems (TOCS),18(3):263-297, Aug. 2000.

[3] B. Chen and R. Morris. Flexiblecontrol of parallelism in a multiprocessor PC router. InProceedings of the 2002 USENIX Annual Technical Conference, 2002, pp.333-346.

[4] N. Shah, W. Plishker, K. Ravindran, and K.Keutzer. NP-Click:A productive software development approach for network processors.IEEE Micro, 24(5):45-54, Sep.-Oct. 2004.

[5] C. Sauer, M. Gries, and S. Sonntag. Modulardomain-specific implementation and exploration framework for embeddedsoftware platforms. In Proceedings of the Design AutomationConference, 2005, pp. 254-259.

[6] P. Paulin, C. Pilkington, E. Bensoudane, M.Langevin, and D. Lyonnard. Application of amulti-processor SoC platform to high-speed packet forwarding. InProceedings of the Design, Automation and Test in Europe Conference andExhibition (Designer Forum), 2004, vol. 3, pp. 58-63.

[7] K. Ravindran, N. Satish, Y Jin, and K. Keutzer. AnFPGA-based soft multiprocessor for IPv4 packet forwarding. InProceedings of the 15th International Conference on Field ProgrammableLogic and Applications, 2005, pp. 487-492.

[8] C. Kulkarni, G. Brebner, and G. Schelle. Mappinga domain specific language to a platform FPGA. In Proceedings ofthe 41st Design Automation Conference, 2004, pp. 924-927.

[9] Tensilica, Inc. TensilicaXtensa LX processor tops EEMBC networking 2.0 benchmarks, May 2005.

[10] 1. Sourdis, D. Pnevmatikatos, and K. Vlachos. Anefficient and low-cost input/output subsystem for network processors.In Proceedings of the FirstWorkshop on Application Specific Processors, 2002, pp. 56-64.

[11] G. Hadjiyiannis, S. Hanono, and S. Devadas. ISDL:An instruction set description language for retargetability. InProceedings of the 34th Design Automation Conference, 1997, pp.299-302.

[12] A. Fauth, J. Van Praet, and M. Freericks.Describinginstruction set processors using nML. In Proceedings of theEuropean Design and Test Conference, 1995, pp. 503-507.

[13] A. Hoffman, H. Meyr, and R. Leupers.Architecture Exploration for Embedded Processors with LISA. Kluwer,2002.

[14] A. Halambi, P. Grun, V. Ganesh, A. Khare, N.Dutt, and A. Nicolau. EXPRESSION:a language for architectureexplorationthrough compiler/simulator retargetability. In Proceedingsof Design, Automation and Test in Europe Conference and Exhibition,1999, pp. 485-490.

[15] R. Leupers and P. Marwedel. Retargetablecodegeneration based on structural processor descriptions. In DesignAutomation for Embedded Systems, 3(l):75-108, Jan. 1998.

[16] M. Gries, K. Keutzer, H. Meyr, and G. Martin.Building ASIPs: The MESCAL Methodology. Springer, 2005.

[17] S. Weber and K. Keutzer. Usingminimal mintermsto represent programmability. In Proceedings of the InternationalConference on Hardware/Software Codesign and System Synthesis, 2005,pp. 63-66.

[18] Free Software Foundation, Inc. GNU MultiplePrecision Arithmetic Library. http://www.swox.com/gmp.

[19] S. Weber, M. Moskewicz, M. Gries, C. Saner, andK. Keutzer. Fastcycle accurate simulation and instruction setgeneration for constraint-based descriptions of programmablearchitectures. In Proceedings of the Intel- national Conference onHardware/Software Codesign and System Synthesis, 2004, pp. 18-23.

[20] E. Lee. Embedded software. In Advances inComputers, M. Zelkowitz, editor, Academic, 2002, pp. 56-99.

[21] E. Willink, J. Eker, and J. Janneck. Programmingspecifications in CAL. In proceedings of the 00PSLAWorkshop onGenerative Techniques in the Context of Model Driven Architecture,2002.

[22] Y. Jin, N. Satish, K. Ravindran, and K. Keutzer.Anautomated exploration framework for FPGA-based soft multiprocessorsystems. In Proceedings of the International Conference onHardware/Software Codesign and System Synthesis, 2005, pp. 273-278.

[23] T. Henriksson and 1. Verbauwhede. Fast IPaddress lookup engine for SoC integration. In Proceedings of the IEEEDesign and Diagnostics of Electronic Circuits and Systems Workshop,2002, pp. 200-210.

[24] D. Taylor, J. Lockwood, T. Sproul, J. Turner,and D. Parlour. ScalableIP lookup for Internet routers. IEEE Journalon Selected Areas in Communications, 21:522-534, May 2003

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.