UML versus Domain Specific Languages
By Mark Dalgarno and Matthew Fowler
Embedded.com
(09/19/08, 10:19:00 PM EDT)
With C++ and Java failing to deliver significantly improved developer productivity over their predecessors, it's no surprise that around 40% of developers are already using or are planning to use code generation approaches to tackle the problem of code complexity.

There are, by now, many cases studies of the successful application of code generation tools and technologies. Allowing developers to raise the level of abstraction over that supported by general-purpose programming languages is the best bet for development organisations wishing to address the productivity problem.

Although there are other approaches to raising the level of abstraction - such as framework evolution, or even programming language evolution, code generation, because it is flexible and fast, has the advantage of being able to adapt to new environments relatively quickly.

Here we consider the two most popular starting points for code generation; UML for program modelling [part of the OMG's Model Driven Architecture (MDA)] approach), and Domain-Specific Languages (little languages that are created specifically to model some problem domain).

As well as introducing both approaches, the aim is to offer advice on their usefulness for real-world development. We also ask whether they are mutually exclusive or if in some circumstances it can make sense to combine them.

UML and MDA
Experience of using UML as a modelling language is widespread and so using UML to express what is required in a system and generating code from that is acceptable for many organisations. 'Code generation' in UML originally meant a very low level of generation - converting classes on a diagram into classes in C or Java.

Experience has shown that this level of modelling does not give any business benefit when applied to complete systems. However, by using more specialised or abstract modelling elements it is possible to increase the amount of generation.

This approach was adopted by the OMG in 2001 as part of its MDA standard. MDA was developed to enable organisations to protect their software investment in the face of continually changing technology 'platforms' (languages, operating systems, interoperability solutions, architecture frameworks and so on). If the design and implementation is tied to the platform, then a platform change means a complete rewrite of a software system.

To avoid this, MDA proposed to separate 'the specification of system functionality from the specification of the implementation of that functionality on a specific technology platform'.

The specification of system functionality is a Platform Independent Model (PIM); the specification on a particular platform is a Platform-Specific Model (PSM). The PSM can be annotated by developers to provide advice or guidance for the final code 'text' generation step - which creates the source code and configuration files.

To reap the business benefits of this approach, the PIM must survive platform change and be reusable across platforms. The implications are twofold. Models become first-class artifacts in the development process, rather than being ignored after a certain point; if you change the PIM, the functionality of the delivered system will change.

The second is that code generation becomes important; mapping the PIM to the PSM by hand is costly and error prone, whereas automatic mapping to the PSM can significantly reduce the cost of a transition to a new or upgraded platform.

MDA defines a set of standards for transforming models that was finally completed in 2007. These standards are well supported in the telecoms and defence sectors, where there is a history of investing in development tools as part of large projects. In the commercial world, the lack of standards led to companies supporting the 'model driven' approach (MDD - development, MDE - engineering etc.) using a variety of tools to transform UML models into working systems " 'pragmatic MDA', as it was called.

The industry position of UML also means that developers can choose from a wide variety of vendors for their MDA tooling. Furthermore, vendors typically provide additional products based on the MDA approach, reducing the investment for an individual company to adopt MDA.

However, there are some issues in the use of MDA. First is the expression of detailed business logic. While 90-95% of a commercial information system can be generated from a UML model, there is a point where the business logic is not general and so not amenable to a code generation solution. There are two approaches to expressing the business logic.

The 'purist' approach is to model the business logic; one of the MDA specifications covers this approach. The 'pragmatic' approach is to leave holes in the generated application for the hand-written business logic; this is most popular where there is a rich, standardised development environment, like Java or C# and .NET.

Another issue is the low level of UML and the looseness (or generality, to put a positive slant on it) of its semantics: a common criticism is that UML is too big and vague to be effective. This assumes that the only 'code generation' possible is the very low-level code generation described earlier - the assumption is that UML can't express more abstract or specialised concepts.

But this criticism ignores UML's profile feature. 'Pragmatic MDA' vendors use this to specialise UML. To do this, they define profiles so developers can create models with a more specialised terminology and associated data. On top of that, vendors add their own validation to tighten up the UML semantics. The result is a domain-specific subset of UML if you like.

Using UML profiles gives as much expressive power as DSLs: stereotyped classes typically equate to the DSL terminology and stereotyped relationships are the same as for relationships in graphical DSL terminology. In other words, either approach can express concepts of arbitrary levels of abstraction.

There are two main problems with using UML with profiles to define new modelling languages: with current UML tools it is usually hard to remove parts of UML that are not relevant or need to be restricted in a specialised language, and; all the diagram types have restrictions based on the UML semantics.

For example, New Technology/ enterprise (NT/e) is in the process of building a graphical DSL for a novel middleware product. The key to this is being able to model methods as first-class model elements.

In theory we should be able to do this using action diagrams, but in practice there is too much other baggage that drags along with it. As we will see below, the DSLs are built from the ground up, so the modeller is not confronted with extraneous UML semantics or modelling elements.

Despite this, defining a high-level UML profile has historically been the best commercial approach to realising MDA. To produce a new profile is relatively cheap. On the marketing front, the installed base of UML tools and the understanding of the practice and benefits of modelling mean MDA products can be positioned as 'addons' rather than a completely new paradigm.

Domain-Specific Languages
Although DSLs and DSL tools have been around for a while now it is only in the past few years that interest in this area has really taken off - partly in response to Microsoft's entry into this space with its DSL Tools for Visual Studio.

As noted above, DSLs are little languages that can be used to directly model concepts in specific problem domains. These languages can be textual, like most programming languages, or graphical. Underpinning each DSL is a domain-specific code generator that maps domain-specific models created with the DSL into the required code.

One way to think of how to use a (graphical) DSL is to imagine a palette containing the boxes and lines that correspond to key concepts and relationships in your problem domain.

Modelling with this palette involves selecting the concepts you wish to model and 'painting' them onto your canvas. 'Painting' different types of lines between these concepts can then create different types of relationships between the concepts.

An advantage of the DSL approach is that the modelling environment can constrain and validate a created model for the domain's semantics, something that is not possible with UML profiles.

Tools to support the definition of DSLs and domain-specific code generators have been around for a while now but have been far less commonly available than MDA-based toolsets, with only one or two vendors offering mature products.

Given this, many developers using DSLs have chosen to go down the road of implementing their own generators with varying degrees of success due to the complexity of this type of work.

This is now changing with the increasing availability of tooling to support DSL and generator creation from companies such as MetaCase, Microsoft and as part of the Eclipse Modelling Framework. To some extent these have reduced the skill levels required to create DSLs and domain specific generators.

Which to use?
Given that both approaches now have momentum behind them in the form of vendor support, successful case studies and increasing industry awareness, the question arises for developers of which approach to adopt (assuming developers are completely open-minded!).

Perhaps the first thing to note is that developers in organisations, or supply-chains, where use of UML or Microsoft technologies is mandated may find it politically difficult to choose a 'competing' approach.

Modelling and code generation is just one part of the software life cycle, albeit an important part, and must fit in with the rest of the organisation's tooling and processes.

Similarly, in industry sectors such as real-time systems engineering, where intensive work has already been undertaken to support the particular modelling needs and constraints of the sector (with development of the SysML customisation of UML, for example), developers may not find it cost-effective to create their own unique UML profiles or DSLs that don't take advantage of this prior work.

As noted above, a basic DSL can be produced using UML profiles and this will often be a viable and relatively quick approach for a first-time code generator.

However, the baggage that UML brings to the problem can confuse novice modellers; to avoid this, generator developers may choose to directly proceed to building their own DSLs - either with tool support or in a completely bespoke manner.

It's also worth mentioning that in many cases software systems can only be implemented with multiple modelling languages and code generators addressing different aspects of the overall problem.

There's nothing to stop developers, who on the whole are a pragmatic bunch, from using a hybrid approach that combines UML with DSLs to create solutions that draw on the strengths of each approach, and indeed this is what some organisations, such as NT/e, have done very successfully.

So what is the outlook for the industry? It is our belief that as a basis for modelling for code generation, UML tools - in their current form - will gradually lose 'market share' to DSLs: DSLs can be more direct, appealing and easier to use for a wider range of users.

However, UML vendors, with their strong background in code generation approaches, can compete by adding a non-UML modelling and metamodelling 'surface'. Combined with their tool's existing UML features, this would make an attractive combination product for many companies.

Mark Dalgarno is a Partner, Software Acumen and Matthew Fowler, Matthew Fowler is founder and CEO of New Technology/enterprise.