On the difference between data and object models


A recurring challenge in the development of an object-oriented system is mapping the object model (in particular, the domain model) with a relational database. This is a well-known problem referred in literature with the “impedance mismatch” term [2,3]. It is a problem because we are trying to merge two paradigms which encourage very different programming styles. It is not surprising to see how often the communities of data modelers and object modelers clash when they talk each others about the importance/superiority of one model with respect to the other. Usually, these discussions don’t produce much value because (more often than not) they originated by several misunderstandings. If we do not understand the very different nature of the two paradigms, we will not be able to effectively integrate them in current software development methods and applications. Hence, the idea of this post is to compare each paradigm in order to better understand how they differ. My assumption is that the two models will coexist for many years (perhaps until either object-oriented databases will became mainstream or another paradigm will outpace the object-oriented development style).

Obviously the space in a blog is not sufficient to provide a complete treatment. The interested reader can deepen the arguments discussed here in the books suggested at the end of this post.

Relationships and navigability

Of course, the most evident difference between the two paradigms concerns how they handle the relationships between entities. The relational model is based on the set theory whereas the object model is quite close to the graph theory. This fact has several implications which affect how relationships are dealt with. As described by Jimmi Nilsson [1], in the relational model, relationships are formed by duplicated values. Consider, for example, the ORDER entity which has the primary key ORDER.Id: such key is duplicated as a foreign key in any children entity, letting the child rows “point” to the parents. Thus, everything in a relational model is data, even the relationships. In an object-oriented model, relationships are implemented using the built-in object identifiers, letting the parent have references to object identifiers of its children.

Navigability is also affected by the different nature of these two models. A graph can be oriented, so in an object model the navigation between two objects is one way. Bidirectional navigation is achieved by two separate links (relations). Conversely, in the relational model the navigation is always bidirectional. Moreover, the relational model is set-based, hence almost every operation on the database deals with sets. When we want to navigate in the relational model, we execute a query to retrieve all children that have a foreign key with the same value of the primary key of the parent. What we have in return is a set of data. Another way of navigating the relational model is to perform a joint between both the parent set and the children set. In an object model the typical way to navigate through the instances is to traverse the relationships between the objects (as in a graph). With objects, we deal always with only an object (instance) at a time.

Data encapsulation

This different way to implement relationships affects also the “privacy” of the data. Every data in the relational model is “global”, while in an object model we usually keep data encapsulated and protected in agglomerates called aggregates. Every aggregate should keep secret its internal state, exposing only the public services used to manipulate it.

Static vs. dynamic nature

Relational models concern only data, whereas object models take into account also the dynamic nature of object (objects, differently than entities, expose a behavior implemented by methods).

Type system

Using both a relational database and an object model, we deal with two different type systems. This is evident when we look at the physical models. Often the two types systems are located in different machines and even it is not so, they live in different address spaces. Moreover, not even the primitive types are the same. A string in .NET can have a variable length whereas in Microsoft SQL Server is typically a varchar or a text or a char. Again according to [1], a DateTime in .NET has the precision of 100 ns, whereas in SQL Server the precision is only of 3/1000 s. Yet another example: in the object type system it is not possible to assign a primitive type to an object instance, and viceversa. Thus it is not possible to set an int to null; however, it is perfectly legal to store a null value in an int in SQL Server.

Clearly, these differences show us that when we are developing a complete software system we are dealing with two distinct development environments: the application programming environment and the database programming environment. In the latter the typical programming language is a declarative query language, whereas in the former the languages are usually imperative (often either procedural or object-oriented). Declarative query languages have no control constructs, nor variables, thus they are not well suited for complete application development. Hence, programmers write applications in a programming language with its own data structures, and use a query language to transfer data back and forth between the application programming environment and the database programming environment. Because these two environments adopt different paradigms (with the differences highlighted above), we deal with the impedance mismatch problem.

Due to their intrinsic characteristics, the object model is usually the best place for coding the business logic. The translation between the object model and the data model can be done manually (e.g. with design patterns such as Repository or Query Object) or automatically, with an Object-Relational Mapping (ORM) tool such as Hibernate or Cayenne. Moving the business layer in the data model is typically an anti-pattern. We are forced to work in the database programming environment where the programming languages are not expressive enough to provide the support needed to code business rules and complex business logic. Moreover, we lack also the support of testing tools and techniques which is commonly available in the application programming environment (assertions, unit testing, integration testing, and system testing frameworks).

Bibliography

[1] Jimmy Nilsson, “Applying Domain-Driven Design and Patterns”, Addison Wesley, 2007

[2] Roderic Geoffrey Galton Cattell, “Object Data Management: Object-Oriented and Extended Relational Database Systems”, Addison Wesley, 1991

[3] François Bancilhon and David Maier, “Multilanguage Object-Oriented Systems: New Answers to Old Database Problems?”, in Future Generation Computer, K. Fuchi and L. Kott Eds, North-Holland, 1988

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s