Data modelling: start it early!

Whenever I mention data modelling on a project, everyone seems to assume I am talking about database design.

In my experience, on the vast majority of projects, both the Business and the service provider leave data modelling until they are at the screen design stage. At best, they might start modelling data at the System Use Case (SUC) level.

This approach only leads to problems. If you are lucky, the problems will arise during the design of the technical solution. If you are unlucky, they will arise six months after the system goes live and the database starts under-performing. Try fixing a badly designed DB on a live system! Have fun with that.

Some of you might be wondering what the problem is, so here is an example.

Suppose you are an analyst and you are modelling a SUC called “Amend Customer Details”. That SUC was identified at an earlier stage of the change initiative because on the business models there is a similarly named business process. During that SUC there will be a step where the System provides Customer data to the Actor and a step where the Actor amends some of that data. So we might record the following attributes:

  • Customer: Forename
  • Customer: Surname
  • Customer: Date of Birth

Now imagine a colleague is working on a SUC called “Amend Patient Details” and ends up recording the following attributes:

  • Patient: Forename
  • Patient: Surname
  • Patient: Date of Birth

Now suppose those data requirements are taken forward and a database is designed with a “Customer” table and a “Patient” table.

The problem in my imaginary scenario is that the business processes were modelled by interviewing Subject Matter Experts (SMEs) from different departments of the Business: Sales and Service Delivery. Neither department cares too much about what the other department does because they are all too busy. The Business is a health insurance provider. From the point of view of the Sales Department, you are a customer, but from the point of view of the Service Delivery Department, you are a patient because you only appear on their radar when you get sick and actually need the healthcare you have paid for.

But you are still you. You are a single entity.

Yet because the data analysis was left too late, and then approached in a silo fashion, the system designers think you are two separate entities and business data is unnecessarily duplicated across the database.

The Sales Department might not care about your status as a patient and the Service Delivery Department might not care about your status as a customer, but from the overall perspective of the Business as a whole, you are not two things. It is this overall business perspective that matters in data modelling, not the departmental view, or the individual SUC view, or (horror of horrors) the individual screen view.

Conceptual Data Modelling (modelling the data entities and the relationships between them) as a minimum should be completed alongside the business process models otherwise the overall model of the business is not complete. Yet most companies spend a fortune on business process models and nothing at all on CMDs.

So what?

So if you are modelling a business process called “Amend Contract”, how can you properly understand the process if you don’t know what a Contract is from the perspective of the Business and how a contract relates to other entities, such as Customer, Agent or Product?

The answer is that you cannot.

No “buts”. Stop it. Stop saying “But…” There are no “buts”.

You cannot fully understand a process if you do not understand the entities the process affects and the relationships between those entities. In order to gain that understanding, you need a Conceptual Data Model.

If you get to the stage of a change initiative where high level requirements are being drafted, or a Request for Proposal (RFP) is being issued and there is no CDM in place, something is horribly wrong and both the Business and the service provider will suffer for it later.

I know because in my less experienced years I was involved in projects where no CDM was in place and I know because as a more experienced analyst I have raised a flag and been ignored and watched people suffer the pain later.

Ideally, a Logical Data Model will also be in place. The LDM is the CDM taken to a lower level of granularity. A CDM only shows entity names (Customer, Contract, etc.) and how they relate to each other, but not their attributes. Attributes (Forename, Surname, etc.) appear on a Logical Data Model, which can then be normalised. If the LDM is not in place before the project starts, then it needs to be put in place early on.

With a LDM in place, all data requirements exposed during requirements gathering can be validated against it. If during the investigation of some particular functional requirement, a SME says that a certain attribute is required, that candidate data item should be validated against the LDM. If there is not a corresponding attribute on the LDM, then one of two things is likely:

  • The data item is unnecessary to the Business (despite what the individual SME might think)
  • The LDM is incorrect and must be reviewed and amended

There is a further level of data analysis that should be done during the requirements stage. For each required attribute, certain characteristics should be recorded in a data dictionary. All such characteristics relate to the business need only and not to any technical solution. Examples:

  • Type
  • Maximum length
  • Whether there are any pre-set values (e.g., in the case of Mr, Mrs, Dr, etc.)

Click here to download a simple example of such a data dictionary in Excel format (it’s on the worksheet titled ‘Data’).

With a robust CMD, LDM and data dictionary in place (in that order), you will provide the database designer with all the information needed to design an efficient and sturdy database.

With none of those things in place (or at best, just a data dictionary), you will end up either with a very shaky database design (in the case of lazy designers) orΒ  a lot of work for the designers as they try to reverse-engineer the entity relationships from the data requirements (and they will not get it 100% right).

To iterate: our understanding of data must start with the business perspective, not the technical solution. That seems obvious to me, and I hope it is obvious to you now, but I have lost count of the meetings I have been in when both the customer and the solution provider get into a deep discussion about database design without doing even rudimentary logical data analysis.

If you find yourself modelling business needs and you start talking about database design, stop, go down to the fishmonger, buy yourself a Flounder, or other equally sustainable fish, return to the meeting room, sit down and then smack yourself hard on the face with the fish. You will find that this will awaken you to the importance of modelling data from the Business perspective first.

A Flounder yesterday

Related posts:

2 comments to Data modelling: start it early!

  • Caitlin Dean

    Well said Declan πŸ™‚ And a good reminder for me that what may seem obvious and unquestionable (to me, to you…) may still encounter resistance in projects. Of course, the flounder brings it all home πŸ˜‰

  • Fred


    It’s somewhat depressing to hear that this even needs saying. But even as I’m typing this, I know it does.


Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>