Why can't I just expose entities via API?

After talking to hundreds of developers I realised that there is huge frustration when additional levels of indirection and mappings are used to produce API - we have to write extra tests, code and often it looks like evil boilerplate. Booyhay!

Let’s look at simple example that you usually can find in online tutorials.

We have a User repository:

interface UserRepository extends CrudRepository<User, String> {
	
}

And a Controller that calls the repository. User is automatically exposed and marshalled to JSON:

@RequestMapping(method = RequestMethod.GET, produces = MediaType.APPLICATION_JSON_VALUE)
  public Collection<User> users() {
     return userRepository.findAll();
  }

So simple!

Then someone comes and says that you have to explicitly map User to UserRepresentation and return Collection of UserRepresentations. Or UserDTOs or UserBeans if you want. Sadly, you have to map most of its fields - firstName, lastName, email etc. Hence the question:

Let me reveal few architectural forces that justifies indirection as necessary evil.

API and data model change due to different reasons

Let’s say, you have a mobile client. UI dictates data that needs to be fetched from API. Different screens may require different user representations - for home screen you may need only firstName and lastName to say welcome; for protected screens - firstName, lastName and email; for user profile - all user information; It happens to any non-trivial application.

I’ve mentioned protected screens intentionally. Trivial solution would be producing fat endpoint that exposes everything that screens may ever need. But there are few problems with this approach:

  • You waste time on unnecessary data transfer;
  • You invite data leakage. It’s very common to see sensitive data like emails, personal ids, agreement ids exposed as part of public API because it’s already exposed somewhere privately. Total disaster.

So you can’t really send fat User once and forever; you have to produce different representations / endpoints depending on context and a use case.

For some screens, one-to-one mapping from screen to data storage will always be sufficient and this observation can lead you to dangerous conclusions. You even may end up having two approaches - approach without mapping for simple screens and indirection for “corner cases”. But remember that consistency always wins, even if it comes with overhead.

  • What if you need to get some user information from cache?

  • What if user data is not stored solely in primary DB?

  • What if you model your domain the way it doesn’t really fit client screens?

That’s where simplistic approach start to crack and you either start introducing workarounds such as adding UI-specific @Transient fields or bring additional level of indirection to the table.

Be conservative in what you expose

It’s so easy to expose all data you have without considering client needs, without understanding client app behaviour and talking to client developers, especially in theory.

Remember what Postel’s law says:

Be conservative in what you send, be liberal in what you accept

Besides official reasons, there is one more good reason for that - once you exposed something, it’s very hard to undo. It begins to make huge sense when client and backend have different development and release cycles. Just think about iOS client app developed by outsourcing partner. You’re refactoring User entity. You press Find Usages in your IDE against phoneNumber field and find nothing because entity gets auto-marshalled to JSON. Can you rename the field or change it’s type? You never know before you figure out whether it’s used by the client app and how.

So now you have to either dive into Objective C / Swift code or talk to 3rd party folks.

Ok, I maybe I will rename it once I have more time.

In other words - never.

Would’t it be cool to make published interface explicit and be able to immediately distinguish between code that can be easily changed and code that requires extra care?

Obviously.

Does extra mapping worth the effort?

Definitely.

By the way, if you apply Murphy’s law:

Anything that can go wrong, will go wrong

to the API, you can easily answer the question immediately even without looking into iOS client code:

Anything that can be used, will be used

And one more thing - Consumer-Driven Contracts help you understand which parts of your API are used by which clients. CDC also makes it easier to apply TDD during API development, but that’s another story.

Respect Transaction Boundaries

And the last thing - persistent entities usually leverage proxies at some extent, most often for lazy loading. That’s why you want to make sure that entities are accessed only within transaction boundaries. Once you have transaction management strategy in place, clear transaction boundaries and keep entities within them, you will never have to worry about lazy initialisation exceptions and OSIVs.

Conclusion

I hope I gave you some food for thought and, like I always say, there is no black and white. For example, some NoSQL solutions like Cassandra encourage denormalisation and expect you to store data exactly in the way it’s queried. There is also CQRS pattern that suggest using separate data model for reads. Both solutions reduce the need for levels of indirection, but does not completely eliminate it. So be don’t be afraid of some “boilerplate”.

Please share this article if you liked it and don't forget to follow me on Twitter. Ah, don't be shy to comment ↓ as well!