Friday, June 22, 2012

Data (Repository) as a Service

In a couple of systems I'm supporting, database is used not only as persistence service, but also as single source of truth and a trusted way of inter-system communication. In Java world, JDBC is the official passport to database, no matter how many roles your database has. But my question is, is there a better way to provide data service with business values, rather than just a connection to database?

That's basically the reason I'd like to see the possibilities of Data as a Service in enterprise computing. Note that it's about software architecture and design, not infrastructure as a service, like Amazon RDS, Amazon DynamoDB, Amazon SimpleDB, Google BigQuery or OpenStack RedDwarf.

Data access layer hasn't changed much since the beginning of Java. From Java statement
DriverManager.getConnection("jdbc:oracle:thin:@//localhost:1521/myDB", "scott", "tiger");

to hibernate.properties
hibernate.connection.driver_class = org.postgresql.Driver
hibernate.connection.url = jdbc:postgresql://localhost/mydatabase
hibernate.connection.username = myuser
hibernate.connection.password = secret

to JNDI datasource
<local-tx-datasource>
  <jndi-name>TEST</jndi-name>
  <connection-url>jdbc:jtds:sqlserver://localhost:2423/TEST</connection-url>
  <driver-class>net.sourceforge.jtds.jdbc.Driver</driver-class>
  <user-name>user</user-name>
  <password>password</password>
</local-tx-datasource>

to Spring applicationContext.xml
<bean id="dataSource" destroy-method="close" class="org.apache.commons.dbcp.BasicDataSource">
  <property name="driverClassName" value="${jdbc.driverClassName}"/>
  <property name="url" value="${jdbc.url}"/>
  <property name="username" value="${jdbc.username}"/>
  <property name="password" value="${jdbc.password}"/>
</bean>

to JPA persistence.xml
<property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver" />
<property name="javax.persistence.jdbc.url" value="jdbc:mysql://localhost/test" />
<property name="javax.persistence.jdbc.user" value="user" />
<property name="javax.persistence.jdbc.password" value="password" />

As you can see, for more than a decade, we've simply given applications a bump key to database and let them do whatever they want. This reminds me of the setters in a Java Bean, or Anaemic Domain Model. Database takes no business responsibility, and totally depends on the implementation of data access layer. Thanks to generics, we have CRUD without implementing all these operations for each entity. We don't even have to know what service layer needs before providing all these operations to them. Let alone the methods defined in generic DAO that can execute arbitrary SQL statements.

So what's Data (Repository) as a Service? It's the encapsulation of a database that provides business related services (as opposed to RESTful-like SQL statements like Google BigQuery or JPA-RS) that are accessible via REST interface. Applications (as clients) define the requirements they need for data repository to implement, and utilise these services to achieve business-oriented objectives.

What's the impact to application development? Well, if you already apply multi-tired software design, the only impact happens in service layer. There is no more data access, instead there is a trustworthy, high available and fast response data service. More challenges however come from data service. Here are some points I'm having in my mind.

Transaction
Services provided by data repository are transactional. Client receives the result of a committed or roll-backed transaction but has no control over the transaction. In other words, Clients locate outside of transaction boundaries. Data repository may have operations that involve Transaction Script or even two-phase commit, but clients have no idea of these operations either.

Timeout and Asynchronous Operations
Query services may have timeout control. If underlying database fails to return resultset within specified time (defined by data repository service but may be overwritten by client), client will get notification. On the other hand, persistence services may be asynchronous. Client has several ways to get the result of a persistence request, like callback, server push or delayed polling.

Authentication
Roles can be used to provide business level access control to each individual endpoint. These controls remain consistent for multiple applications. Instead of identifying Administrator in System A, and Manager in System B, data service define roles based on business characteristics and values of data manipulations. For example, no matter which system is interacting with data repository, InvoiceViewer is not allowed to request any service that changes invoice, or anything not invoice-related, like placing an order. This is a service level control, as opposed to fine-grained (domain object level, or table level from DB point of view) access control solutions like JPA Security.

Continuous Delivery
One major change in development when applications share database is, there is no more multiple copy / version of domain objects and DAOs. Traditionally, all the applications have their own copy of mapped objects and data operations. They make their own changes and synchronise the changes made by others. Any changes to data repository service however take effect immediately to all the applications. From application point of view, there is only one up-to-date database / data service, and no more mismatch of domain object and database, or domain object / DAO versions. We may have multiple versions of data service at the same time, but the point here is isolating detailed database changes with consistent service interface, so that no development has to stop just because others made a change that has nothing to do with you.

Performance
Now we can see more applications are designed with ease of development, testability and maintainability in mind. A web application may be divided into UI and backend. This separation also addresses the diversity of user interface, like desktop and mobile. This is the performance of development and response to changes. When it comes to runtime performance, data services should be stateless and building block like, especially for query services, just like RISC's high efficiency command set, and leave the assembly tasks to applications. All application share same entity manager makes caching possible. In case extremely low latency is required, use WebSocket to avoid HTTP protocol overhead.

Entities and Value Objects
This is a side benefit of data repository service. By using JSON as data exchange, no entity will get leaked outside of transaction boundaries. By the way, there are other options in data exchange than XML and JSON, like Google uses more expressive Protocol Buffers.

Database?
By providing data services, the database used becomes a internal detail. As long as the services respond requests and return data in certain format, it doesn't matter whether the database is RDBMS or NoSQL. Data services can even provide mock data to support fast prototyping.

HATEOAS?
No, data service has no hypermedia controls (sorry Dean, ;-]). It belongs to Richardson Maturity Model Level 2 - HTTP Verbs. The main reason is that a service is not a resource. Placing an order is to persist order data, it has no possible valid subsequence states, which should live in application level.

If your database is serving SQL clients, make the change and let it provide unified and controlled service to business clients.

No comments:

Post a Comment