Sunday, December 23, 2012

Spring schema declaration

ERROR [ContextLoader] Context initialization failed
org.springframework.beans.factory.parsing.BeanDefinitionParsingException: Configuration problem: Failed to import bean definitions from relative location [Context.xml]
Offending resource: class path resource [Context.xml]; nested exception is org.springframework.beans.factory.xml.XmlBeanDefinitionStoreException: Line xx in XML document from class path resource [Context.xml] is invalid; nested exception is org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'beans'. One of '{"":import, "":alias, "":bean, WC[##other:""]}' is expected.

Caused by: org.springframework.beans.factory.xml.XmlBeanDefinitionStoreException: Line xx in XML document from class path resource [Context.xml] is invalid; nested exception is org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'beans'. One of '{"":import, "":alias, "":bean, WC[##other:""]}' is expected.

Caused by: org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'beans'. One of '{"":import, "":alias, "":bean, WC[##other:""]}' is expected.

If you got this error, check you xsi:schemaLocation element, make sure schema file has version number in its location, like

Friday, November 09, 2012

Different approach to prevent SQL Injection and XSS

From security point of view, a system should prevent SQL injection, cross-site scripting and other potential vulnerabilities. But from a developer / architect point of view, the approaches to prevent them are quite different.

One might say that as long as we can stop any SQL injection and XSS strings from entering into a system (exhaustive prevention method?), we win. I'd prefer defer the prevention until as late as possible, just before the code is about to execute. That said, the architect of a system shouldn't try to list all the ways information goes into a system, and predict what could be the new ways in the future.

For SQL injection prevention, the only check point should be in DAO layer. For XSS prevention, the only check point should be in Web presentation layer. These are the ultimate solutions.

Yes, you can check every user input for SQL injection, but what about all the inbound messages and documents from messaging system, all the responses from 3rd party web services? As long as the information won't go into database, drop table is not a problem at all. Likewise what's the point to prevent script alert('attacked') /script from being printed on a printer?

On the contrary, you check all the input for XSS, but by accident you DBA (who knows every detail about SQL injection) inserts script alert('attacked') /script into database and this field will be presented in browsers. Can your system handle this?

That being said, dump the exhaustive prevention method. drop table is not a threaten as long as it never hits your database; script alert('attacked') /script isn't either as long as it's never presented in a script runtime.

Wednesday, October 17, 2012

Major Bursary in Science Talent Search 2012

It's the 4th year the boy joined the game, and he got his 3rd major bursary this time.

If you have an Android phone or tablet, you can try his application here.

Saturday, September 29, 2012

Btrfs is really slow

Knowing Btrfs is still not default file system in any mainstream distros, I have 2 machines using it. I just love the features of subvolume, snapshot and COW (copy-on-write).

I'm not talking about everyday operations here, which (except system boot though) are hard to tell the difference between Btrfs and Ext4. When you do a major update or distribution upgrade, be aware that it may take much longer time than you expect.

I'm doing the upgrade from Ubuntu 12.04 Precise Pangolin to 12.10 Quantal Quetzal Beta 2. It took about half an hour to download new packages, which totally depends on your network bandwidth. But it's been 5 hours and the progress bar is just over half in installing the upgrades step.

I can expect the upgrade to finish within another few hours, which is not as bad as another user's experience. I don't think this issue will get addressed before Btrfs becomes default in Fedora or something, but it definitely should be solved before it becomes default. The question is, chicken or egg?

Tuesday, September 18, 2012

Software imitates nature

In some software I participated, a web application is actually divided into two projects during development, and two deployable artifacts during runtime. This design addresses the idea that the UI of an application should only be the presentation of the application, while all the logic keeps the same in back-end even when front-end UI is changed. In this case, a lot of communication between front-end and back-end are required. And a lot of problems are also caused by this kind of communication. Let me give you an example.

There is a list of items in browser. User can make changes to the list, add items to the list or remove items from the list. Each time the list is changed, an asynchronized RESTful request is sent to back-end to update the model. Everything seems fine but sometimes it works strangely as the front-end and back-end have different items of the same list.

We finally figured it out that this is because the items in the list might change before the previous request is finished, and the change may or may not be sent to back-end and processed in order. Solving the problem is easy once we know the reason. Developers decided to temporarily disable the list from being modified until previous request is finished. Problem solved, well, for this specific front-end.

But if we turn to real life for such scenario, helmsman repeating verbal commands is one example to avoid misunderstanding between two parties. Back to the problem, to keep single source of truth, we can also return items in list as response and request any front-end to update the items in list to be the same.

You might wonder why bother doing the same thing twice. It's because they have different point of view. Disabling changes in front-end while waiting is a solution for a front-end not to have different state from back-end. Replying the state in back-end is telling any front-end what's in back-end is the only truth. Following the latter one is the key, while the former one is just a user-friendly add-on.

This is just an example that many software problems are really projections of real life problems that already got solved. Don't try to re-solve them when you can simply follow patterns that are already proved in nature.

Tuesday, August 28, 2012

A better way to solve Refused Bequest

There is a bad smell called Refused Bequest, which means subclasses don't want or need what they are given by superclass. According to Refactoring: Improving the Design of Existing Code (by Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts p.87), there are two ways to solve it. Traditional way is to create a new sibling class and use Push Down Method and Push Down Field to push all the unused methods to the sibling. Or use Replace Inheritance with Delegation, if the subclass does not want to support the interface of the superclass.

I have to say I'm shocked by how easily this expression could mislead those less experienced developers. How come bad smells in code can be solved like taking OTC medicines without even asking readers to focus on their software design.

Let me start with the 2nd solution. If there is no is-a relationship, there shouldn't be a class hierarchy. That said, removing a hierarchy should only be supported by the evidence that the original is-a relationship is not valid any more. By removing the hierarchy, two classes can only interact with each other in OO way - sending messages. This is the point. Readers shouldn't be misled that delegation can be used to remove inheritance.

The 1st solution has potential problems in any working system. Pushing method / field from parent to newly created sibling, just because a subclass in the inheritance tree doesn't want the method / field? This definitely breaks the contract between parent class and the rest of the world. A more engineering way is introducing a new base class, moving all the methods / fields the subclass wants from original base class to new base class and making both the subclass and the original base class extend new base class. Again, problem is solved by applying OO design, not by taking OTC medicines.

I'm not saying identifying bad smells and refactoring to solve them are of no values. I just want to highlight that bad smell comes from bad (or outdated) design, and can only be solved by good (or updated) design. Remember, refactoring with books at hand but nothing in mind is also a bad smell.

Monday, July 09, 2012

What Uncle Bob won't tell you about TDD

If you never maintain code (written by yourself or others), please ignore this blog post.

Let's imagine you're about to make some changes in an existing product. The product was developed by following strictly The Three Laws of TDD (Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin p.122).
First Law You may not write production code until you have written a failing unit test.
Second Law You may not write more of a unit test than is sufficient to fail, and not compiling is failing.
Third Law You may not write more production code than is sufficient to pass the currently failing test.
And you decide to follow exactly the same laws to finish the requested changes. The laws become The Three Laws of TDD for Changes
You may not change production code until you have written, or made an existing passed unit test, a failing unit test.
You may not write, or change more of a unit test than is sufficient to fail, and not compiling is failing.
You may not change more production code than is sufficient to pass the currently failing test.
Check all the related somethingShouldHappenIfThisHappens tests and make them fail one by one with the changes you're working on? Or forget the TDD laws, make changes to production code directly and see which tests turn red, then review those tests?

TDD helps the design of the outermost layer (ask a junior and a senior developer TDD same story and you'll see almost the same public interfaces in production code, but very different implementations inside) of an application. But test cases are instances of requirements (again, ask a junior and a senior developer TDD same story and you'll see very likely they capture different instances from same requirement), and by no means are the requirements themselves. Can't agree more with C├ędric on tests are not specs.

Requirement changes everyday, so it's easy to try if The Three Laws of TDD for Changes works.

Friday, June 22, 2012

Data (Repository) as a Service

In a couple of systems I'm supporting, database is used not only as persistence service, but also as single source of truth and a trusted way of inter-system communication. In Java world, JDBC is the official passport to database, no matter how many roles your database has. But my question is, is there a better way to provide data service with business values, rather than just a connection to database?

That's basically the reason I'd like to see the possibilities of Data as a Service in enterprise computing. Note that it's about software architecture and design, not infrastructure as a service, like Amazon RDS, Amazon DynamoDB, Amazon SimpleDB, Google BigQuery or OpenStack RedDwarf.

Data access layer hasn't changed much since the beginning of Java. From Java statement
DriverManager.getConnection("jdbc:oracle:thin:@//localhost:1521/myDB", "scott", "tiger");

hibernate.connection.driver_class = org.postgresql.Driver
hibernate.connection.url = jdbc:postgresql://localhost/mydatabase
hibernate.connection.username = myuser
hibernate.connection.password = secret

to JNDI datasource

to Spring applicationContext.xml
<bean id="dataSource" destroy-method="close" class="org.apache.commons.dbcp.BasicDataSource">
  <property name="driverClassName" value="${jdbc.driverClassName}"/>
  <property name="url" value="${jdbc.url}"/>
  <property name="username" value="${jdbc.username}"/>
  <property name="password" value="${jdbc.password}"/>

to JPA persistence.xml
<property name="javax.persistence.jdbc.driver" value="com.mysql.jdbc.Driver" />
<property name="javax.persistence.jdbc.url" value="jdbc:mysql://localhost/test" />
<property name="javax.persistence.jdbc.user" value="user" />
<property name="javax.persistence.jdbc.password" value="password" />

As you can see, for more than a decade, we've simply given applications a bump key to database and let them do whatever they want. This reminds me of the setters in a Java Bean, or Anaemic Domain Model. Database takes no business responsibility, and totally depends on the implementation of data access layer. Thanks to generics, we have CRUD without implementing all these operations for each entity. We don't even have to know what service layer needs before providing all these operations to them. Let alone the methods defined in generic DAO that can execute arbitrary SQL statements.

So what's Data (Repository) as a Service? It's the encapsulation of a database that provides business related services (as opposed to RESTful-like SQL statements like Google BigQuery or JPA-RS) that are accessible via REST interface. Applications (as clients) define the requirements they need for data repository to implement, and utilise these services to achieve business-oriented objectives.

What's the impact to application development? Well, if you already apply multi-tired software design, the only impact happens in service layer. There is no more data access, instead there is a trustworthy, high available and fast response data service. More challenges however come from data service. Here are some points I'm having in my mind.

Services provided by data repository are transactional. Client receives the result of a committed or roll-backed transaction but has no control over the transaction. In other words, Clients locate outside of transaction boundaries. Data repository may have operations that involve Transaction Script or even two-phase commit, but clients have no idea of these operations either.

Timeout and Asynchronous Operations
Query services may have timeout control. If underlying database fails to return resultset within specified time (defined by data repository service but may be overwritten by client), client will get notification. On the other hand, persistence services may be asynchronous. Client has several ways to get the result of a persistence request, like callback, server push or delayed polling.

Roles can be used to provide business level access control to each individual endpoint. These controls remain consistent for multiple applications. Instead of identifying Administrator in System A, and Manager in System B, data service define roles based on business characteristics and values of data manipulations. For example, no matter which system is interacting with data repository, InvoiceViewer is not allowed to request any service that changes invoice, or anything not invoice-related, like placing an order. This is a service level control, as opposed to fine-grained (domain object level, or table level from DB point of view) access control solutions like JPA Security.

Continuous Delivery
One major change in development when applications share database is, there is no more multiple copy / version of domain objects and DAOs. Traditionally, all the applications have their own copy of mapped objects and data operations. They make their own changes and synchronise the changes made by others. Any changes to data repository service however take effect immediately to all the applications. From application point of view, there is only one up-to-date database / data service, and no more mismatch of domain object and database, or domain object / DAO versions. We may have multiple versions of data service at the same time, but the point here is isolating detailed database changes with consistent service interface, so that no development has to stop just because others made a change that has nothing to do with you.

Now we can see more applications are designed with ease of development, testability and maintainability in mind. A web application may be divided into UI and backend. This separation also addresses the diversity of user interface, like desktop and mobile. This is the performance of development and response to changes. When it comes to runtime performance, data services should be stateless and building block like, especially for query services, just like RISC's high efficiency command set, and leave the assembly tasks to applications. All application share same entity manager makes caching possible. In case extremely low latency is required, use WebSocket to avoid HTTP protocol overhead.

Entities and Value Objects
This is a side benefit of data repository service. By using JSON as data exchange, no entity will get leaked outside of transaction boundaries. By the way, there are other options in data exchange than XML and JSON, like Google uses more expressive Protocol Buffers.

By providing data services, the database used becomes a internal detail. As long as the services respond requests and return data in certain format, it doesn't matter whether the database is RDBMS or NoSQL. Data services can even provide mock data to support fast prototyping.

No, data service has no hypermedia controls (sorry Dean, ;-]). It belongs to Richardson Maturity Model Level 2 - HTTP Verbs. The main reason is that a service is not a resource. Placing an order is to persist order data, it has no possible valid subsequence states, which should live in application level.

If your database is serving SQL clients, make the change and let it provide unified and controlled service to business clients.

Saturday, June 09, 2012

How to install Adobe Flash plugin on Ubuntu 12.04

I switched to 64-bit Precise Pangolin last night and couldn't install flash plugin from Firefox or Chrome. This is because flashplugin-installer has an older download URL ( that doesn't exist anymore.

To solve this, go to, download latest version of plugin (adobe-flashplugin_11.2.202.236.orig.tar.gz at time of writing) and move it to /var/cache/flashplugin-installer/. Then go to /usr/lib/flashplugin-installer/ folder and execute
sudo ./install_plugin adobe-flashplugin_11.2.202.236.orig.tar.gz

Update: or remove flashplugin-installer and install adobe-flashplugin_11.2.202.236-0precise1_amd64.deb directly.

Thursday, May 17, 2012

One-to-one relationship using EclipseLink

Best Practice in JPA series:
Part 1 – JPA Caching in EclipseLink
Part 2 – One-to-one relationship using EclipseLink

Here are some tips on how to implement high performance one to one relationship using EclipseLink 2.3.2.

Identify owning side (side with foreign key) and inverse side (side with mappedBy attribute in @OneToOne annotation). I will use Owning entity and Inverse entity as examples in following tips.

If you define private Inverse inverse; in Owning entity and you're satisfied with inverse_id as column name for foreign key, you don't need to specify @JoinColumn(name = "inverse_id") on it. However, you need to specify @OneToOne(cascade = CascadeType.ALL) on it, so that any operations on Inverse entity can be performed from Owning entity. You also need to specify @OneToOne(mappedBy = "inverse") on private Owning owning; in Inverse entity.

To avoid N + 1 select problem, specify @BatchFetch(BatchFetchType.EXISTS) on owning property in Inverse entity. You can also use @BatchFetch(BatchFetchType.IN) or @BatchFetch(BatchFetchType.JOIN). Following SQL statements will be used respectively for better performance.




If both Owning entity loads Inverse entity eagerly and Inverse entity loads Owning entity eagerly, following SQL statement will still be executed N times. Otherwise N + 1 problem is solved.


fetch = FetchType.LAZY can be set in @OneToOne annotation on owning side and / or inverse side. But it's better to use it on inverse side, because when owning object loads inverse object eagerly (FetchType.EAGER is default in one-to-one relation) by INVERSE_ID, the result can be used to populate inverse entity cache.

If you want lazy fetch take effect outside of a Java EE 5/6 application server, VM argument -javaagent:/home/jerry/.m2/repository/org/eclipse/persistence/eclipselink/2.3.2/eclipselink-2.3.2.jar needs to be set. Note that full absolute path is used here. See Using EclipseLink JPA Weaving for more details.

Saturday, April 14, 2012

Wildcard Host doesn't work in MySQL 5.5.22

I haven't used MySQL for a while. After reinstalled my laptop with Ubuntu 12.04 Beta 2, I decided to use MySQL instead of PostgreSQL to continue my JPA study.

sudo apt-get install mysql-server

I used MySQL Administrator long time ago and now it's replaced with powerful MySQL Workbench. If you have difficulties in installing it, please follow this tutorial.

Everything seems fine, just like when I developed myTunes 6 years ago, except that % for any host doesn't work any more. So if you get

ERROR 1045 (28000): Access denied for user 'user'@'localhost' (using password: YES)

Update Host column from % to localhost for your user account and restart MySQL will solve it.

Update: you can also remove anonymous accounts by executing delete from user where user=''; (Thank you, Dan)

Sunday, March 04, 2012

Everybody Loves FizzBuzz

One of my career objectives is implementing business requirements in a way that makes
  • customer / employer happy
  • developers happy, and
  • infrastructure happy
at the same time. I'd like to take FizzBuzz puzzle as an example to illustrate how attention to detail helps me achieve this goal in Java.
Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".

public class FizzBuzz {

    public static void main(String[] args) {
        boolean fizzOrBuzz;

        for (int i = 1; i <= 100; i++) {
            fizzOrBuzz = false;

            if (i % 3 == 0) {
                fizzOrBuzz = true;

            if (i % 5 == 0) {
                fizzOrBuzz = true;

            if (!fizzOrBuzz) {


Update (06/12/2014):

Friday, February 10, 2012

JPA Caching in EclipseLink 2.3.2

Best Practice in JPA series:
Part 1 – JPA Caching in EclipseLink
Part 2 – One-to-one relationship using EclipseLink

Java Persistence API 2.0 defines Level 1 (L1) Cache (Entity Cache), Level 2 (L2) Cache (Shared Entity Cache) and Query (Result) Cache. Now I can take full advantage of JPA cache, just like what I did 5, 6 years ago using Hibernate cache. Although Hibernate also has its JPA implementation, I found EclipseLink has better default settings and also easier to enable advanced features. Here are some random tips.

The shared attribute of @Cache annotation has deprecated and is replaced by isolation=CacheIsolationType.SHARED, which means sharing entity cache between EntityManager objects and allowing query cache (if enabled) to use entity cache. Even you don't set @Cache annotation on a domain object, it's enabled by default.

CacheIsolationType.PROTECTED means sharing entity cache between EntityManager objects but disallowing query cache to use entity cache, even when query cache is enabled.

CacheIsolationType.ISOLATED means not sharing entity cache between EntityManager objects and disallow query cache to use entity cache, even when query cache is enabled.

The default coordinationType=CacheCoordinationType.SEND_OBJECT_CHANGES in @Cache means any entity update to database also updates entity in cache, which is great in performance.

Set hints = { @QueryHint(name = QueryHints.QUERY_RESULTS_CACHE, value = HintValues.TRUE) } in @NamedQuery if you want to enable query cache. But unless you have a fixed domain objects, don't enable its query cache. Note that "eclipselink.query-results-cache" is not a standard JPA hint, you cannot set it to "True" for a javax.persistence.Query object.

Unless you have every domain object in entity cache, don't use hints = { @QueryHint(name = QueryHints.CACHE_USAGE, value = CacheUsage.CheckCacheOnly) } in @NamedQuery.

Set <property name="eclipselink.logging.level" value="FINE" /> in META-INFO/persistence.xml to show SQL statements.

Unless you want to change the default cache retrieve / store mode, don't need to set following properties to EntityManager object.
em.setProperty(QueryHints.CACHE_RETRIEVE_MODE, CacheRetrieveMode.USE);
em.setProperty(QueryHints.CACHE_STORE_MODE, CacheStoreMode.USE);

Even you don't set any hint in @NamedQuery, query result will be used to populate entity cache.

Only entityManager.find(entityClass, id) can use entity cache. Get entity, or get entities, by any field(s) other than id don't use entity cache.

The default sizes of entity cache and query cache are 100 each.

If you get
Exception in thread "main" java.lang.IllegalArgumentException: An exception occurred while creating a query in EntityManager:
Exception Description: Syntax error parsing the query [from Entity], line 1, column 0: unexpected token [from].
Internal Exception: NoViableAltException(33@[])
at org.eclipse.persistence.internal.jpa.EntityManagerImpl.createQuery(
change your JPA query to entityManager.createQuery("select entity from Entity entity").getResultList();

Please feel free to let me know if you have any questions regarding EclipseLink cache.

Tuesday, February 07, 2012

Dependency Lock-in

Update 16/2/2012: I read Apache Camel 2.9 - Reduced dependency on Spring JARs after I wrote following post and wish you to have a read at it as well.

Like version lock-in, dependency lock-in is another thing developers should pay attention to. It happens both when you are developing your own product, and when you are using 3rd party products.

When you're planning your product, it's better not to make any assumption of how it'll be used. Even it IS a web application, you will focus on the core functions first, and later make these core functions accessible via a browser. By thinking and doing this way, you will never make one of your function depending on an HttpServletRequest. The request from web is just one of the many ways to your product.

On the other hand, things become a bit tricky when you include a 3rd party component, especially when including a new dependency in Maven. What you can control is groupId, artifactId and latest version. What does that component depend on is usually out of your control. However it's your responsibility to make sure all the dependencies (and their recursive dependencies) work together happily.

In one of my hobby projects, I want to use Apache CXF to expose core functions to RESTful web services and JSON format, so I include the latest version of Apache CXF Runtime JAX-RS Frontend, org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2. After running mvn eclipse:eclipse, I found my project is locked in to Spring Framework 3.0.6. org.springframework:spring-core:jar:3.0.6.RELEASE is directly included, and org.springframework:spring-web:jar:3.0.6.RELEASE is indirectly included by org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2. Please look at the follow dependency tree.

+- org.apache.cxf:cxf-rt-frontend-jaxrs:jar:2.5.2:compile
|  +- org.springframework:spring-core:jar:3.0.6.RELEASE:compile
|  |  +- org.springframework:spring-asm:jar:3.0.6.RELEASE:compile

|  +- org.apache.cxf:cxf-rt-transports-http:jar:2.5.2:compile
|  |  +- org.apache.cxf:cxf-rt-transports-common:jar:2.5.2:compile
|  |  \- org.springframework:spring-web:jar:3.0.6.RELEASE:compile
|  |     +- org.springframework:spring-beans:jar:3.0.6.RELEASE:compile
|  |     \- org.springframework:spring-context:jar:3.0.6.RELEASE:compile
|  |        +- org.springframework:spring-aop:jar:3.0.6.RELEASE:compile
|  |        \- org.springframework:spring-expression:jar:3.0.6.RELEASE:compile

I do decide to use Spring, but don't want to be decided by Apache CXF. Moreover, I'd like to use Spring 3.1.0. It's not hard to solve this problem by modifying pom.xml.



Don't dependency lock-in yourself, and don't dependency lock-in the users of your products.

Happy coding!

Saturday, January 21, 2012

What Ubuntu Unity means to Linux community

Being a long-term Ubuntu Gnome user, I can definitely understand how die hard hackers hate Unity. But on the other hand, Unity makes some Ubuntu users re-think what they take for granted for last 5 - 6 years and re-examine the world around.

Let's check the Linux users that accessed Wikipedia in September and October 2011 to see the huge impact of Unity.

September 2011

October 2011
A controversial product from Apple and Microsoft will hurt their users, while Linux users as a whole will benefit from such a product. And this is the spirit of FOSS.