Monday, November 04, 2013

multiple XML APIs in project

If you have below exceptions in tests or runtime, it's cause by multiple XML APIs / implementations in your project. How to find and exclude them is beyond this post.

ERROR [org.apache.catalina.core.ContainerBase.[jboss.web].[localhost].[resteasy-servlet]] (http- Servlet.service() for servlet resteasy-servlet threw exception
java.lang.LinkageError: loader constraint violation: when resolving field "DATETIME" the class loader (instance of org/jboss/classloader/spi/base/BaseClassLoader) of the referring class, javax/xml/datatype/DatatypeConstants, and the class loader (instance of ) for the field's resolved type, javax/xml/namespace/QName, have different Class objects for that type
at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl.(

java.lang.IllegalStateException: Failed to load ApplicationContext
at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.checkOverrideProperties(
at com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.(
at com.sun.xml.bind.v2.runtime.JAXBContextImpl.getOrCreate(
at com.sun.xml.bind.v2.runtime.JAXBContextImpl.getOrCreate(

Monday, October 07, 2013

Merit Certificate in Science Talent Search 2013

It's the 5th year the boy attended the competition, and he got a merit certificate this time. He finished his STS journey in primary school with fantastic results. What's gonna happen next? Let's wait and see.

If you have an Android phone or tablet, you may want to try his application here.

Saturday, September 28, 2013

You're What You Depend On

People love 3rd party libraries, especially those from Apache, Google, etc. But I'd like to ask a question, how do you really find vanilla Java SDK hard to use?

The reason I ask the question is, I found a dependency of in a project and the only usage of Google Guava throughout the whole project is:
import static;

Wow! I'm sure the programmer thought the same when he decided to make the project depend on Guava. I'm also sure he was so excited and didn't even bother to check the magic implementation in Guava:
import java.nio.charset.Charset;
public static final Charset UTF_8 = Charset.forName("UTF-8");

I feel sorry for the programmer, not only because he introduced an unnecessary dependency, but also he lost a chance to learn from 3rd parties to master Java SDK better.

Thursday, July 11, 2013

MongoDB-based Cache Service

I've talked about using REST web service to wrap database and provide managed repository service. This time, I'd like to discuss developing cache service with MongoDB's two convenient features.

Before I start, I want to make it clear that the cache service here is not the one that trying reduce response time to sub-ms. It's something you hesitate to call it again and want to store it somewhere. Usually it's a distributed web service (for example, Maps API from Google), or an expensive SQL statement result. You want to cache it not only because you don't want to wait for few seconds again, but also to save your usage quota, or reduce the workload of a database. In this case, you'll be happy if we can reduce the response from X seconds to X ms.

Depending on whether it's a single node or clustered environment, the size of the cached data, text or binary, there are quite a few products that can fulfill the task. But when we examine if the solution can scale up and scale out, the answer become not clear. Consider config something like Memcached in a 4 node cluster and you'll get the idea. Basically you have to explicitly tell each node, "you are in a group so you guys have a shared memory or disk".

How about share nothing? As long as a node knows the cache, it doesn't matter how many other nodes also know the cache, they can share something, with same key of course. Cache service can just become a couple of HTTP methods (POST and GET) backed by MongoDB. But why MongoDB?

One aspect of a cache is the capacity, in bytes or in number of objects. In MongoDB, you can use Capped Collections to achieve this. You can create a capped collection using
db.createCollection("mycoll", {capped:true, size:100000})
or convert a collection to capped one using
db.runCommand({"convertToCapped": "mycoll", size: 100000});

The value of size parameter is in bytes. You may not know the size of a document in the collection if you want to know how many documents can be stored in the capped collection. If you already have a amount of documents, you can run
and check the avgObjSize value before converting it to capped collection. Here is an example,
    "ns" : "mydb.mycoll",
    "count" : 7739,
    "size" : 42885120,
    "avgObjSize" : 5541.429125209976,
    "storageSize" : 65724416,
    "numExtents" : 8,
    "nindexes" : 1,
    "lastExtentSize" : 23224320,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 228928,
    "indexSizes" : {
        "_id_" : 228928
    "ok" : 1

If you run stats() on a capped collection, you'll see 2 more lines in result
    "capped" : true,
    "max" : NumberLong("9223372036854775807"),

Another feature in caching is Time To Live, which is used to specify when a cached item should be invalidated. In MongoDB, you can create index on a date field and provide expireAfterSeconds option to set the TTL of a collection.
db.mycoll.ensureIndex( { "created": 1 }, { expireAfterSeconds: 3600 } )

Note however that the background task to delete expired documents runs once every 60 seconds, so don't expect this feature working much more accurately than that. And you can't make a collection both size- and time-based (who's going to need both anyway).

So next time when you design a size-based or time-based cache, would you like to consider MongoDB?

Thursday, May 02, 2013

Setup VirtualBox after kernel upgrade

If you get something like
Unable to locate package kernel-header-3.5.0-25-generic
sudo apt-get install linux-headers-$(uname -r)
sudo apt-get install build-essential dkms
sudo /etc/init.d/vboxdrv setup
A tip: select Save the machine state, instead of Power off the machine, whenever possible.

Wednesday, April 03, 2013

Enable Full Text Search for MongoDB

If you get this in mongo console
    "err" : "text search not enabled",
    "code" : 16633,
    "n" : 0,
    "connectionId" : 1,
    "ok" : 1

, and this in mongod console
[conn1] insert test.system.indexes keyUpdates:0 exception: text search not enabled code:16633 locks(micros) w:336411 336ms

, you need to enable full text search when starting MongoDB.
mongod --setParameter textSearchEnabled=true

Try again.

What's happening in background?
[initandlisten] connection accepted from #1 (1 connection now open)
[conn1] build index test.coll { _fts: "text", _ftsx: 1 }
[conn1]     Index: (1/3) External Sort Progress: 3500/6245 56%
[conn1]     Index: (1/3) External Sort Progress: 5400/6245 86%
[conn1]  external sort used : 413 files in 25 secs
[conn1]     Index: (2/3) BTree Bottom Up Progress: 185800/2616966 7%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 401900/2616966 15%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 554200/2616966 21%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 769700/2616966 29%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 973700/2616966 37%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1175400/2616966 44%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1380700/2616966 52%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1588900/2616966 60%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1794900/2616966 68%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1936500/2616966 73%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 2125800/2616966 81%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 2315800/2616966 88%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 2528700/2616966 96%
[conn1]  done building bottom layer, going to commit
[conn1] build index done. scanned 6245 total records. 160.617 secs
[conn1] insert test.system.indexes ninserted:1 keyUpdates:0 locks(micros) w:160642416 160645ms

Check indexes.
        "v" : 1,
        "key" : {
            "_id" : 1
        "ns" : "test.coll",
        "name" : "_id_"
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        "ns" : "test.coll",
        "name" : "content_text",
        "weights" : {
            "content" : 1
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 1

Check index size. It's about half the text size it indexed.
    "ns" : "test.coll",
    "count" : 6245,
    "size" : 69054068,
    "avgObjSize" : 11057.496877502002,
    "storageSize" : 178520064,
    "numExtents" : 12,
    "nindexes" : 4,
    "lastExtentSize" : 49213440,
    "paddingFactor" : 1.0000000000003018,
    "systemFlags" : 0,
    "userFlags" : 1,
    "totalIndexSize" : 86314032,
    "indexSizes" : {
        "_id_" : 212576,
        "content_text" : 85381968
    "ok" : 1

Have a test.
db.coll.runCommand("text", {search:'Hello'})
    "queryDebugString" : "hello||||||",
    "language" : "english",
    "results" : []
    "stats" : {
        "nscanned" : 4,
        "nscannedObjects" : 0,
        "n" : 4,
        "nfound" : 4,
        "timeMicros" : 157
    "ok" : 1

Not bad.

Wednesday, March 06, 2013

Nobody is gonna stop Andrew Morton from singing "la la la"

I'm gonna stick my fingers in my ears and sing "la la la" until people tell me "I set swappiness to zero and it didn't do what I wanted it to do".

swappiness==0 doesn't mean swap is turned off. If you want to turn off swap, remove swap partition / file. When you don't have swap partition / file, the value of swappiness doesn't matter any more.

Luckily, the misleading swappiness==0 will be changed soon.

... with current reclaim implementation, the kernel may swap out even if we set swappiness=0 and there is pagecache in RAM.
This patch changes the behavior with swappiness==0. If we set swappiness==0, the kernel does not swap out completely ...;a=commitdiff;h=fe35004fbf9eaf67482b074a2e032abb9c89b1dd;hp=c50ac050811d6485616a193eb0f37bfbd191cc89

Andrew, soon you can stick your fingers in your ears and sing "la la la" forever, as no one will tell you "I set swappiness to zero and it didn't do what I wanted it to do".

Saturday, February 02, 2013

Dependency Hell

Like Dependency Lock-in, another big problem in modular software development is Dependency Hell. I have a very recent example.

 +- org.apache.httpcomponents:httpclient:jar:4.2.3:compile  
 | +- org.apache.httpcomponents:httpcore:jar:4.2.2:compile  

You may be familiar with both and ask why I don't have the same version of httpclient and httpcore. Because that's the ways they work together.

If you use httpclient and also declare httpcore in your pom.xml, congratulations, you're making Dependency Hell. But what if you don't declare httpcore but httpclient gets updated and removes the dependency to httpcore, or depends on something else that provides http core function? Good luck. Spring Framework 3.2.1 made such a mistake and broke one of my hobby projects. The hell has different impact on component provider and consumer, but most of time, we're both component provider and consumer.

You shouldn't care about a component that your dependency depends on, and you shouldn't let any user of your component care about any component that your component depends on. If you break this, you break the fundamental principle of software component design. But in practice, unless you use OSGi, you don't have easy ways to control how not to expose your dependencies to those components that directly, or indirectly, depend on your component.

Versioned jar files are the root of Dependency Hell in Java development. Check your pom file to see how many dependencies are redundant, they should be taken care of by direct dependencies; And how many dependencies should be upgraded to latest but you can't. Consider replace these outdated jar files with XaaS services?

Thursday, January 17, 2013

What an average GWT team looks like?

From a survey result, an average GWT team looks like this. If your team don't deliver and you're thinking try something else, compare this with your team.

What do you think? Let me guess. "My mileage varies", right? :-)