Wednesday, December 29, 2010

Timezones, timezones and a new time zone library

I've had a bit of a thing about time zones and daylights savings rules for quite some time i.e. they interest me. This was first apparent in an application I wrote for the Palm OS originally named Time Traveler (which became Titan Class). Its time zone database was inspired by zoneinfo, but was no where near as sophisticated. The Palm OS was without time zone support in its first few incarnations. Eventually time zone support was incorporated but IMHO it was weak. I digress a little... the point is I've really liked the zoneinfo database that underpins most Unix based platforms for quite some time.

Java supports zoneinfo in its implementation by use of the ICU library. There is also a JSR that's about 3 years old which aims to provide greater support for zoneinfo.

I recently had a need to ensure that my application had the latest time zone rules available. From a Java perspective, Sun or Apple (or whoever) provide a patch when time zone rules change. You then typically restart your application. I didn't want that. My goals were:

  • to have the latest rules on a monthly basis; and
  • to be able to dynamically update the time zones without having to restart my application.

JSR-310 could probably help me out here but I had another nagging concern; in fact a couple:

  • the JSR is 3 years old and doesn't appear to have progressed; and
  • I like the zoneinfo structure and wanted to use something that honoured its structure closely.

I might be a little unfair toward JSR-310 and if it becomes approved then it'll be difficult to avoid. I'm also strongly aware of the "Not Invented Here" syndrome... not my style though. Then, there's the JCP-is-dead thingy...

So, what I've done is created a new Java time zone library that takes zoneinfo files and produces a JDK compatible facade. The library uses ANTLR to parse the zoneinfo files thus actually providing a parser that can be used for many languages. I'll shortly be open-sourcing this library and probably at The Codehaus depending on how well it is received there. Meanwhile here is an overview of its structure:

Zoneinfo TZ.png

The library is effectively done and has reasonable test coverage. I hope that you'll join me and help improve it. Meanwhile any thoughts and ideas are most welcome.

Tuesday, December 7, 2010

Memory grids

There's a great podcast on Software Engineering Radio with Nati Shalom. The podcast discusses memory grids and I certainly found it insightful. In essence memory grids are being looked upon as the next disk.

The clincher for me was the revelation that the durability of data is not related to it being persisted to disk; it is related to the number of geographically disbursed copies of that data at any one time. Taken to the extreme this could mean that you don't need disk at all, but practically the data gets persisted to disk in an asynchronous manner. This is called "write behind" and can be performed at n nodes, if not all of them.

I think memory grids are very interesting. My prediction is that memcached, a popular open source memory cache that can be distributed over many nodes, will become a memory grid offering write-behind persistence. Same goes for Ehcache/Terracotta i.e. these memory caches will evolve beyond being just that. There are of course commercial memory caches out there including vmware's Gemfire and Oracle's Coherence.

One reason in my mind as to why memory grids are topical is commodity hardware being able to address one heck of a lot of memory resident data. Since the introduction of 64 bit computing for the masses, we now have a situation where a cheap processor can generally access about 256TB data - more than enough for most databases! Of course, with 32 bit processors about 4GB could be addressed which is less than many databases.

I think something that can be overlooked with memory grids is persistence. As mentioned, write-behind appears typical, but what isn't focused on is what performs the write-behind. There's no reason why that write-behind can't be done with a conventional RDBMS and I understand that many memory grids support such a thing. Thus with memory grids, it appears that you can have the best of both worlds.

Bring on the memory grid (preferably open sourced!).