On February 20th, the first "Key-Value Storage" meeting, a meeting of Open Source database and search developers was held in Tokyo. The participants, with wide-ranging specializations from backend storage committers to algorithm researchers, had a casual time to discuss the future of KVS.
Key-Value Storage (KVS) is a new type of server indispensable to recent web services. It serves as a very fast "associative array" server for providing database cache data and session data. Starting from memcached, used in livejournal.com, it has been widely adopted in Web 2.0 services.
The session was started with the introduction of the Groonga search engine software, which incorporates a fast database engine inside. Written by Daijiro Mori and Tasuku Suenaga, authors of the popular Senna search engine, the new engine appealed to the audience. "We realized that we need a search engine with more flexibility," Mori said. It was amazing that their memcached clone, based on the Groonga database engine, overwhelmed the performance of the original memcached in almost every test case.
The guys from LuxIO, Tokyo-Cabinet, and Tx held a talk on database backend sessions. Their software has already outperformed Berkeley DB, the gold standard in this area. Hiroyuki Yamada has disclosed some of the I/O optimizations he has used in LuxIO and indicated its usefulness with SSD storage. Mikio Hirabayashi, the author of Tokyo-Cabinet and its distributed interface Tokyo-Tylant, discussed how to store a large number of values efficiently by considering BER compression and bit-shifts. Daisuke Okanohara showed that it is possible to express a binary tree using only 2 bits per node by using recent progress in algorithm research.
A Server session was held with authors and committers of various key-value storage implementations, including memcached, repcached, Kai, Cagra, kumofs, ROMA, and Flare. As the author of Cagra, an OSS media storage engine, I gave a presentation on recent academic progress in distributed hash table algorithms.
Various implementation details were discussed. Many were based on Consistent Hashing algorithms and single zero-hop routing, but it was interesting that each used different replication strategies. Also, we all agreed that the memcached protocol itself is very useful as a standard for key-value storage server interfaces. Already being ported to many languages and platforms, it reduces the cost of implementing custom interfaces.
Kazuyuki Shudo, an overlay algorithms researcher, explained the relationship of key-value storage algorithms and largely distributed P2P algorithms. As a demo, he has implemented a working key-value storage in his overlay algorithm testbed, Overlay Weaver. "Strictly speaking, zero-hop routing algorithms themselves are not classified as distributed hash table algorithms, but they can be considered in the same framework," he said.
The overall meeting was very informative. I myself had learned many new things as a key-value storage author. Big thanks to Kazuki Ohta for arranging such great chance, and GREE (a Japanese social network for mobile Internet) for providing the location.