This documentation project is the result of work by Shane R. Spencer and is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
Many of the techniques used in this documentation are not solely associated with Shane R. Spencer and for the most part are the intellectual property of the nameless horde known as humanity. If you feel as though you might like some attribution please make yourself known.
As previously mentioned.. this work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
This license lets others remix, tweak, and build upon this project even for commercial purposes, as long as they credit the author and license their new creations under the identical terms. This license is often compared to “copyleft” free and open source software licenses. All new works based on yours will carry the same license, so any derivatives will also allow commercial use. This is the license used by Wikipedia, and is recommended for materials that would benefit from incorporating content from Wikipedia and similarly licensed projects.
Key-Value stores (KV) have existed for a good long time and have subsequently served humanity and its more technically inclined people very well.
The history and of KV based storage methods is well beyond the scope of this document. Briefly however, the idea of quick and fast access to stored values relating to a named key is fundamental to how modern databases work. More recently we’ve all seen an increase in fun new storage techniques associated with the NoSQL genre which, simply put, breaks away from the relational database structure typically associated with database management systems that directly support the SQL standards for describing a query and processing results.
Relational database solutions typically do not concentrate on large amounts of data per “row”, hence the name of this document. Hopefully some benchmarks, storage comparisons, and network analysis will help developers understand the crazy method I’ve been calling “Big Fat Happy Data”!
The initial use of these methods is being utilized in the project Informadiko which is currently based around TornadoWeb, MongoDB and Xapian. Mongrel2 and Brubeck are also being evaluated for the project.
This document focuses on using MongoDB and Redis as KV stores. MongoDB has a very rich query parser that helps the developer keep things simple while still using a KV based solution and Redis does a great job at keeping data at the ready, handling atomic operations, and allowing blazing fast access to key data. This is of course very general since both projects have a huge list of pros and cons that the Internet is more than happy to point out. However the developers of both solutions are techodweebs with a good nose for what to avoid when developing reliable database solutions.
Other technologies used in this document to help comparatively describe the data layout as well as offer some fun benchmarks include the popular relational database management system (RDBMS) MySQL as well as the object relationship manager (ORM) used by Django. These two products may appear to get the shaft a bit.. however the should be considered very valuable projects. Both have inspired projects as well as been part of the foundation for a very high percentage of websites, custom applications, and large enterprise scale solutions. Maybe not Django so much on the last two since it’s a web framework, however the ORM itself has powered many ideas completely unrelated to web interfaces.
The initial reference specification used for the data is for an arbitrary information storage and retrieval system you would use for collecting forms data or storing searchable information for later queries. The project this was developed around originally started out as Django+MySQL then moved to Django+MongoDB and is currently using Tornado+MongoDB and the techniques described in this document.
The specification (not schema) is defined as follows and referenced in the next topic:
Stores information about a collection of data
Stores information about fields available to a specific collection
Stores global user information (username, password, email, hat color)
preferences when rendering this users content.
Has a single account reference
Has a single user reference
collection preferences for things like time zones, if this collection is bookmarked, and other flags a user can have against a specific collection
Has a single account reference
Has a single collection reference
Has a single user reference
O.K. good. Now we have a quick and dirty starting point.