PostgresNoSQL: The Hidden NoSQL in PostgreSQL

Oct 15, 2011

Recently, NoSQL data stores have been getting a lot of attention, as an alternative to using a relational database. They allow more complex data structures to be stored and queried than you find in the table-row-column model.

The PostgreSQL relational database server has several data features that you would expect to be found only in the NoSQL data stores.

Documents (Text Datatype)

While not a data structure itself, the ‘text’ datatype is a “clob” (character large object) of an “unlimited length” and can be used to store documents. This can be useful for specialty formats like YAML, XML, JSON, and serialized data from programming languages. Some of these document types are supported as specialty data types; more on that later.

PostgreSQL has string functions and operators and matching operators including regular expressions and its “similar to” hybrid of the SQL “like” and regular expressions.

The real power of PostgreSQL here is the full text search (a successor to earlier Tsearch and Tsearch2 extentions) features to search these documents. Its full text search offers stemming (removing pluarizations and conjugations of a word) and weights to make searching your documents as easy as a search engine.

The GiST and GIN index types are used to speed up full text searches by indexing the content as a standard search engine would. A GIN index is faster to search than a GiST index, but slower to build or update; so GIN is better suited for static data and GiST for often-updated data.

Arrays

A PostgreSQL column can be created as an array of values, a table, or other variable-length multidimentional arrays. Append the [] brackets at the end of the datatype to define the column as an array of that datatype.

create table lists  (id serial primary key, items text[]);
create table tables (id serial primary key, items text[][]);

The array representation syntax can be either:

'{1,2,"Hello, there",word}'
array[1,2,3]

The first version, is how psql will print out the array and how it will be returned to your program, as a string of comma-separated values. PostgreSQL does support different delimiters. The second version may only contain integers.

Access the elements of the array using the column_name[index] syntax. The first item in the array is at index 1, not 0. The split[start:end] syntax returns the array slice between the two given indexes, and is returned in the ‘{start,middle,end}’ syntax.

Also, PostgreSQL provides an intarray module with functions and operator for working with arrays of non-null integers.

To increase performance for lookup of array values, create a GIN/GiST index on the array column

create index lists_index on lists using gin (items);

Name-Value Pairs

The hstore datatype provides a column defined as a set of name-value pairs. This feature is not in core PostgreSQL, but is delivered as an extenstion in the “contrib” directory of the PostgreSQL distribution. The feature may already be compiled into your database, but may need to be enabled in the databases you need it.

create extension hstore;
create table catalog (id serial, specifications hstore);

NOTE: If you wish to have the hstore installed in all future databases you create, install it into the “template1” database.

The syntax for specifying the dictionary is like this:

'cores=>1, "graphics card=>"xyz graphics"'

So note that double-quotes have to be used for non-simple names and values, or when the contain special characters. Access done as follows:

select specifications->'graphics card' from catalog where specifications->['cores']=1;

To insert/replace or delete a key:

UPDATE tab SET h = h || ('c' => '3');
UPDATE tab SET h = delete(h, 'k1');

To increase performance for lookup of names or values, create a GIN/GiST index on the array column

create index catalog_specifications on catalog using gin (specifications);

XML

PostgreSQL comes with a specialized datatype for XML documents, including functions to parse, alter, and traverse the structure of the document.

CREATE TABLE test (a xml, b xml);
SELECT xmlelement(name test, xmlattributes(a, b)) FROM test;
SELECT xpath('/my:a/text()', 'test',
         ARRAY[ARRAY['my', 'http://example.com']]);

JSON

Soon, PostgreSQL will release a JSON datatype to handle JSON documents as it handles XML. There has been work done through a Google Summer of Code project in 2010, but the PostgreSQL team needs more time to merge it into the distribution.

Custom Data Types

PostgreSQL can be extended to handle new user-defined datatypes that can handle new types of data. Existing specialty type include:

* inet - Holds an IPv4 or IPv6 internet address with CIDR functions and operators
* money - hold curreny amounts with a fixed precision
* enum - holds a static, ordered set of values
* PostGIS - holds positions of geographic information systems

Caveats

With all these wonderful types, you still get the ACID complienace you expect from a relational database.

However the downside is that the datastructures are updated as an entire column and row in your table. It does not treat each element in the data structure atomically. As a result, the database is not appropriate to hold very large data structures where you intend to do frequent updates.

These extenstions are best used as a tool to encode and access more specific information in your row, rather than as a Data struture store like Redis and Riak, or document store like CouchDB or MongoDB.