Archive for

December 2011

Open Source Win and Fail


I have long enjoyed using Open Source Software (OSS) as a developer for many good reasons. It is not even an issue of being "free", but of having community access to the code to learn, troubleshoot, and enhance the functionality. Support for OSS by the community is fast and accurate. As the source is scrtinized by both the white hats and the black hats, it is also make more secure from attacks and vulerabilities.

The greatness of a full OSS system is best seen by operating systems such as GNU/Linus and FreeBSD, databases like PostgreSQL, and a host of other powerful software that competes as well as or better than many commercial offerings.

We watched Linux for years, expecting it to overtabke commeercial operating systems like Windows. Each year, it seems just as far away from that goal as it did the previous year. Why is that, I wondered?

The limit of Open Source Software

Recently, Linus Torvolds (the "creator" of Linux) spoke of Abandoning the GNOME 3 GUI interface. Before that, the lastest major version of KDE 4, an alternative to GNOME on the Linux operating system, was received to a damp reception. Of course, these things happen with commercial operating systems too, like the blunders of Windows Vista. But within a year, Microsoft released a beta of Windows 7, aiming to fix the shortcomings of a new design.

Microsoft, and certainly Apple Computer have a large interest in making the increasingly complex software easy to use. Beyond operating systems, applications like MS Office, Photoshop, etc., need to maintain a user-friendly veneer on top of a complex software application.

Why does OSS fail? It does not have the same demand for interface design, ultimately commercial success. There are no interface designers creating a navifation structure across the operating system, no reaction to interface failures. They seem to stick to their plan, and blame the users for their own issues.

The OSS community creates and delivers powerful software and appeals to other developers and geeks for use on servers, but until they can compete for users on interface design, theye will always loose the desktop wars.

-- PS -  This post was written in September 2011.

Posted

PostgresNoSQL: The NoSQL Hidden in PostgreSQL

Recently, NoSQL data stores have been getting a lot of attention, as an alternative to using a relational database. They allow more complex data structures to be stored and queried than you find in the table-row-column model.

The PostgreSQL relational database server has several data features that you would expect to be found only in the NoSQL data stores.

Documents (Text Datatype)

While not a data structure itself, the ‘text’ datatype is a “clob” (character large object) of an “unlimited length” and can be used to store documents. This can be useful for specialty formats like YAML, XML, JSON, and serialized data from programming languages. Some of these document types are supported as specialty data types; more on that later.

PostgreSQL has string functions and operators and matching operators including regular expressions and its “similar to” hybrid of the SQL “like” and regular expressions.

The real power of PostgreSQL here is the full text search (a successor to earlier Tsearch and Tsearch2 extentions) features to search these documents. Its full text search offers stemming (removing pluarizations and conjugations of a word) and weights to make searching your documents as easy as a search engine.

The GiST and GIN index types are used to speed up full text searches by indexing the content as a standard search engine would. A GIN index is faster to search than a GiST index, but slower to build or update; so GIN is better suited for static data and GiST for often-updated data.

Arrays

A PostgreSQL column can be created as an array of values, a table, or other variable-length multidimentional arrays. Append the [] brackets at the end of the datatype to define the column as an array of that datatype.

create table lists  (id serial primary key, items text[]);
create table tables (id serial primary key, items text[][]);

The array representation syntax can be either:

'{1,2,"Hello, there",word}'
array[1,2,3]

The first version, is how psql will print out the array and how it will be returned to your program, as a string of comma-separated values. PostgreSQL does support different delimiters. The second version may only contain integers.

Access the elements of the array using the column_name[index] syntax. The first item in the array is at index 1, not 0. The split[start:end] syntax returns the array slice between the two given indexes, and is returned in the ‘{start,middle,end}’ syntax.

Also, PostgreSQL provides an intarray module with functions and operator for working with arrays of non-null integers.

To increase performance for lookup of array values, create a GIN/GiST index on the array column

create index lists_index on lists using gin (items);

Name-Value Pairs

The hstore datatype provides a column defined as a set of name-value pairs. This feature is not in core PostgreSQL, but is delivered as an extenstion in the “contrib” directory of the PostgreSQL distribution. The feature may already be compiled into your database, but may need to be enabled in the databases you need it.

create extension hstore;
create table catalog (id serial, specifications hstore);

NOTE: If you wish to have the hstore installed in all future databases you create, install it into the “template1” database.

The syntax for specifying the dictionary is like this:

'cores=>1, "graphics card=>"xyz graphics"'

So note that double-quotes have to be used for non-simple names and values, or when the contain special characters. Access done as follows:

select specifications->'graphics card' from catalog where specifications->['cores']=1;

To insert/replace or delete a key:

UPDATE tab SET h = h || ('c' => '3');
UPDATE tab SET h = delete(h, 'k1');

To increase performance for lookup of names or values, create a GIN/GiST index on the array column

create index catalog_specifications on catalog using gin (specifications);

XML

PostgreSQL comes with a specialized datatype for XML documents, including functions to parse, alter, and traverse the structure of the document.

CREATE TABLE test (a xml, b xml);
SELECT xmlelement(name test, xmlattributes(a, b)) FROM test;
SELECT xpath('/my:a/text()', 'test',
         ARRAY[ARRAY['my', 'http://example.com']]);

JSON

Soon, PostgreSQL will release a JSON datatype to handle JSON documents as it handles XML. There has been work done through a Google Summer of Code project in 2010, but the PostgreSQL team needs more time to merge it into the distribution.

Custom Data Types

PostgreSQL can be extended to handle new user-defined datatypes that can handle new types of data. Existing specialty type include:

* inet - Holds an IPv4 or IPv6 internet address with CIDR functions and operators
* money - hold curreny amounts with a fixed precision
* enum - holds a static, ordered set of values
* PostGIS - holds positions of geographic information systems

Caveats

With all these wonderful types, you still get the ACID complienace you expect from a relational database.

However the downside is that the datastructures are updated as an entire column and row in your table. It does not treat each element in the data structure atomically. As a result, the database is not appropriate to hold very large data structures where you intend to do frequent updates.

These extenstions are best used as a tool to encode and access more specific information in your row, rather than as a Data struture store like Redis and Riak, or document store like CouchDB or MongoDB.

Posted