Open Source Win and Fail


I have long enjoyed using Open Source Software (OSS) as a developer for many good reasons. It is not even an issue of being "free", but of having community access to the code to learn, troubleshoot, and enhance the functionality. Support for OSS by the community is fast and accurate. As the source is scrtinized by both the white hats and the black hats, it is also make more secure from attacks and vulerabilities.

The greatness of a full OSS system is best seen by operating systems such as GNU/Linus and FreeBSD, databases like PostgreSQL, and a host of other powerful software that competes as well as or better than many commercial offerings.

We watched Linux for years, expecting it to overtabke commeercial operating systems like Windows. Each year, it seems just as far away from that goal as it did the previous year. Why is that, I wondered?

The limit of Open Source Software

Recently, Linus Torvolds (the "creator" of Linux) spoke of Abandoning the GNOME 3 GUI interface. Before that, the lastest major version of KDE 4, an alternative to GNOME on the Linux operating system, was received to a damp reception. Of course, these things happen with commercial operating systems too, like the blunders of Windows Vista. But within a year, Microsoft released a beta of Windows 7, aiming to fix the shortcomings of a new design.

Microsoft, and certainly Apple Computer have a large interest in making the increasingly complex software easy to use. Beyond operating systems, applications like MS Office, Photoshop, etc., need to maintain a user-friendly veneer on top of a complex software application.

Why does OSS fail? It does not have the same demand for interface design, ultimately commercial success. There are no interface designers creating a navifation structure across the operating system, no reaction to interface failures. They seem to stick to their plan, and blame the users for their own issues.

The OSS community creates and delivers powerful software and appeals to other developers and geeks for use on servers, but until they can compete for users on interface design, theye will always loose the desktop wars.

-- PS -  This post was written in September 2011.

Posted

PostgresNoSQL: The NoSQL Hidden in PostgreSQL

Recently, NoSQL data stores have been getting a lot of attention, as an alternative to using a relational database. They allow more complex data structures to be stored and queried than you find in the table-row-column model.

The PostgreSQL relational database server has several data features that you would expect to be found only in the NoSQL data stores.

Documents (Text Datatype)

While not a data structure itself, the ‘text’ datatype is a “clob” (character large object) of an “unlimited length” and can be used to store documents. This can be useful for specialty formats like YAML, XML, JSON, and serialized data from programming languages. Some of these document types are supported as specialty data types; more on that later.

PostgreSQL has string functions and operators and matching operators including regular expressions and its “similar to” hybrid of the SQL “like” and regular expressions.

The real power of PostgreSQL here is the full text search (a successor to earlier Tsearch and Tsearch2 extentions) features to search these documents. Its full text search offers stemming (removing pluarizations and conjugations of a word) and weights to make searching your documents as easy as a search engine.

The GiST and GIN index types are used to speed up full text searches by indexing the content as a standard search engine would. A GIN index is faster to search than a GiST index, but slower to build or update; so GIN is better suited for static data and GiST for often-updated data.

Arrays

A PostgreSQL column can be created as an array of values, a table, or other variable-length multidimentional arrays. Append the [] brackets at the end of the datatype to define the column as an array of that datatype.

create table lists  (id serial primary key, items text[]);
create table tables (id serial primary key, items text[][]);

The array representation syntax can be either:

'{1,2,"Hello, there",word}'
array[1,2,3]

The first version, is how psql will print out the array and how it will be returned to your program, as a string of comma-separated values. PostgreSQL does support different delimiters. The second version may only contain integers.

Access the elements of the array using the column_name[index] syntax. The first item in the array is at index 1, not 0. The split[start:end] syntax returns the array slice between the two given indexes, and is returned in the ‘{start,middle,end}’ syntax.

Also, PostgreSQL provides an intarray module with functions and operator for working with arrays of non-null integers.

To increase performance for lookup of array values, create a GIN/GiST index on the array column

create index lists_index on lists using gin (items);

Name-Value Pairs

The hstore datatype provides a column defined as a set of name-value pairs. This feature is not in core PostgreSQL, but is delivered as an extenstion in the “contrib” directory of the PostgreSQL distribution. The feature may already be compiled into your database, but may need to be enabled in the databases you need it.

create extension hstore;
create table catalog (id serial, specifications hstore);

NOTE: If you wish to have the hstore installed in all future databases you create, install it into the “template1” database.

The syntax for specifying the dictionary is like this:

'cores=>1, "graphics card=>"xyz graphics"'

So note that double-quotes have to be used for non-simple names and values, or when the contain special characters. Access done as follows:

select specifications->'graphics card' from catalog where specifications->['cores']=1;

To insert/replace or delete a key:

UPDATE tab SET h = h || ('c' => '3');
UPDATE tab SET h = delete(h, 'k1');

To increase performance for lookup of names or values, create a GIN/GiST index on the array column

create index catalog_specifications on catalog using gin (specifications);

XML

PostgreSQL comes with a specialized datatype for XML documents, including functions to parse, alter, and traverse the structure of the document.

CREATE TABLE test (a xml, b xml);
SELECT xmlelement(name test, xmlattributes(a, b)) FROM test;
SELECT xpath('/my:a/text()', 'test',
         ARRAY[ARRAY['my', 'http://example.com']]);

JSON

Soon, PostgreSQL will release a JSON datatype to handle JSON documents as it handles XML. There has been work done through a Google Summer of Code project in 2010, but the PostgreSQL team needs more time to merge it into the distribution.

Custom Data Types

PostgreSQL can be extended to handle new user-defined datatypes that can handle new types of data. Existing specialty type include:

* inet - Holds an IPv4 or IPv6 internet address with CIDR functions and operators
* money - hold curreny amounts with a fixed precision
* enum - holds a static, ordered set of values
* PostGIS - holds positions of geographic information systems

Caveats

With all these wonderful types, you still get the ACID complienace you expect from a relational database.

However the downside is that the datastructures are updated as an entire column and row in your table. It does not treat each element in the data structure atomically. As a result, the database is not appropriate to hold very large data structures where you intend to do frequent updates.

These extenstions are best used as a tool to encode and access more specific information in your row, rather than as a Data struture store like Redis and Riak, or document store like CouchDB or MongoDB.

Posted

How Resque works

Understanding how Resque works

Resque is a fast, lightweight, and powerful message queuing system used to run Ruby jobs asynchronously (or in the background) from your on-line software for scalability and response. I needed to integrate software written in different languages and environments for processing, and this is my understanding of the implementation.

How Queuing with Redis works

Resque’s real power comes with the Redis “NoSQL” Key-Value store. While most other Key-Value stores use strings as keys and values, Redis can use hashes, lists, set, and sorted sets as values, and operate on them atomically. Resque leans on the Redis list datatype, with each queue name as a key, and a list as the value.

Jobs are en-queued (the Redis RPUSH command to push onto the right side of the list) on the list, and workers de-queue a job (LPOP to pop off the left side of the list) to process it. As these operations are atomic, queuers and workers do not have to worry about locking and synchronizing access. Data structures are not nested in Redis, and each element of the list (or set, hash, etc.) must be a string.

Redis is a very fast, in-memory dataset, and can persist to disk (configurable by time or number of operations), or save operations to a log file for recovery after a re-start, and supports master-slave replication.

Redis does not use SQL to inspect its data, instead having its own command set to read and process the keys. It provides a command-line interface, redis-cli, to interactively view and manipulate the dataset. Here is a simple way to operate on a list in the CLI:

$ redis-cli
redis> rpush mylist "hello, redis"  # <= Adds the value to the right side of the list/queue
(integer) 1

redis> keys mylist*                 # <= Returns the matching key names
1) "mylist"

redis> type mylist                  # <= Returns the datatype of the value of this key
list

redis> lrange mylist 0 10           # <= Returns a elements 0 through 10 from the list/queue
1) "hello, redis"

redis> llen mylist                  # <= Returns the number of elements in the list/queue
(integer) 1

redis> lpop mylist                  # <= Pops the leftmost element from the list/queue
"hello, redis"

redis> lrange mylist 0 10
(empty list or set)

How Queuing with Resque works

Resque stores a job queue in a redis list named “resque:queue:name”, and each element is the list is a hash serialized as a JSON string. Redis also has its own management structures, including a “failed” job list.

$ redis-cli
redis> keys * 
1) "resque:stat:processed"          # <= Number of jobss successfully processed
2) "resque:failed"                  # <= This is the failed job list (not a queue)
3) "resque:queue:myqueue"           # <= This is your work queue!
4) "resque:queues"                  # <= The "Set" of work queues
5) "resque:stat:failed"             # <= The number of failed jobs
6) "resque:workers"                 # <= Set of workers
7) "resque:worker:host.example.com:79163:myqueue:started" # <= Count of jobs processed by worker
8) "resque:processed:host.example.com:79163:myqueue:started" # <= Timestamp of worker start

redis> get resque:stat:processed    # <= Returns the count of processed jobs 
"9"

redis> smembers resque:queues       # <= Prints the members of the set of queues
1) "myqueue"

redis> smembers resque:workers      # <= Prints the set of workers
1) "host.example.com:79163:myqueue"

Resque namespaces its data within redis with the “resque:” prefix, so it can be shared with other users.

Designed to work with Ruby on Rails, Resque jobs are submitted and processed like the following boilerplate:

class MyModel
  @queue = :myqueue                 # <= jobs will be placed in this queue name

  # call to queue processing in Resque until later
  def defer(*args)
     Resque.enqueue(MyModel, self.id, *args)
  end

  # Resque calls this method with the additional arguments. Must be named #process
  def self.process(id,*args)
    model = MyModel.find(id)
    # Do something here, raise an exception to send job to failure list
    raise "Oh Noes!" if failed?
  end
end

This does not serialize an object to the queue, instead it saved the (ActiveRecord) model name and record id which is re-instantiated from the database later. The additional arguments are saved in an array to call later. To keep the operation light, do not pass a lot of data to the job. Instead pass references to other records, files, etc.

Each job in Resque is a hash serialized as a JSON string (remember data structures can not be nested in Redis) of the format:

{"class":"MyModel", "args":[123, "arg1", "arg2", ...]}

When the job is popped from the queue, Resque instantiates the ActiveRecord object and calls its process method, passing the additional parameters. Functionally, the worker code behaves something like this (simplified):

klass, args = Rescue.reserve(queue_name)
model = klass.process(*args)

If processing raises an exception, the job and relevant information is placed on the failed list of the JSON format (as a string):

{ "failed_at":"2011/08/22 15:55:16 EDT",
  "payload":{"class":"MyModel","args":[123,"arg1","arg2"]},
  "exception":"NameError",
  "error":"uninitialized constant SalsaJob",
  "backtrace":[...],
  "worker":"host.example.com:56870:myqueue",
  "queue":"myqueue",
  "retried_at":"2011/08/22 16:07:50" }

A failed job can be retried (only once though) through the web interface started with the resque-web command.

Using Resque without Rails

Resque runs out of the box on Ruby on Rails. If you have a ruby application not in Rails, you can still run the Resque workers with the rake command by adding

require 'resque/tasks'

to your Rakefile.

Calling external systems with Resque

There are ports of Resque to other languages such as python, C, Java, .NET, node, PHP and Clojure. If your external system is written in one of these languages, then you can start workers listening to their queues. Since you are not talking to a ruby class with arguments, you can set up a placeholder class with the proper queue name. This will allow Resque plugins to fire on enqueue. (I assume the other libraries work the same way as the original, though some of the languages are not object-oriented—I have not verified them.)

class ExternalClass
  @queue = :external_class
end

Rescue.enqueue(ExternalClass, *args)

That class does not have to implement process() since that will be called in the real class.

If you need to call an external system to perform the task, either that system can be written to accept Resque-style queuing requests (hash of “class” and “args”), or you can push the expeted format directly to the queue

Resque.redis.rpush("queue:#{queue_name}", args.to_json)

The format does not have to be json, but has to be a string of a format the external system expects. You can not use the Resque workers

Calling the Ruby Resque from an external system

Maybe your external system needs to trigger a job to run on your Ruby Resque system, but can does not have a Resque implementation. You can drop your work (as a JSON hash of “class” and “args”) on the raw Redis list/queue yourself from the Redis library or the command line

redis-cli rpush "resque:queue:myqueue" '{"class":"MyModel","args":["arg1"]}'

Epilogue

I am new to Redis and Resque and wanted to dig into the Redis data structures used by Resque, and learned it more in depth while writing this. After understanding how it all fits together, I can now write some integration code! I hope you found this useful, and not too incorrect.

Posted

Loading system libraries in Ruby on Rails (v3) Applications

I just discovered this little surprise working with an app on Rails 3.0.0.rc. I have a class

# app/models/account.rb
require 'Resolv'
class Account
  def verify_email
     mx = Resolv::DNS.open { |dns| dns.getresources(domain, Resolv::DNS::Resource::IN::MX) }
  end
end

This would work fine. The first time only. After that, it though an exception

  NameError: uninitialized constant Account::Resolv

Huh?

Running in the development environment, every time the web server reloads the class, or after the reload! command, it would remove the symbol and not require the library again. Or something like that.

What should you do?

Do not use the require statements in your rails class files. I moved all my require statements at the top of my config/application.rb file and all was better!

Posted

Rails Custom Logger Format and Production Log Location

The default 2.3.5 Rails logger does not print a Timestamp, or other useful background information. Also, you need to know how to change the log path when deploying in production environments when you don't want your logs mixed in with your app. Most Unix systems put the logs in /var/log where they can be monitored for size and content and rotated.


The first parameter is the location of the log. If you do not need a customized logger, initiate the standard ActiveSupport::BufferedLogger instead. The second parameter is the severity level: DEBUG, INFO, WARN, ERROR, or FATAL constants as shown above.

config.logger = MyLogger.new( "/var/logsrails.log", MyLogger::Severity::INFO)

Another problem I had was getting this to work. The config.logger line was not having any effect on my app I started developing on an earlier 2.x release then upgrading to 2.3.5. I managed to get it working by using a fresh config/environment.rb file from a 2.3.5 project.

Posted

Tumblr_kq34kl3ajv1qzre8po1_500

Color Picker for a 256-Color XTerm (xterm, konsole, iTerm). Here is a simple Perl script that prints out this lovely chart. It reminds me of my early years on my first 1982 IBM PC with the Color Card and Monitor. I am using these colors for tweaking vim colorscheme data.

perl -e ‘foreach $i (0..255) {printf(“\e[38;5;$i”.”m%03d\e[0m “,$i); }’

Posted

Changing postgres user info on OS X 10.6

(How to change a service user account home directory in Apple OS X 10.6 Snow Leopard)

I’m sure I did something differently than expected. After upgrading to Snow Leopard, my macports weren’t working. After installing the macports for snow leopard, it told me to do a `port upgrade outdated` but that didn’t work because of the variants. I added the —enforce-variants option but had other issues, I don’t recall which now. So I decided to just re-install the ports from scratch, and it worked very well. Until I tried to start postgres.

The postgres user runs the database, and I had been using 8.3 under Leopard, and now using 8.4. Whenever I tried to initdb or do any postgres command, it complained that the home directory for the postgres user, the old 8.3 directory, was not found. Where was this setting, and how can I change it?

My first stop was at ole /etc/passwd. A note at the top reminded me that it is unused except in single-user mode. The service is now provided by “Open Directory”. Oh joy, LDAP. There are no standard *NIX commands to add and modify the users, except on the Desktop GUI (what’s this called?), which wasn’t the answer. (Actually an old apple tutorial to install postgres says to create a log-in user to run your database instead of a system account.)

A friend suggested something called “netinfo”. A little google-fu led me to this post

http://forums.macosxhints.com/showthread.php?p=552033

which showed me some commands! The `dscl` (Directory Services Command Line) was the one I wanted. I read the man page and switched my brain into LDAP mode temporarily. Then I issued this command:

dscl localhost change /Local/Default/Users/postgres NFSHomeDirectory /opt/local/var/db/postgresql83 /opt/local/var/db/postgresql84

and then tried to be the postgres user

sudo -u postgres -i

And no errors! Then I was able to initialize the database and start it

sudo -u postgres /opt/local/lib/postgresql84/bin/initdb -D /opt/local/var/db/postgresql84/defaultdb -E UTF8
sudo -u postgres /opt/local/lib/postgresql84/bin/pg_ctl -D /opt/local/var/db/postgresql84/defaultdb start

I hope you find this post if you are in a similar situation, and it will help you resolve it.

Meow!

Posted

Ruby YAML loads leading-zero numbers as Octal

Watch out! I was innocently loading a YAML file in Ruby created by another system

value: 012

The Ruby YAML library interprets the 012 as octal, as it would in Ruby source code (and as Perl would do). To have this interpreted as string, you must quote the value.

value: “012”

In case you didn’t know, any number starting with a leading zero in C, Ruby, Perl, or Python, is assumed to be octal. “Of course!”, you say, “Everyone thinks and codes in octal”. The only time I use octal is using the unix permissions on the “chmod” command—but now there are modern alternatives to that.

Hexadecimal numbers start with “0x” such as “0x12”. That is not a hard mistake to find as its not a valid integer. But innocent leading-zero numbers can cause syntax errors if the number contains a 9 (non-octal digit) such as “09”, or even floating point numbers  like “012.345”.

You can test this in “irb”…

$ irb
» 07
=> 7
» 012
=> 10
» 0x12
=> 18
» 019
SyntaxError: compile error…
» 012.345
SyntaxError: compile error…

So this is error-prone and annoying in these languages.

As it turns out, this behavior is part of the YAML specification, but I do not like it!  I doubt many people expect that behavior. Data encoding should not take these “short cuts”, as YAML files can be created by non-programmers who have never heard of octal and expect a leading-zero to be harmless.

I’m finding the rule of thumb in YAML is to use quoted values when possible.

Posted

Install Ubuntu 9.04 Virtual Server With Passenger and Rails Application

Install Ubuntu 9.04 Virtual Server With Passenger and Rails Application

There have been a few post and code I referenced to set up my server. See gist for a raw install script, which you must adapt to your needs.

I’ll assume we are started from a post-install state of the Server, with the proper networking, etc. in good working order. Here are the basic command line I entered.

First, let’s get the system up to date

sudo ntpdate ntp.ubuntu.com
sudo apt-get update
sudo apt-get upgrade
sudo ssh-keygen

Now to install the basic needs for the server

sudo apt-get -y install git-core openssh-server openssh-client build-essential \
wget ntp-simple


This sets up the MRI 1.8.7 Ruby interpreter. I think you can skip this step if you want to run the Phusion Ruby Enterprise Edition instead (see next).

sudo apt-get -y install ruby rdoc irb libyaml-ruby libzlib-ruby ri libopenssl-ruby \
ruby1.8-dev libopenssl-ruby

Install Phusion Ruby Enterprise Edition. Check that page for the current version. This installs executables into /opt/ruby-enterprise/bin, so you should either add that to your path or link those command into your path. I did the link, and since Ubuntu ruby installs into /usr/bin, and my default PATH puts /usr/local/bin before that, I simply created links for each command into that directory. This may not be the best way, but this was a proof of concept run.

wget http://rubyforge.org/frs/download.php/55510/ruby-enterprise_1.8.6-20090421_i386.deb
sudo dpkg ruby-enterprise_1.8.6-20090421_i386.deb
rm ruby-enterprise_1.8.6-20090421_i386.deb
sudo ln -s /opt/ruby-enterprise/bin/* /usr/local/bin/

Next, we need to install Ruby Gems, the heart of ruby library package management. You can install via the ubuntu ‘apt-get install rubygems’, but the version you install doesn’t like to upgrade itself. It is best to do it from the rubygems distrubution package. Download the current release from the rubygems download page. Since a lot of gems are released though github these days, add that source to the gems sources.

wget “http://rubyforge.org/frs/download.php/45905/rubygems-1.3.1.tgz”
tar -xvzf rubygems-1.3.1.tgz
rm rubygems-1.3.1.tgz
cd rubygems-1.3.1
sudo ruby setup.rb
cd ..
rm -r rubygems-1.3.1
sudo gem sources -a http://gems.github.com

Install PostgreSQL. Its my database of choice, fast, robust, friendly (…except for replication). Set it up to accept connections from this server. After that, create the user accounts you require of your applications. Consult https://help.ubuntu.com/9.04/serverguide/C/postgresql.html for more details.

sudo apt-get -y install postgresql libpq-dev
sudo vi /etc/postgresql/8.3/main/postgresql.conf
enable this setting:
listen_addresses = ‘localhost,127.0.0.1’”
sudo vi /etc/postgresql/8.3/main/pg_hba.conf
Change md5 to trust in line:
host all all 127.0.0.1/32 trust
sudo /etc/init.d/postgresql-8.3 restart
sudo -u postgres psql template1
ALTER USER postgres with encrypted password ‘your_password’;
create user appuser createdb createuser;

Alternatively, you can run MySQL. It’s also a nice database. You may want to verify this elsewhere to get everything up and running.

sudo apt-get install mysql-server mysql-client
sudo apt-get install libmysql-ruby libmysqlclient15-dev
sudo gem install mysql —no-rdoc —no-ri

Now to install the Apache2 web server and the Phusion Passenger (modrails) module. It instructs you to add some apache configurations, but this location is slightly different. Verify the passenger version numbering if using the “echo” command below.

sudo apt-get -y install apache2-mpm-prefork libapr1-dev apache2-prefork-dev
sudo gem install passenger —no-rdoc —no-ri
sudo /opt/ruby-enterprise/bin/passenger-install-apache2-module
sudo echo “LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-2.2.1/ext/apache2/mod_passenger.so
PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-2.2.1
PassengerRuby /usr/bin/ruby1.8” > /etc/apache2/mods-available/passenger.load
sudo a2enmod passenger
sudo a2enmod ssl
sudo a2enmod rewrite
sudo /etc/init.d/apache2 force-reload

Now we are on the road to deployment. Let’s install Rails and some other basic gems.

sudo apt-get -y install libxml2 libxml2-dev
sudo gem install rails rake rspec rspec-rails ruby-debug capistrano libxml-ruby \ fastercsv —no-rdoc —no-ri
sudo gem install mislav-will_paginate —no-rdoc —no-ri

Say we decide to host our app out of /var/app and set up up ready to go. I’m not sure about the chown command below, but saw it elsewhere and it didn’t hurt.

sudo mkdir /var/app
sudo chown allen:allen /var/app
cd /var/app
git clone git://github.com/account/appname.git
sudo chown www-data:www-data /var/app/k23/config/environment.rb

Chances are, you will deploy with a yummy capistrano or vlad recipe. That out out of the scope at this time, so we’ll create the database and test to make sure te connection is right. Play with your app and fix settings before going on..

cd /var/app/appname
cp config/database.yml.example config/database.yml
rake db:create RAILS_ENV=production
rake db:migrate RAILS_ENV=production
script/console production

Now we install the app through Passenger.

sudo echo “<VirtualHost *:80>
ServerName appname.com
DocumentRoot /var/app/appname/public
</VirtualHost>” » /etc/apache2/sites-available/appname.com
sudo a2ensite appname.com
sudo /etc/init.d/apache2 reload

If we were successful, we should be able to load that site (assuming DNS is pointing here already).

Load http://appname.com on your workstation web browser.

I hope this helped you as much as it did me. I apologize for errors that crept into this script while converting it to a blog entry.

Posted

twitterdb - Using twitter as a database

I just had this crazy idea. If you already read the title, then you already know! ;-)

Twitter is crazy cool. You can tweet as much as you want (I think) and can create gigabytes of useless personal trivia… or you can store somthing kinda cool in it… DATA! As a result, you can get a free Small-Document-Oriented database you can access from the cloud.

Sure, this is nuts, but its fun to consider, right?

First, each “table” is a twitter “account”, call it “app.table” or whatever. Each “tweet” to the table is a record. Ok, twitter posts can only be 140 characters—a limit I otherwise appreciate. We can store the record as any parsable key/value store.

Perhaps we could use JSON:{“ircEvent”: “PRIVMSG”, “method”: “newURI”, “regex”: “^http://.*”}

Or a Ruby Hash: {:key=>”value”, …}

Or Python Dictionary: {‘jack’: 4098, ‘sape’: 4139}

Insert the data by tweeting the record to the table. Now you are storing it. You can’t update it, only supercede a record.  How would we access it? Feeds, api, searches, and mashups.My twitter-fu on this isn’t so strong as I never had a need for it before.

But still… could be interesting, right? And you could even SHARE data by posting it to public databases like this. For sanity, you would validate and post this through a gateway service of your own devising.

I wonder now much this could be like CouchdB, SimpleDB, or the Google App Engine Datastore this is?

Perhaps this could be work exploring. Maybe I could even write ActiveRecord (Ruby on Rails) bindings to emulate this. Someday.

I’ll keep you posted…

Posted