In the last post i described how you can setup plproxy and create a basic horizontally partitioned cluster. Now we will take a look at another real life usage: building a read-only cluster for your database
Distributing read-only load
The simplest real world usage for plproxy would be it’s use for redirecting read-only queries to read-only replicas of master database. The replicated databases can be filled with data via Londiste that is part of the SkyTools package,setup tutorial can be found here or with Slony which is a more heavyweight solution but from my own experience also harder to setup and maintain though definitely at the time being better documented.
A typical read-only cluster could look like on the following schema. The databases with the letter (P) on them are connection poolers. We ourself use PgBouncer but pgpool is also a choice.
The poolers are needed to minimize the number of open connections to a database also execution plans are cached on a connection basis. Of course everything will work fine also without the poolers. Dashed bold arrows represent replicas.
In this setup the plproxy functions determine the database to which the query is redirected. Read&write queries go to master database and read-only queries are distributed based on the algorithm you define to read-only replicas.
Setting up replication itself is relatively easy once you have passed the painful skytools installation process.
First let us create both replicas from write database toward ro1 & ro2. ro1 configuration file looks like this:
replica1.ini
replica2.ini is basically the same only job name and database name need to be changed. Now let’s install Londiste on provider (write) and subscribers (ro1,ro2) and start the replication daemons:
@H_403_21@mbpro:~/temp kristokaiv$ londiste.py replica1.ini provider install mbpro:~/temp kristokaiv$ londiste.py replica1.ini subscriber install mbpro:~/temp kristokaiv$ londiste.py replica2.ini subscriber install mbpro:~/temp kristokaiv$ londiste.py replica1.ini replay -d mbpro:~/temp kristokaiv$ londiste.py replica2.ini replay -dThe next thing you need to do is to setup the ticker process on the database where write is performed. The ticker creates sync events so running it with shorter intervals will reduce latency. My configuration file looks like this:
ticker_write.ini
To start the ticker as a daemon just run:
@H_403_21@mbpro:~/temp kristokaiv$ pgqadm.py ticker_write.ini ticker -dLets create a simple table that we will replicate from master to read-only’s
@H_403_21@mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key,password text);" write mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key,password text);" ro1 mbpro:~/temp kristokaiv$ psql -c "CREATE TABLE users (username text primary key,password text);" ro2And add it to replication
@H_403_21@mbpro:~/temp kristokaiv$ londiste.py replica1.ini provider add users mbpro:~/temp kristokaiv$ londiste.py replica1.ini subscriber add users mbpro:~/temp kristokaiv$ londiste.py replica2.ini subscriber add usersAfter some time the tables should be up to date. Insert a new record in the write database and check if it’s delivered to both read-only db’s.
The functions to insert and select from users table:
Just for the comfort of those actually trying to follow these steps,here is how the proxy databases
cluster config:
The last thing left to do is to create the plproxy function definitions that will redirect the login function calls against read-only databases and add_user calls against write database:
@H_403_21@CREATE OR REPLACE FUNCTION public.login( in i_username text,out status_code text ) AS $ CLUSTER 'readonly'; RUN ON ANY; $ LANGUAGE plproxy; CREATE OR REPLACE FUNCTION public.add_user( in i_username text,out status_code text ) AS $ CLUSTER 'write'; $ LANGUAGE plproxy;This is it,the read-only cluster is ready. Note that even though creating such a read-only cluster seems simple and a quick solution for your performance problems it is not a silver bullet solution. Asynchronous replication often creates more problems than it solves so be careful to replicate only non-timecritical data or guarantee a fallback solution when data is not found (eg. proxy function first checks readonly database and if data is not found looks the data up from write db)