Archive for the ‘Xen’ Category
China gets horny
Today I woke up to several alerts from Linode informing me that one of my VPS nodes was exceeding the Disk I/O threshold that I had set. Curiously this VPS is used as a HTTP web proxy and whilst it gets about 300-400 visitors per day (mainly china) this morning I was seeing over 800 visitors in Google Analytics.
Attempting to ssh to the server failed with timeouts although the PHP web application was still responding to requests over HTTP fine. I suspect sshd was failing to reverse-lookup my IP address in any reasonable amount of time, or perhaps IP Tables – (Note to self: Look in to why that happened). Thankfully Linode provide out of band / console access via SSH and AJax so all was not lost.
Looking at the Network rrdgraph it shows that the server was approaching 7Mbit/s of HTTP traffic and almost 50GB had been consumed today alone. Whilst the server seemed to handle the load without problem (minus ssh access) consuming 50GB+ per day would quickly max out my monthly data transfer allowance with Linode – this wasn’t acceptable. I modified the firewall to accept HTTP/HTTPS traffic from my IP only in order to investigate and the load suddenly stopped and SSH was alive again.
Initially I had suspected that some sort of automated bot was using ehproxy.info to do automated scans and attacks but a closer inspection of the traffic showed an even number of distributed IPs (all from China – as Google Analytics confirms) all clicking various porn sites. I guess everyone in China was feeling horny this afternoon!
Further analysis of the access.log shows that the server (Linode XenU VPS with 720MB of ram) was handling 62 hits sec (2428863/39600) and lighttpd was dealing with the load no problem. Pretty good considering this is a pure PHP application utilising php-cgi.
For the record, the top five IP addresses were:
Hits : IP address
Error: Device 0 (vif) could not be connected. Hotplug scripts not working
Are you running Xen / “xm create” and you get this error?
Try this (RHEL/CentOS): service haldaemon start
- and have a nice day!
Highly Available Mail Cluster – v2
It has been some time since I last blogged about my quest to build a Highly Available Mail Cluster. If you recall, the last HA Mail Cluster architecture that I designed involved four identically configured servers, spread between two data centres utilizing DNS round-robin load balancing. The Maildirs were rsynced from the ‘master node’ to the three slaves every 5 minutes. MySQL master-slave replication was used for user authentication. This worked “well enough” but it wasn’t real High Availability and it meant that, at any given time, three out of four Mail nodes were passive (idle) – wasting resources.
I decided to design a brand new platform to deliver a Highly Available Mail Cluster. The platform consists of two domU nodes in active/active configuation. Each node utilizes a DRBD block device in Primary/Primary mode. OCFS-2 is the clustered file system which sits on top of DRBD. This allows both nodes shared-concurrent access to the Maildir directories.
Both nodes run dovecot for pop3/imap and postfix for smtp. Each node has an equally weighted A/MX record for SMTP / POP3 / IMAP load balancing. The load balancing is still performed by DNS, utilizing two IP addresses in the A record. The beauty of this active/active heartbeat setup means that in the event of a server failure, the IP resource of the failed server will be taken over (via heartbeat) by the other mail server. This means there is virtually zero chance of a user hitting a stale IP address in the DNS A record. I have noticed that when users are constantly checking their inbox (pop3/imap) every 5 minutes or so Outlook caches the DNS entry indefinitely, regardless of TTL.
The above solution is working very well. I did have a couple of initial concerns regarding the stability of DRBD/OCFS-2 within a Xen domU – but I have had no problems to date. Overall the entire solution appears to be very stable.
The architecture diagram below (servers on right) shows the architecture. The full size image can also be found at: http://napta2k.googlepages.com/linode-v2.png
The drawbacks of this solution is that it does not scale past 16 nodes. The scaling limitations are due to how many cluster members can be part of Heartbeat, DRBD and OCFS-2. Thankfully I only host 400 or so mailboxes and can never see the need to scale to any more loads. One domU handles my current load just fine. The two node active/active could easily be active/passive and work just as well.
If I was going to implement a large single-data centre Highly Available Mail Cluster I would use the following architecture:
1. CARP IP address layer to distribute the load to the TCP load balancers <– probably only needed for the largest setups. You may have 16 load balancers on your front line but you definitely do not want to have 16 IP addresses in your DNS A record. CARP masquerades this.
2. TCP load balancer layer such as HAProxy to load balance the IMAP/POP3 traffic to the IMAP/POP3 farm
3. A farm of dovecot servers all sharing a _resilient_ NFS backend of Maildirs
4. A resilient NFS architecture. This could be as part of a SAN (e.g. EMC celerra) or Linux iSCSI/DRBD.
x. SMTP can be load balanced via MX.
Optimizing VBulletin for a VPS – part 1.5
I have modified my VBulletin config file and enabled the use of APC as a VBulletin datastore. Smokeping is now reporting latency of 90-95ms. Not an immediately noticeable improvement but the average load on the server is 0.00 0.00 0.00 even with 100,000 hits per day. The performance improvements should be more measurable as the load increases.
To configure APC as a VBulletin datastore I simply uncommented the following line from includes/config.php:
$config['Datastore']['class'] = 'vB_Datastore_APC';
Optimizing VBulletin for a VPS – part 1
I run a small-medium VBulletin based web forum that receives a modest 110,000 hits / 2,000 – 3,000 unique visitors per day. I run this forum from an even more modest Xen-based VPS from Linode. Originally the forum started out on a big dedicated machine with 2GB of ram and a beefy processor, running VBulletin / Apache 2 mod_php / FreeBSD. I wasn’t happy with this solution, Apache 2 / mod_php could easily consume 2GB of ram due to the prefork MPM and the notion of running a complete PHP interpreter in each Apache process. I was convinced that it could create a much more efficient platform to host the VBulletin forum.
I decided to move the forum to a more modern, efficient, platform consisting of a Xen-based VPS from Linode, Debian GNU/Linux, and lighttpd. Not only is lighttpd measurably faster than Apache, using a VPS allows me to attain higher availability by default since most VPS servers (atleast at Linode) are usually of better specification (quad-core, RAID-1, dual PSU) than your low-end single disk dedicated server. Linode also allows me to rapidly deploy VPS instances in different data centres, and create HA/failover solutions. A VPS is also substantially cheaper than a dedicated server. The average dedicated server costs $150-$250 per month – the average VPS costs $20 per month. Win!
Whilst lighttpd and it’s FastCGI architecture based PHP happily serve out over 110,000 hits per day, running ApacheBench against the forum revealed that the server would max-out at serving 12 concurrent requests of the forum per second. Although due to the nature of HTTP and Web surfing users do not notice any problems. To gather a better understanding of how long the page took to load I installed smokeping and echoping. Smokeping reported that the forum took 150ms to load. This accounted for the 12 requests per second that ApacheBench reported. Not good… but at least I had a clearer picture of how long things were taking to load.
In an attempt to further optimize the forum I installed the XCache PHP Accelerator. Smokeping showed a measurable improvement of 50ms, taking the forum load time from 150ms to 100ms. Although XCache was working perfectly out of the box I decided to compare it with PHP-APC. After I installed PHP-APC smokeping reported the same drop from 150ms to 100ms, and the APC admin URL reported a 98% cache hit rate within 20 minutes of running, and with a default setting of 30MB of cache. Overall, a satisfactory performance improvement.
Below you can see the two ‘dips’ in the graph where adding a PHP accelerator improved VBulletin performance. The first dip is XCache, the second is APC:

Building a Highly Available Mail Cluster
Update: I have updated my Highly Available Mail Cluster Architecture.
Checkout: http://ajclark.wordpress.com/2009/03/05/highly-available-mail-cluster-v2/
In my spare time I look after a mail cluster for a small-ish hosting / consulting company. The mail infrastructure used to consist of one physical dedicated server which was rented by the consulting company. It was my job to manage this server. The server was basically a FreeBSD box running Plesk 7.3 which worked very well for a year or so.
Unfortunately with the rising costs of energy most dedicated / colocated server ISPs are raising their prices. I know Layered Tech are one such company that did this and angered many of their customers. The consulting company were faced with a decision, either pay the increased ISP prices or migrate to a virtual Xen based platform which was a fraction of the cost and could be made Highly Available.
Design goals
Since this is a new project I felt it was important to have design goals.
- Keep it simple.
- Keep it consistent.
- Use stock software packages – All system builds should adhere to Debian release configurations.
- Platform must scale.
- KEEP IT SIMPLE!
The advantages of a Xen platform
The plan was to purchase for Xen domU “linodes” systems from the ISP linode.com. Linode are a great ISP who provide Xen (and previously UML) domUs across four different data centres in the US. Their ingenious control panel lets you quickly provision a Xen domU in a data centre of your choosing, and/or clone/migrate existing domUs. The advantages of using a Xen platform:
- Cost saving – Able to purchase four domU linodes for less than the cost of one dedicated server.
- High Availability – Linode assist in HA by placing your domUs on different physical servers. They also allow IP takeover.
- Simple management – Using Debian as opposed to FreeBSD allows us to quickly patch and update the OS.
- Efficient resource usage - A single “dom0″ physical server hosting eight domUs is more efficient than a single dedicated server hosting Web / Mail. This also gives us that ‘Green Energy’ feel good factor.
The Mail Cluster Architecture
The Mail Cluster design was based on Linux-HA & Heartbeat. There would be a total of four Xen domUs in two different data centres. In each data centre there would be a primary (active) and secondary (inactive) domU. All domUs would have identical configurations. There would only be one Xen node serving Mail services at any one time. This means that three domUs are effectively inactive until they are required.
Failover Architecture
Should the primary server DC1-SVR1 fail DC1-SVR2 will instantly take over using Linux Heartbeat. This provides a near sub-second failover. Should the entire DC1 go offline DNS will be manually switched over to DC2-SVR1. Primary DNS is hosted on DC1-SVR1 and slave DNS on DC2-SVR1. All DNS TTLs are 300 seconds.
- DNS is kept in sync using standard master / slave configuration.
- MySQL is kept in sync with master / slave replication (1 master, 3 slaves)
- Web htdocs are kept in sync with rsync at regular intervals.
This architecture gives us protection against both a physical server failure and a complete data centre failure.
The failover architecture was designed on the principle that it is not acceptable to have a multi-hour server outage (e.g. physical server failure) but it is acceptable to have a 10 minute outage while DNS fails over (e.g. complete DC failure)
Mail Cluster Software Components
One of the key goals of the fail over design is to make as many applications as possible use MySQL to store data. This simplifies data synchronization.
- Debian stable
- Apache
- Postfix – Using MySQL for all maps
- Dovecot – Using MySQL Postfix database
- MySQL – Replication enabled. Master: DC1-SVR1, Slaves: DC1-SVR2, DC2-SVR1, DC2-SVR2.
- BIND
- Roundcube mail
- Postfix Admin
- Bindgraph
- Mailgraph
Future improvements to the Mail Cluster
The main thing that bugs me is that there are three domUs that are largely inactive, except for MySQL and htdoc replication. This complies with the number one design goal of keep it simple, however, I can’t help but feel this is wasteful.
I would like to replace the rsync data syncs with iSCSI. Each set of servers could fail over to an iSCSI target within their respective data centres. This would be more efficient than periodic rsyncs from cron.


