Archive for the ‘High Availability’ Category
Highly Available Mail Cluster – v2
It has been some time since I last blogged about my quest to build a Highly Available Mail Cluster. If you recall, the last HA Mail Cluster architecture that I designed involved four identically configured servers, spread between two data centres utilizing DNS round-robin load balancing. The Maildirs were rsynced from the ‘master node’ to the three slaves every 5 minutes. MySQL master-slave replication was used for user authentication. This worked “well enough” but it wasn’t real High Availability and it meant that, at any given time, three out of four Mail nodes were passive (idle) – wasting resources.
I decided to design a brand new platform to deliver a Highly Available Mail Cluster. The platform consists of two domU nodes in active/active configuation. Each node utilizes a DRBD block device in Primary/Primary mode. OCFS-2 is the clustered file system which sits on top of DRBD. This allows both nodes shared-concurrent access to the Maildir directories.
Both nodes run dovecot for pop3/imap and postfix for smtp. Each node has an equally weighted A/MX record for SMTP / POP3 / IMAP load balancing. The load balancing is still performed by DNS, utilizing two IP addresses in the A record. The beauty of this active/active heartbeat setup means that in the event of a server failure, the IP resource of the failed server will be taken over (via heartbeat) by the other mail server. This means there is virtually zero chance of a user hitting a stale IP address in the DNS A record. I have noticed that when users are constantly checking their inbox (pop3/imap) every 5 minutes or so Outlook caches the DNS entry indefinitely, regardless of TTL.
The above solution is working very well. I did have a couple of initial concerns regarding the stability of DRBD/OCFS-2 within a Xen domU – but I have had no problems to date. Overall the entire solution appears to be very stable.
The architecture diagram below (servers on right) shows the architecture. The full size image can also be found at: http://napta2k.googlepages.com/linode-v2.png
The drawbacks of this solution is that it does not scale past 16 nodes. The scaling limitations are due to how many cluster members can be part of Heartbeat, DRBD and OCFS-2. Thankfully I only host 400 or so mailboxes and can never see the need to scale to any more loads. One domU handles my current load just fine. The two node active/active could easily be active/passive and work just as well.
If I was going to implement a large single-data centre Highly Available Mail Cluster I would use the following architecture:
1. CARP IP address layer to distribute the load to the TCP load balancers <– probably only needed for the largest setups. You may have 16 load balancers on your front line but you definitely do not want to have 16 IP addresses in your DNS A record. CARP masquerades this.
2. TCP load balancer layer such as HAProxy to load balance the IMAP/POP3 traffic to the IMAP/POP3 farm
3. A farm of dovecot servers all sharing a _resilient_ NFS backend of Maildirs
4. A resilient NFS architecture. This could be as part of a SAN (e.g. EMC celerra) or Linux iSCSI/DRBD.
x. SMTP can be load balanced via MX.
Building a Highly Available Mail Cluster
Update: I have updated my Highly Available Mail Cluster Architecture.
Checkout: http://ajclark.wordpress.com/2009/03/05/highly-available-mail-cluster-v2/
In my spare time I look after a mail cluster for a small-ish hosting / consulting company. The mail infrastructure used to consist of one physical dedicated server which was rented by the consulting company. It was my job to manage this server. The server was basically a FreeBSD box running Plesk 7.3 which worked very well for a year or so.
Unfortunately with the rising costs of energy most dedicated / colocated server ISPs are raising their prices. I know Layered Tech are one such company that did this and angered many of their customers. The consulting company were faced with a decision, either pay the increased ISP prices or migrate to a virtual Xen based platform which was a fraction of the cost and could be made Highly Available.
Design goals
Since this is a new project I felt it was important to have design goals.
- Keep it simple.
- Keep it consistent.
- Use stock software packages – All system builds should adhere to Debian release configurations.
- Platform must scale.
- KEEP IT SIMPLE!
The advantages of a Xen platform
The plan was to purchase for Xen domU “linodes” systems from the ISP linode.com. Linode are a great ISP who provide Xen (and previously UML) domUs across four different data centres in the US. Their ingenious control panel lets you quickly provision a Xen domU in a data centre of your choosing, and/or clone/migrate existing domUs. The advantages of using a Xen platform:
- Cost saving – Able to purchase four domU linodes for less than the cost of one dedicated server.
- High Availability – Linode assist in HA by placing your domUs on different physical servers. They also allow IP takeover.
- Simple management – Using Debian as opposed to FreeBSD allows us to quickly patch and update the OS.
- Efficient resource usage - A single “dom0″ physical server hosting eight domUs is more efficient than a single dedicated server hosting Web / Mail. This also gives us that ‘Green Energy’ feel good factor.
The Mail Cluster Architecture
The Mail Cluster design was based on Linux-HA & Heartbeat. There would be a total of four Xen domUs in two different data centres. In each data centre there would be a primary (active) and secondary (inactive) domU. All domUs would have identical configurations. There would only be one Xen node serving Mail services at any one time. This means that three domUs are effectively inactive until they are required.
Failover Architecture
Should the primary server DC1-SVR1 fail DC1-SVR2 will instantly take over using Linux Heartbeat. This provides a near sub-second failover. Should the entire DC1 go offline DNS will be manually switched over to DC2-SVR1. Primary DNS is hosted on DC1-SVR1 and slave DNS on DC2-SVR1. All DNS TTLs are 300 seconds.
- DNS is kept in sync using standard master / slave configuration.
- MySQL is kept in sync with master / slave replication (1 master, 3 slaves)
- Web htdocs are kept in sync with rsync at regular intervals.
This architecture gives us protection against both a physical server failure and a complete data centre failure.
The failover architecture was designed on the principle that it is not acceptable to have a multi-hour server outage (e.g. physical server failure) but it is acceptable to have a 10 minute outage while DNS fails over (e.g. complete DC failure)
Mail Cluster Software Components
One of the key goals of the fail over design is to make as many applications as possible use MySQL to store data. This simplifies data synchronization.
- Debian stable
- Apache
- Postfix – Using MySQL for all maps
- Dovecot – Using MySQL Postfix database
- MySQL – Replication enabled. Master: DC1-SVR1, Slaves: DC1-SVR2, DC2-SVR1, DC2-SVR2.
- BIND
- Roundcube mail
- Postfix Admin
- Bindgraph
- Mailgraph
Future improvements to the Mail Cluster
The main thing that bugs me is that there are three domUs that are largely inactive, except for MySQL and htdoc replication. This complies with the number one design goal of keep it simple, however, I can’t help but feel this is wasteful.
I would like to replace the rsync data syncs with iSCSI. Each set of servers could fail over to an iSCSI target within their respective data centres. This would be more efficient than periodic rsyncs from cron.
