How I manage my Linodes
In my free time I manage a four Linode VPS servers on behalf of a small business. The VPS servers have various production infrastructure roles that are critical to the business and it’s customers. Two of the servers function as mail servers in a high-availability configuration and the other two are dedicated to a couple of medium traffic web applications. All servers have the standalone puppet client installed, but not the puppet master. The reason for not using a puppet master is largely for simplicity.
I manage the configuration of the Linodes by assigning each server a ‘role’. e.g. ’sitexyz.com’ or ‘mail server’. In this instance, my definition of a ‘role’ means a puppet module containing the relevant configuration. I have a local git repository on my home iMac which contains the necessary puppet module that I have created for each role. Each time I am required to make a change on any Linode I first create that change in the relevant file of my local puppet module. I then commit the puppet changes in to git. The next step is to clone the git repo to the target server and then run the server’s local puppet client against the relevant module in the git repository. The change is then executed on the remote server.
Naturally, the key to successful configuration management is to learn the discipline of making any required server change in puppet first, testing, and then deploying. You should never make an ad-hoc change to a server without (first) updating the necessary puppet configuration.
The beauty of configuration management is that it aids greatly in disaster recovery or capacity expansion. If I suddenly need to provision more Linodes I just have to replay the relevant puppet module on a new server and a new XYZ server is rapidly created. Just make sure you backup your git repository. I suggest something like Amazon S3 or a private github account.
In my environment I am using puppet modules. However, if you have an even simpler environment you could just use simple classes or even without them. Astute readers will note that I do little or no testing of my puppet changes before I push them out. This is because the size/scale of the changes that are usually required are simple to implement and carry little risk. Below is a diagram that I created in Cacoo which (attempts to) illustrate my environment.

The little green helicopter!
I’ve recently bought a micro R/C helicopter from ThinkGeek as an early Christmas gift to myself. ThinkGeek stock several fairly inexpensive R/C helicopters (as well as other R/C toys) that are easy to fly and fun to use.
I say R/C, but I actually discovered that the control uses IR (infrared) to talk to the Helicopter instead of R/C. This hasn’t proved to be an issue, although I do feel a little cheated. There is also a descreprency between the flight/charge time on ThinkGeek’s web site and the Helicopter packaging. ThinkGeek state a charge time of 10 minutes and a flight time of 7 minutes. The packaging that came with the Helicopter says 20-30 minute charge produces 5-6 minutes of flight time.
I seem to average about 20 min charge (via USB) to 5-7 minutes of flight. Either way it’s a really fun way to get involved in the world of R/C flying.
Fortunately the helicopter is made of metal and is very robust, as I crash it almost each time I fly it indoors. I highly recommend one of these things to… well, everyone!
http://www.thinkgeek.com/geektoys/rc/cb6c/
My thoughts on Carbonite.com
When I first found out about Carbonite I didn’t believe that they actually supported Mac backup. Their web site resembles Windows Vista in my opinion with bad graphics everywhere. A bit hesitant I signed up anyways since they had a “Risk Free” (sounds like the Shopping channel) 15 day trial.
The Carbonite Mac client is pretty impressive. It is as slick as the BackBlaze client and very simple to use. Unfortunately it inherits those awful graphics from their web site. They need to fix that. Using Carbonite is as simple as it gets. I chose to let Carbonite auto-manage my backups and it intelligently selected my home directory and everything under it, and it let me exclude the bits I didn’t want to backup.
Unfortunately Carbonite suffers the same problem as Mozy. IT’S SO SLOW. When I say slow, I don’t mean “backing up 12GB is slow” I mean slow as in it’s only using a fraction of the upload bandwidth available on my DSL connection. Unfortunately this is because Carbonite are in the US somewhere far far away. After a couple of hours it had backed up 300MB out of 12GB at a rate of about 300Kbit/s or less. This wasn’t going to work.
What is stopping me from purchasing Carbonite.com is that they are just too slow to use from the UK in my experience.
Pros:
- Slick UI, simple to use
- 15 day unlimited free trial
Cons:
- Slow
- Carbonite.com web site is AWFUL to use
My thoughts on Mozy.com
Mozy offers a unique approach to backing up your Mac. When the client is installed, instead of presenting a list of files or directories the Mozy client asks you if you’d like your Mail backing up, or your Keychain prefs, Safari Bookmarks, iCal, iPhoto etc. I thought this is a breath of fresh air. There is also a tab that lets you select individual directories too.
Although not as slick as BackBlaze, the Mozy UI is pleasant to work with and I think the unique perspective of what to backup more than makes up for this. Mac users don’t so much care about what directories are backed up (how many people know what is in ~/Library?!) but what application data is preserved.
The problem that stopped me from choosing Mozy.com as my backup solution is that it’s very slow to backup your files if you are in the UK. This is because Mozy, like virtually all online backup providers are based in the US. The speed I achieved backing up to Mozy.com was about 300Kbit/s – with BackBlaze.com I actually hit 900Kbit/s according to the BackBlaze UI.
I may revisit using Mozy sometime in the future if the speed improves.
Update:
It seems I am unable to restore all my files from Mozy. This makes the service a bit of a failure in my eyes. Why can’t I restore all the files that I backup? A directory was successfully backed up with zero errors, yet there are errors restoring certain arbitrary files in the directory to an alternative location. gah.
Pros:
- Can backup Applications (Safari Bookmarks, Keychain, iCal etc)
- Free Trial
Cons:
- Too slow
- Backups are scheduled, no continuous data protection
My thoughts on BackBlaze.com
I really wanted to like this product. It has a very slick user interface which makes the software a joy to use. However, there is something very wrong with their approach to backups and protection.
The BackBlaze approach is to backup everything by default, excluding the OS and temporary files. Unfortunately they don’t quite get this right. BackBlaze let you select directories to exclude from backup but there is no way to not backup hidden files. The average Mac user may accumulate a lot of hidden files over a period of time, especially developers or even just VLC users.
Example hidden directories are ~/.cpan or ~/.dvdcss which you get from VLC. There maybe 30,000 files in my ~/.cpan directory and perhaps a hundred or so files in ~/.dvdcss.
Also, the whole model of “backup everything by default” I think is wasteful for an online backup system. I have 9GB of files in my Mac home directory yet somehow backblaze wants to backup 46GB of data, even though it excludes most OS directories. I noticed it did not exclude /var nor did it exclude /tmp. My /var contained 13GB of useless data
The BackBlaze “backup everything by default” is what stopped me from using their product.
Pros:
- Only the changes in modified files are backed up
- Slick UI
- Fast upload speed
- Continuous Data Protection – no set backup schedule
- Free Trial
Cons:
- Cannot exclude hidden directories
- Backup “everything” by default approach causes unnecessary data to be backed up. Wasting many gigabytes.
- Does not exclude /tmp and /var by default
Update:
I’ve actually given BackBlaze another chance. I do want to like the product and for some reason I am able to max out my upload bandwidth to them despite their servers being located in San Jose somewhere. I’ve deleted all my large .dotfile directories and excluded enough directories so BackBlaze is only backing up my home directory.
Ideally I want to use BackBlaze for both computers in my home but I’m not entirely sure about paying $10/month for backups when I suspect others will let you backup an unlimited amount of computers, or at least two or three for the same price.
Why JungleDisk sucks
Recently I’ve been contemplating storing all my backups in The Cloud instead of using an external hard disk. On the Web Dev side I’ve been using Amazon S3 for some projects and I thought it would be cool to have automated backups from my Mac to S3. Unfortunately I couldn’t find any OS X S3 client that does automated backups and has a native UI, or a UI period. Everyone seems to use command-line s3backup ruby scripts to get the job done – no thanks
I decided to give JungleDisk a try since I recall they had a decent offering and used AWS S3. When you go to JungleDisk.com you’ll notice that they are now a division of Rackspace! Rackspace offer their own Cloud storage option called CloudFiles which is a direct competitor to Amazon S3/CloudFront. Jungledisk also use Amazon’s payment infrastructure to take your payment. All of this seemed very sketchy. I decided to give it a try anyways and I signed up for the $3/month desktop service. No free trial was available.
One advertised benefit of using JungleDisk is that they now give you the choice of either using S3 or CloudFiles as the backend storage for your files. CloudFiles (via JungleDisk) currently only charge for storage, not data in/out. This makes things cheaper than S3.
When installing the JungleDisk client on OS X the immediate impressions were not good. The client was clearly written in QT and did not use the native Mac Cocoa UI. Once more the client was AWFUL to use and even had some built in web server which you could access locally. why?! There was no way I wanted to use this interface on a daily basis. ugh.
One thing that occurred to me is when you install the JungleDisk client you have no indication of what storage backend you were using. I didn’t want to use S3 if CloudFiles was available. Hunting around the JungleDisk knowledge base revealed that if you want to use CloudFiles you have to sign up to Rackspace CloudFiles independently and then it automatically gets associated to your JungleDisk account… somehow. So off I went…
Signing up to CloudFiles give me the impression that the company was in an early beta stage. I had to wait for a telephone call from a person in the US (at 11pm GMT!) who would then activate my account over the telephone. On the plus side, the guy who I spoke to was very polite and the whole conversation only took a minute or two. Ironically after going through this whole process just to backup some data I was put off, and I decided to checkout the alternatives to JungleDisk, such as Mozy.com, Carbonite and BackBlaze.
Google DNS vs OpenDNS – performance comparison
Earlier today I hastily switched my router’s DNS forwarders from OpenDNS to Google DNS. Everything went OK and there aren’t any problems but I decided to perform a benchmark between OpenDNS and Google DNS for peace of mind. The results are below.
imac:~ napta2k$ ./dns_benchmark.sh twitter.com: Google: ;; Query time: 18 msec OpenDNS: ;; Query time: 16 msec google.com: Google: ;; Query time: 70 msec OpenDNS: ;; Query time: 13 msec bbc.co.uk: Google: ;; Query time: 19 msec OpenDNS: ;; Query time: 12 msec itv.com: Google: ;; Query time: 20 msec OpenDNS: ;; Query time: 16 msec tvguide.com: Google: ;; Query time: 19 msec OpenDNS: ;; Query time: 13 msec piratebay.org: Google: ;; Query time: 19 msec OpenDNS: ;; Query time: 12 msec yahoo.com: Google: ;; Query time: 19 msec OpenDNS: ;; Query time: 12 msec wordpress.com: Google: ;; Query time: 19 msec OpenDNS: ;; Query time: 13 msec playboy.com: Google: ;; Query time: 184 msec OpenDNS: ;; Query time: 15 msec
.
The test basically does a dig for each domain and records the returned Query time. The large difference in query time can be attributed to the fact that I’m physically close to the opendns.com network.
A traceroute to the OpenDNS resolvers show that my router is merely 4 hops away. A traceroute to GoogleDNS shows that my router is 20 hops away. The tests were performed from London using Level3.com as my ISP/transit. Likewise a ping to the OpenDNS resolvers show a round-trip latency of only 12ms. A ping to GoogleDNS resolvers show a round-trip latency of 20ms.
Update:
I have performed the exact same test from a server in New Jersey, USA in a nac.net data centre and OpenDNS is still far faster than GoogleDNS. Again a traceroute to OpenDNS from NJ/USA shows that the OpenDNS network is merely 5 hops away from my router. A traceroute to GoogleDNS shows that I am 10 hops away. Thus it is the network latency contributing to the overall slow query times.
napta2k@svr1:~$ ./dns_benchmark.sh twitter.com: Google: ;; Query time: 10 msec OpenDNS: ;; Query time: 1 msec google.com: Google: ;; Query time: 31 msec OpenDNS: ;; Query time: 2 msec bbc.co.uk: Google: ;; Query time: 13 msec OpenDNS: ;; Query time: 80 msec itv.com: Google: ;; Query time: 15 msec OpenDNS: ;; Query time: 1 msec tvguide.com: Google: ;; Query time: 10 msec OpenDNS: ;; Query time: 1 msec piratebay.org: Google: ;; Query time: 10 msec OpenDNS: ;; Query time: 2 msec yahoo.com: Google: ;; Query time: 10 msec OpenDNS: ;; Query time: 1 msec wordpress.com: Google: ;; Query time: 10 msec OpenDNS: ;; Query time: 2 msec .
The results of my ad-hoc testing seem to indicate that OpenDNS has a lower latency / better placed anycast platform than GoogleDNS ?
Hello, NXDOMAIN!
So today I moved my home router from the OpenDNS DNS servers over to the new Google public DNS resolvers. One of the reasons that I moved from OpenDNS to Google is that I got tired of the NXDOMAIN “trickery” that OpenDNS employ. I sometimes find myself doing a a fair amount of lookups with dig/nslookup and the fact that opendns will return their own IP is sometimes confusing.
My router’s DNS table originally had a single external DNS server for OpenDNS. Whilst the OpenDNS IP address is probably anycasted for performance and resilience having a single IP address is still a bit of a concern – what if the upstream route to 208.67.222.222/32 was unavailable?
:dns server route list
DNS Server Source Domain Metric Intf State
208.67.222.222 10 RoutedEthoA UP
My new solution is to load balance the Google DNS resolvers and use the OpenDNS server as a backup. The load balancing is achieved on my router by setting an identical metric. Theoretically this should make the router “flip” between DNS addresses with the same metric. I will also use OpenDNS for a backup. The Google DNS offering is quite new (released yesterday) so who knows how stable it will be.
I also have concerns about load balancing DNS IPs. You sometimes see DNS master and slaves becoming out of sync when a zone file is modified because the slave has yet to receive a NOTIFY or I/AXFR in a traditional DNS environment. Although I’m sure that the Google DNS resolvers don’t use the traditional concepts of DNS master/slaves, and instead they’re some Infrastructure 2.0 multi-master, federated, atomic entity which employ anycast.
My new DNS routing table looks like this:
:dns server route list
DNS Server Source Domain Metric Intf State
8.8.8.8 20 RoutedEthoA UP
8.8.4.4 20 RoutedEthoA UP
208.67.222.222 30 RoutedEthoA UP
As far as performance goes, I haven’t done any benchmarking between OpenDNS and Google DNS (hopefully someone will) but I am presuming that the Google offering is faster than it’s rivals, merely because it’s google.
One concern that I have about every device on your network using the router as a DNS forwarder/proxy is that, whilst you leverage things such as automatic rDNS for your home lan (e.g. DHCP client gets both an A and PTR record registered on the router) some routers experience performance problems acting as a busy DNS forwarder, and sometimes timeout / require a reboot, thus killing DNS on your network. The “safer” approach is to have your router’s DHCP server issue a set of DNS resolvers, and any modifications you make will get picked up on the next DHCP lease.
A bonus point is that if you ever travel or are setting up new equipment the Google public DNS resolver IP addresses are incredibly simple to remember: 8.8.8.8. Never again will you scramble to lookup the IP of the OpenDNS resolvers
Further reading:
http://code.google.com/speed/public-dns/
http://blog.opendns.com/2009/12/03/opendns-google-dns/
China gets horny
Today I woke up to several alerts from Linode informing me that one of my VPS nodes was exceeding the Disk I/O threshold that I had set. Curiously this VPS is used as a HTTP web proxy and whilst it gets about 300-400 visitors per day (mainly china) this morning I was seeing over 800 visitors in Google Analytics.
Attempting to ssh to the server failed with timeouts although the PHP web application was still responding to requests over HTTP fine. I suspect sshd was failing to reverse-lookup my IP address in any reasonable amount of time, or perhaps IP Tables – (Note to self: Look in to why that happened). Thankfully Linode provide out of band / console access via SSH and AJax so all was not lost.
Looking at the Network rrdgraph it shows that the server was approaching 7Mbit/s of HTTP traffic and almost 50GB had been consumed today alone. Whilst the server seemed to handle the load without problem (minus ssh access) consuming 50GB+ per day would quickly max out my monthly data transfer allowance with Linode – this wasn’t acceptable. I modified the firewall to accept HTTP/HTTPS traffic from my IP only in order to investigate and the load suddenly stopped and SSH was alive again.
Initially I had suspected that some sort of automated bot was using ehproxy.info to do automated scans and attacks but a closer inspection of the traffic showed an even number of distributed IPs (all from China – as Google Analytics confirms) all clicking various porn sites. I guess everyone in China was feeling horny this afternoon!
Further analysis of the access.log shows that the server (Linode XenU VPS with 720MB of ram) was handling 62 hits sec (2428863/39600) and lighttpd was dealing with the load no problem. Pretty good considering this is a pure PHP application utilising php-cgi.
For the record, the top five IP addresses were:
Hits : IP address
Show limits of a running process in Linux
A rather simple but often asked question was put forward to me today: How can I see the maximum amount of file descriptors my running process can open? (without killing the process!)
Typically one would say ‘check ulimit -n’ but lets say that a thread-driven or event-driven application like varnish or lighttpd is configured with an arbitrary amount of open file descriptors and you want to verify that they have taken effect before the application crashes.
A simple way to check this (atleast on Linux 2.6.26-1 or later) is to run:
svr1:~# awk '/Max open files/{ print $4}' /proc/$(pgrep -n lighttpd)/limits
1024
As you can see the above command returned the value of max open files for the running process. This means you can be sure that your lighttpd or varnish application will not suddenly die after being starved of file descriptors!
I have included the entire output of the limits table for the lighttpd process for completeness:
svr1:~# cat /proc/$(pgrep -n lighttpd)/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited ms Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 5824 5824 processes Max open files 1024 1024 files Max locked memory 32768 32768 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 5824 5824 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us


