The other day I was watching our Master Build server, running Hudson, send off about 50 different jobs to the 10 Hudson slaves we have. I just sat there and thought… “Wow that looks cool!”
When I first implemented the solution I admit that I was a little wary of how it would operate. After having it run for over a year I think I can safely say that it has performed very well.
The configuration that we use is basically one Master Hudson server as a director. This hudson server queues up all of the build jobs and sends them off to the slave servers that in turn perform the work. We currently have about 10 slaves so we put a maximum of 5-6 simultaneous jobs from each server (depending on the hardware specs).
You can accomplish this setup by going to “Manage Hudson”. Then select the category for “Manage Nodes”. You then add in the various children (slaves) that you want to setup to create your Master/Slave Hudson pool. Within those settings you can then dictate how many jobs to send to each server, scheduling, and so forth.
We found that with this type of a setup within the cloud we basically just throw all kinds of requests to this server configuration. PHP Code Sniffing against our code repo, JS Lint checking, production based scraping utilities and health checks, even performance monitoring tools run through this system due to it’s centralized management and reporting styles.
About 4-6 jobs per server seems to be about the sweet spot. You can run the servers a little hotter imo but I like this number because it allows you ensure that a job does not take longer simply due to resources. This may play into effect if you are using the servers for health monitoring or performance monitoring. You would not want those hudson jobs to run slower simply because the server was overloaded.
This system however has not been without it’s troubles. The first issue we ran into was properly getting the commands to sync with each other. For example – the Master server actually starts the service on the slave. You do not start each slave with it’s own hudson instance. This will in fact create multiple hudson instances in your environment. The JAR files for independant running and slave based running are in different locations. The master hudson server connects to the slave server and runs the slave based jar file.
You also need to ensure that you have a very solid backup of the job configuration of your master hudson server. Make this a requirement before even inputing any information. If you have a 10 server pool with 50 jobs going across at any given moment – if that master fails – you better be able to promote a slave into the master position and get it running quickly. We use both a plugin based backup utility – as well as a generic linux based script that does a file by file backup of the system as any given interval of time.
Another problem you may run into will be permissions from the slave boxes to any other servers they may need to interact with. The master Hudson server sends the job information to the salve Hudson server for processing. If within the job the Hudson server needs to ssh, curl, sftp, etc… to another server – then you had better be sure that each slave has the ability to talk to those destination servers. You do have the ability to designate which Hudson slave you want a particular job to run on – but I recommend not using that feature unless necessary as it defeats the purpose of balanced load distribution of your build jobs.