I ran into an issue where we had all jobs in particular failing on a certain slave. Let’s just assume a simple Hudson master/slave environment of 1 Hudson Master and 2 Hudson slaves Slave A and Slave B.
The boxes are the exact same hardware specs. Same configuration, etc. Jobs are created on the master and are given to either slave A or slave B depending on available resources.
When a Job was handed off the Slave A the job would execute perfectly, when given to Slave B the jobs would have failures everytime trying to connect to other servers and deploy code.
Given that the job was actually running on Slave B we decided that it had to be some sort of connectivity issue. Given that the boxes were setup exactly the same however we spent a lot of wasted time testing various potential networking issues and so forth with no luck.
After re-examining the failed log statement we decided to try to actual steps fo the build job line by line. This gave us more information on the output of a particular command.
What we found was that the user that the Hudson jobs were running through was not configured properly. This resulted in various permission and connection errors when trying to connect from the Hudson slave to a downstream server.