The occassional trials and tribulations of a jack of all tr ades sysadmin in a startup in Silicon Valley
My boss recieved a phone call from our alarm company about 8pm one evening. He was informed that the temperature sensor in the server room tripped. They didn't know what temperature it was in the room, they only knew that the sensor had tripped and that it was still tripped. When he arrived, the room was 86 degrees. It seems our newly installed AC unit had failed.
As the backup (original) AC unit could not even begin to keep up with our processing machines, he relocated all of those machines to the secondary server room. Thinking the problem temporarily solved, he retired for the night. When I arrived the next morning, that secondary room was almost 80. I arrived about the same time that all of the engineers were arriving and were beginning to use those processing machines more seriously. Within an hour, the room was over 90 and a few of the cases were quite hot to the touch. Once again at a critical temperature, I powered off the machines and investigated my options.
All of our servers are powered by Opteron's, and nearly all of the motherboards support PowerNow, we double-checked that all of our processing machines were configured to take advantage of this dynamically adjusting clock speed (HP's PowerNow for linux instructions Gentoo instructions. With the primary room, maintaining a mid 70s, and the secondary room staying above 80, we forcibly set all of the processors to the slowest speeds available.
Later that day, we had new circuits installed in each of our server rooms to handle our two ton Movincool. And our primary AC unit had been inspected, with a slated time to fix it being the next day once new parts arrive.
Now a portable AC unit like the Movincool is a reasonable backup unit (it had previously been used as the primary cooler in a previous server room), it is not really the best solution. The correct solution is to install two HVAC systems in the server room, either of which could handle the full load of the room. Then, run each of them at half utilization. One day.
[2007/02/09 | /hardware | permanent link]