Monday, April 28, 2008

Oxford DPM progressing Slowly

Removing zero length files from the DPM storage pool with the rfrm command has helped the dpm-drain command to start progressing again.
The command still fails after 10-20 files and the transfer speed in very low, but at least we are making progress.

Monday, April 14, 2008

SouthGrid Update

Having installed half of the Oxford cluster at Begbroke last Tuesday. The Air conditioning failed during the night, a valve on the Chillers failed cutting off the water supply to the Chillers which in turn switched themselves off. The room rapidly heated up to >40 degrees. After investigation and repairs the AC went back on and all has been well so far. More automated warning systems are required.

Cambridge have set up space tokens for both ATLASDATADISK and ATLASMCDISK. They have also started upgrading to SL4 (64bit) worker nodes.

Britol completed upgrading the Worker nodes to SL4 on Monday. They had some problems caused by se linux, preventing normal loging but all now appears well.

Tuesday, April 08, 2008

Oxford Update

Last week Ewan and I started the DIY move to Begbroke. We moved 40 1 u servers over two mornings. One of the (now empty) Dell racks was moved on Wednesday afternoon.
The worker nodes were reinstalled in that rack on Thursday, we had one psu failure out of 27 nodes. These nodes will be installed with sl4 shortly.

This week we have emptied one of the Viglen racks and moved the servers yesterday.
We hope to move the rack this afternoon and get the worker nodes back on line asap as we are at half capacity currently.

On Firday 28th March we had one of our new 9TB file servers burn out its backplane. This is very similar to the problems RAL have been seeing. The backplane was swapped out and the server is back on line now.
We are in talks with the supplier.

Oxford Update