Monday, March 31, 2008

Oxford adds another space token

With some advice from Graeme I have added another space token at Oxford
Commands recorded for posterity.

dpm-reservespace --gspace 3T --lifetime Inf --group atlas/Role=production --token_desc ATLASMCDISK
dpns-mkdir /dpm/
dpns-chgrp atlas/Role=production /dpm/
dpns-chmod 775 /dpm/
dpns-setacl -m d:g:atlas/Role=production:7,m:7 /dpm/
dpns-setacl -m g:atlas/Role=production:7,m:7 /dpm/

We started our DIY move today, two trips in the lab van with 8 1U servers each time.
Tomorrow we plan to move more old WN's and then an empty rack later in the week.

Friday, March 28, 2008

SouthGrid Update

The first stage of the HPC cluster is running LCG jobs, and is being correctly accounted for.

The HPC WNs have AMD 2218 cores, 2.6GHz; these are said to be
1.745 KSi2K each core.
Currently gridpp can run max 32 jobs on this small stage 1 HPC cluster;

Santanu is continuing to work with LHCb to solve all the problems running their code at Cambridge.
The WNs will be upgraded to SL4 within the next few weeks.

When over 100 of the 120 Babar cluster died after a power shutdown at the end of January, it was deemed not worth restoring the cluster. Two twin 1 u servers have been bought to replace this which will provide 32 cores and 78.4KSI2K.
The old escience cluster is being setup as an SL4 ce and WN farm in as a template for the way they will drive the new University 'Blue Bear' HPC cluster. This cluster is made up of 31 dual 3GHz xeons.
The main grid cluster (aka the atlas cluster) has been expanded to 60 cores.

The PPS is not being maintained at the moment.

Chirs has got space tokens working at RALPP ( updated to dCach1.8-12p6 - from p4 - and also rebooted everything after putting in the srmSpaceManager enabled config files).
The new hardware has been installed:
8 boxes, 16 nodes, 32 CPUs so 128 cores.

CPUs are "E5410 @ 2.33GHz" not sure of the kSI2k rating yet.

Running stable. WNs were updated to SL4 earlier this year.

Quotes to move the kit to Begbroke seem too high so we are going to adopt a DIY approach.
Draining t2se01 is taking for ever. The dpm-drain command terminates sometimes after only 20mins (~6GB data transfered). We did however have a good run on the night of the 26th which lasted over 10 hours.
Oddly the error log files are often the same size although not totally consistent.

-rw-r--r-- 1 root root 22519 Feb 22 14:32 dpm-drain-errorlog-se01-1
-rw-r--r-- 1 root root 22519 Feb 22 15:56 dpm-drain-errorlog-se01-2
-rw-r--r-- 1 root root 22564 Feb 22 17:05 dpm-drain-errorlog-se01-3
-rw-r--r-- 1 root root 22519 Feb 22 18:20 dpm-drain-errorlog-se01-4
-rw-r--r-- 1 root root 22519 Feb 25 12:35 dpm-drain-errorlog-se01-5
-rw-r--r-- 1 root root 22519 Feb 25 13:04 dpm-drain-errorlog-se01-6
-rw-r--r-- 1 root root 22519 Feb 25 16:53 dpm-drain-errorlog-se01-7
-rw-r--r-- 1 root root 22519 Mar 26 13:51 dpm-drain-errorlog-se01-8
-rw-r--r-- 1 root root 22519 Mar 26 14:17 dpm-drain-errorlog-se01-9
-rw-r--r-- 1 root root 1193287 Mar 27 00:56 dpm-drain-errorlog-se01-10
-rw-r--r-- 1 root root 567836 Mar 27 13:59 dpm-drain-errorlog-se01-11
-rw-r--r-- 1 root root 25241 Mar 27 14:57 dpm-drain-errorlog-se01-12
-rw-r--r-- 1 root root 22598 Mar 27 15:37 dpm-drain-errorlog-se01-13
-rw-r--r-- 1 root root 22598 Mar 27 16:22 dpm-drain-errorlog-se01-14
-rw-r--r-- 1 root root 22598 Mar 27 17:11 dpm-drain-errorlog-se01-15
-rw-r--r-- 1 root root 22598 Mar 27 21:55 dpm-drain-errorlog-se01-16
-rw-r--r-- 1 root root 22598 Mar 27 23:15 dpm-drain-errorlog-se01-17

Tuesday, March 25, 2008

Oxford Update

The Original 74 cpu SL3 Dell workernodes have been taken down in preparation for reinstallation as SL4 worker nodes.
We will maintain a separate ce to drive these but intend to separate out the torque server.
The new ce will be an SL4 ce.
The ce and torque server are likely to be virtual machines running under VMware.

The move of the 4 racks up to Begbroke is still uncertain.
We are awaiting quotes from companies to move the equipment for us. The DIY price of just paying for a truck and driver has also been looked into but issues of insurance and warantee's may prevent use of this option.

The steps up to the new computer room have been made smaller to allow installation of a scissor lift, to lift racks to the false floor height. The date for the lift to be installed is still unclear.

The grand opening on the 15th April is all too close.