Friday, December 19, 2008

Automount problems on torque server

We've been having a few problems with our torque server failing to automout disks randomly.

Most of the time the mounts succeeded but occasionally they would fail with just:

Dec 19 08:05:06 heplnx201 kernel: RPC: error 5 connecting to server nfsserver
Dec 19 08:05:06 heplnx201 automount[23438]: >> mount: nfsserver:/opt/ppd/mount: can't read superblock
Dec 19 08:05:06 heplnx201 automount[23438]: mount(nfs): nfs: mount failure nfsserver:/opt/ppd/mount on /net/mount
Dec 19 08:05:06 heplnx201 automount[23438]: failed to mount /net/mount
Dec 19 08:05:07 heplnx201 kernel: RPC: Can't bind to reserved port (98).
Dec 19 08:05:07 heplnx201 kernel: RPC: can't bind to reserved port.

With the wonders of Google I was able to find out that error 98 is address in use and that what is going on is that the client is unable to find a free port in it's port range to initiate the connection to the server.

The culprit seems to be torque, which when I checked with a netstat -a was using very single port from 600 to 1023, which quite neatly overlaid the nfs client port range of 600-1023.

Here Google failed me and I was unable to find anyway to limit the port range used by torque.

So for now I've taken the quick option of extending the nfs client port range down to port 300 with:

echo 300 > /proc/sys/sunrpc/min_resvport

I think I'd like to move the nfs client port range out of the priveledged port range altogether. I think this should be possible, the RFC says that it SHOULD use a port below 1023 but MAY use a higher port, but I'd like to test it a bit before I configure a major server like that.

static-file-Cluster.ldif edit required post yaim at Oxford

Every time we run yaim at Oxford we have to fix the number of cpu's in our cluster by hand.
on t2ce02:
diff static-file-Cluster.ldif-fixed /opt/glite/etc/gip/ldif/static-file-Cluster.ldif
< GlueSubClusterPhysicalCPUs: 384
> GlueSubClusterPhysicalCPUs: 2
[root@t2ce02 ~]# cp static-file-Cluster.ldif-fixed /opt/glite/etc/gip/ldif/static-file-Cluster.ldif

On t2ce04:
Physical cpu's needs to be 74. After the change the ldap query shows:
ldapsearch -x -H ldap:// -b Mds-vo-name=UKI-SOUTHGRID-OX-HEP,o=grid|grep -i physicalcpu
GlueSubClusterPhysicalCPUs: 74
GlueSubClusterPhysicalCPUs: 384

Tuesday, December 16, 2008

EFDA-JET Service nodes upgraded to glite 3.1

We upgraded our service nodes to Scientific Linux 4.7 and glite-3.1. The worker nodes had been upgraded earlier. The problems/issues we had while upgrading to Scientific Linux 4.7 are listed below:

Storage Engine

While installing the SE glite middleware (glite-SE_dpm_mysql), there was
a missing dependency issue for the perl-SOAP-Lite package.

Error: Missing Dependency: perl-SOAP-Lite >= 0.67 is needed by package

Doing a

# yum install perl-SOAP-Lite

only installs perl-SOAP-Lite-0.65, which is lower than the version needed.

The perl-SOAP-Lite rpm was downloaded from a different repository. We
initially downloaded the perl-SOAP-Lite-0.67.el4 but this one failed to install as it needed MQSeries and other packages to be installed. We finally downloaded perl-SOAP-Lite-0.67-1.1.fc1.rf.noarch.rpm and it installed without any problems.

When the node was configured by yaim, the following error was obtained

sed: can't read /opt/bdii/etc/schemas: No such file or directory

The file /opt/bdii/etc/schemas was missing. The fix is to copy the schemas.example file to schemas

# cp -i /opt/bdii/doc/schemas.example /opt/bdii/etc/schemas

First SAM test failed. lcg-lr was missing, we needed to install lcg_util.
This installed a new version of lcg_util that was on the other nodes. lcg_util
was then updated on all the nodes.

Compute Element (& site BDII)

We run the compute element service and the site BDII service on the same node.

While installing the glite-BDII packages, we obtained the following dependency errors.

Error: Missing Dependency: glite-info-provider-ldap = 1.1.0-1 is needed by package glite-BDII
Error: Missing Dependency: glue-schema = 1.3.0-3 is needed by package glite-BDII
Error: Missing Dependency: bdii = 3.9.1-5 is needed by package glite-BDII

Using yum to install the missing packages, installs these packages at a higher level and still causes the installation of glite-BDII packages to fail, as it needs these packages at the versions listed above. These packages were instead installed by hand. A GGUS ticket (Ticket-ID: 42456), which suggested that this problem is fixed in the latest release (update 34).

As with the SE install above, we had the same problem with the schemas file, missing. The above fix was repeated here.

When running yaim, we had the following errors,

grep: a: No such file or directory
grep: VO: No such file or directory
grep: or: No such file or directory
grep: a: No such file or directory
grep: VOMS: No such file or directory
grep: FQAN: No such file or directory
grep: as: No such file or directory
grep: an: No such file or directory
grep: argument: No such file or directory
qmgr: Syntax error - cannot locate attribute
set queue lhcb acl_groups += /opt/glite/yaim/bin/yaim: supply a VO or a VOMS FQAN as an argument

To fix it we edited the file /opt/glite/yaim/functions/utils/users_getvogroup and commented out

#echo "$0: supply a VO or a VOMS FQAN as an argument"

On Gstat web monitoring page, it was being reported that the SE service was missing ('SE missing in Gstat service'). To fix this problem, we edited the file /opt/bdii/etc/bdii-update.conf and add the following line for our SE.

SE ldap://,o=grid

Mon Box

When running yaim, we had the following errors

Problem starting rgma-servicetool

Starting rgma-servicetool: [FAILED]
For more details check /var/log/glite/rgma-servicetool.log
Stopping rgma-gin: [ OK ]
Starting rgma-gin: [FAILED]

Fixed by defining a new java by adding the following to the site-info.def

if [ "$HOSTNAME" == "$MON_HOST" ] ; then

We had the same 'schemas' file missing problem here as well.


EFDA-JET has a slightly unusually set up as we are restricted to a small number of external IP addresses. All nodes are on the same LAN with private IP addresses, whilst the service nodes also have external addresses. In the hosts files on the service nodes, all service nodes are referenced by their external addresses, whilst on the worker nodes, the service nodes are referenced by their private addresses.

This worked well for glite 3.0, but not for glite 3.1, where we saw clients on the worker nodes trying to contact the service nodes via their external addresses. It looks like glite 3.1 iservices are passing IP addresses for clients to be call back on at a later time. The complete solution was to run iptables on the worker nodes and NAT translate outgoing connections for external addresses of the service nodes to their corresponding internal addresses. This was done by adding the following to /etc/rc.local on the worker nodes.

/sbin/service iptables start
/sbin/iptables -A OUTPUT -t nat -d <CE-ext-addr> -j DNAT \
--to-destination <CE-int-addr>
/sbin/iptables -A OUTPUT -t nat -d <SE-ext-addr> -j DNAT \
--to-destination <SE-int-addr>

Thursday, December 04, 2008

dCache Update

We updated dCache this morning to 1.9.0. Now that sounds like a major jump but reading the release notes it is only a minor step up from the 18.0.15pX series of releases.

The upgrade itself was trivial, just installing the new dcache-server rpm and running across all the nodes.

We also took the opportunity to update the version of Postgresql on the head node from 8.3.1 to 8.3.5 using rpms from I'm hoping that I will now be able to use their prebuilt slony-1 rpm to set up master slave mirroring of the databases from the dCache head node to a live mirror node.

Finally we updated the SL version of all the dCache nodes to SL4.6 from a mix of SL4.4, SL4.5 and SL4.6. We're now using the SL-Contrib xfs kernel modules on all nodes and the Araca drivers complied into the 2.6.9-78 series of kernels on all nodes with Areca raid cards rather than our own builds and have had no issues.

Wednesday, October 15, 2008


Today I ran Graeme's script to fix the acls on the ATLASLOCALGROUPDISK space token.
Should have done this a few weeks ago but ..
There is nothing currntly stored here yet.

[root@t2se01 ~]# ./
Debug: - - atlaslocalgroupdisk
Fixing permissions on /dpm/
Searching /dpm/

dpns-ls /dpm/
shows nothing

dpns-getacl /dpm/
# file: /dpm/
# owner: root
# group: atlas/uk
group::rwx #effective:rwx
group:atlas/Role=lcgadmin:rwx #effective:rwx
group:atlas/Role=production:rwx #effective:rwx
group:atlas/uk:rwx #effective:rwx

Thursday, October 09, 2008

SouthGrid update

The Birmingham site has suffered some reliability problems caused by Site Networking problems.
Physics are working with central IS to resolve these issues.

Bristol has been having problems with their SE. Despite work over the weekend to fsck all the partitions by hand, the array is still causing problems.

Oxford has recently been having a strange Maui problem that means only about 60-70% of the available cores get allocated jobs. Manually 'qrun' ing the jobs causes them to run ok.
Then more recently the maui process actually started crashing. Investigations are ongoing although things seem a bit better just now.

RalPPD have installed the latest purchase of WNs into production adding another 160 job slots worth 270 kSI2k bringing us up to 1025kSI2k Total. Further disk servers have arrived but will take a month to commission.

Thursday, August 28, 2008

Upgrade at RALPP

We had a downtime this morning to:
  1. Upgrade the kernel on the Grid Service and dCache pool nodes
  2. Install the latest gLite updates on the service nodes
  3. Upgrade dCache to the most recent patch version
Upgrading gLite and the kernel on the service nodes seems to have gone smoothly (still waiting for the SAM Admin jobs I submitted to get beyond "Waiting").

However, I had a bit more fun upgrading the kernel on the dCache Pool nodes. This is supposed to be much easier now the Areca drivers are in the stock SL4 kernel and the xfs kernel modules are in the SL Contrib yum repository so we don't have to build our own rpms like we have in the past, and indeed both these parts worked fine. But five of the nodes with 3ware cards did not appear again after I (remotely) shutdown -r now'd them. Of course these are the nodes in the Atlas Center so I had to walk across to try to find out what the problem was. The all seemed to have hung up at the end of the shutdown "Unmounting the filesystems". All came back cleanly after I hit the reset buttons.

The second problem (which had me worried for a time ) was with one of the Areca nodes. I was checking them to see if the XFS kernel modules had installed correctly and that the raid partition was mounted and on this node it wasn't, but the kernel modules had installed correctly. Looking a bit harder I found that the whole device seemed to be missing. Connecting to the RAID Card web interface I find that instead of two RAID sets (system and data) it has the two system disks in a RAIDo pair and 22 free disks (cue heart palpitations). Looking (in a rather panicked fashion) through the admin interface options I find "Rescue RAID set" and give it a go. After a reboot I connected to the web interface again and now I see both RAID sets. Phew! It's too early to start the celebrations though becuase when I log in the partition isn't mounted and when I try by hand it complains that the Logical Volume isn't there. Uh oh, cue much googling and reading of man pages.

pvscan sees the physical volume, vgscan sees the volume group and lvscan sees the logical volume but it's "NOT Available". I tried vgscan --mknodes, no that didn't work. I finally got it working with:

vgchange --available y --verbose raid

Then I could mount the partition and all the data appeared to be there.

After all that the upgrade of dCache was very simple. Just a case of adding the latest rpm to my yum repository, running yum update dcache-server then /opt/d-cache/install/ The latter complained of some depreciated config file options I'll have to look at but dCache came up

I'd qsig -s STOP'd all the jobs whist doing the obviously and here's an interesting ploy of the network traffic into the Worker Nodes over the last day.

As you can see once I restarted the jobs they more or less picked up without missing a beat. And yes, they are reading data at 600 MB/sec and the dCache is quite happily serving to them at that rate.

Wednesday, August 27, 2008

More spacetokens at Oxford

Expanding the spacetokens at Oxford showed that the dpm-updatespace command has to have integer values so for 4.5T use 4500G

/opt/lcg/bin/dpm-updatespace --token_desc ATLASMCDISK --gspace 4500G --lifetime Inf

I used Graemes script to setup the ATLASGROUPDISK permissions after the reservespace command:

/opt/lcg/bin/dpm-reservespace --gspace 2T --lifetime Inf --group atlas/Role=production --token_desc ATLASGROUPDISK

Graeme's script:

[root@t2se01 ~]# more

DOMAIN=$(hostname -d)

dpns-mkdir /dpm/$DOMAIN/home/atlas/atlasgroupdisk/
dpns-chgrp atlas/Role=production /dpm/$DOMAIN/home/atlas/atlasgroupdisk/
dpns-setacl -m d:g:atlas/Role=production:7,d:m:7 /dpm/$DOMAIN/home/atlas/atlasgroupdisk/

for physgrp in exotics higgs susy beauty sm; do
dpns-entergrpmap --group atlas/phys-$physgrp/Role=production
dpns-mkdir /dpm/$DOMAIN/home/atlas/atlasgroupdisk/phys-$physgrp
dpns-chgrp atlas/phys-$physgrp/Role=production /dpm/$DOMAIN/home/atlas/atlasgroupdisk/phy
dpns-setacl -m d:g:atlas/phys-$physgrp/Role=production:7,d:m:7 /dpm/$DOMAIN/home/atlas/at

ATLASDATADISK space was increased to 15TB
dpm-updatespace --token_desc ATLASDATADISK --gspace 15T --lifetime Inf

ATLASLOCALGROUPDISK was created and setup:
/opt/lcg/bin/dpm-reservespace --gspace 1T --lifetime Inf --group atlas --token_desc ATLASLOCALGROUPDISK

dpns-mkdir /dpm/

dpns-chgrp atlas/uk /dpm/
dpns-setacl -m d:g:atlas/uk:7,m:7 /dpm/
dpns-setacl -m g:atlas/uk:7,m:7 /dpm/

Wednesday, August 20, 2008

Brief Bristol Update

Brief Bristol update: new hardware to replace HPC CE received &
being built. New hardware for StoRM SE & a gridftp nodes received,
Dr Wakelin building them.
Our 50TB of new storage should be ready in September.

New hardware to replace MON received, being built. Will replace small
cluster WN this fall (possibly increase number) & possibly also
its CE & DPM SE.

Both clusters mostly stable, except for occasional gpfs timeouts on
HPC & recent intermittent problems with SCSI resets on DPM SE.

Delays due to Yves, Jon & Winnie very busy with other very high prio.

Monday, August 18, 2008

Setting up the Atlas Space Tokens on dCache

Well the request from Atlas to have space tokens set up is quite complicated but here's my first attempt at setting them up for dCache:

The want to have different permissions on different space tokens. I think the only way to do that is to create different LinkGroups to associate with the space tokens. Here is the section from my LinkGroupAuorization.conf file for Atlas now:
LinkGroup atlas-link-group

LinkGroup atlas-group-link-group

LinkGroup atlas-user-link-group

LinkGroup atlas-localgroup-link-group
However it appears a Link can only be associated with one LinkGroup so we also have to create a Link for each of these. Luckily it appears that a PoolGroup can be associated with multiple links so we don't have to split up the Atlas space (phew).

So I created a bunch of Links and LinkGroups in the PoolManager like this:
psu create link atlas-localgroup-link world-net atlas
psu set link atlas-localgroup-link -readpref=20 -writepref=20 -cachepref=20 -p2ppref=-1
psu add link atlas-localgroup-link atlas-pgroup
psu add link atlas-localgroup-link atlas
psu create linkGroup atlas-localgroup-link-group
psu set linkGroup custodialAllowed atlas-localgroup-link-group false
psu set linkGroup replicaAllowed atlas-localgroup-link-group true
psu set linkGroup nearlineAllowed atlas-localgroup-link-group false
psu set linkGroup outputAllowed atlas-localgroup-link-group false
psu set linkGroup onlineAllowed atlas-localgroup-link-group true
psu addto linkGroup atlas-localgroup-link-group atlas-localgroup-link
Obviously repeated for each of the other extra LinkGroups

Then it's just a case of creating the space tokens in the SrmSpaceManager:
reserve -vog=/atlas -vor=NULL -acclat=ONLINE -retpol=REPLICA -desc=ATLASUSERDISK -lg=atlas-user-link-group 2500000000000 "-1"
reserve -vog=/atlas/uk -vor=NULL -acclat=ONLINE -retpol=REPLICA -desc=ATLASLOCALGROUPDISK -lg=atlas-localgroup-link-group 9000000000000 "-1"
reserve -vog=/atlas -vor=production -acclat=ONLINE -retpol=REPLICA -desc=ATLASGROUPDISK -lg=atlas-group-link-group 3000000000000 "-1"
I'm not sure the last one will work as expected I don't know how the -vog=/atlas will map with the multiple VOMS groups in the LinkGroupAuthorization.conf file. But I've no idea how to specify multiple VOMS groups there.

OK, that should get us the Space tokens, but Atlas are also requesting specific permissions on directories and that's completely orthogonal to the space tokens. All I've got to play with there are the normal UNIX users and groups.

So I start off by creating 6 extra groups and amking it the primary group for a single pool account (which is also in the main atlas group) , I also add the atlasprd account to the physics group groups since they want that to have write access to the group areas. Here's the relevant bit from /etc/groups, you can work out the changes to /etc/passwd yourselves.
Now I've got the users and groups set up I can create the directories:
mkdir /pnfs/
chown atlas007:atl-uk /pnfs/
chmod 755 /pnfs/
[root@heplnx204 etc]# ls -l /pnfs/
total 3
drwxrwxr-x 1 atlas005 atl-b 512 Aug 18 13:21 phys-beauty
drwxrwxr-x 1 atlas002 atl-exo 512 Aug 18 13:21 phys-exotics
drwxrwxr-x 1 atlas003 atl-higg 512 Aug 18 13:21 phys-higgs
drwxrwxr-x 1 atlas006 atl-sm 512 Aug 18 13:21 phys-sm
drwxrwxr-x 1 atlas004 atl-susy 512 Aug 18 13:21 phys-susy
But no I have to make sure dCache maps the right voms credentials to the correct account:
First of in /etc/grid-security/storage-authzdb
authorize atlas001 read-write 37101 24259 / / /
authorize atlas002 read-write 37102 24358 / / /
authorize atlas003 read-write 37103 24359 / / /
authorize atlas004 read-write 37104 24360 / / /
authorize atlas005 read-write 37105 24361 / / /
authorize atlas006 read-write 37106 24362 / / /
authorize atlas007 read-write 37107 24365 / / /
authorize atlasprd read-write 51000 24259 / / /
and in /etc/grid-security/grid-vorolemap
# Added role /alice/Role=production
"*" "/alice/Role=production" aliceprd

# Added role /atlas
"*" "/atlas" atlas001
"*" "/atlas/phys-exotics" atlas002
"*" "/atlas/phys-higgs" atlas003
"*" "/atlas/phys-susy" atlas004
"*" "/atlas/phys-beauty" atlas005
"*" "/atlas/phys-sm" atlas006
"*" "/atlas/uk" atlas007

# Added role /atlas/Role=lcgadmin
"*" "/atlas/Role=lcgadmin" atlas001

This has not been fully tested yet, in particular it's not clear the the ATLASGROUPDISK space token will handle the way I expect.

Oh, and doing this has once again made me realise that I don't really understand what Units and Links are and do in dCache, so I'm offering a beer to anyone who can explain this to me.

Update on 28/08/08

It looks like this doesn't work fully, dCache doesn't support secondary groups so the atlasprd user who is in group atlas cannot write to the /pnfs/* areas even though it has secondary group membership of the groups which do have write access. I'm now waiting for feedback from atlas to know how they want the permissions configured in view of this.

Friday, July 25, 2008

Adding multiple clusters to get different memory limit queues

I've been thinking about doing this for ages. The aim is to have different queues with different memory limits to better direct jobs with higher memory requirements to nodes with more memory.

The current method of doing this is to set up up a separate queue for each level with the default memory requirement set. Then in the information system publish separate clusters and subclusters for each of the queues.

So I first created grid500, grid1000 and grid2000 queues with no memory limits, configured them normal using yaim so they would accept jobs from all my supported VOs and checked that job submission to them worked as expected.

I then edited the static-file-Cluster.ldif file on the CE to add extra clusters and subclusters for each of the queues and set the memory for each of the clusters to the memory for the queue. So for example for the grid500 queue I created a cluster like so:

objectClass: GlueClusterTop
objectClass: GlueCluster
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueForeignKey: GlueSiteUniqueID=UKI-SOUTHGRID-RALPP
GlueInformationServiceURL: ldap://,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

objectClass: GlueClusterTop
objectClass: GlueSubCluster
objectClass: GlueHostApplicationSoftware
objectClass: GlueHostArchitecture
objectClass: GlueHostBenchmark
objectClass: GlueHostMainMemory
objectClass: GlueHostNetworkAdapter
objectClass: GlueHostOperatingSystem
objectClass: GlueHostProcessor
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_1_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_1_1
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_2_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_3_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_3_1
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_4_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_5_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_6_0
GlueHostApplicationSoftwareRunTimeEnvironment: LCG-2_7_0
GlueHostApplicationSoftwareRunTimeEnvironment: GLITE-3_0_0
GlueHostApplicationSoftwareRunTimeEnvironment: RALPP
GlueHostApplicationSoftwareRunTimeEnvironment: SOUTHHGRID
GlueHostApplicationSoftwareRunTimeEnvironment: GRIDPP
GlueHostApplicationSoftwareRunTimeEnvironment: R-GMA
GlueHostArchitectureSMPSize: 2
GlueHostArchitecturePlatformType: i586
GlueHostBenchmarkSF00: 0
GlueHostBenchmarkSI00: 1000
GlueHostMainMemoryRAMSize: 500
GlueHostMainMemoryVirtualSize: 1000
GlueHostNetworkAdapterInboundIP: FALSE
GlueHostNetworkAdapterOutboundIP: TRUE
GlueHostOperatingSystemName: ScientificSL
GlueHostOperatingSystemRelease: 4.4
GlueHostOperatingSystemVersion: Beryllium
GlueHostProcessorClockSpeed: 2800
GlueHostProcessorModel: PIV
GlueHostProcessorVendor: intel
GlueSubClusterPhysicalCPUs: 0
GlueSubClusterLogicalCPUs: 0
GlueSubClusterTmpDir: /tmp
GlueSubClusterWNTmpDir: /tmp
GlueInformationServiceURL: ldap://,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3

Note that there has to be a blank line in the file after the end of the subcluster definition or else the gip script that adds the VO tags doesn't.

I also had to edit the CE/Queue entries in static-file-CE.ldif to change these two entries for each of the grid500, grid1000 and grid2000 queues


These clusters seem to have appeared correctly on the gStat pages.

So now when I edg-job-list-match a jdl with the following requirements I get:

Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 500);

The following CE(s) matching your job requirements have been found:


Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 1000);

The following CE(s) matching your job requirements have been found:


Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 1500);

The following CE(s) matching your job requirements have been found:


Requirements = ( RegExp("heplnx206\.pp\.rl\.ac\.uk.*", other.GlueCEUniqueID) && other.GlueHostMainMemoryRAMSize >= 2001);

===================== edg-job-list-match failure ======================
No Computing Element matching your job requirements has been found!

Which looks very much like what I want to do.

Then to let Torque/Maui know about the memory requirements for each of these new queues I set a default memory requirement for each with something like:
qmgr -c "set queue grid1000 resources_default.mem = 1000mb"
(I think this is a non-enforcing requirement so jobs will not be killed for going over it. To do that I think you need to set "resources_max.mem".)

Now I just need to do the same configuration on heplnx207, phase out the "per VO" queues and persuade users to put the memory requirements of their jobs in their JDL files

Friday, July 11, 2008

Oxford adds the ATLAS proddisk space token

Using the same procedure as before I added the new SPACE TOKEN for ATLAS:


/opt/lcg/bin/dpm-reservespace --gspace 2T --lifetime Inf --group atlas/Role=production --token_desc ATLASPRODDISK
/opt/lcg/bin/dpns-mkdir /dpm/
/opt/lcg/bin/dpns-chgrp atlas/Role=production /dpm/
dpns-chmod 775 /dpm/
dpns-setacl -m d:g:atlas/Role=production:7,m:7 /dpm/
dpns-setacl -m g:atlas/Role=production:7,m:7 /dpm/

Wednesday, July 02, 2008

SouthGrid Update

Just to update that the rebuild of the Bristol
DPM SE to SL4 on 19th June by Yves & Winnie was a smooth success.

Oxford had a strange error on its site BDII (still running SL3) after a re run of yaim it stopped advertising the site web address and several other entries so gstat had a warning. The following were missing.:
dn: GlueSiteUniqueID=UKI-SOUTHGRID-OX-HEP,mds-vo-name=UKI-SOUTHGRID-OX-HEP,o=g
objectClass: GlueSite
GlueSiteDescription: LCG Site
GlueSiteLocation: Oxford, UK
GlueSiteLatitude: 51.7595
GlueSiteLongitude: -1.2595
GlueSiteSponsor: none
GlueSiteOtherInfo: TIER-2

It transpired that the file /opt/bdii/etc/bdii-update.conf
had the entry GIP pointing to

> GIP file:///opt/glite/libexec/glite-info-wrapper

which did not exist, changing it to

< GIP file:///opt/lcg/libexec/lcg-info-wrapper

and restarting bdii has fixed the missing web address etc problem.

We were about to setup an SL4 based site bdii anyway so Ewan did so and curiously got the same errors, even though the glite 3.1 files do all now exist in the /opt/glite directories.
Investigations continue but for now we are sticking with the working glite 3.0 based site bdii.

PS A thread on LCG-ROLLOUT mentions the same errors:

Tuesday, June 17, 2008

SouthGrid update

Yves will be helping Bristol upgrade the se to SL4 on Thursday.
They had problems with transtec raid array, specifically the battery backed up cache. New parts have now arrived.

The MESC cluster at Birmingham is working well now, the four LHC VOs are supported, but it is not full yet. (Both Atlas and LHCb are aware of it)
We may have to advertise it to the VOs.

Yves hopes to be able to start configuring Blue Bear (The new HPC cluster) after the grand opening next week. There will also be an meeting between NGS people and Birmingham to see how they can work better together, probably by NGS enabling Blue Bear.

64 bit tarball was used at Bristol and (32 bit on MESC). Some extra i386 rpms are required.

JET have had problems, failing SAM tests but OK for real jobs. Will reinstall the ce.

Stop Press: Now working since the reinstall.

Oxford have not published accounting since the introduction of two SL4 based ce's. We will be working on fixing this so we have the data for the Quaterly report.

Wednesday, May 21, 2008

Second phase of Oxford's move

Over the last couple of days Ewan and I have moved five 9TB disk servers, five twin worker nodes, a couple of head nodes and two UPS's. We had help moving the rack, and then we were able to reassemble it all. The site was back up and passing SAM tests in time for us to come out of scheduled maintenance at 1700.
The current setup means we have three ce's, the original SL3 based ce is driving most of the newer SL4 based WNs. The two new SL4 based ce's send jobs to a new torque server, and onto two subclusters, one for the 32 bit hardware (Dell 2.8GHz xeons), and the other the Intel Clovertown quads. We will migrate the workers off the old ce onto the new ones over the next few days, before decommissioning the original SL3 based ce.

The migration of data from our old se head node, is complete. There were three files that were listed in the database but did not exist on the physical storage. I used the scripts from :
which matched the three files names that were left in my dpm-drain logs.

Two of the files could be removed with rfrm but one refused to appear in the normal dpns-ls listing so was not removed.
We decided to ignore this one and remove the files system from the pool with the command.
dpm-rmfs --server --fs /storage

The new DPM head node will be setup and then the mysql database dumped and restored on to it shortly.

Meanwhile we are awaiting the backplanes in our storage servers to be swapped out to avoid the burnout issue we have suffered on one of them.

SouthGrid technical meeting will be held tomorrow at Birmingham.

Monday, April 28, 2008

Oxford DPM progressing Slowly

Removing zero length files from the DPM storage pool with the rfrm command has helped the dpm-drain command to start progressing again.
The command still fails after 10-20 files and the transfer speed in very low, but at least we are making progress.

Monday, April 14, 2008

SouthGrid Update

Having installed half of the Oxford cluster at Begbroke last Tuesday. The Air conditioning failed during the night, a valve on the Chillers failed cutting off the water supply to the Chillers which in turn switched themselves off. The room rapidly heated up to >40 degrees. After investigation and repairs the AC went back on and all has been well so far. More automated warning systems are required.

Cambridge have set up space tokens for both ATLASDATADISK and ATLASMCDISK. They have also started upgrading to SL4 (64bit) worker nodes.

Britol completed upgrading the Worker nodes to SL4 on Monday. They had some problems caused by se linux, preventing normal loging but all now appears well.

Tuesday, April 08, 2008

Oxford Update

Last week Ewan and I started the DIY move to Begbroke. We moved 40 1 u servers over two mornings. One of the (now empty) Dell racks was moved on Wednesday afternoon.
The worker nodes were reinstalled in that rack on Thursday, we had one psu failure out of 27 nodes. These nodes will be installed with sl4 shortly.

This week we have emptied one of the Viglen racks and moved the servers yesterday.
We hope to move the rack this afternoon and get the worker nodes back on line asap as we are at half capacity currently.

On Firday 28th March we had one of our new 9TB file servers burn out its backplane. This is very similar to the problems RAL have been seeing. The backplane was swapped out and the server is back on line now.
We are in talks with the supplier.

Oxford Update

Monday, March 31, 2008

Oxford adds another space token

With some advice from Graeme I have added another space token at Oxford
Commands recorded for posterity.

dpm-reservespace --gspace 3T --lifetime Inf --group atlas/Role=production --token_desc ATLASMCDISK
dpns-mkdir /dpm/
dpns-chgrp atlas/Role=production /dpm/
dpns-chmod 775 /dpm/
dpns-setacl -m d:g:atlas/Role=production:7,m:7 /dpm/
dpns-setacl -m g:atlas/Role=production:7,m:7 /dpm/

We started our DIY move today, two trips in the lab van with 8 1U servers each time.
Tomorrow we plan to move more old WN's and then an empty rack later in the week.

Friday, March 28, 2008

SouthGrid Update

The first stage of the HPC cluster is running LCG jobs, and is being correctly accounted for.

The HPC WNs have AMD 2218 cores, 2.6GHz; these are said to be
1.745 KSi2K each core.
Currently gridpp can run max 32 jobs on this small stage 1 HPC cluster;

Santanu is continuing to work with LHCb to solve all the problems running their code at Cambridge.
The WNs will be upgraded to SL4 within the next few weeks.

When over 100 of the 120 Babar cluster died after a power shutdown at the end of January, it was deemed not worth restoring the cluster. Two twin 1 u servers have been bought to replace this which will provide 32 cores and 78.4KSI2K.
The old escience cluster is being setup as an SL4 ce and WN farm in as a template for the way they will drive the new University 'Blue Bear' HPC cluster. This cluster is made up of 31 dual 3GHz xeons.
The main grid cluster (aka the atlas cluster) has been expanded to 60 cores.

The PPS is not being maintained at the moment.

Chirs has got space tokens working at RALPP ( updated to dCach1.8-12p6 - from p4 - and also rebooted everything after putting in the srmSpaceManager enabled config files).
The new hardware has been installed:
8 boxes, 16 nodes, 32 CPUs so 128 cores.

CPUs are "E5410 @ 2.33GHz" not sure of the kSI2k rating yet.

Running stable. WNs were updated to SL4 earlier this year.

Quotes to move the kit to Begbroke seem too high so we are going to adopt a DIY approach.
Draining t2se01 is taking for ever. The dpm-drain command terminates sometimes after only 20mins (~6GB data transfered). We did however have a good run on the night of the 26th which lasted over 10 hours.
Oddly the error log files are often the same size although not totally consistent.

-rw-r--r-- 1 root root 22519 Feb 22 14:32 dpm-drain-errorlog-se01-1
-rw-r--r-- 1 root root 22519 Feb 22 15:56 dpm-drain-errorlog-se01-2
-rw-r--r-- 1 root root 22564 Feb 22 17:05 dpm-drain-errorlog-se01-3
-rw-r--r-- 1 root root 22519 Feb 22 18:20 dpm-drain-errorlog-se01-4
-rw-r--r-- 1 root root 22519 Feb 25 12:35 dpm-drain-errorlog-se01-5
-rw-r--r-- 1 root root 22519 Feb 25 13:04 dpm-drain-errorlog-se01-6
-rw-r--r-- 1 root root 22519 Feb 25 16:53 dpm-drain-errorlog-se01-7
-rw-r--r-- 1 root root 22519 Mar 26 13:51 dpm-drain-errorlog-se01-8
-rw-r--r-- 1 root root 22519 Mar 26 14:17 dpm-drain-errorlog-se01-9
-rw-r--r-- 1 root root 1193287 Mar 27 00:56 dpm-drain-errorlog-se01-10
-rw-r--r-- 1 root root 567836 Mar 27 13:59 dpm-drain-errorlog-se01-11
-rw-r--r-- 1 root root 25241 Mar 27 14:57 dpm-drain-errorlog-se01-12
-rw-r--r-- 1 root root 22598 Mar 27 15:37 dpm-drain-errorlog-se01-13
-rw-r--r-- 1 root root 22598 Mar 27 16:22 dpm-drain-errorlog-se01-14
-rw-r--r-- 1 root root 22598 Mar 27 17:11 dpm-drain-errorlog-se01-15
-rw-r--r-- 1 root root 22598 Mar 27 21:55 dpm-drain-errorlog-se01-16
-rw-r--r-- 1 root root 22598 Mar 27 23:15 dpm-drain-errorlog-se01-17

Tuesday, March 25, 2008

Oxford Update

The Original 74 cpu SL3 Dell workernodes have been taken down in preparation for reinstallation as SL4 worker nodes.
We will maintain a separate ce to drive these but intend to separate out the torque server.
The new ce will be an SL4 ce.
The ce and torque server are likely to be virtual machines running under VMware.

The move of the 4 racks up to Begbroke is still uncertain.
We are awaiting quotes from companies to move the equipment for us. The DIY price of just paying for a truck and driver has also been looked into but issues of insurance and warantee's may prevent use of this option.

The steps up to the new computer room have been made smaller to allow installation of a scissor lift, to lift racks to the false floor height. The date for the lift to be installed is still unclear.

The grand opening on the 15th April is all too close.

Tuesday, February 05, 2008

SouthGrid Technical Meeting at JET

The SouthGrid Technical Board met at JET.
All sites are moving towards SL4. The recent updates will be applied shortly.
The SouthGrid vo has been setup and a central LFC is being provided at RAL Tier 1 for it.
The outstanding tickets were looked at and all found to be solved. A problem was found, tickets still open in footprints were closed in GGUS, so the link may not be working properly.
Birmingham is setting up some ex escience nodes to be the interface to the new HPC cluster. This will bring back some of the spec int power lost due to hardware failures after recent electrical work.

Tuesday, January 22, 2008

Oxford Update

Plans to move the Oxford gridpp cluster up to Begbroke are being formulated.
The first part of the plan is to ensure that only these nodes are using the subnet in question. We did some tidying up over the last week or so, before having the subnet rerouted to both Physics and Begbroke. This change was made this morning at 8:50, and mostly went smoothly.
Our ui needs to be moved back on to the physics subnet to allow NFS mounting of home directories to work.
A new rack, PDU and network switch has been ordered to allow us to move a few test nodes up to Begbroke in advance of the main move.
We aim to complete the move late Jan/ early Feb.

The disk on our installation server which holds ganglia data and central syslog data failed today. We will restore from backups.
t2wn05 has a failed hard disk which may have been acting as a black hole over the weekend.

Working with ZEUS and LHCb VO's to improve usage of our cluster uncovered some configuration problems.
  1. Not all the nodes had the latest DESY VOMS server certs applied (stopped zeus working)
  2. sgm ROLES were not mapped correctly for LHCB.
Finally the APEL problems seem to be behind us.
  1. Configuration seemed to have changed at the last running of yaim before Christmas which stopped any records getting published
  2. Installing the latest Development Apel rpms fixed the problem of not seeing the newer spec value for our new ce.

Friday, January 04, 2008

Scheduled Power outage at Birmingham causes problems

The scheduled power outage at Birmingham on Saturday 8th December caused 19 Babar SL4 systems to fail. 4 bad disks appeared on the SL3 cluster. The age of this equipment is a cause for concern.

There has been some concern expressed at small sites such at Bristol that the number of Atlas tests submitted by Steve Lloyds tests can over whelm their sites.