Tuesday, May 13, 2014

Configuring ARC CE and Condor with puppet

ARC CE and condor using puppet

We have started testing Condor and ARC CE with the intention of moving away from Torque.  Almost one third of cluster has been moved to condor and we are quite satisfied with Condor as a batch system.  Condor setup was fairly easy but configuring ARC CE was bit challenging.  I believe that new version of ARC CE has fixed most of the issue I faced.  Andrew Lahiff was of great help in troubleshooting our  problems .Our setup consists of
1           CE :  Configured as ARC CE and  Condor submit host and runs Condor SCHEDD process
2              Central manager :  Condor Server and  runs Condor COLLECTOR and NEGOTIATOR process
3              WN’s :  Runs Condor  STARTD process, also installed emi-wn and glexec metapackages.
CE , Central Manager and condor part of WN’s  were completely  configured  with puppet.  I have to run yaim on WN’s t configure emi-wn and glexec.
I used puppet modules from https://github.com/HEP-puppet which were initially written by Luke Kreczko from Bristol.  We are using Hiera to pass parameters but most puppet modules works without Hiera as well.  I am not intending to go into details of condor or ARC CE but rather use of puppet modules to install and configure Condor and ARC CE.

Condor :
It was a pleasing experience to configure condor with puppet.
     Git clone https://github.com/HEP-Puppet/htcondor.git to module directory on puppet server
     include htcondor
on CE, Central Manager and WN’s and then Hiera tells that which service has to be configured on a particular machine.
# Condor
- 't2condor01.physics.ox.ac.uk'
- 't2arc01.physics.ox.ac.uk'
- 't2wn*.physics.ox.ac.uk'

htcondor::uid_domain: 'physics.ox.ac.uk'
htcondor::collector_name: 'SOUTHGRID_OX'
htcondor::pool_password: 'puppet:///site_files/grid/condor_pool_password'

This configures a basic condor cluster.  There is no user account at this stage so a test user account can be created on all three machines and basic condor jobs can be tested.  Htcondor manual is here

Setting up user accounts :
I  used this module to create user accounts only  for central manager and ce.  Since I have to run yaim on WN’s to setup emi-wn and glexec so  created user account on WN through yaim.
This puppet module can parse a glite type users.conf to create users account or range of  id’s can be passed to the module.

Setting up voms server :
It is used to set voms client on central-manager and ce.  One way to use this module is to pass name of each VO separately as described in the readme file of the module.
     Class { ‘voms::atlas’}
I  have used small wrapper class to pass all VO’s as array to wrapper class
     include include setup_grid_accounts
Then pass name of the VO’s through Hiera setup_grid_accounts::vo_list:
    - 'alice'
    - 'atlas'
    - 'cdf'
    - 'cms'
    - 'dteam'
    - 'dzero'

include arc_ce and on CE and then pass configuration parameters from Hiera. It has a very long list of configurable parameters and most of the default values works ok.  Since most of values are passed through Hiera so arc Hiera file is quite long, I am giving few of the examples
    targethostname: 'index1.gridpp.rl.ac.uk'
    targetport: '2135'
    targetsuffix: 'Mds-Vo-Name=UK,o=grid'
    regperiod: '120'

       default_memory: '2048'
         - '1cpu:4'
          OSFamily: 'linux'
          OSName: 'ScientificSL'
          OSVersion: '6.5'
          OSVersionName: 'Carbon'
          CPUVendor: 'GenuineIntel'
          CPUClockSpeed: '2334'
          CPUModel: 'xeon'
          NodeMemory: '2048'
          totalcpus: '168'

This almost sets up condor cluster with arc ce. There are few bits in arc and puppet modules which are there as a workaround for things which have already been fixed upstream. It needs some testing and clean up.

WN's needs some small runtime env setting specific to ARC. When jobs arrive at WN's it looks into /etc/arc/runtime/ directory for ENV settings 
 Our's runtime tree is like this.
├── APPS
│   └── HEP
│       └── ATLAS-SITE-LCG
└── ENV
    ├── GLITE
    └── PROXY
It can be just empty files. SAM-Nagios doesn't submit jobs if ARC CE is not publishing GLITE env.

I may have missed few things so please feel free to point it out.



Wednesday, May 07, 2014

Configuring CVMFS for smaller VOs

We have just configured cvmfs for t2k, hone, mice and ilc after sitting on the request for long time. The main reason for delay was the assumption that we need to change cvmfs puppet module to accommodate non lhc VOs.   It turns out to be quite straight forward with  little effort.
We are using cern cvmfs module and there was an update a month ago so it is better to keep it updated.

 Using hiera to pass parameters to module, our hiera bit for cvmfs

      cvmfs_server_url: 'http://cvmfs-egi.gridpp.rl.ac.uk:8000/cvmfs/@org@.gridpp.ac.uk;http://cvmfs01.nikhef.nl/cvmfs/@org@.gridpp.ac.uk'
      cvmfs_server_url: 'http://cvmfs-egi.gridpp.rl.ac.uk:8000/cvmfs/@org@.gridpp.ac.uk;http://cvmfs01.nikhef.nl/cvmfs/@org@.gridpp.ac.uk'
      cvmfs_server_url: 'http://cvmfs-egi.gridpp.rl.ac.uk:8000/cvmfs/@org@.gridpp.ac.uk;http://cvmfs01.nikhef.nl/cvmfs/@org@.gridpp.ac.uk'
      cvmfs_server_url: 'http://grid-cvmfs-one.desy.de:8000/cvmfs/@fqrn@;http://cvmfs-stratum-one.cern.ch:8000/cvmfs/@fqrn@;http://cvmfs-egi.gridpp.rl.ac.uk:8000/cvmfs/@fqrn@'

One important bit is the name of cvmfs repository e.g  t2k.gridpp.ac.uk instead of t2k.org

Other slight hitch is public key distribution of various cvmfs repositories.  Installation of cvmfs also fetch cvmfs-keys-*.noarch rpm which put all the keys for cern based repository into /etc/cvmfs/keys/.

I have to copy publich key for gridpp.ac.uk and desy.de to /etc/cvmfs/keys. It can be fetched from  repository
wget http://grid.desy.de/etc/cvmfs/keys/desy.de.pub -O desy.de.pub
or copied from

we  distributed the keys through puppet but outside cvmfs module.
It would be great if some one can convince cern to include public keys of other repositories into cvmfs-keys-* rpm. I am sure that there is not going to be many cvmfs stratum 0s.

Last part of the configuration is to change SW_DIR in site-info.def or vo.d directory

WNs requires re-yaim  to configure SW_DIR in /etc/profile.d/grid-env.sh.  You can also edit grid-env.sh file manually and distribute it through your favourite configuration management system.