This
section is aimed at persons who will have to administer the CDR
scripts and services and perform troubleshooting. The CDR software
comprises a number of C-shell scripts and binary executables. While
the C-shell has limitations as a programming language, it is
available on all variants of UNIX, it is easily modified and
appropriate for scripts up to several hundred lines of code.
Installation from the tar file distribution
The scripts are installed from a tar file, the latest version being
CDR2004.3.tar and an up-to-date copy of this file in the public area
of the home directory for "gordon" The tar file is transferred into
the home directory of the CDR account on the DAQ machine and
unpacked into a sub-directory called cdr. In addition,
there must be a symbolic link
configuration
pointing to cdr/configuration in the home directory of the
CDR account
The tar file contains the following:
-
./README
-
./acroncdr
-
./bin/get_castor_multitag
-
./bin/get_castor_path
-
./bin/get_castor_path_FIC
-
./bin/getRemainingFiles
-
./bin/file_done
-
./bin/file_done_finished
-
./bin/file_done_slast
-
./bin/file_done.ntof
-
./bin/nschmod
-
./bin/nschown
-
./bin/nsls
-
./bin/nsmkdir
-
./bin/nsrename
-
./bin/nsrm
-
./bin/rfcp
-
./bin/rfdir
-
./bin/rfrm
-
./bin/stagein
-
./bin/get_castor_path_basic
-
./bin/get_castor_path.4
-
./bin/file_done.multi
-
./bin/get_castor_fcal
-
./bin/etime.pl
-
./configuration
-
./configure
-
./croncdr
-
./execs/cdr_control
-
./execs/cdr_purge
-
./execs/cdr_status
-
./execs/cdr_watchdog
-
./execs/hsm_daemon
-
./execs/CDR2004.3_execs.tar
-
./locks
-
./log/cdr.log
-
./log/watchdog.log
-
./STATUS_2004
-
./tmp
CDR Configuration File
The CDR
configuration file allows the customisation of the service for
different experiments. Parameters in the file are extracted by line
number, a simple yet practical method. A sample file is given below
including the in-line comments and some explanations. The
format of the configuration was changed at the start of data taking
in 2004 and is incompatible with the old file format.
-
# CDR-2004 CONFIGURATION -
parameters are identified by line number
-
#
-
#3 --
Working directory on local machine
-
/home/ntofdaq/cdr
-
-
#6 --
Temporary file directory
-
tmp
-
-
#9 --
Filesystems to look for the datafiles.
-
/shift/lxshare012d/data01 /shift/lxshare012d/data02
-
-
#12
-- Presence files
-
locks/daemon_running
-
-
#15
-- Maximum no of concurrent running hsm_daemons/rfcp processes
-
1
-
-
#18
-- cdr_control wait time in seconds
-
180
-
-
#21
-- Local machine name, cdr_account and os_type
-
lxshare012d ntofdaq linux
-
-
#24
-- 1: max number of retries for copy 2: reread frequency of
config file
-
2 3
-
-
#27
-- stage_host machine name - one stage_host only
-
stagentof
-
-
#30
-- Tag to indentify files which are currently copied
-
_curr_copied
-
_finished_hsmcopy
-
-
#34
-- Log file name for the copied files and purge
-
migrator.log
-
-
-
#38
-- Procedure to say if a file is ready for transfer from DAQ
-
bin/file_done
-
-
#41
-- Stage pool name
-
ntof_cdr1
-
-
#44
-- Command to send messages to the DAQ
-
echo
-
-
#47
-- Data files are under ( afs / dunix / hp / ibm / linux / sgi /
sun )
-
linux
-
-
#50
-- Switches for deleter | log_level
-
# 0 = off, 1 = on
-
1
0
-
-
#54
-- Watermarks - % of filesystem full for start/stop of the purger
-
70 50
-
-
#57
-- Name tag to identify the datafiles
-
run
-
-
#60
-- CASTOR pathname - "get_castor_path" uses this value
-
/castor/cern.ch/ntof/2004/TAC1
The
execs directory contains the five principal scripts which are
described in a later section. The bin directory contains some
auxillary scripts and also executable of common CASTOR commands like
nsls, rfcp etc. This is because some DAQ machines are quite
old and may not have a local CASTOR installation.
crontab, croncdr, acroncdr
Main CDR Exec Scripts
cdr_watchdog
This is
normally started from the crontab entry and it periodically checks
that the main script cdr_control is running. If cdr_control has
aborted for some reason, it restarts the process and notes the event
in the watchdog log file. It also checks that there is only one
cdr_control daemon running on the system.
cdr_control
cdr_control is the main script for CDR and which runs indefinitely
in a loop. At regular intervals, typically every 180 seconds, it
executes a sequence as follows:
-
Extracts
configuration file parameters
-
· Check
for file system purging according to the high and low watermarks as
given in the configuration file. The usage of a file system must be
greater than the high watermark for purging to start.
-
· Check
for new files ready to transfer to CASTOR. For each file, it invokes
the hsm_daemon to perform the transfer. Files for transfer are
identified by the “tag” parameter and in addition, the file must
pass the “file_done” criteria which is experiment dependent. The
simplest form of “file_done” is to take all files but the latest
one.
-
cdr_daemon builds a list of files ready to be copied
to CASTOR and transfers the files in group of 20. After transferring
20 files, the main loop is exited allowing the purger to run and
free up space before the next batch of files is copied to CASTOR.
-
· Sleep
for an idle time – default is 180 seconds
cdr_purge
cdr_purge is called with a directory as a parameter and the script
will try to remove files until the low watermark for the file system
is reached.
In
order that a file is a candidate for deletion, it must have a
partner flag file in the same directory with “finished_hsmcopy”
appended to the file name. In addition the file must have the
migrate bit set in CASTOR indicating that the file is on tape and
the sizes of the file on the DAQ system and in CASTOR must match.
cdr_status
cdr_status
hsm_daemon
hsm_daemon is executed from the script cdr_control and handles the
transfer of one file.
Auxiliary Scripts and Executables
These are contained in the subdirectory cdr/bin
get_castor_path
A
single CASTOR destination directory is specified in the
configuration file. In the simplest case, all cdr files are copied
directly to this directory. However, some experiments require a
sub-structure into which files are mapped according name or type.
This mapping is experiment dependent and is handled by "get_castor_path"
- get_castor_path is usually a symlink to the actual script text.
file_done
Files for transfer via CDR are collected in a data directory on the
DAQ system. However, CDR must not transfer files which are open and
still being written. The file_done script is called for each file
and a status of 0 returned if the file is ready for transfer. In
most cases, all but the latest file can be transferred but some
experiments explicitly mark files that are ready for transfer.
|