HSM Section - CASTOR

User Guide for CASTOR

Last Update: 23 November, 2005

Intended Audience

This user guide is targeted at both new and existing CASTOR users. See the bibliography for a list of documentation and further reading.

This document refers to Version 1.7.1.5 of CASTOR.

A Overview
B Terminology
C Managed Storage Aspects
D CASTOR Deployment at CERN
E Getting started with CASTOR
F Group Administrators
G Advanced Issues
H Common Questions
I Current CASTOR limitations
J Good Practices
K Data transfer to and from CERN
Z Bibliography

A Overview

CASTOR [1] is a hierarchical storage management (HSM) system developed at CERN for files which may be migrated between front-end disk and back-end tape storage hierarchies. Files in CASTOR are accessed using rfio (Remote File Input/Output) protocols [2] either at the command level or, for C programs, via function calls. An extensive set of CASTOR man pages exists [3]. The CERN central data recording service (CDR [4]) uses CASTOR for the transfer of raw data from experimental areas to central storage. CASTOR will evolve into the hierarchical storage management system for CERN for the LHC era and, as such, will be integrated with GRID technologies.

All registered users at CERN with PLUS accounts have access to CASTOR facilities without special action on their part.

A summary of CASTOR changes for the release Version 1.5.1 is available [5]. The main change of this version is the support for large (> 2GB) files (see section g3 below).

Please send questions and problems concerning CASTOR, including comments on this user guide, to castor.support@cern.ch.

B Terminology

b1 Disk pools, disk Servers and stagers

The migration of CASTOR files between disk and tape storage is managed by a stager daemon process. The stager manages one or more disk pools or groups of one or more UNIX filesystems, residing on one or more disk servers. The stager is responsible for space allocation and for file migration between the disk pool and tape storage. There is no communication between different stagers.

The function of disk servers is to provide disk space for disk pools. The majority of the disk servers used at CERN are PCs running RedHat Linux with EIDE disk drives. The disks are 70GB capacity and each pair of disks is mirrored using a 3ware RAID controller. A Linux EXT2 filesystem is created on the resulting volume and the filesystems are grouped together to define pools. The EIDE disk servers offer an excellent price/performance and the hardware disk mirroring protects the server against single disk failures. Users may not log on to disk servers nor may they directly write, modify or delete files residing in the disk pools.

Large CASTOR files are normally accessed remotely from a disk pool via rfio. Files may also be copied to and from disk pools and a local file system via rfcp or ftp.

b2 File segments on tape

On tape disk files are decomposed into segments. While CASTOR tries to write a file as a single segment, it may, due to reasons of file size or lack of space on tape volumes, be forced to decompose the file into multiple segments.

b3 CASTOR name server, ns list and rfio change commands

The CASTOR Name Server contains the file catalogue in a relational database with information on each file segment, file name, details of access permissions, file size, fileclass, physical location on tape etc. The CASTOR name space closely mimics the UNIX directory structure where the root of the filesystem is /castor. A set of utilities, optimised for querying and referred to as the name server (ns) [6] commands, is available. Some of the more common commands are:

nsls <file>	List the specified file or directory
nsls --class <file>	Show the fileclass for a given file
nslistclass --id <class-id>	Show attributes of a given fileclass
nsmkdir <directory>	Create a directory

Once a file has been copied to CASTOR, its status may be checked using the nsls command:

nsls -l /castor/cern.ch/user/s/smith/data.big
mrwxr-xr-x 1 smith zz 1234567890 Jun 12 09:56 data.big

The nsls output shows the file ownership and permission rights similar to the UNIX ls command. The leading m setting in the permission bits indicates that the file is migrated to tape media. If the file were not yet migrated then this field would be a dash: "-".

The name server is, by design, ignorant of the stager activity and thus the files in the disk pools. Thus a name server change to a file does not affect a disk pool copy of the file. For this reason file change operations (change mode, delete) should be done by the rfio commands (rfchmod and rfrm) which will, in a later version of CASTOR, modify the disk pool copy of the file as well as update the name server catalogue. Some of the more common rfio commands for changing file or directory parameters are:

rfrm <file/directory>	Remove (delete) a file or directory
rfrename <oldfile_name> <newfile_name>	Rename a file or directory
rfchmod <absolute_mode> <file>	Change file permissions

b4 Fileclasses and tape pools

CASTOR files belong to a fileclass which is inherited from the CASTOR directory level for all files in the directory and any subdirectories. The fileclass, identified by an id and classname, specifies a number of attributes relating to the migration scenario of the files: the tape pool(s) to which the files will be migrated, the desired retention period on disk, the migration time after which the file will become a candidate for migration and so on.

The fileclass of a file may be displayed by the command:

nsls --class /castor/cern.ch/atlas/testbeam
 34  /castor/cern.ch/atlas/testbeam

The output shows the fileclass to be 34. The nslistclass --id 34 command can be used to see the attributes of this fileclass. The command nslistclass (with no parameter) shows the full set of available file classes, most of which are experiment specific. More information on file classes may be found in the section on advanced issues (below).

C Managed Storage Aspects

CASTOR makes optimal use of the underlying hardware based on internal algorithms and end-user directives such as migration policies as defined via the fileclasses. These directives are part of the CASTOR setup for a given experiment and are agreed between the experiment's data managers and the CASTOR team.

As noted above, the stager is responsible for managing the disk space in a disk pool by migrating files to tape, recalling files from tape to disk pools and making sure that there is sufficient space for new files by invoking garbage collection. The default garbage collector removes files based on a last used and file size algorithm and is normally invoked when the free disk space in the pool falls below a low-water mark. Additional space is freed up to a given high-water mark. The water-marks are parameters of the disk pool.

The end-user does not need to know where his/her files reside in terms of the tape cartridge number or type. This knowledge is contained in the filename entry in the CASTOR name server and permits the transparent introduction of new tape media and the efficient use of existing tape media.

Re-packing is a process by which tapes that are not full, usually due to files having been deleted from CASTOR but not yet removed from tape, are condensed onto fewer new tapes. Media migration refers to the process of moving files from one media type to another. Re-packing and media migrations are CASTOR system functions which will be executed in background mode without the file owner's intervention. In either case tape volume and/or the file sequence number of any given file may change. The up-to-date information concerning file location on secondary tape storage is kept in the CASTOR name server.

D CASTOR Deployment at CERN

d1 Accounts and directories at CERN

All registered users with PLUS accounts have access to CASTOR facilities without special action on their part. Users may check their accounts via xwho. To obtain a new CERN account see [7]. Once a PLUS account is registered, a CASTOR user directory is created similar in construction to the AFS convention: e.g for userid smith:

/castor/cern.ch/user/s/smith

The intended use of user directories is to store personal files, assumed to be small (< 100MB) and which are migrated to a tape media able to efficiently handle a large number of files on a single tape volume and with fast recall. This type of tape media is more expensive and is intended for users who do not store more than 100GB.

User directories are normally in fileclass 2 which maps to the fast media but users storing more than 100GB will autmatically be switched to file class 95 which maps to less expensive but slower media.

User directories are for personal rather than experiment or project shared files. When a user PLUS account is deleted (e.g. /castor/cern.ch/user/s/smith) his directory is then moved (via nsrename) to /castor/cern.ch/user/deleted/smith_delete_id and 2 years after the change of user directory name, the directory and all its files will be removed. This can be compared e.g. with AFS where files are kept (in backups) for 10-12 months after account deletion.

For larger data files, separate experiment directories are created to hold raw event data, test beam data etc. These are of the form:

/castor/cern.ch/atlas/ . . .

Such directories are for experiment specific files that are frequently shared amongst many users and belong to specific fileclasses as agreed between the experiment and Castor support.

d2 Stagers and disk pools at CERN

A complete list of the stagers, disk pools and their capacity and free space may be found at [8]. Large experiments usually have dedicated stagers and disk pools. Smaller experiments are encouraged to share the public stager and its associated tape and disk pools.

d3 Media Costs at CERN

The media costs for experiment directories are charged back to the experiment at about 2SFR per GB. Currently costs associated with user directories are not charged to experiments or users.

d4 Tape drives and robotics at CERN

The underlying hardware for the back end storage consists of tape drives and associated robotics. Currently CASTOR employs two main types of linear recording devices. The STK 9840 unit has a capacity of 20 GBytes and a data transfer speed capability of ~10 Mbytes/sec. It has a fast mount/dismount time, and can thus provide rapid access to data due to its cartridge design which incorporates twin tape reels. The STK 9940A unit is based on 9840 technology and is also a linear recording device with a cartridge consisting of a single reel system. The unit contains the take-up spool and the single reel cartridge has a capacity of 60 GBytes together with a read/write capability of ~10 MBytes/sec. Currently the 9840 is used for user files whereas experiment files are written to the 9940. An overview of the current hardware may be found at [9 and 10].

d5 CASTOR tape pools at CERN

The current status of the CERN CASTOR tape pools [11] shows the number of specific media types and their capacities for each pool. In order to balance the tape mounting load between the two robot complexes which are located in different buildings (513, 613) for reasons of protection against catastrophes, tape pools are divided between the two complexes with even numbered VIDs (Visual IDentifiers) in building 513 and odd numbered VIDs in building 613. The command vmgrlistpool shows the permissions information for a tape pool.

vmgrlistpool -P na601
na601 - wf CAPACITY 720.00GB FREE 688.16GB ( 95.6%)

The permissions for the pool na601 are any uid (or '-') and group wf. The overall capacity and free space are also displayed.

E Getting Started

e1 CASTOR environment

Two essential environment variables must first be set. Often they are pre-defined in the experiment's group profile.

STAGE_HOST: defines the stager used e.g. cms001d
STAGE_POOL: defines the disk pool e.g. tbed011d

If these are not set then the default public stager (stagepublic) and disk pool (public) are used. See [8] for a list of stagers and disk pools.

e2 Accessing CASTOR Files from C Programs

The RFIO Application Program Interface (API) consists of the following functions calls which closely mimic UNIX I/O:

rfio_open(), rfio_write(), rfio_read(), rfio_close()

Thus to write a file:

#include <stdio.h>
#include <fcntl.h>
#include <shift.h>

int main(int argc, char *argv[]) {
     int rc, fd, buffer_size;

     char *buffer = NULL;
     char *CASTOR_file_name = NULL;

     CASTOR_file_name = argv[1];

     buffer = argv[2];
     buffer_size = strlen(argv[2]);

     fd = rfio_open(CASTOR_file_name,O_RDWR,0766);

     if ( fd == -1 ) {
         rfio_perror(CASTOR_file_name);
         exit(1);
     }

     rc = rfio_write(fd,buffer,buffer_size);

     if ( rc != buffer_size ) {
         rfio_perror(CASTOR_file_name);
         rfio_close(fd);
         exit(1);
     }
     rfio_close(fd);
 }

The program should be compiled and linked on a node where the CASTOR client software is installed (e.g. all 'plus' machines). To read a file, use rfio_open, rfio_read and rfio_close.

e3 Copying a local file to CASTOR and listing a CASTOR directory

> rfcp myfile1 /castor/cern.ch/user/t/tonyo/mycastorfile1
39605868 bytes ready for migration

> rfcp myfile2 /castor/cern.ch/user/t/tonyo/mycastorfile2
46206846 bytes ready for migration

And shortly after:

> nsls -l /castor/cern.ch/user/t/tonyo
mrw-r--r-- 1 tonyo c3 39605868 Mar 06 14:48 mycastorfile1
-rw-r--r-- 1 tonyo c3 46206846 Mar 06 14:52 mycastorfile2

The leading 'm' in the mode bits of mycastorfile1 indicates that the file has been migrated to tape. File mycastorfile2 has not yet been migrated to tape. rfdir may also be used to list directories although it does not show the 'm' flag indicating migration status.

> rfdir /castor/cern.ch/user/t/tonyo
-rw-r--r-- 1 tonyo c3 39605868 Mar 06 14:48 mycastorfile1
-rw-r--r-- 1 tonyo c3 46206846 Mar 06 14:52 mycastorfile2

To see which tape volume a file has been migrated to:

> nsls -T /castor/cern.ch/user/t/tonyo
- 1 1 R08706 516 000b6622 39605868 100 mycastorfile1

Shows the tape VID (R08706) and the file sequence number(516).

e4 Copying from CASTOR to local file via rfcp

> rfcp /castor/cern.ch/user/t/tonyo/mycastorfile1 mylocalfile1
39605868 bytes in 12 seconds through eth0 (in) and local (out) (3223 KB/sec)

Should rfcp not return within a reasonable time it means that the data is being recalled from the back-end tape media.

e5 Viewing the status of files on staging disks

While a file resides on a disk pool or is being recalled to disk its status may be seen by the stageqry command using the -M option.

>stageqry -M /castor/cern.ch/user/t/tonyo/mycastorfile2
File name State Nbacc. Size Pool
mycastorfile2 CAN_BE_MIGR 1 44.1/* public

e6 Renaming files and directories with rfrename

This command implements the rename part of the normal UNIX mv command, i.e. physical data movement cannot be performed (e.g. rename a UNIX file to CASTOR will not work). If files are renamed their fileclass remains unchanged. The syntax is: rfrename old_path new_path. e.g.

> rfrename /castor/cern.ch/user/t/tonyo/mycastorfile1 /castor/cern.ch/user/t/tonyo/myfile1

> nsls -l /castor/cern.ch/user/t/tonyo
-rw-r--r-- 1 tonyo c3 46206846 Mar 06 14:52 mycastorfile2
mrw-r--r-- 1 tonyo c3 39605868 Mar 06 14:48 myfile1

There is no restriction to when a CASTOR file can be renamed, e.g. the file does not need to have been migrated to tape.

e7 Creating new directories with nsmkdir

> nsmkdir /castor/cern.ch/user/t/tonyo/test-directory
> nsls -l /castor/cern.ch/user/t/tonyo
-rw-r--r-- 1 tonyo c3 46206846 Mar 06 14:52 mycastorfile2
mrw-r--r-- 1 tonyo c3 39605868 Mar 06 14:48 myfile1
drwxr-xr-x 0 tonyo c3 0 Mar 06 15:10 test-directory

Creating new user directories is the sole responsibility of the user. Creating experiment directories is sometimes done by the data manager of the experiment concerned; otherwise by castor.support.

e8 Removing files or directories with rfrm

For convenience, the rfrm command also supports recursive remove of directories (-r option).

> rfrm /castor/cern.ch/user/t/tonyo/myfile1
> nsls -l /castor/cern.ch/user/t/tonyo
-rw-r--r-- 1 tonyo c3 46206846 Mar 06 14:52 mycastorfile2
drwxr-xr-x 0 tonyo c3 0 Mar 06 15:10 test-directory

> rfrm -r /castor/cern.ch/user/t/tonyo/test-directory

> nsls -l /castor/cern.ch/user/t/tonyo
-rw-r--r-- 1 tonyo c3 46206846 Mar 06 14:52 mycastorfile2

e9 Changing file permissions

e.g.:

rfchmod 750 /castor/cern.ch/user/t/tonyo/mycastorfile

Note that absolute mode must be given in octal.

e10 Changing ownership of experiment directories

Group administrators, i.e. those users registered in the Castor User Privilege Validation (Cupv) scheme for this purpose, may change the ownership of any directory in their group by using the command:
> nschown userid:gg path.

A list of group administrations for a given group may be seen by : 'Cupvlist --priv GRP_ADMIN --group gg' where gg is the two character Computer group for your experiment.

Otherwise the use of nschown is a root privilege; thus if such an operation is required it must be requested to castor.support.

F Group Administrators

CASTOR has the concept of Group Administrators. To become one the experiment contact person(s) for CASTOR/data management should mail castor.support@cern.ch and request that they or other users be give the privilege by sending the necessary information (userid,hostname). Then the relevent commands may be issued by the userid on that hostname. More than one pair is allowed for a given userid. A list of group administrations for a given group may be seen by:
Cupvlist --priv GRP_ADMIN --group gg where gg is the two character computer group for your experiment. Currently the list of operations allowed by group administrators is:

f1 Change of ownership of a CASTOR directory

A group administrator may change the ownership of any CASTOR directory in his computer group:
> nschown userid:gg path

G Advanced Issues

g1 More on fileclasses

Migration policies are defined via the fileclass definitions which specify how long a file should remain on disk, how soon it should be migrated to tape, the number of copies and the tape pools to be used. The command nslistclass --id 34 shows, for example the attributes of class 34.

CLASS_ID	34	Unique Numeric Identifier
CLASS_NAME	atlastestbeam	Unique Mnemonic
CLASS_UID	-	uid filter ('-' means no restriction) Any numeric value will restrict migration to a specific uid or any alpha character string restricts to a specific userid (logon name)
CLASS_GID	zp	gg filter ('-' means no restriction); zp=Atlas or gid filter
FLAGS	0x0	Not used
MAXDRIVES	2	No. drives which can be simultaneously used during migration
MIN FILESIZE	0	Minimum file size for migration (Not used)
MAX FILESIZE	0	Maximum file size for migration (Not used)
MAX SEGSIZE	0	Not used
MIGR INTERVAL	600	Migration Interval (seconds)
MIN TIME	0	Minimum number of seconds after file is last updated before file is considered as a migration candidate
NBCOPIES	2	No. copies of file; each copy is written to a different tape pool
RETENP_ON_DISK	AS_LONG_AS_POSSIBLE	Maximum retention period of file on disk. Can be AS_LONG_AS_POSSIBLE (purge only when disk space is needed) or INFINITE_LIFETIME or a given number of seconds. If 0 file is purged immediately after migration. DEFAULT is AS_LONG_AS_POSSIBLE.
TAPE POOLS	atlascdr1:atlascdr2	tape pool(s) to which file is written; the number of pools must match NBCOPIES.

Some points to note:

User files always have fileclass class_id=2; normally fileclass choice is important only for experiment directories.
The class_uid and class_gid are filters i.e. in this case only files with gg=zp, but any uid in that group, may be migrated to tape.
'Pinned' files exist on disk either for a given specified time or forever (RETENP_ON_DISK=INFINITE_LIFETIME).
If NBCOPIES = 0 files are not migrated to tape.
If more than one tape pool is used it means that migrated files will be copied to tapes selected alternately from the two tapes pools. This can help with load balancing of physical tape mounts if the tapes in several robots, and/or to have the data split into different physical locations for safety reasons.
Use nschclass class path to change the class of a directory. A change to a directory will not change the fileclass of existing files but only subsequent (new) files.
New fileclasses cannot be created by users; please send such requests to castor.support.

To see all fileclasses for your experiment, use nslistclass (no parameters) and then look for classes where class_gid corresponds to your experiment's compute group code or gg.

The fileclass for user directories (ID 2) has the following specification:

CLASS_ID	2
CLASS_NAME	user
CLASS_UID	-	Any uid
CLASS_GID	-	Any group
FLAGS	0x0
MAXDRIVES	2
MIN FILESIZE	0
MAX FILESIZE	0
MAX SEGSIZE	0
MIGR INTERVAL	1800
MIN TIME	0
NBCOPIES	1
RETENP_ON_DISK	AS_LONG_AS_POSSIBLE
TAPE POOLS	default

g2 Co-location issues

Production managers in experiments are often concerned with data sets (groups of files) rather than individual files. Thus when migrating files to tape they would like to ensure that a set of files is written to as few tapes as possible in order to minimize the number of tape mounts on reading the datasets back from tape. Although there is a coarse grained control over which tapes are used via the selection of a specific tape pool, any finer grained control for co-location issues has currently to be achieved via stage commands, giving a list of files as input . i.e.

To migrate a list of files on the same tape (unless it becomes full):

stagewrt -M hsmfile1 [-M hsmfile2 [-M hsmfile3 [...]]] \
[--tppool tape_pool] [-K] \
diskfile1 [diskfile2 [diskfile3 [...]]]

To recall a list of files to disk:

stagein -M hsmfile1 [-M hsmfile2 [-M hsmfile3 [...]]]

In a future version, users will be able to migrate whole directories in one command to address the co-location issue.

g3 Large File support

Starting with version 1.5.1.0, CASTOR gives full support for Large Files, i.e. files larger than 2 GB. The RFIO command line interface supports Large Files by default now. For the RFIO API, there are 2 cases: if the machine where the user application runs is 64 bits native (DEC Alpha, SGI or IA-64), the API is unchanged while otherwise the standard RFIO API supports only files smaller than 2 GB. In this case to access larger files, one needs to use either rfio_open64 or the flag O_LARGEFILE on rfio_open.

There is a complete set of RFIO64 routines:

        rfio_open       rfio_open64
        rfio_stat       rfio_stat64
        rfio_lstat      rfio_lstat64
        rfio_lockf      rfio_lockf64
        rfio_lseek      rfio_lseek64
        rfio_preseek    rfio_preseek64
        rfio_fopen      rfio_fopen64
        rfio_fstat      rfio_fstat64
        rfio_fseek      rfio_fseeko64
        rfio_ftell      rfio_ftello64

The RFIO64 routines take off64_t instead of off_t arguments and use struct stat64 instead of struct stat. To use 64 bits routines under Linux, it is mandatory to define the flag _LARGEFILE64_SOURCE as shown in the example below.

#define _LARGEFILE64_SOURCE
#include 
#include 
#include 

#include 

#include  /* For O_CREAT .. */
#include   /* For SEEK_CUR */
#include  /* for strlen */


int main(int argc, char *argv[]) {

    int fd, rc;
    off64_t big_offset,tmp_off;
    char *text = "TEST_TEXT";

    /* Checking the arguments */
    if (argc < 2) {
        fprintf(stderr, "Usage: %s \n", argv[0]);
        return EXIT_FAILURE;
    }

    /* Opening the file */
    fd = rfio_open(argv[1], O_CREAT | O_LARGEFILE | O_WRONLY, 0644);
    if (fd < 0) {
        rfio_perror("open");
        return EXIT_FAILURE;
    }

    /* Doing a fseek to 3GB */
    big_offset = 1024LL * 1024LL * 1024LL * 3LL;

    tmp_off = rfio_lseek64(fd, big_offset, SEEK_CUR);
   if (tmp_off < 0) {
        rfio_perror("seek");
        return EXIT_FAILURE;
    }

    /* Writing a byte */
    rc = rfio_write(fd, text,  strlen(text));
    if (rc < 0) {
        rfio_perror("write");
        return EXIT_FAILURE;
    }

    rc = rfio_close(fd);
    if (rc < 0) {
         rfio_perror("closing");
        return EXIT_FAILURE;
    }

    return EXIT_SUCCESS;

}

g4 Multi-file staging

If you have a large number of files to stagein then rather than stagein them in one by one (thus having n stager processes), it is much better to stage multiple files in one stagein request. This can drastically save on resources for the stager daemon. Such a stagein would look like .e.g

stagein -h my_stager -A deferred -M <castor_filename_1> -M <castor_filename_2> ...

The '-A deferred' option means that the actual disk space is not reserved when you submit the command, but only when needed. See here for details to the stagein command. One could think of submitting requests for several hundred files at a time like this - the only limit being a shell dependent limit on the total character length of commands.

Another large benefit from multi-file staging is that the total number tape requests is limited to the number of different tapes on which the files reside. Typically this is much less than the number of files. Very large numbers of tape requests risk to hit the limit of the tape queue manager.

H Common Questions

h1 Do I have a CASTOR user directory?

Use nsls to find out.

nsls -ld /castor/cern.ch/user/t/tonyo
drwxr-xr-x   1 tonyo    c3                        0 Mar 06 15:15 /castor/cern.ch/user/t/tonyo

nsls -ld /castor/cern.ch/user/t/tonyo3

/castor/cern.ch/user/t/tonyo3: No such file or directory

h2 How safe is my data in CASTOR?

Like any system involving hardware, software and human beings the absolute safety of data stored in Castor cannot be 100% guaranteed. Please note however:

Castor Disk servers use mirrored disks (except for some older machines which currently are being phased out) in order to minimize the effects of disk failure.
Tape media can be defective. Sometimes we are able to fix such problems at CERN; if not we are obliged to send the tapes back to the manufacturer for data recovery. This can mean that the tape is unavailable for some weeks or even that the data cannot be recovered. Thus very precious data should have more than one copy on tape (by using a fileclass with NBCOPIES=2 (see above). Concerning media errors see g6 below.
Should the migration of a file to tape fail for any reason (normally because of an inconsistency between file class and tape pool setup), the file will not be deleted even though it is in a 'PUT-FAILED' status.

Other potential sources are CASTOR software bugs and system administration errors, both of which we strive to eliminate. The CASTOR software has been very reliable since the end of 2001.

h3 What operations can I perform on a file that is not yet migrated?

You can update a file whose status in the pool is CAN_BE_MIGR but not a file whose status is BEING_MIGR. The status can be seen by the stageqry command (stageqry -M castor_file_name).

h4 Which CASTOR version am I using?

Use 'castor -v' to find out.

h5 On which tape is my file?

Use nsls with the -T option to see the tape number:

nsls -T /castor/cern.ch/user/m/myuserid/mycastorfile
- 1 1 R08706 530 000b9832 46206846 100 /castor/cern.ch/user/m/myuserid/mycastorfile

The output is: status ( '-' means OK, 'D' mean Disabled); Copy number (1); Segment Number within this copy number (1); Tape Vid; Tape File Sequence no (530); Tape Block Id (000b9832); Segment size (46206846) and Compression factor*100.

h6 Is there a problem with the tape on which my file resides?

If you observe the following error message:

stagein error : Required tape segments are not all accessible
STG98 - Invalid argument

this is an indication that the tape on which (at least one) of the segments that constitute your file are on a tape that cannot be mounted. Usually a file correspnonds to a single segment, although in ~1% of all cases files spans two tapes and thus have two segments. Most likely the tape is DISABLED due to a problem. For any castor file that is migrated to tape, one may check on the associated tape as noted above:

>nsls -T --deleted /castor/cern.ch/user/m/mysusrid/mycastorfile
- 1   1 P05485     338 0022beb8             46206896 100 /castor/cern.ch/user/m/myuserid/mycastorfile
>vmgrlisttape -V P05485
P05485   P05485 STK_ACS5 200GC    aul default                0B 20031129 FULL|RDONLY|DISABLED

If the first character of the nsls -T output is 'D' (rather than '-') then the individual segment has been disabled. This may be verified by consulting the disabled segment web page. The last element of the output from vmgrlisttape is the tape status. If this status contains DISABLED, EXPORTED or ARCHIVED then the tape cannot currently be mounted. Usually in such cases the tape status will include DISABLED which means a significant error has ocurred while reading or writing this tape. Such an error could be due to the last tape drive used or a problem with the tape itself. (Should the tape be EXPORTED or ARCHIVED please contact castor.support@cern.ch..) Media errors do occur,albeit at a low rate (in the order of 1 in 10**4 mounts). There is a list of CASTOR tapes that are DISABLED and the associated CASTOR files. As the status DISABLED indicates, such tapes, and thus any files on these tapes, are currently inaccessible by users. The two lists are refreshed every 50 minutes. Since the tape support service actively tracks all DISABLED tapes and tries to fix the problems as soon as possible, no action is required by the user. In the event that a tape has to be sent back to its manufacturer delays of a few days or even weeks are possible.

h7 What to do if my file has an incorrect status in a disk pool?

If stageqry shows that the file status is abnormal, for instance PUT_FAILED, contact castor support.

h8 What happens to my Castor user files if my PLUS account is deleted?

Any user files are moved to a hidden directory and kept for at least 2 years after deletion of the PLUS account. If there are no user files the Castor user directory is removed. Both the removal and changing to a hidden directory actions are necessary because userids maye be re-used for other users.

h9 Why does stagein/rfcp not respond quickly?

If rfcp or stagein commands do not return quickly it is likely that a file has been requested that needs to be recalled from tape. This may be observed by executing a showqueues command on e.g. lxplus. Thus for a given userid myuserid:

>showqueues -x | grep myuserid

> DA 994BR4 994B43AC@tpsrv127 RUNNING 39 (No_dedication) None P06180 R 1599 (myuserid,vl)@stage005

shows that the tape request is actually running on tpsrv125 for userid myuserid on stager stage05. If the request was in the queue the result would be:

>Q 994BR4 P06234 R 754869 (myuserid,vl)@stage005 46

which shows that the request is Queued and has been in the queue for 46 seconds. Tape queues are chaotic in nature and users are requested to be patient. It can happen at busy times that tape queues result in wait times of up to ~ 1 hour or even longer.

I Current CASTOR limitations

This section describes limitations of the current CASTOR release (1.7.1.5).

i1 Changing fileclasses

The fileclass is an attribute of a directory and, if changed, will only affect new files in that directory and not existing files. Change of fileclasses for individual files is not supported.

i2 Consistency between Castor directory permissions and tape pool,fileclass definitions

Normally the fileclass restrictions should not be more restrictive than those of any associated (via the fileclass definition) tape pool. Care must be taken in setting the permissions of the Castor directories. Inconsistencies can lead to a situation where files cannot be migrated to tape (and thus in status PUT_FAILED). Such problems can normally only be resolved by castor support.

i3 Migration and recall of directories

rcfp does not yet support migration and recall of directories. A workaround for this issue is described in the section on Advanced Issues.

J Good Practices

j1 Always use the same stager

The current CASTOR software does not support communication between stagers. Thus it is highly recommended that experimental groups use a single stager to manage all the pools for the experiment. The use of multiple stagers is confusing and can lead to erroneous situations where the same file is recalled by different stagers into different disk pools.

j2 Use only rfio rather than stage commands

Use rfio commands rather than the stagein/stagewrt commands. The former are more simple to use and thus less prone to user error. Stage commands will eventually be removed.

j3 Use name server (ns) list commands rather than their rfio equivalents

The name server commands have, in general, richer functionality and better performance. i.e. nsls and rfdir.

j4 Use rfio change commands rather than their name server equivalents

Name server change commands, nsrm and nschmod, do not update files residing in disk pools - only the name server catalogue. The rfio equivalents, rfrm and rfchmod should be used instead.

j5 Store only large files (> 20MB) in CASTOR experiment directories

CASTOR was designed for large files. The use of experiment directories for small files is discouraged.

j6 Large User files should be stored in experiment directories

User files larger than 100 MB should not be stored in user directories but rather in experiment directories. User files are stored on a more expensive tape medium which optimizes the recall of small files.

j7 Do not use CASTOR as a file backup system

Do your file backups using a backup system. CASTOR, a Hierarchical Storage Manager, does not have the ability to do incremental backups. Repeated use of CASTOR to backup user files will thus result in many obsolete files.

K Data transfer to and from CERN

For data transfers within CERN, users are recommended to use rfcp which should be quite sufficient.

wacdr and bbftp should no longer be used for transfers of data between CERN and external sites.

CERN IT is now running a GridFTP service for transfer of large amounts of data to/from CERN from an external location. For detailed information about this service please follow this set of links:

For any questions you might have concerning Wide Area Data transfers, please contact: Wan-Data.Operations@cern.ch

Z Bibliography and further reading

CASTOR : http://cern.ch/it-div-ds/HSM/CASTOR/
Remote File I/O (rfio): http://cern.ch/it-div-ds/HSM/CASTOR/DOCUMENTATION/MAN/#rfio
CASTOR man pages: http://cern.ch/it-div-ds/HSM/CASTOR/DOCUMENTATION/MAN/
Central Data Recording at CERN (CDR)
CASTOR changes for V1.5.1: http://cern.ch/it-div-ds/HSM/CASTOR/VERSIONS/1.4.1.2.html
Name Server commands: http://cern.ch/it-div-ds/HSM/CASTOR/DOCUMENTATION/MAN/#ns
To get a new CERN account: http://consult.cern.ch/service/registration/
Status of CERN disk pools: http://cern.ch/it-div-ds/HSM/CASTOR/MONITOR/DISKPOOL/
Overview (1) of current tape hardware at CERN
Overview (2) of current tape hardware at CERN
CASTOR tapes that are DISABLED due to media errors : http://castor.web.cern.ch/castor/DOCUMENTATION/END_USERS/disabled_tapes.html and the associated CASTOR files : http://castor.web.cern.ch/castor/DOCUMENTATION/END_USERS/disabled_segments.html. The list of disabled segments excluding those from disabled tapes maye be found here.