*****************************************
  Monitoring the ARC Information System
*****************************************

The main configuration section for these probes is ``arcinfosys``, see
:ref:`configuration-files`.


EGIIS Check
===========

*This probe will soon be deprecated. Do not use it for new deployments.*

To monitor an EGIIS service, use ::

    check_egiis -H <HOST> [-P <PORT>] --index=<INDEX-NAME>

This will do an LDAP query of the EGIIS service on ``<HOST>:<PORT>``.  The
default port is 2135.  The base DN of the query is ``Mds-Vo-name=<INDEX-NAME>,
o=grid``.  The probe will also fetch the subschema at ``cn=subschema`` and
check the presence of attributes against MAY and MUST specifications in the
schema.  In addition some type conversions are attempted to catch invalid
data.

Any validation error will give a CRITICAL Nagios status.  If the index is
empty, a WARNING Nagios status is reported.  Otherwise, the status is OK and
counts for different registrations states is printed.


CE Health State using EMIES
===========================

The following probe contacts the EMIES service of the compute element and
checks the ``HealtStatus`` element in the reply.

    check_arcservice -u <url> [-k <key-file> -c <cert-file>] [-t <timeout>]

``arcinfo -c <host>`` shows whether a CE supports EMIES and which URL to use.
EMIES uses SSL client authentication.  By default the host certificate is
used.  To use a grid proxy, pass it as both key and certificate.  Example:

    check_arcservice -u https://arcce.example.org:443/arex \
                     -k /tmp/x509up_1000 -c /tmp/x509up_1000


CE Infosys Validation for the NorduGrid and GLUE 1 Schemas
==========================================================

*This probe will soon be deprecated. Do not use it for new deplomynts.*

The ARIS probe is invoked with ::

    check_aris -H <HOST> [-P <PORT>] [--cluster <CLUSTER>...] \
            [--cluster-test <testname>...] [--queue-test <testname>...] \
            [OTHER-OPTIONS...]

See ``check_aris --help`` for the full list of options.
It will query ``Mds-Vo-name=local, o=grid`` on ``<HOST>:<PORT>``.  The default
port is 2135.  If one or more clusters are specified with the ``--cluster``
option, only those will be checked (``nordugrid-cluster-name=<CLUSTER>``), and
it is considered error for any of them to be missing.  The probe validates
attributes of entries against MAY and MUST of the schema, and attempts some
type conversions.  For each found cluster, the probe will query and validate
queues.

If no clusters are found, or if no queues are found for a given cluster, it
will be reported as a warning.  You can change this by passing a Nagios status
to the option ``--if-no-clusters`` or ``--if-no-queues``, respectively.
Valid statuses are ``ok``, ``warning``, ``critical``, and ``unknown``, though
only the first three makes sense here.

This probe can also do custom checks on the LDAP data, either numeric limits
or regular-expression matches.  A custom test defined in the configuration
file under a section ``arcinfosys.aris.<testname>``, can be enabled by passing
any number of ``--cluster-test <testname>`` and ``--queue-test <testname>``
options to the probe.  The tests are run on entries of the type
``nordugrid-cluster`` and ``nordugrid-queue``, respectively.

The ARIS infosystem contains a attribute ``nordugrid-cluster-contactstring``
which provides the interface for job submission.  You can check that this URL
is accessible by passing ``--check-contact``.  This will do a list operation
and, if the logging level is ``INFO`` or lower, will report the number of
entries.  If the attribute is missing or the URL is inaccessible, the service
goes CRITICAL with an appropriate message.


Limit Checks
------------

A limit check takes the form

.. code-block:: ini

    [arcinfosys.aris.<testname>]
    type = limit
    value = <expr>
    critical.min = <value>
    critical.max = <value>
    critical.message = <message>
    warning.min = <value>
    warning.max = <value>
    warning.message = <message>

The ``type`` and ``value`` variables are required, and at least one of the
``min`` or one of the ``max`` variables should be given for the test to be
useful.  There are reasonable defaults for the messages, though if your
``<expr>`` is complex, you may want to provide a more human readable version.
The probe will

* Evaluate ``<expr>`` using Python's `eval` function, in an environment based
  on the LDAP attribute names to the corresponding converted values.  The
  variable names are obtained from the attribute names by replacing "``-``"
  with "``_``" and stripping common prefixes including
  "``nordugrid-cluster-``", "``nordugrid-queue-``", and "``Mds-``".

* If ``critical.min`` is given and the result is below this value, or if
  ``critical.max`` is given and the result is above this value, report it as a
  critical error.

* Similar for ``warning.min`` and ``warning.max``, reported as a warning.


Regular Expression Checks
-------------------------

A regular expression check takes the form:

.. code-block:: ini

    [arcinfosys.aris.<testname>]
    type = regex
    variable = <varname>
    critical.pattern = <python-regex>
    critical.message = <message>
    warning.pattern = <python-regex>
    warning.message = <message>

The ``type`` and ``variable`` settings are required, and you should specify at
least on of ``critical.pattern`` and ``warning.pattern``.  The variable name
is obtained the same way as for the limit checks.  The probe will consider all
values for the LDAP attribute corresponding to ``<varname>``.

* If ``critical.pattern`` is specified and none of the values match it, then a
  critical condition is reported, else

* if ``warning.pattern`` is specified and none of the values match it, then a
  warning is reported.

The following example will issue a critical state if a queue is not active:

.. code-block:: ini

    [arcinfosys.aris.queue-active]
    type = regex
    variable = status
    critical.pattern = ^active$
    critical.message = Inactive queue


Glue Schema Checks
------------------

Some CEs publish cluster and queue information in the Glue schema in addition
to the NorduGrid schema.  You can enable schema checks for these if present by
passing ``--enable-glue``.

The information in the Glue entries should match information in the ARC
entries as described in [ARCIS2011]_.  You can enable a partial comparison of
GlueCE, GlueCluster, and GlueSubCluster records by passing ``--compare-glue``.


.. [ARCIS2011]
    "The NorduGrid-ARC Information System";
    Balázs Kónya and Daniel Johansson;
    NORDUGRID-TECH-4;
    http://www.nordugrid.org/documents/arc_infosys.pdf
