ports//sysutils/ganglia-webfrontend/work/ganglia-3.0.6/ganglia.pod

=pod

=for comment The ganglia documentation is written in POD (Plain Old Documentation)
format. If you want to edit this file but don't know the POD format, it is very
easy to learn. Visit http://www.linuxgazette.com/issue73/spiel.html for a nice intro
or read the http://www.perldoc.com/perl5.6.1/pod/perlpod.html perl pod documentation.

=head1 Name

B<ganglia> - distributed monitoring system

=head1 Version

B<ganglia> GANGLIA_VERSION

The latest version of this software and document will always be found at
http://ganglia.sourceforge.net/. You are currently reading
$Revision: 682 $ of this document.

=head1 Synopsis

______ ___
/ ____/___ _____ ____ _/ (_)___ _
/ / __/ __ `/ __ \/ __ `/ / / __ `/
/ /_/ / /_/ / / / / /_/ / / / /_/ /
\____/\__,_/_/ /_/\__, /_/_/\__,_/
/____/ Distributed Monitoring System

Ganglia is a scalable distributed monitoring system for high-performance computing systems
such as clusters and Grids. It is based on a hierarchical design targeted at federations of
clusters. It relies on a multicast-based listen/announce protocol to monitor state within
clusters and uses a tree of point-to-point connections amongst representative cluster nodes
to federate clusters and aggregate their state. It leverages widely used technologies such as
XML for data representation, XDR for compact, portable data transport, and RRDtool for data
storage and visualization. It uses carefully engineered data structures and algorithms to
achieve very low per-node overheads and high concurrency. The implementation is robust, has
been ported to an extensive set of operating systems and processor architectures, and is
currently in use on over 500 clusters around the world. It has been used to link clusters
across university campuses and around the world and can scale to handle clusters with 2000 nodes.

The ganglia system is comprised of two unique daemons, a PHP-based web frontend and a few other
small utility programs.

=over 4

=item B<Ganglia Monitoring Daemon (gmond)>

Gmond is a multi-threaded daemon which runs on each cluster node you want to monitor.
Installation is easy. You don't have to have a common NFS filesystem or a database backend,
install special accounts, maintain configuration files or other annoying hassles.

Gmond has four main responsibilities: monitor changes in host state, announce relevant changes,
listen to the state of all other ganglia nodes via a unicast or multicast channel and answer requests for
an XML description of the cluster state.

Each gmond transmits in information in two different ways: unicasting/multicasting host state in external data
representation (XDR) format using UDP messages or sending XML over a TCP connection.

=item B<Ganglia Meta Daemon (gmetad)>

Federation in Ganglia is achieved using a tree of point-to-point
connections amongst representative cluster nodes to aggregate the
state of multiple clusters. At each node in the tree, a Ganglia Meta
Daemon (C<gmetad>) periodically polls a collection of child data
sources, parses the collected XML, saves all numeric, volatile metrics
to round-robin databases and exports
the aggregated XML over a TCP sockets to clients. Data sources may be either
C<gmond> daemons, representing specific clusters, or other
C<gmetad> daemons, representing sets of clusters. Data sources
use source IP addresses for access control and can be specified using
multiple IP addresses for failover. The latter capability is natural
for aggregating data from clusters since each C<gmond> daemon
contains the entire state of its cluster.

=item B<Ganglia PHP Web Frontend>

The Ganglia web frontend provides a view of the gathered information
via real-time dynamic web pages. Most importantly, it displays Ganglia
data in a meaningful way for system administrators and computer users.
Although the web frontend to ganglia started as a simple HTML view of the
XML tree, it has evolved into a system that keeps a colorful history of
all collected data.

The Ganglia web frontend caters to system administrators and users. For
example, one can view the CPU utilization over the past hour, day, week,
month, or year. The web frontend shows similar graphs for Memory usage,
disk usage, network statistics, number of running processes, and all other
Ganglia metrics.

The web frontend depends on the existence of the C<gmetad> which provides it
with data from several Ganglia sources. Specifically, the web frontend will
open the local port 8651 (by default) and expects to receive a Ganglia XML tree.
The web pages themselves are highly dynamic; any change to the Ganglia data appears
immediately on the site. This behavior leads to a very responsive site, but requires
that the full XML tree be parsed on every page access. Therefore, the Ganglia web
frontend should run on a fairly powerful, dedicated machine if it presents a large
amount of data.

The Ganglia web frontend is written in the PHP scripting language, and uses graphs
generated by C<gmetad> to display history information. It has been tested on many
flavours of Unix (primarily Linux) with the Apache webserver and the PHP 4.1 module.

=back

=head1 Installation

The latest version of all ganglia software can always be downloaded from http://ganglia.info/

Ganglia runs on Linux (i386, ia64, sparc, alpha, powerpc, m68k, mips, arm, hppa, s390),
Solaris, FreeBSD, AIX, IRIX, Tru64, HPUX, MacOS X and Windows NT/XP/2000 making it as
portable as it is scalable.

=head2 Monitoring Core Installation

If you use the Linux RPMs provided on the ganglia web site, you can skip
to the end of this section.

Ganglia uses the GNU autoconf so compilation and installation of the monitoring core is basically

% ./configure
% make
% make install

but there are some issues that you need to take a look at first.

=over 4

=item B<Kernel multicast support>

If you use the ganglia multicast support, you much have a kernel that
supports multicast. The
vast majority of machines have multicast support by default. If you have
problems with ganglia this is a core issue.

=item B<Gmetad is not installed by default>

Since C<gmetad> relies on the Round-Robin Database Tool ( see http://www.rrdtool.org/ )
it will not be compiled unless you explicit request it by using a B<--with-gmetad> flag.

% ./configure --with-gmetad

The configure script will fail if it cannot find the rrdtool library and header files.
By default, it expects to find them at /usr/include/rrd.h and /usr/lib/librrd.a. If
you installed them in different locations then you need to add the following configure
flags

% ./configure CFLAGS="-I/rrd/header/path" CPPFLAGS="-I/rrd/header/path" \
LDFLAGS="-L/rrd/library/path" --with-gmetad

of course, you need to substitute C</rrd/header/path> and C</rrd/library/path> with the
real location of the rrd tool header file and library respectively.

=item B<AIX should not be compiled with shared libraries>

You must add the C<--disable-shared> and C<--enable-static> configure flags if you running on AIX

% ./configure --disable-shared --enable-static

=item B<GEXEC confusion>

GEXEC is a scalable cluster remote execution system which provides fast, RSA authenticated
remote execution of parallel and distributed jobs. It provides transparent forwarding of
stdin, stdout, stderr, and signals to and from remote processes, provides local environment
propagation, and is designed to be robust and to scale to systems over 1000 nodes. Internally,
GEXEC operates by building an n-ary tree of TCP sockets and threads between gexec daemons and
propagating control information up and down the tree. By using hierarchical control, GEXEC
distributes both the work and resource usage associated with massive amounts of parallelism
across multiple nodes, thereby eliminating problems associated with single node resource limits
(e.g., limits on the number of file descriptors on front-end nodes).
(from http://www.theether.org/gexec )

C<gexec> is a great cluster execution tool but integrating it with ganglia is a bit clumsy.
GEXEC can run standalone without access to a ganglia C<gmond>. In standalone
mode gexec will use the hosts listed in your GEXEC_SVRS variable to run on. For example,
say I want to run C<hostname> on three machines in my cluster: C<host1>, C<host2> and C<host3>.
I use the following command line.

% GEXEC_SVRS="host1 host2 host3" gexec -n 3 hostname

and gexec would build an n-ary tree (binary tree by default) of TCP sockets to those machines
and run the command C<hostname>

As an added feature, you can have C<gexec> pull a host list from a locally running gmond and
use that as the host list instead of GEXEC_SVRS. The list is load balanced and C<gexec> will
start the job on the I<n> least-loaded machines.

For example..

% gexec -n 5 hostname

will run the command C<hostname> on the five least-loaded machines in a cluster.

To turn on the C<gexec> feature in ganglia you must configure ganglia with the C<--enable-gexec>
flag

% ./configure --enable-gexec

Enabling C<gexec> means that by default any host running gmond will send a special
message announcing that gexec is installed on it and open for requests.

Now the question is, what if I don't want gexec to run on every host in my cluster? For
example, you may not want to have C<gexec> run jobs on your cluster frontend nodes.

You simply add the following line to your C<gmond> configuration file (C</etc/gmond.conf> by
default)

no_gexec on

Simple huh? I know the configuration file option, C<no_gexec>, seems crazy (and it is). Why
have an option that says "yes to no gexec"? The early versions of gmond didn't use a configuration
file but instead commandline options. One of the commandline options was simply C<--no-gexec>
and the default was to announce gexec as on.

=back

Once you have successfully run

% ./configure
% make
% make install

you should find the following files installed in C</usr> (by default).

/usr/bin/gstat
/usr/bin/gmetric
/usr/sbin/gmond
/usr/sbin/gmetad

If you installed ganglia using RPMs then these files will be installed when you install
the RPM. The RPM is installed simply by running

% rpm -Uvh ganglia-gmond-GANGLIA_VERSION.i386.rpm
% rpm -Uvh ganglia-gmetad-GANGLIA_VERSION.i386.rpm

Once you have the necessary binaries installed, you can test your installation by running

% ./gmond

This will start the ganglia monitoring daemon. You should then be able to run

% telnet localhost 8649

And get an XML description of the state of your machine (and any other hosts running gmond
at the time).

If you are installing by source on Linux, scripts are provided to start C<gmetad> and C<gmond>
at system startup. They are easy to install from the source root.

% cp ./gmond/gmond.init /etc/rc.d/init.d/gmond
% chkconfig --add gmond
% chkconfig --list gmond
gmond 0:off 1:off 2:on 3:on 4:on 5:on 6:off
% /etc/rc.d/init.d/gmond start
Starting GANGLIA gmond: [ OK ]

Repeat this step with gmetad.

=head2 PHP Web Frontend Installation

=over 4

=item 1.

The B<./web> directory of the ganglia distribution contains all the necessary PHP files for
running your web frontend. Copy those files to C</var/www/html>, however look for the variable
C<DocumentRoot> in your Apache configuration
files to be sure. All the PHP script files use relative URLs in their links, so you may
place the C<ganglia/> directory anywhere convenient.

=item 2.

Ensure your webserver understands how to process PHP script files. Currently, the
web frontend contains certain php language that requires PHP version 4 or greater.
Processing PHP script files usually requires a webserver module, such as the
C<mod_php> for the popular Apache webserver. In RedHat Linux, the RPM package that
provides this module is called simply "php".

For Apache, C<mod_php> module must be enabled. The following lines should appear
somewhere in Apache's *conf files. This example applies to RedHat and Mandrake
Linux. The actual filenames may vary on your system. If you installed the php
module using an RPM package, this work will have been done automatically.

<IfDefine HAVE_PHP4>
LoadModule php4_module extramodules/libphp4.so
AddModule mod_php4.c
</IfDefine>

AddType application/x-httpd-php .php .php4 .php3 .phtml
AddType application/x-httpd-php-source .phps

=item 3.

The webfrontend requires the existance of the gmetad package on the webserver.
Follow the installation instructions on the gmetad page. Specifically, the webfrontend
requires the rrdtool and the C<rrds/> directory from gmetad. If you are a power user,
you may use NFS to simulate the local existance of the rrds.

=item 4.

Test your installation. Visit the URL:

http://localhost/ganglia/

With a web-browser, where localhost is the address of your webserver.

=back

Installation of the web frontend is simplified on Linux by using rpm.

% rpm -Uvh ganglia-web-GANGLIA_VERSION-1.i386.rpm
Preparing... ########################################### [100%]
1:ganglia-web ########################################### [100%]

=head1 Configuration

=head2 Gmond Configuration

The configuration file format has changed between gmond version 2.5.x and version
2.6.x. The change was necessary in order to allow more complex configuration options.

Gmond has a default configuration it will use if it does not find the default configuration
file B</etc/gmond.conf>. To see the default configuration simply run the command:

% gmond --default_config

and gmond will output its default configuration to stdout. This default configuration can
serve as a good starting place for building a more custom configuration.

% gmond --default_config > gmond.conf

would create a file B<gmond.conf> which you can then edit to taste and copy to B</etc/gmond.conf>
or elsewhere.

To start gmond with a configuration file other then B</etc/gmond.conf>, simply specify the
configuration file location by running

% gmond --config /my/ganglia/configs/custom.conf

If you want to convert a 2.5.x configuration file to 2.6.x file format, run the following command

% gmond --convert ./old_25_config.conf

and gmond with output the equivalent 2.6.x configuration file to stdout. You can then redirect
that output to a new configuration file which can serve as a starting point for your configuration.

% gmond --convert ./old_25_config.conf > ./new_26_config.conf

For details about gmond configuration options, simply run

% man gmond.conf

for a complete listing of options with detailed explanations.

=head2 Gmetad Configuration

The behavior of the Ganglia Meta Daemon is completely controlled by a single configuration
file which is by default C</etc/gmetad.conf>. For gmetad to do anything useful you much
specify at least one C<data_source> in the configuration. The format of the data_source
line is as follows

data_source "Cluster A" 127.0.0.1 1.2.3.4:8655 1.2.3.5:8625
data_source "Cluster B" 1.2.4.4:8655

In this example, there are two unique data sources: "Cluster A" and "Cluster B". The
Cluster A data source has three redundant sources. If gmetad cannot pull the data
from the first source, it will continue trying the other sources in order.

If you do not specify a port number, gmetad will assume the default ganglia port which
is 8649 (U*N*I*X on a phone key pad)

Here is a sample gmetad configuration file with comments

# This is an example of a Ganglia Meta Daemon configuration file
# http://ganglia.sourceforge.net/
#
#-------------------------------------------------------------------------------
# Setting the debug_level to 1 will keep daemon in the forground and
# show only error messages. Setting this value higher than 1 will make
# gmetad output debugging information and stay in the foreground.
# default: 0
# debug_level 10
#
#-------------------------------------------------------------------------------
# What to monitor. The most important section of this file.
#
# The data_source tag specifies either a cluster or a grid to
# monitor. If we detect the source is a cluster, we will maintain a complete
# set of RRD databases for it, which can be used to create historical
# graphs of the metrics. If the source is a grid (it comes from another gmetad),
# we will only maintain summary RRDs for it.
#
# Format:
# data_source "my cluster" [polling interval] address1:port addreses2:port ...
#
# The keyword 'data_source' must immediately be followed by a unique
# string which identifies the source, then an optional polling interval in
# seconds. The source will be polled at this interval on average.
# If the polling interval is omitted, 15sec is asssumed.
#
# A list of machines which service the data source follows, in the
# format ip:port, or name:port. If a port is not specified then 8649
# (the default gmond port) is assumed.
# default: There is no default value
#
# data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655
# data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651
# data_source "another source" 1.3.4.7:8655 1.3.4.8

data_source "my cluster" localhost

#
#-------------------------------------------------------------------------------
# Scalability mode. If on, we summarize over downstream grids, and respect
# authority tags. If off, we take on 2.5.0-era behavior: we do not wrap our output
# in <GRID></GRID> tags, we ignore all <GRID> tags we see, and always assume
# we are the "authority" on data source feeds. This approach does not scale to
# large groups of clusters, but is provided for backwards compatibility.
# default: on
# scalable off
#
#-------------------------------------------------------------------------------
# The name of this Grid. All the data sources above will be wrapped in a GRID
# tag with this name.
# default: Unspecified
# gridname "MyGrid"
#
#-------------------------------------------------------------------------------
# The authority URL for this grid. Used by other gmetads to locate graphs
# for our data sources. Generally points to a ganglia/
# website on this machine.
# default: "http://hostname/ganglia/",
# where hostname is the name of this machine, as defined by gethostname().
# authority "http://mycluster.org/newprefix/"
#
#-------------------------------------------------------------------------------
# List of machines this gmetad will share XML with. Localhost
# is always trusted.
# default: There is no default value
# trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org
#
#-------------------------------------------------------------------------------
# If you want any host which connects to the gmetad XML to receive
# data, then set this value to "on"
# default: off
# all_trusted on
#
#-------------------------------------------------------------------------------
# If you don't want gmetad to setuid then set this to off
# default: on
# setuid off
#
#-------------------------------------------------------------------------------
# User gmetad will setuid to (defaults to "nobody")
# default: "nobody"
# setuid_username "nobody"
#
#-------------------------------------------------------------------------------
# The port gmetad will answer requests for XML
# default: 8651
# xml_port 8651
#
#-------------------------------------------------------------------------------
# The port gmetad will answer queries for XML. This facility allows
# simple subtree and summation views of the XML tree.
# default: 8652
# interactive_port 8652
#
#-------------------------------------------------------------------------------
# The number of threads answering XML requests
# default: 4
# server_threads 10
#
#-------------------------------------------------------------------------------
# Where gmetad stores its round-robin databases
# default: "/var/lib/ganglia/rrds"
# rrd_rootdir "/some/other/place"

C<gmetad> has a C<--conf> option to allow you to specify alternate configuration files

% ./gmetad -conf=/tmp/my_custom_config.conf

=head2 PHP Web Frontend Configuration

Most configuration parameters reside in the C<ganglia/conf.php> file. Here you
may alter the template, gmetad location, RRDtool location, and set the default time range
and metrics for graphs.

The static portions of the Ganglia website are themable. This means you can alter elements
such as section lables, some links, and images to suit your individual tastes and environment.
The C<template_name> variable names a directory containing the current theme. Ganglia uses
TemplatePower to implement themes. A user-defined skin must conform to the template interface
as defined by the default theme. Essentially, the variable names and START/END blocks in a
custom theme must remain the same as the default, but all other HTML elements may be changed.

Other configuration variables in C<conf.php> specify the location of gmetad's files, and where
to find the rrdtool program. These locations need only be changed if you do not run gmetad on
the webserver. Otherwise the default locations should work fine. The C<default_range> variable
specifies what range of time to show on the graphs by default, with possible values of hour,
day, week, month, year. The C<default_metric> parameter specifies which metric to show on
the cluster view page by default.

=head1 Commandline Tools

There are two commandline tools that work with C<gmond> to add custom metrics and
query the current state of a cluster: C<gmetric> and C<gstat> respectively.

=head2 Gmetric

The B<Ganglia Metric Tool (gmetric)> allows you to easily monitor any arbitrary
host metrics that you like expanding on the core metrics that gmond measures by default.

If you want help with the gmetric sytax, simply use the "help" commandline option

% gmetric --help
gmetric GANGLIA_VERSION

Purpose:
The Ganglia Metric Client (gmetric) announces a metric
on the list of defined send channels defined in a configuration file

Usage: gmetric [OPTIONS]...

-h, --help Print help and exit
-V, --version Print version and exit
-c, --conf=STRING The configuration file to use for finding send channels
(default=`/etc/gmond.conf')
-n, --name=STRING Name of the metric
-v, --value=STRING Value of the metric
-t, --type=STRING Either
string|int8|uint8|int16|uint16|int32|uint32|float|double
-u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius
(default=`')
-s, --slope=STRING Either zero|positive|negative|both (default=`both')
-x, --tmax=INT The maximum time in seconds between gmetric calls
(default=`60')
-d, --dmax=INT The lifetime in seconds of this metric (default=`0')

Gmetric sends the metric specified on the commandline to all B<udp_send_channel>s specified
in the configuration file B</etc/gmond.conf> by default. If you want to send metric to
alternate B<udp_send_channel>s, you can specify a different configuration file as such:

% gmetric --conf=./custom.conf -n "wow" -v "it works" -t "string"

All metrics in ganglia have a name, value, type and optionally units. For example, say I wanted
to measure the temperature of my CPU (something gmond doesn't do by default) then I could
send this metric with name="temperature", value="63", type="int16" and units="Celcius".

Assume I have a program called C<cputemp> which outputs in text the temperature of the CPU

% cputemp
63

I could easily send this data to all listening gmonds by running

% gmetric --name temperature --value `cputemp` --type int16 --units Celcius

Check the exit value of gmetric to see if it successfully sent the data: 0 on success and -1 on failure.

To constantly sample this temperature metric, you just need too add this command to your cron table.

=head2 Gstat

The Ganglia Cluster Status Tool (gstat) is a commandline utility that allows you to get status
report for your cluster.

To get help with the commandline options, simply pass C<gstat> the C<--help> option

% gstat --help
gstat GANGLIA_VERSION

Purpose:
The Ganglia Status Client (gstat) connects with a
Ganglia Monitoring Daemon (gmond) and output a load-balanced list
of cluster hosts

Usage: gstat [OPTIONS]...
-h --help Print help and exit
-V --version Print version and exit
-a --all List all hosts. Not just hosts running gexec (default=off)
-d --dead Print only the hosts which are dead (default=off)
-m --mpifile Print a load-balanced mpifile (default=off)
-1 --single_line Print host and information all on one line (default=off)
-l --list Print ONLY the host list (default=off)
-iSTRING --gmond_ip=STRING Specify the ip address of the gmond to query (default='127.0.0.1')
-pINT --gmond_port=INT Specify the gmond port to query (default=8649)

Note: gstat with no option will only show gexec-enabled hosts. To see all hosts that are UP (regardless
of their gexec state) you need to add the B<--all> flag.

% gstat --all

=head1 Frequently Asked Questions (FAQ)

=over 4

=item B<What metrics does ganglia collect on platform x?>

To see a complete list of the metrics
that a particular gmond supports, run the command:

% gmond -m

and gmond will output all the metrics that it is capable of collecting
and sending.

This table describes all the metrics that ganglia collects and
shows what platforms the metric are supported on. (The following table
is only partially complete).

Metric Name Description Platforms
-----------------------------------------------------------------------
boottime System boot timestamp l,f
bread_sec
bwrite_sec
bytes_in Number of bytes in per second l,f
bytes_out Number of bytes out per second l,f
cpu_aidle Percent of time since boot idle CPU l
cpu_arm
cpu_avm
cpu_idle Percent CPU idle l,f
cpu_intr
cpu_nice Percent CPU nice l,f
cpu_num Number of CPUs l,f
cpu_rm
cpu_speed Speed in MHz of CPU l,f
cpu_ssys
cpu_system Percent CPU system l,f
cpu_user Percent CPU user l,f
cpu_vm
cpu_wait
cpu_wio
disk_free Total free disk space l,f
disk_total Total available disk space l,f
load_fifteen Fifteen minute load average l,f
load_five Five minute load average l,f
load_one One minute load average l,f
location GPS coordinates for host e
lread_sec
lwrite_sec
machine_type
mem_buffers Amount of buffered memory l,f
mem_cached Amount of cached memory l,f
mem_free Amount of available memory l,f
mem_shared Amount of shared memory l,f
mem_total Amount of available memory l,f
mtu Network maximum transmission unit l,f
os_name Operating system name l,f
os_release Operating system release (version) l,f
part_max_used Maximum percent used for all partitions l,f
phread_sec
phwrite_sec
pkts_in Packets in per second l,f
pkts_out Packets out per second l,f
proc_run Total number of running processes l,f
proc_total Total number of processes l,f
rcache
swap_free Amount of available swap memory l,f
swap_total Total amount of swap memory l,f
sys_clock Current time on host l,f
wcache

Platform key:
l = Linux, f = FreeBSD, a = AIX, c = Cygwin
m = MacOS, i = IRIX, h = HPUX, t = Tru64
e = Every Platform

If you are interested in B<how> the metrics are collected, just
take a look in directory C<./libmetrics> in the source distribution.
There is a single source file in the directory for each platform that is supported.

=item B<What does the error "Process XML (x): XML_ParseBuffer() error at line x: not well-formed">

This is an error that occurs when a ganglia components reads data from another ganglia component
and finds that the XML is not well-formed. The most common time this is a problem is when the PHP
web frontend tries to read the XML stream from gmetad.

To troubleshoot this problem, capture an XML from the ganglia component in question (gmetad/gmond).
This is easy to do if you have telnet installed. Simply login to the machine running the
component and run.

% telnet localhost 8651

By default, gmetad exports its XML on port 8651 and gmond exports its XML on port 8649. Modify
the port number above to suite your configuration.

When you connect to the port you should get an XML stream. If not, look in the process table on
the machine to ensure that the component is actually running.

Once you are getting an XML stream, capture it to a file by running.

% telnet localhost 8651 > XML.txt
Connection closed by foreign host.

If you open the file C<XML.txt>, you will see the captured XML stream. You will need to remove
the first three lines of the C<XML.txt> which will read...

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

Those lines are output from C<telnet> and not the ganglia component (I wish telnet would send
those messages to C<stderr> but they are send to C<stdout>).

There are many ways that XML can be misformed. The great tool for validating XML is C<xmllint>.
C<xmllint> will read the file and find the line containing the error.

% xmllint --valid --noout XML.txt

will read your captured XML stream, validate it against the ganglia DTD and check that it is
well-formed XML. C<xmllint> will quiet exit if there are no errors. If
there are errors they will be reported with line numbers. For example...

/tmp/XML.txt:3393: error: Opening and ending tag mismatch: HOST and CLUSTER
</CLUSTER>
^
/tmp/XML.txt:3394: error: Opening and ending tag mismatch: CLUSTER and GANGLIA_XML
</GANGLIA_XML>
^
/tmp/XML.txt:3395: error: Premature end of data in tag GANGLIA_XML

If you get errors, open C<XML.txt> and go to the line numbers in question. See if you can
understand based on your configuration how these errors could occur. If you cannot fix the
problem yourself, please email your C<XML.txt> and output from C<xmllint> to
C<ganglia-developers@lists.sourceforge.net>. Please include information about the version
of each component in question along with the operating system they are running on. The
more details we have about your configuration the more likely it is we will be able to help
you. Also, all mailing to C<ganglia-developers> is archiving and available to read on the
web. You may want to modify C<XML.txt> to remove any sensitive information.

=item B<How do I remove a host from the list?>

A common problem that people have is not being able to remove a host from the ganglia
web frontend.

Here is a common scenario

=over 4

=item 1.
All hosts in a cluster are send on the ganglia udp_send_channels.

=item 2.
One of the hosts fails or is moved for whatever reason.

=item 3.
All the hosts in the cluster report that the host is "dead" or "expired".

=item 4.
The sysadmin wants to removed this host from the "dead" list.

=back

Unfortunately there is currently no nice way to remove a single dead host from the list.
All data in gmond is soft state so you will need to restart all gmond and gmetad processes.
It is important to note that ALL dead hosts will be flushed from the record by restarting
the processes (since they have to hear the host at least once to know it is expired).

If you add the line

globals {
host_dmax = 3600
}

then hosts will be removed from host tables when they haven't been heard from in 3600 seconds.
See C<man gmond.conf> for details.

=item B<How good is Solaris, IRIX, Tru64 support?>

Here is an email from Steve Wagner about the state of the ganglia on Solaris,
IRIX and Tru64. Steve is to thank for porting ganglia to Solaris and Tru64. He also
helped with the IRIX port.

State of the IRIX port:

* CPU percentage stuff hasn't improved despite my efforts. I fear there
may be a flaw in the way I'm summing counters for all the CPUs.
* Auto-detection of network interfaces apparently segfaults.
* Memory and load reporting appear to be running properly.
* CPU speed is not being reported properly on multi-proc machines.
* Total/running processes are not reported.
* gmetad untested.
* Monitoring core apparently stable in foreground, background being tested
(had a segfault earlier).

State of the Tru64 port:

* CPU percentage stuff here works perfectly.
* Memory and swap usage stats are suspected to be inaccurate.
* Total/running processes are not reported.
* gmetad untested.
* Monitoring core apparently stable in foreground and background.

State of the Solaris port:
* CPU percentages are slightly off, but correct enough for trending
purposes.
* Load, ncpus, CPU speed, breads/writes, lreads/writes, phreads/writes,
and rcache/wcache are all accurate.
* Memory/swap statistics are suspiciously flat, but local stats bear
this out (and they *are* being updated) so I haven't investigated
further.
* Total processes are counted, but not running ones.
* gmetad appears stable

Anyway, all three ports I've been messing with are usable and fairly
stable. Although there are areas for improvement I think we really can't
keep hogging all this good stuff - what I'm looking at is ready for
release.

=item B<Where are the debian packages?>

Here is an email message from Preston Smith for Debian users

Debian packages for Debian 3.0 (woody) are available at
http://www.physics.purdue.edu/~psmith/ganglia
(i386, sparc, and powerpc are there presently, more architectures will
appear when I get them built.)
Packages for "unstable" (sid) will be available in the main Debian
archive soon.

Also, a CVS note: I checked in the debian/ directory used to create
debian packages.

=item B<How should I configure multihomed machines?>

Here is an email that Matt Massie sent to a user having problems with
multihomed machines

i need to add a section in the documentation talking about this since it
seems to be a common question.

when you use...

mcast_if eth1

.. in /etc/gmond.conf that tells gmond to send its data out the "eth1"
network interface but that doesn't necessarily mean that the source
address of the packets will match the "eth1" interface. to make sure that
data sent out eth1 has the correct source address run the following...

% route add -host 239.2.11.71 dev eth1

... before starting gmond. that should do the trick for you.

-matt

> I have seen some post related to some issues
> with gmond + multicast running on a dual nic
> frontend.
>
> Currently I am experiencing a weird behavior
>
> I have the following setup:
>
> -----------------------
> | web server + gmetad |
> -----------------------
> |
> |
> |
> ----------------------
> | eth0 A.B.C.112 |
> | |
> | Frontend + gmond |
> | |
> | eth1 192.168.100.1 |
> ----------------------
> |
> |
>
> 26 nodes each
> gmond
>
> In the frontend /etc/gmond.conf I have the
> following statement: mcast_if eth1
>
> The 26 nodes are correctly reported.
>
> However the Frontend is never reported.
>
> I am running iptables on the Frontend, and I am seing
> things like:
>
> INPUT packet died: IN=eth1 OUT= MAC= SRC=A.B.C.112 DST=239.2.11.71
> LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=53740 DF PROTO=UDP SPT=41608 DPT=8649
> LEN=16
>
> I would have expected the source to be 192.168.100.1 with mcast_if eth1
>
> Any idea ?

=item B<How should I configure my Cisco Catalyst Switches?>

Perhaps information regarding gmond on networks set up through cisco
catalyst switches should be mentioned in the ganglia documentation. I think
by default multicast traffic on the catalyst will flood all devices unless
configured properly. Here is a relavent snipet from a message forum, with a
link to cisco document.

If what you are trying to do, is minimizing the impact on your network due
to a multicast application, this link may describe what you want to do:
http://www.cisco.com/warp/public/473/38.html

We set up our switches according to this after a consultant came in and
installed an application multicasting several hundred packets per second.
This made the network functional again.

=back

=head1 Getting Support

The tired and thirsty prospector threw himself down at the edge of the
watering hole and started to drink. But then he looked around and saw
skulls and bones everywhere. "Uh-oh," he thought. "This watering hole
is reserved for skeletons." --Jack Handey

There are three mailing lists available to you: C<ganglia-general>, C<ganglia-developers> and
C<ganglia-announce>. You can join these lists or read their archives by
visiting https://sourceforge.net/mail/?group_id=43021

C<All of the ganglia mailing lists are closed>. That means that in order to
post to the lists, you must be subscribed to the list. We're sorry for the
inconvenience however it is very easy to subscribe and unsubscribe from the
lists. We had to close the mailing lists because of SPAM problems.

When you need help please follow these steps until your problem is resolved.

=over 4

=item 1.

completely read the documentation

=item 2.

check the C<ganglia-general> archive to see if other people have had the same problem

=item 3.

post your support request to the C<ganglia-general> mailing list

=item 4.

check the C<ganglia-developers> archive

=item 5.

post your question to the C<ganglia-developers> list

=back

please send all bugs, patches, and feature requests to the C<ganglia-developers> list
after you have checked the C<ganglia-developers> archive to see if the question has
already been asked and answered.

=head1 Copyright

=head1 Authors

The B<Ganglia Development Team>...

Bas van der Vlies basv Developer basv at users.sourceforge.net
Neil T. Spring bluehal Developer bluehal at users.sourceforge.net
Brooks Davis brooks_en_davis Developer brooks_en_davis at users.sourceforge.net
Eric Fraser fraze Developer fraze at users.sourceforge.net
greg bruno gregbruno Developer gregbruno at users.sourceforge.net
Jeff Layton laytonjb Developer laytonjb at users.sourceforge.net
Doc Schneider maddocbuddha Developer maddocbuddha at users.sourceforge.net
Mason Katz masonkatz Developer masonkatz at users.sourceforge.net
Mike Howard mhoward Developer mhoward at users.sourceforge.net
Matt Massie massie Project Admin massie at users.sourceforge.net
Oliver Mössinger olivpass Developer olivpass at users.sourceforge.net
Preston Smith pmsmith Developer pmsmith at users.sourceforge.net
Federico David Sacerdoti sacerdoti Developer sacerdoti at users.sourceforge.net
Tim Cera timcera Developer timcera at users.sourceforge.net
Mathew Benson wintermute11 Developer wintermute11 at users.sourceforge.net

=head1 Contributors

There have been dozens of contributors who have provided patches and helpful bug reports.
We need to list them here later.

=cut

syntax highlighted by Code2HTML, v. 0.9.1