=pod =head1 NAME B - Collects flow data and stores it in binary SiLK Flow files =head1 SYNOPSIS rwflowpack --sensor-configuration=FILE_PATH { --log-destination=DESTINATION | --log-directory=DIR_PATH [--log-basename=BASENAME] | --log-pathname=FILE_PATH } [--byte-order=ENDIAN] [--compression-method=COMP_METHOD] [--site-config-file=FILENAME] [--flush-timeout=VAL] [--pack-interfaces] [--no-file-locking] [--log-level=LEVEL] [--log-sysfacility=NUMBER] [--pidfile=FILE_PATH] [--no-daemon] [--input-mode=MODE] [--output-mode=MODE] MODE_SPECIFIC_SWITCHES To collect NetFlow v5 or IPFIX data from the network (default): rwflowpack ... [--input-mode=stream] [--sensor-name=SENSOR] ... To collect from files containing NetFlow v5 PDUs: rwflowpack ... --input-mode=file --netflow-file=FILE_PATH [--sensor-name=SENSOR] [--archive-directory=DIR_PATH] ... To collect from a remote B process: rwflowpack ... --input-mode=flowcap [--flowcap-port=NUMBER] --flowcap-address=IP_ADDR[:NUMBER][,IP_ADDR[:NUMBER]] --work-directory=DIR_PATH --valid-directory=DIR_PATH [--archive-directory=DIR_PATH] ... To collect from local files containing files created by B: rwflowpack ... --input-mode=fcfiles --incoming-directory=DIR_PATH [--polling-interval=NUMBER] [--archive-directory=DIR_PATH] ... To store the SiLK Flow files on the local machine (default): rwflowpack ... [--output-mode=local-storage] --root-directory=DIR_PATH ... To forward the SiLK Flow files to a remote machine: rwflowpack ... --output-mode=sending --sender-directory=DIR_PATH --incremental-directory=DIR_PATH ... =head1 DESCRIPTION B is a daemon that collects NetFlow-like data, converts the data to the SiLK Flow record format, categorizes each flow (e.g., as incoming or outgoing), and stores the data in binary flat files within a directory tree, with one file per hour-category-sensor tuple. See the I for an explanation of how SiLK categorizes flows and converts data to the SiLK format. =head1 OPTIONS Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as B<--arg>=I or B<--arg> I, though the first form is required for options that take optional parameters. =head2 General Configuration The following switch is required: =over 4 =item B<--sensor-configuration>=I Give the path to the configuration file that B will consult to determine whether a record represents an incoming or outgoing flow. The complete syntax of the configuration file is described in the B manual page; see also the I. =back One of the following switches is required: =over 4 =item B<--log-destination>=I Specify the destination where logging messages are written. When I begins with a slash C, it is treated as a file system path and all log messages are written to that file; there is no log rotation. When I does not begin with C, it must be one of the following strings: =over 4 =item C Messages are not written anywhere. =item C Messages are written to the standard output. =item C Messages are written to the standard error. =item C Messages are written using the B facility. =item C Messages are written to the syslog facility and to the standard error (this option is not available on all platforms). =back =item B<--log-directory>=I Use I as the directory where the log files are written. I must be a complete directory path. The log files have the form DIR_PATH/LOG_BASENAME-YYYYMMDD.log where I is the current date and I is the application name or the value passed to the B<--log-basename> switch when provided. The log files will be rotated: at midnight local time a new log will be opened and the previous day's log file will be compressed using B. (Old log files are not removed by B; the administrator should use another tool to remove them.) When this switch is provided, a process-ID file (PID) will also be written in this directory unless the B<--pidfile> switch is provided. B: Prior to SiLK 0.10.0, when the B<--sensor> switch was given, the sensor name was automatically included as part of the log file name. As of SiLK 0.10.0, the B<--log-basename> must be used to include this information in the log file name. =item B<--log-pathname>=I Use I as the complete path to the log file. The log file will not be rotated. =back The following set of switches is optional: =over 4 =item B<--byte-order>=I Set the byte order for newly created SiLK Flow files. When appending records to an existing file, the byte order of the file is maintained. The argument is one of the following: =over 4 =item C Use the byte order of the machine where B is running. This is the default. =item C Use network byte order (big endian) for the flow files. =item C Write the flow files in little endian format. =back =item B<--compression-method>=I Set the compression method for newly created SiLK Flow files to I. When appending records to an existing file, the compression method of the file is maintained. In addition to the packing (shrinking) of the flow records that SiLK normally does, B can use an external library to further reduce the size of the records on disk. The list of available compression methods and the default method are set when SiLK is compiled (the B<--help> and B<--version> switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support the following: =over 4 =item none Do not compress the SiLK Flow records using an external library. =item zlib Use the B library for compressing the flow records. =item lzo1x Use the I algorithm from the LZO real-time compression library for compressing the flow records. =item best Use whichever available method gives the C compression in general, though not necessarily the C for this particular file. =back =item B<--site-config-file>=I Read the SiLK site configuration from the named file I. When this switch is not provided, the location specified by the C<$SILK_CONFIG_FILE> environment variable is used if that variable is not empty. The value of C<$SILK_CONFIG_FILE> should include the name of the file. Otherwise, the application looks for a file named F in the following directories: the directory specified in the F<$SILK_DATA_ROOTDIR> environment variable; the data root directory that is compliled into SiLK (use the B<--version> switch to view this value); the directories F<$SILK_PATH/share/silk/>, F<$SILK_PATH/share/>, or F<$SILK_PATH>; or in the F and F directories parallel to the application's directory. =item B<--flush-timeout>=I Set the timeout for flushing any in-memory records to disk to I seconds. If not specified, the default is 5 minutes (600 seconds). When using local storage mode, this value specifies how often the files are flushed to disk to ensure that any records in memory are written to disk. When using sending output mode with a stream input mode, this value specifies how often to move the files from the incremental-directory to the sender-directory. =item B<--pack-interfaces> Allow one to override the default file output format of the packed SiLK Flow files that B writes. When this switch is present, B writes additional information into the packed files: the router's SNMP input and output interfaces and the next-hop IP address. The extra data produced by this switch is useful for determining why traffic is being stored in certain files. Note that this switch will only affect newly created files. New records will always be appended to an existing file in the file's current output format to maintain file integrity. =item B<--no-file-locking> Do not use advisory write locks. Normally, B will attempt to obtain a write lock on the data files prior to writing records to them; these locks prevent two instances of B from writing to the same data file. However, not all file systems support advisory write locks, and this switch must be used when writing data to such a file system. =item B<--log-level>=I Set the severity of messages that will be logged. The levels from most severe to least are: C, C, C, C, C, C, C, C. The default is C. =item B<--log-sysfacility>=I Set the facility that B uses for logging messages. This switch takes a number as an argument. The default is a value that corresponds to C on the system where B is running. This switch produces an error unless B<--log-destination>=syslog is specified. =item B<--log-basename>=I Use I in place of the application name for the files in the log directory. See the description of the B<--log-directory> switch. =item B<--pidfile>=I Set the complete path to the file in which B writes its process ID (PID) when it is running as a daemon. No PID file is written when B<--no-daemon> is given. When this switch is not present, no PID file is written unless the B<--log-directory> switch is specified, in which case the PID is written to F/rwflowpack.pid>>. B: Prior to SiLK 0.10.0, when the B<--sensor> switch was given, the sensor name was automatically included as part of the PID-file name. As of SiLK 0.10.0, the B<--pidfile> switch must be used to include this information in the PID-file name. =item B<--no-daemon> Force B to stay in the foreground---it does not become a daemon. Useful for debugging. =item B<--reader-function>=I This switch is deprecated and ignored. It is only present to maintain backward compatibility. =back =head2 Input and Output Mode B supports multiple ways of getting (the I mode) storing (the I mode) data. =over 4 =item B<--input-mode>=I Determine how B will gather data. The default input I is C. The available modes are =over 4 =item C B opens a port for every network-listening probe specified in the sensor configuration file. These ports expect to receive NetFlow v5 PDUs as generated by a router. =item C B reads NetFlow v5 PDUs from a file. The file's size must be an integer multiple of 1464, where each 1464 byte chunk contains a 24 byte NetFlow v5 header and space for thirty 48 byte NetFlow records. The number of valid records per chunk is specified in the header. =item C B connects over a TCP socket to a machine running B, and transfers specially compressed NetFlow v5 files (called flowcap files) from the remote location to the local disk for processing. Note that C mode may not be available depending on how B was built. =back =item B<--output-mode>=I Determines what B will do with the data as it is packed into SiLK binary files. The default output I is C. The available modes are =item C B writes the data on the local machine into a directory tree with a specific structure. =item C B writes the data into a temporary location on the local disk. A separate program, B, moves the data from the local machine to remote machines where B working in concert with the B will write the data into a directory tree with a specific structure. Note that B may have been built without support for C mode. =back =head2 Stream Collection Switches (--input-mode=stream) When the B<--input-mode> switch is set to C or when the switch is not provided, B expects to receive stream(s) of NetFlow data over the network. B will open a UDP port for every probe given in the configuration file that has a C attribute. The following switch is optional: =over 4 =item B<--sensor-name>=I Cause B to ignore all probes in the sensor configuration file except the probes for SENSOR. Only data for this sensor will be collected. This allows a common configuration file to be used by multiple B invocations, yet also allow each B instance only collect for a single sensor. There must be a sensor-probe definition for SENSOR in the configuration file. When this switch is not present, B will collect and pack data for all sensors. =back =head2 PDU File Collection Switches (--input-mode=file) Instead of reading flows from the network, B can process files containing NetFlow data. These files can be in one of two forms: either files generated by a NetFlow collector, or files generated by another SiLK program called B. To make B process a file generated by a NetFlow collector, pass it the B<--input-mode>=C switch. The file must have a particular format: The file's length should be an integer multiple of 1464 bytes, where 1464 is the maximum length of the NetFlow v5 PDU. Each 1464 block should contain the 24-byte NetFlow v5 header and space for 30 48-byte flow records, even if data for only one NetFlow record is valid. Although B can get the names of the NetFlow files from the C attributes in the sensor configuration file, it is more common to set C to F, and to pass the name of the NetFlow file on the command line with the B<--netflow-file> switch. This simplifies scripting; otherwise, the sensor configuration file would have to be rewritten for each run. In this mode, B will not become a daemon; it will remain in the foreground, process the NetFlow file(s), and exit. The following switches are all optional: =over 4 =item B<--sensor-name>=I Cause B to ignore all probes in the sensor configuration file except the probes for I. See above for a full description of this switch. =item B<--netflow-file>=I Name the full path of the file from which B reads NetFlow v5 PDUs. =item B<--archive-directory>=I Nameds the full path of the directory to which NetFlow files will be moved after they have been processed by B. If this switch is not provided, the original NetFlow source files are not modified, moved, or deleted. Removing files from the archive-directory is not the job of B; the system administrator should implement a separate cron job to clean this directory. =back =head2 Flowcap Collection Switches (--input-mode=flowcap) When the B<--input-mode>=flowcap switch is provided, B will process files created by another SiLK daemon called B. B collects NetFlow records near the router generating the NetFlow records, compresses the records, and stores them on the local disk. B will contact the machine where flowcap is running and transfer the files via TCP to its local disk; B then processes the files to generate SiLK Flow files. When operating in flowcap input mode, the first four of the following switches are required: =over 4 =item B<--flowcap-address>=I Specify the host addresses of the flowcap servers. B will attempt to contact these machines on the ports given. If no port is specified for an address, it will use the port specified by the B<--flowcap-port> switch. =item B<--flowcap-port>=I Specify the default port on which B will attempt to contact to flowcap servers. =item B<--work-directory>=I Name the full path of the directory used to store files as they are being received from flowcap. The files in this directory are incomplete; any files in this directory will be removed when B is started. Once complete, files are moved from this directory to the valid-directory. The work-directory, valid-directory, and archive-directory must be on the same file system mount point. =item B<--valid-directory>=I Name the full path of the directory used to store files that have been successfully received from flowcap but which have not yet been processed by B. Once processed by B, files are moved from this directory to the archive-directory, if it has been specified. The work-directory, valid-directory, and archive-directory must be on the same file system mount point. =item B<--archive-directory>=I Name the full path of the directory used to store flowcap files after B has processed them. If this switch is not provided, the flowcap files are deleted. Removing files from the archive-directory is not the job of B; the system administrator should implement a separate cron job to clean this directory. The work-directory, valid-directory, and archive-directory must be on the same file system mount point. =item B<--fc-address>=I Deprecated alias for B<--flowcap-address>. =item B<--fc-port>=I Deprecated alias for B<--flowcap-port>. =back =head2 Flowcap Files Collection Switches (--input-mode=fcfiles) When the B<--input-mode>=fcfiles switch is provided, B will process files created by another SiLK daemon called flowcap. flowcap collects NetFlow records near the router that is generating the NetFlow records, compresses the records, and stores them on the local disk. B will will poll for flowcap files to be inserted in its incoming directory. B then processes the files to generate SiLK Flow files. When operating in flowcap files input mode, the first of the following switches are required: =over 4 =item B<--incoming-directory>=I Name the full path of the directory which B will monitor for files created by flowcap. Once processed by B, files are moved from this directory to the archive-directory, if it has been specified. The incoming-directory and archive-directory must be on the same file system mount point. =item B<--polling-interval>=I Insert the number of seconds between which B will poll the incoming-directory for new files created by flowcap. This defaults to 15 seconds. =item B<--archive-directory>=I Name the full path of the directory used to store flowcap files after B has processed them. If this switch is not provided, the flowcap files are deleted. Removing files from the archive-directory is not the job of B; the system administrator should implement a separate cron job to clean this directory. The incoming-directory and archive-directory must be on the same file system mount point. =back =head2 Local Storage Mode Switches (--output-mode=local-storage) Once B has collected data, categorized it, and written it into files, it can do one of two things with the files: =over 4 =item 1 Store the files on the local disk in a well-defined location. =item 2 Transfer the files to another machine and store them in a well defined location. =back (The data files must be stored in a well-defined location so that B can find them. To see B's idea of the well-defined location, run B B<--help> and read the text after the explanation of the B<--data-rootdir> switch.) The default output-mode is to store the files on the local disk. When operating in this mode, the following switch is required: =over 4 =item B<--root-directory>=I Name the full path of the directory under which the files containing the packed SiLK Flow records will be stored. B will create subdirectories below I based on the data received. =back =head2 Sending Mode Storage Switches (--output-mode=sending) To transfer the packed SiLK Flow files to another machine, specify the B<--output-mode>=sending switch and invoke the B to transfer the files. When B is interoperating with B, the following three switches must be provided: =over 4 =item B<--incremental-directory>=I Name the full path of the directory under which packed SiLK files will initially be created. Files in this directory are considered to be incomplete; any files in this directory will be removed when B is started. Once complete, files are moved from this directory to the sender-directory. =item B<--sender-directory>=I Name the full path of the directory under which completed C files are stored while awaiting action by B. The B is responsible for removing files from this directory. =back =head1 FILES The root of the directory tree that contains the packed, binary SiLK Flow files is set by the B<--root-directory> switch; this directory is called the SILK_DATA_ROOTDIR. Immediately underneath it are subdirectories corresponding to the traffic categories (directions) discussed above. Under these are directories representing the year, month, and day in YYYY/MM/DD format. That is $SILK_DATA_ROOTDIR/in/{$YEAR}/{$MONTH}/{$DAY}/* $SILK_DATA_ROOTDIR/inweb/{$YEAR}/{$MONTH}/{$DAY}/* $SILK_DATA_ROOTDIR/innull/{$YEAR}/{$MONTH}/{$DAY}/* $SILK_DATA_ROOTDIR/out/{$YEAR}/{$MONTH}/{$DAY}/* $SILK_DATA_ROOTDIR/outweb/{$YEAR}/{$MONTH}/{$DAY}/* $SILK_DATA_ROOTDIR/outnull/{$YEAR}/{$MONTH}/{$DAY}/* For example, output web files for October 4th, 2003 are recorded in F<$SILK_DATA_ROOTDIR/outweb/2003/10/04/> The names of the files in these directories include all of this information, and are written in the form: I-I_YYYYMMDD.HH where I encodes the category and I is the sensor on which the flow was collected. =head1 SEE ALSO I, B, B, B, B, B, B, B =cut $SiLK: rwflowpack.pod 7148 2007-05-11 21:20:27Z mthomas $ Local variables: mode:text indent-tabs-mode:nil End: