=pod =head1 NAME B - Perform intersection operations on IPset files =head1 SYNOPSIS rwsettool {--union | --intersect | --difference | --sample } { --size=SIZE | --ratio=RATIO } [--seed=SEED] [--output-path=OUTPUT_PATH] [INPUT_SET ...] =head1 DESCRIPTION B performs a single operation on one or more IPset file(s) to produce a new IPset file. B reads the IPsets specified on the command line; if no IPsets are listed, B attempts to read an IPset from the standard input. The string C can be used as the name of an input file to force B to read from the standard input. The output is written to the specified I or to the standard output if it is not connected to a terminal. Passing the string C will also cause B to write the IPset to the standard output. =head1 OPTIONS Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A parameter to an option may be specified as B<--arg>=I or B<--arg> I, though the first form is required for options that take optional parameters. =head2 Operation Switches At least one of the following operation switches must be provided: =over 4 =item B<--union> Perform the set union operation: The resulting IPset will contain the IPs that exist in I of the input IPsets. =item B<--intersect> Perform the set intersection operation: The resulting IPset will contain the IPs that exist in I of the input IPsets. =item B<--difference> Perform the set difference (relative complement) operation: The resulting IPset will contain all IPs from the first input IPset that do not exist in any of the subsequent input IPsets. =item B<--sample> Select a random sample of IPs from the input IPsets. The size of the subset must be specified by either the B<--size> or B<--ratio> switches described below. In the case of multiple input IPsets, the resulting IPset is the union of all IP addresses sampled from each of the input IPsets. =back =head2 Sampling Switches These switches control how records are sampled by the B<--sample> operation. =over 4 =item B<--size>=I Select a random sample containing I randomly selected records from I input IPset. If the input set is smaller than I, all input IPs will be selected from that IPset. =item B<--ratio>=I Select a random sample where the selection probability for each record of each input set is I, specified as a decimal number between 0.0 and 1.0. The exact size of the subset selected from each file will vary between different runs with the same data. =item B<--seed>=I Seed the pseudo-random number generator with value I. By default, the seed will vary between runs. Seeding with specific values will produce repeatable results given the same input sets. =back =head2 Output Switches This switch controls the output: =over 4 =item B<--output-path>=I Write the resulting IPset to I. If this switch is not provided, B will attempt to write the IPset to the standard output, unless it is connected to a terminal. =item B<--compression-method>=I Set the compression method of the output to I. Some SiLK tools can use an external library to compress their binary output. The list of available compression methods and the default method are set when SiLK is compiled (the B<--help> and B<--version> switches print the available and default compression methods) and depend on which supported libraries are found. SiLK can support: =over 4 =item none Do not compress the output using an external library =item zlib Use the B library for compressing the output =item lzo1x Use the I algorithm from the LZO real time compression library for compression =item best Use whichever available method gives the C compression in general, though not necessarily the C for this particular output. =back =back =head1 EXAMPLES Assume the following IPsets: A.set = { 1, 2, 4, 6 } B.set = { 1, 3, 5, 7 } C.set = { 1, 3, 6, 8 } D.set = { } (empty set) Then the following commands will produce the following result IPsets: +---------------------------------+----------------------------+ | OPTIONS | RESULT | +---------------------------------+----------------------------+ | --union A.set B.set | { 1, 2, 3, 4, 5, 6, 7 } | | --union A.set C.set | { 1, 2, 3, 4, 6, 8 } | | --union A.set B.set C.set | { 1, 2, 3, 4, 5, 6, 7, 8 } | | --union C.set D.set | { 1, 3, 6, 8 } | | --intersect A.set B.set | { 1 } | | --intersect A.set C.set | { 1, 6 } | | --intersect A.set B.set C.set | { 1 } | | --intersect A.set D.set | { } | | --difference A.set B.set | { 2, 4, 6 } | | --difference B.set A.set | { 3, 5, 7 } | | --difference A.set B.set C.set | { 2, 4 } | | --difference C.set B.set A.set | { 8 } | | --difference C.set D.set | { 1, 3, 6, 8 } | | --difference D.set C.set | { } | |---------------------------------+----------------------------+ Sampling yields variable results, but here some example runs: +---------------------------------+----------------------------+ | COMMAND | RESULT | +---------------------------------+----------------------------+ | --sample -size 2 A.set | { 1, 4 } | | --sample -size 2 A.set | { 1, 6 } | | --sample -size 3 A.set | { 2, 4, 6 } | | --sample -size 2 A.set B.set | { 1, 2, 5, 7 } | | --sample -size 2 A.set B.set | { 3, 4, 5, 6 } | | --sample -size 2 A.set B.set | { 1, 4, 5 } | | --sample -ratio 0.5 A.set | { 2, 6 } | | --sample -ratio 0.5 A.set | { 4 } | | --sample -ratio 0.5 A.set B.set | { 1 } | | --sample -ratio 0.5 A.set B.set | { 2, 3, 5, 6, 7 } | +---------------------------------+----------------------------+ These examples demonstrate some important points about sampling from IPsets: =over 4 =item * When using B<--size>, an exact number of items is selected from each input set. =item * When using B<--size> with multiple input sets, the number of records in the output set may not be (num_input_sets*size) in all cases. =item * When using B<--ratio>, the number of items sampled is not stable between runs. =back =head1 NOTES B supersedes the B and B tools. =head1 SEE ALSO B, B, B, B, B =cut $SiLK: rwsettool.pod 6652 2007-03-14 13:29:39Z mthomas $ Local Variables: mode:text indent-tabs-mode:nil End: