% % S T A T L I B % % by John Walker % http://www.fourmilab.ch/ % % What's all this, you ask? Well, this is a "literate program", % written in the CWEB language created by Donald E. Knuth and % Silvio Levy. This file includes both the C source code for % the program and internal documentation in TeX. Processing % this file with the CTANGLE utility produces the C source file, % while the CWEAVE program emits documentation in TeX. The % current version of these programs may be downloaded from: % % http://www-cs-faculty.stanford.edu/~knuth/cweb.html % % where you will find additional information on literate % programming and examples of other programs written in this % manner. % % If you don't want to wade through all these details, don't % worry; this distribution includes a .cc file already % extracted and ready to compile. If "make" complains that it % can't find "ctangle" or "cweave", just "touch *.cc" % and re-make--apparently the process of extracting the files % from the archive messed up the date and time, misleading % make into believing it needed to rebuild those files. @** Introduction. \vskip 15pt \centerline{\ttitlefont STATLIB} \vskip 10pt \centerline{\titlefont Probability and Statistics Library} \vskip 15pt %\centerline{by John Walker} %\centerline{\.{http://www.fourmilab.ch/}} \centerline{\pdfURL{John Walker}{http://www.fourmilab.ch/}} \vskip 15pt \centerline{This program is in the public domain.} \vskip 15pt This program provides facilities for computations involving probability and statistics. A variety of probability distributions, both discrete and continuous, are provided along with a template class |dataTable| which implements a variety of descriptive statistical measures on an arbitrary numeric data type. This program requires none of the other components of the analysis software suite and may be extracted as a general statistics library. It uses the \.{DCDFLIB} package for its low-level computations. \vskip 2ex \noindent $\underline{\hbox{\bf References}}$ \vskip 1ex \noindent Abramowitz, Milton and Irene A. Stegun. {\it Handbook of Mathematical Functions} (Chapter 26). New York: Dover, 1974. ISBN 0-486-61272-4. \vskip 1ex \noindent Montgomery, Douglas C. and George C. Runger. {\it Applied Statistics and Probability for Engineers.} New York: John Wiley \& Sons, 1994. ISBN 0-471-54041-2. \vskip 1ex \noindent Wolfram Research. {\it Mathematica 4.0 Standard Add-On Packages.} Champaign, IL: Wolfram Media, 1999. ISBN 1-57955-007-X. \vskip 30pt @(statlib_test.cc@>= #define REVDATE "15th February 2003" @** Program global context. @q Define standard C++ namespaces and classes @> @i "cweb/c++lib.w" @c #include "config.h" /* System-dependent configuration */ @h @@/ @@/ @ We export the class definitions for this package in the external file \.{statlib.h} that programs which use this library may include. @(statlib.h@>= #ifndef STATLIB_HEADER_DEFINES #define STATLIB_HEADER_DEFINES #include // Make sure \.{math.h} is available #include #include #include #include #include #include #include using namespace std;@# @ #endif @ The following include files provide access to external components of the program not defined herein. @= #include "dcdflib.h" // DCDFlib Cumulative Distribution Function library #include "statlib.h" // Class definitions for this package @ The following classes are defined and their implementations provided. @= @@/ @** Probability distributions. We provide the class definitions for computations involving random variables with the following distributions: \vskip 1ex \hskip 8em{\vbox{ \settabs 3 \columns \+{\bf normalDistribution}&Normal (Gaussian) distribution\cr \+{\bf poissonDistribution}&Poisson distribution\cr \+{\bf chiSquareDistribution}&Chi-Square ($\chi^2$) distribution\cr \+{\bf gammaDistribution}&Gamma distribution\cr \+{\bf betaDistribution}&Beta distribution\cr \+{\bf tDistribution}&Student's $t$ distribution\cr \+{\bf FDistribution}&$F$ distribution\cr \+{\bf binomialDistribution}&Binomial distribution\cr \+{\bf negativeBinomialDistribution}&Negative binomial distribution\cr } } \vskip 1ex Computations are performed using the \.{DCDFLIB} Double Precision Cumulative Distribution Function Library developed by the Section of Computer Science, Department of Biomathematics, University of Texas M. D. Anderson Hospital. Source code for \.{DCDFLIB} may be downloaded via:\hfill\break \centerline{\pdfURL{anonymous FTP}{ftp://odin.mda.uth.tmc.edu/pub/unix/dcdflib.c.tar.Z}} The abstract superclass |probabilityDistribution| is the parent of all of the specific distributions, discrete and continuous. It implements properties common to all distributions and defines pure virtual functions which all specific distributions must implement. Quantities named in the the following methods are as follows: \vskip 1ex \settabs 6 \columns \+&\.{P}&$P$&Cumulative probability distribution function\cr \+&\.{Q}&$Q$&``Upper tail'' probability distribution ($Q=1-P$)\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{mean}&$\mu$&Mean value of distribution\cr \+&\.{stdev}&$\sigma$&Standard deviation\cr \+&\.{variance}&$\sigma^2$&Variance\cr \+&\.{skewness}&$\gamma_1$&Coefficient of skewness\cr \+&\.{kurtosisExcess}&$\gamma_2$&Kurtosis excess\cr \+&\.{kurtosis}&$\beta_2$&Kurtosis\cr \vskip 1ex @= class probabilityDistribution { private:@/ virtual string distributionName(void) = 0; // Return distribution name public:@/ static double Q_from_P(double x) { // Upper tail probability: $Q=1-P$ return 1 - x; } static double P_from_Q(double x) { // CDF from upper tail probability: $P=1-Q$ return 1 - x; } virtual double mean(void) = 0; // Return mean of distribution virtual double stdev(void) = 0; // Return standard deviation $\sigma$ of distribution double variance(void) { // Return variance $\sigma^2$ of deviation return stdev() * stdev(); } virtual double skewness(void) = 0; // Skewness $\gamma_1$ of distribution virtual double kurtosisExcess(void) = 0; // Kurtosis excess $\gamma_2$ of distribution double kurtosis(void) { // Kurtosis $(\beta_2=\gamma_2+3$) of distribution return kurtosisExcess() + 3; } virtual double CDF_P(double x) = 0; // Compute CDF $P=\int_{-\infty}^{|x|}f(x)\,dx$ double CDF_Q(double x) { // $Q=1-P$ return Q_from_P(CDF_P(x)); } // Write description of distribution to output stream virtual void writeParameters(ostream &of) { of << "Distribution: " << distributionName() << "\n"; of << " Mean = " << mean() << " Stdev = " << stdev() << " Variance = " << variance() << " Skewness = " << skewness() << "\n"; of << " Kurtosis = " << kurtosis() << " KurtosisExcess = " << kurtosisExcess() << "\n"; } }; @*1 Normal (Gaussian) distribution. A {\it normal distribution} describes a random variable $x$ with probability distribution function $$f_x(x;\mu,\sigma)={1\over\sqrt{2\pi\sigma}}e^{-\Bigl({{(x-\mu)^2}\over{2\sigma^2}}\Bigr)}, \qquad -\infty= class normalDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "normal"; } double mu; // Mean value double sigma; // Standard deviation public:@/ /* The default constructor creates a standard normal distribution: $\mu=0$, $\sigma=1$. */ normalDistribution() { set_mu_sigma(); } /* The following constructor creates a normal distribution with a given mean and standard deviation. */ normalDistribution(double c_mu, double c_sigma) { set_mu_sigma(c_mu, c_sigma); } // Accessors void set_mu(double c_mu = 0) { mu = c_mu; } void set_sigma(double c_sigma = 1) { sigma = c_sigma; } void set_mu_sigma(double c_mu = 0, double c_sigma = 1) { set_mu(c_mu); set_sigma(c_sigma); } double get_mu(void) { return mu; } double get_sigma(void) { return sigma; } // Implementations of abstract virtual functions double mean(void) { return get_mu(); } double stdev(void) { return get_sigma(); } double skewness(void) { return 0; // $\gamma_1=0$ for normal distribution } double kurtosisExcess(void) { return 0; // $\gamma_2=0$ for normal distribution } // Auxiliary accessors @; @; // Static transformation functions static double p_from_mu_sigma_x(double mu, double sigma, double x); static double x_from_p_mu_sigma(double p, double mu, double sigma); static double mu_from_p_x_sigma(double p, double x, double sigma); static double sigma_from_p_x_mu(double p, double x, double mu); // Analysis methods @; double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ }; @ The standard deviation, $\sigma$, is the square root of the variance. You can set the standard deviation of the distribution from the variance with the |set_variance| method. @= void set_variance(double var) { set_sigma(sqrt(var)); } @ A binomial distribution for a large number of trials $n$ each with probability $0.5$ is approximated by a normal distribution with $\mu=n/2$ and $\sigma=\sqrt{n/4}$. The |approximateBinomial| method allows you to create such a normal distribution by specifying the number of trials $n$. @= void approximateBinomial(unsigned int n) { set_mu_sigma(n / 2, sqrt(n / 4.0)); } @ The $z$ score for a given value in a normal distribution is given by $$z={{m-\mu}\over{\sigma}}.$$ These $z$ values obey the $\chi^2$ distribution as does the summation of a number of independent $z$ scores. @= double z_score(double m) { return (m - mean()) / stdev(); } @ The following transformation functions are |static| methods of the |normalDistribution| class. They permit calculation of properties of the normal distribution without reference to a particular |normalDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{mu}&$\mu$&Mean value\cr \+&\.{sigma}&$\sigma$&Standard deviation\cr \vskip 1ex @= double normalDistribution::p_from_mu_sigma_x(double mu, double sigma, double x) { double p, q, bound; int which = 1, status; cdfnor(&which, &p, &q, &x, &mu, &sigma, &status, &bound); if (status != 0) { throw(out_of_range("normalDistribution::p_from_mu_sigma_x: Result out of bounds")); } return p; } double normalDistribution::x_from_p_mu_sigma(double p, double mu, double sigma) { double q, x, bound; int which = 2, status; q = 1 - p; cdfnor(&which, &p, &q, &x, &mu, &sigma, &status, &bound); if (status != 0) { throw(out_of_range("normalDistribution::x_from_p_mu_sigma: Result out of bounds")); } return x; } double normalDistribution::mu_from_p_x_sigma(double p, double x, double sigma) { double q, mu, bound; int which = 3, status; q = 1 - p; cdfnor(&which, &p, &q, &x, &mu, &sigma, &status, &bound); if (status != 0) { throw(out_of_range("normalDistribution::mu_from_p_x_sigma: Result out of bounds")); } return mu; } double normalDistribution::sigma_from_p_x_mu(double p, double x, double mu) { double q, sigma, bound; int which = 4, status; q = 1 - p; cdfnor(&which, &p, &q, &x, &mu, &sigma, &status, &bound); if (status != 0) { throw(out_of_range("normalDistribution::sigma_from_p_x_mu: Result out of bounds")); } return sigma; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability a given value $x$ of the random variable will occur. It is computed by integrating the probability density function $f(x)$ from $-\infty$ to $x$: $$P=\int_{-\infty}^{|x|}f(x)\,dx.$$ @= double normalDistribution::CDF_P(double x) { return p_from_mu_sigma_x(mu, sigma, x); } @*1 Chi-Square ($\chi^2$) distribution. A random variable has a {\it chi-square} ($\chi^2$) distribution if it follows the probability distribution function $$f(x)={1\over{2^{k/2}\Gamma{\Bigl({k\over2}\Bigr)}}}x^{(k/2)-1}e^{-x/2}, \qquad x>0$$ where $k$ is the {\it degrees of freedom} of the distribution and the $\Gamma$ is the gamma function---the generalisation of factorial to real arguments $$\Gamma(n)=\int_0^\infty x^{n-1}e^{-x}\,dx, \qquad n>0.$$ The $\chi^2$ distribution is a special case of the {\it gamma distribution} $$f_x(x; \lambda, r)={{\lambda^r x^{r-1} e^{-\lambda x}}\over{\Gamma(r)}}, \qquad x>0, \lambda>0, r>0$$ when $\lambda=1/2$ and $r$ is the degrees of freedom $k$ divided by 2. The $\chi^2$ distribution is applicable when combining the results of a series of independent experiments whose outcomes are normally distributed. If $Z_1, Z_2,\ldots Z_k$ are the results of $k$ independent experiments normalised so that their mean $\mu=0$ and standard deviation $\sigma=1$, then the sum $X$ of the squares of these outcomes: $$X=\sum_{i=1}^k {Z_i}^2$$ will be $\chi^2$ distributed with $k$ degrees of freedom, $\chi_k^2$. @= class chiSquareDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "chi-square"; } double k; // Degrees of freedom public:@/ /* The default constructor creates a $\chi^2$ distribution with $k=1$. */ chiSquareDistribution() { set_k(); } chiSquareDistribution(double c_k = 1) { set_k(c_k); } // Accessors void set_k(double c_k = 1) { k = c_k; } double get_k(void) { return k; } // Implementations of abstract virtual functions double mean(void) { return get_k(); } double stdev(void) { return sqrt(2 * k); // $\sigma=\sqrt{2 k}$ } double skewness(void) { return sqrt(8 / k); // $\gamma_1=\sqrt{8/k}$ } double kurtosisExcess(void) { return 12 / k; // $\gamma_2=12/k$ } // Static transformation functions static double p_from_k_x(double k, double x); static double x_from_p_k(double p, double k); static double k_from_p_x(double p, double x); // Analysis methods double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " Degrees of freedom = " << k << "\n"; } }; @ The following transformation functions are |static| methods of the |chiSquareDistribution| class. They permit calculation of properties of the $\chi^2$ distribution without reference to a particular |chiSquareDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{k}&$k$&Degrees of freedom\cr \vskip 1ex @= double chiSquareDistribution::p_from_k_x(double k, double x) { double p, q, bound; int which = 1, status; cdfchi(&which, &p, &q, &x, &k, &status, &bound); if (status != 0) { throw(out_of_range("chiSquareDistribution::p_from_k_x: Result out of bounds")); } return p; } double chiSquareDistribution::x_from_p_k(double p, double k) { double q, x, bound; int which = 2, status; q = 1 - p; cdfchi(&which, &p, &q, &x, &k, &status, &bound); if (status != 0) { throw(out_of_range("chiSquareDistribution::x_from_p_k: Result out of bounds")); } return x; } double chiSquareDistribution::k_from_p_x(double p, double x) { double q, k, bound; int which = 3, status; q = 1 - p; cdfchi(&which, &p, &q, &x, &k, &status, &bound); if (status != 0) { throw(out_of_range("chiSquareDistribution::k_from_p_x: Result out of bounds")); } return k; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability a given value $x$ of the random variable will occur. It is computed by integrating the probability density function $f(x)$ from $-\infty$ to $x$: $$P=\int_{-\infty}^{|x|}f(x)\,dx.$$ @= double chiSquareDistribution::CDF_P(double x) { return p_from_k_x(k, x); } @*1 Gamma distribution. The probability density function of the Gamma distribution is $$f_x(x;\alpha,\lambda)={{\lambda^{\alpha}x^{\alpha-1}e^{-\lambda x}}\over{\Gamma(\alpha)}} , \qquad x > 0 .$$ $\alpha$ is referred to as the {\it shape parameter} and $\lambda$ the {\it scale parameter}. Some references use $r$ instead of $\alpha$ for the shape parameter, and others specify the scale parameter as the reciprocal of the value used here. A chi-square distribution with $k$ degrees of freedom is a special case of a gamma distribution with $\lambda=1/2$ and $r=k/2$. @= class gammaDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "gamma"; } double alpha; // Shape parameter $\alpha$ double lambda; // Scale parameter $\lambda$ public:@/ /* The default constructor creates a gamma distribution with $\alpha=10$, $\lambda=10$. */ gammaDistribution() { set_alpha_lambda(); } gammaDistribution(double c_alpha = 10, double c_lambda = 10) { set_alpha_lambda(c_alpha, c_lambda); } // Accessors void set_alpha(double c_alpha = 10) { alpha = c_alpha; } double get_alpha(void) { return alpha; } void set_lambda(double c_lambda = 10) { lambda = c_lambda; } double get_lambda(void) { return lambda; } void set_alpha_lambda(double c_alpha = 10, double c_lambda = 10) { set_alpha(c_alpha); set_lambda(c_lambda); } // Implementations of abstract virtual functions double mean(void) { return alpha / lambda; // $\mu=\alpha/\lambda$ } double stdev(void) { return sqrt(alpha) / lambda; // $\sigma=\sqrt{\alpha}/\lambda$ } double skewness(void) { return 2 / sqrt(alpha); // $\gamma_1=2/sqrt{\alpha}$ } double kurtosisExcess(void) { return 6 / alpha; // $\gamma_2=6/\alpha$ } // Static transformation functions static double p_from_alpha_lambda_x(double alpha, double lambda, double x); static double x_from_p_alpha_lambda(double p, double alpha, double lambda); static double alpha_from_p_lambda_x(double p, double lambda, double x); static double lambda_from_p_alpha_x(double p, double alpha, double x); double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " Shape (alpha) = " << alpha << " Scale (lambda) = " << lambda << "\n"; } }; @ The following transformation functions are |static| methods of the |gammaDistribution| class. They permit calculation of properties of the Gamma distribution without reference to a particular |gammaDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{alpha}&$\alpha$&Shape parameter\cr \+&\.{lambda}&$\lambda$&Scale parameter\cr \vskip 1ex @= double gammaDistribution::p_from_alpha_lambda_x(double alpha, double lambda, double x) { double p, q, bound; int which = 1, status; cdfgam(&which, &p, &q, &x, &alpha, &lambda, &status, &bound); if (status != 0) { throw(out_of_range("gammaDistribution::p_from_alpha_lambda_x: Result out of bounds")); } return p; } double gammaDistribution::x_from_p_alpha_lambda(double p, double alpha, double lambda) { double q, x, bound; int which = 2, status; q = 1 - p; cdfgam(&which, &p, &q, &x, &alpha, &lambda, &status, &bound); if (status != 0) { throw(out_of_range("gammaDistribution::x_from_p_alpha_lambda: Result out of bounds")); } return x; } double gammaDistribution::alpha_from_p_lambda_x(double p, double lambda, double x) { double q, alpha, bound; int which = 3, status; q = 1 - p; cdfgam(&which, &p, &q, &x, &alpha, &lambda, &status, &bound); if (status != 0) { throw(out_of_range("gammaDistribution::alpha_from_p_lambda_x: Result out of bounds")); } return alpha; } double gammaDistribution::lambda_from_p_alpha_x(double p, double alpha, double x) { double q, lambda, bound; int which = 4, status; q = 1 - p; cdfgam(&which, &p, &q, &x, &alpha, &lambda, &status, &bound); if (status != 0) { throw(out_of_range("gammaDistribution::lambda_from_p_alpha_x: Result out of bounds")); } return lambda; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability of $n$ successes in $x$ trials. @= double gammaDistribution::CDF_P(double x) { return p_from_alpha_lambda_x(alpha, lambda, x); } @*1 Beta distribution. If $X_a$ and $X_b$ are independent gamma distributions with shape ($\alpha$) parameters $a$ and $b$ respectively and identical scale ($\lambda$) parameters, then the random variable $${X_a}\over{X_a+X_b}$$ will obey the {\it beta distribution} whose probability density function is $$f(x;a,b)={{(1-x)^{b-1}x^{a-1}}\over{B(a,b)}}$$ where $B(a,b)$ is the {\it Euler beta function} $$B(a,b)={{\Gamma(a)\Gamma(b)}\over{\Gamma(a+b)}}.$$ @= class betaDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "beta"; } double a; // Shape parameter of gamma variable $X_1$ double b; // Scale parameter of gamma variable $X_2$ public:@/ /* The default constructor creates a beta distribution with $a=5$, $b=10$. */ betaDistribution() { set_a_b(); } betaDistribution(double c_a = 5, double c_b = 10) { set_a_b(c_a, c_b); } // Accessors void set_a(double c_a = 5) { a = c_a; } double get_a(void) { return a; } void set_b(double c_b = 10) { b = c_b; } double get_b(void) { return b; } void set_a_b(double c_a = 5, double c_b = 10) { set_a(c_a); set_b(c_b); } // Implementations of abstract virtual functions double mean(void) { return a / (a + b); // $\mu=a/(a+b)$ } double stdev(void) { return sqrt(a * b) / ((a + b) * sqrt(a + b + 1)); // $\sigma={{\sqrt{ab}}\over{(a+b)\sqrt{a+b+1}}}$ } double skewness(void) { return (2 * (b - a) * sqrt(a + b + 1)) / (sqrt(a * b) * (a + b + 2)); // $\gamma_1={{2(b-a)\sqrt{a+b+1}}\over{\sqrt{ab}(a+b+2)}}$ } double kurtosisExcess(void) { return ((3 * (a + b + 1) * ((a * b * (a + b - 6)) + (2 * (a + b) * (a + b))) / (a * b * (a + b + 2) * (a + b + 3)))) - 3; /* $\gamma_2={{3(a+b+1)((ab(a+b-6))+(2(a+b)^2))}\over {ab(a+b+2)(a+b+3)}}-3$ */ } // Static transformation functions static double p_from_a_b_x(double a, double b, double x); static double x_from_p_a_b(double p, double a, double b); static double a_from_p_b_x(double p, double b, double x); static double b_from_p_a_x(double p, double a, double x); double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " a = " << a << " b = " << b << "\n"; } }; @ The following transformation functions are |static| methods of the |betaDistribution| class. They permit calculation of properties of the Beta distribution without reference to a particular |betaDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{a}&$a$&Shape parameter of first Gamma distribution\cr \+&\.{b}&$b$&Shape parameter second Gamma distribution\cr \vskip 1ex @= double betaDistribution::p_from_a_b_x(double a, double b, double x) { double p, q, y, bound; int which = 1, status; y = 1 - x; cdfbet(&which, &p, &q, &x, &y, &a, &b, &status, &bound); if (status != 0) { throw(out_of_range("betaDistribution::p_from_a_b_x: Result out of bounds")); } return p; } double betaDistribution::x_from_p_a_b(double p, double a, double b) { double q, x, y, bound; int which = 2, status; q = 1 - p; cdfbet(&which, &p, &q, &x, &y, &a, &b, &status, &bound); if (status != 0) { throw(out_of_range("betaDistribution::x_from_p_a_b: Result out of bounds")); } return x; } double betaDistribution::a_from_p_b_x(double p, double b, double x) { double a, q, y, bound; int which = 3, status; q = 1 - p; y = 1 - x; cdfbet(&which, &p, &q, &x, &y, &a, &b, &status, &bound); if (status != 0) { throw(out_of_range("betaDistribution::a_from_p_b_x: Result out of bounds")); } return a; } double betaDistribution::b_from_p_a_x(double p, double a, double x) { double b, q, y, bound; int which = 4, status; q = 1 - p; y = 1 - x; cdfbet(&which, &p, &q, &x, &y, &a, &b, &status, &bound); if (status != 0) { throw(out_of_range("betaDistribution::b_from_p_a_x: Result out of bounds")); } return b; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability of $n$ successes in $x$ trials. @= double betaDistribution::CDF_P(double x) { return p_from_a_b_x(a, b, x); } @*1 Student's $t$ distribution. If $Z$ is a random variable with a standard normal distribution ($\mu=0, \sigma=1$) and $V$ is a chi-square random variable, the random variable $$T={Z\over\sqrt{V/k}}$$ will have the probability density function $$f(x)={{\Gamma((k+1)/2)}\over{\sqrt{\pi k}\Gamma(k/2)}}\cdot {1\over{((x^2/k)+1)^{(k+1)/2}}}, \qquad -\infty2$$ where $k$ is the {\it degrees of freedom} of the distribution $V$ and $\Gamma$ is the gamma function. This is referred to as a $t$ distribution with $k$ degrees of freedom. The $t$ distribution is useful in determining confidence intervals and testing hypotheses about the mean of samples taken from a normal distribution. @= class tDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "t"; } double k; // Degrees of freedom public:@/ /* The default constructor creates a $t$ distribution with $k=3$. */ tDistribution() { set_k(); } tDistribution(double c_k = 3) { set_k(c_k); } // Accessors void set_k(double c_k = 3) { k = c_k; } double get_k(void) { return k; } // Implementations of abstract virtual functions double mean(void) { return 0; } double stdev(void) { return sqrt(k /(k - 2)); // $\sigma=\sqrt{k/(k-2)}$ } double skewness(void) { return 0; // $\gamma_1=0$ } double kurtosisExcess(void) { return 6.0 / (k - 4); // $\gamma_2=6/(k - 4)$ } // Static transformation functions static double p_from_k_x(double k, double x); static double x_from_p_k(double p, double k); static double k_from_p_x(double p, double x); // Analysis methods double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " Degrees of freedom = " << k << "\n"; } }; @ The following transformation functions are |static| methods of the |tDistribution| class. They permit calculation of properties of the $t$ distribution without reference to a particular |tDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{k}&$k$&Degrees of freedom\cr \vskip 1ex @= double tDistribution::p_from_k_x(double k, double x) { double p, q, bound; int which = 1, status; cdft(&which, &p, &q, &x, &k, &status, &bound); if (status != 0) { throw(out_of_range("tDistribution::p_from_k_x: Result out of bounds")); } return p; } double tDistribution::x_from_p_k(double p, double k) { double q, x, bound; int which = 2, status; q = 1 - p; cdft(&which, &p, &q, &x, &k, &status, &bound); if (status != 0) { throw(out_of_range("tDistribution::x_from_p_k: Result out of bounds")); } return x; } double tDistribution::k_from_p_x(double p, double x) { double q, k, bound; int which = 3, status; q = 1 - p; cdft(&which, &p, &q, &x, &k, &status, &bound); if (status != 0) { throw(out_of_range("tDistribution::k_from_p_x: Result out of bounds")); } return k; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability a given value $x$ of the random variable will occur. It is computed by integrating the probability density function $f(x)$ from $-\infty$ to $x$: $$P=\int_{-\infty}^{|x|}f(x)\,dx.$$ @= double tDistribution::CDF_P(double x) { return p_from_k_x(k, x); } @*1 $F$ distribution. If $A$ and $B$ are independent random variables with chi-square distributions with $u$ and $v$ degrees of freedom respectively, the ratio of these variables divided by their degrees of freedom $${{A/u}\over{B/v}}$$ follows the {\it F} distribution with probability density function $$f_x(x;u,v)={{\Gamma\Bigl({{u+v}\over 2}\Bigr)\Bigl({u\over v}\Bigr)^{u/2}x^{u/2-1}} \over{\Gamma\Bigl({u\over 2}\Bigr)\Gamma\Bigl({v\over 2}\Bigr) \Bigl[\Bigl({u\over v}\Bigr)x+1\Bigr]^{(u+v)/2}}}, \qquad 0 < x < \infty.$$ @= class FDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "F"; } double u; // Numerator degrees of freedom double v; // Denominator degrees of freedom public:@/ /* The default constructor creates an F distribution with $u=8$, $v=8$. */ FDistribution() { set_u_v(); } FDistribution(double c_u = 8, double c_v = 8) { set_u_v(c_u, c_v); } // Accessors void set_u(double c_u = 8) { u = c_u; } double get_u(void) { return u; } void set_v(double c_v = 8) { v = c_v; } double get_v(void) { return v; } void set_u_v(double c_u = 8, double c_v = 8) { set_u(c_u); set_v(c_v); } // Implementations of abstract virtual functions double mean(void) { return v / (v - 2); } double stdev(void) { return sqrt((2 * (v * v) * (u + v - 2)) / (u * ((v - 2) * (v - 2)) * (v - 4))); // $\sigma=\sqrt{{2v^2(u+v-2)}\over{u(v-2)^2(v-4)}}$ } double skewness(void) { return (2 * sqrt(2.0) * sqrt(v - 4) * (2 * u + v - 2)) / (sqrt(u) * (v - 6) * sqrt(u + v - 2)); // $\gamma_1={{2\sqrt{2}\sqrt{v-4}(2u+v-2)}\over{\sqrt{u}(v-6)\sqrt{u+v-2}}}$ } double kurtosisExcess(void) { return (12 * ((v - 4) * (v - 2) * (v - 2) + u * (u + v - 2) * (5 * v - 22))) / (u * (v - 8) * (v - 6) * (u + v - 2)); /* $\gamma_2={{12(v-4)(v-2)^2+u(u+v-2)(5v-22)}\over {u(v-8)(v-6)(u+v-2)}}$ */ } // Static transformation functions static double p_from_u_v_x(double u, double v, double x); static double x_from_p_u_v(double p, double u, double v); static double u_from_p_v_x(double p, double v, double x); static double v_from_p_u_x(double p, double u, double x); double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " Numerator DF = " << u << " Denominator DF = " << v << "\n"; } }; @ The following transformation functions are |static| methods of the |FDistribution| class. They permit calculation of properties of the $F$ distribution without reference to a particular |FDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{u}&$u$&Numerator degrees of freedom\cr \+&\.{v}&$v$&Denominator degrees of freedom\cr \vskip 1ex @= double FDistribution::p_from_u_v_x(double u, double v, double x) { double p, q, bound; int which = 1, status; cdff(&which, &p, &q, &x, &u, &v, &status, &bound); if (status != 0) { throw(out_of_range("FDistribution::p_from_u_v_x: Result out of bounds")); } return p; } double FDistribution::x_from_p_u_v(double p, double u, double v) { double q, x, bound; int which = 2, status; q = 1 - p; cdff(&which, &p, &q, &x, &u, &v, &status, &bound); if (status != 0) { throw(out_of_range("FDistribution::x_from_p_u_v: Result out of bounds")); } return x; } double FDistribution::u_from_p_v_x(double p, double v, double x) { double q, u, bound; int which = 3, status; q = 1 - p; cdff(&which, &p, &q, &x, &u, &v, &status, &bound); if (status != 0) { throw(out_of_range("FDistribution::u_from_p_v_x: Result out of bounds")); } return u; } double FDistribution::v_from_p_u_x(double p, double u, double x) { double q, v, bound; int which = 4, status; q = 1 - p; cdff(&which, &p, &q, &x, &u, &v, &status, &bound); if (status != 0) { throw(out_of_range("FDistribution::v_from_p_u_x: Result out of bounds")); } return v; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability of $n$ successes in $x$ trials. @= double FDistribution::CDF_P(double x) { return p_from_u_v_x(u, v, x); } @*1 Poisson distribution. The {\it Poisson distribution} gives the probability of an event with an arrival rate $\lambda$ for a given interval of occurring in an period of $x$ intervals. The the probability density function of the Poisson distribution is $$f(x;\lambda)={{e^{-\lambda}\lambda^x}\over{x!}}, \qquad x=0,1,2\ldots.$$ @= class poissonDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "poisson"; } double lambda; // Arrival rate public:@/ /* The default constructor creates a poisson distribution with $\lambda=0.5$. */ poissonDistribution() { set_lambda(); } poissonDistribution(double c_lambda = 0.5) { set_lambda(c_lambda); } // Accessors void set_lambda(double c_lambda = 0.5) { lambda = c_lambda; } double get_lambda(void) { return lambda; } // Implementations of abstract virtual functions double mean(void) { return lambda; } double stdev(void) { return sqrt(lambda); // $\sigma=\sqrt{\lambda}$ } double skewness(void) { return 1 / sqrt(lambda); // $\gamma_1=1/\sqrt{\lambda}$ } double kurtosisExcess(void) { return 1 / lambda; // $\gamma_2=1/\lambda$ } // Static transformation functions static double p_from_lambda_x(double lambda, double x); static double x_from_p_lambda(double p, double lambda); static double lambda_from_p_x(double p, double x); // Analysis methods double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " Arrival rate = " << lambda << "\n"; } }; @ The following transformation functions are |static| methods of the |poissonDistribution| class. They permit calculation of properties of the Poisson distribution without reference to a particular |poissonDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{x}&$x$&Value of random variable\cr \+&\.{lambda}&$\lambda$&Arrival rate\cr \vskip 1ex @= double poissonDistribution::p_from_lambda_x(double lambda, double x) { double p, q, bound; int which = 1, status; cdfpoi(&which, &p, &q, &x, &lambda, &status, &bound); if (status != 0) { throw(out_of_range("poissonDistribution::p_from_lambda_x: Result out of bounds")); } return p; } double poissonDistribution::x_from_p_lambda(double p, double lambda) { double q, x, bound; int which = 2, status; q = 1 - p; cdfpoi(&which, &p, &q, &x, &lambda, &status, &bound); if (status != 0) { throw(out_of_range("poissonDistribution::x_from_p_lambda: Result out of bounds")); } return x; } double poissonDistribution::lambda_from_p_x(double p, double x) { double q, lambda, bound; int which = 3, status; q = 1 - p; cdfpoi(&which, &p, &q, &x, &lambda, &status, &bound); if (status != 0) { throw(out_of_range("poissonDistribution::lambda_from_p_x: Result out of bounds")); } return lambda; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability a given value $x$ of the random variable will occur. It is computed by integrating the probability density function $f(x)$ from $-\infty$ to $x$: $$P=\int_{-\infty}^{|x|}f(x)\,dx.$$ @= double poissonDistribution::CDF_P(double x) { return p_from_lambda_x(lambda, x); } @*1 Binomial distribution. An experiment consisting of $n$ independent trials, each with probability $r$ of success, will follow a {\it binomial distribution} with probability density function $$f_x(x;r,n)={n\choose x}r^x(1-r)^{n-x}, \qquad n=1,2,\ldots \; x=0,1,\ldots,n$$ where the notation $${n\choose x}={{n!}\over{x!(n-x)!}}$$ represents the number of ways of choosing $n$ items from a set of $x$. (We use $r$ to represent the probability in the above equations instead of the customary $p$ to avoid confusion with $P$, the value of the cumulative distribution function.) For an experiment with equal probability of success or failure such as flipping a coin, the probability $r$ is 0.5. As the number of trials becomes large, the binomial distribution approaches the normal distribution; in such circumstances you may wish to use the |approximateBinomial| method of the |normalDistribution| class. @= class binomialDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "binomial"; } double n; // Number of trials double r; // Probability of success in each trial public:@/ /* The default constructor creates a binomial distribution with $n=2$, $r=0.5$. */ binomialDistribution() { set_n_r(); } binomialDistribution(double c_n = 2, double c_r = 0.5) { set_n_r(c_n, c_r); } // Accessors void set_n(double c_n = 2) { n = c_n; } double get_n(void) { return n; } void set_r(double c_r = 0.5) { r = c_r; } double get_r(void) { return r; } void set_n_r(double c_n = 2, double c_r = 0.5) { set_n(c_n); set_r(c_r); } // Implementations of abstract virtual functions double mean(void) { return n * r; } double stdev(void) { return sqrt(n * r * (1 - r)); // $\sigma=\sqrt{n r (1-r)}$ } double skewness(void) { return (1 - (2 * r)) / stdev(); // $\gamma_1={{1-2r}\over\sigma}$ } double kurtosisExcess(void) { return (1 - (6 * (1 - r) * r)) / (n * (1 - r) * r); // $\gamma_2={{1-6(1-r)r}\over{n(1-r)r}}$ } // Static transformation functions static double p_from_n_r_s(double n, double r, double s); static double s_from_p_r_n(double p, double r, double n); static double n_from_p_r_s(double p, double r, double s); static double r_from_p_n_s(double p, double n, double s); // Analysis methods @; double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " Trials = " << n << " Probability = " << r << "\n"; } }; @ The following transformation functions are |static| methods of the |binomialDistribution| class. They permit calculation of properties of the binomial distribution without reference to a particular |binomialDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{n}&$n$&Number of trials\cr \+&\.{r}&$r$&Probability of success in each trial\cr \+&\.{s}&$s$&Number of successes\cr \vskip 1ex @= double binomialDistribution::p_from_n_r_s(double n, double r, double s) { double p, q, ri, bound; int which = 1, status; ri = 1 - r; cdfbin(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("binomialDistribution::p_from_n_r_s: Result out of bounds")); } return p; } double binomialDistribution::s_from_p_r_n(double p, double r, double n) { double q, ri, s, bound; int which = 2, status; q = 1 - p; ri = 1 - r; cdfbin(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("binomialDistribution::s_from_p_r_n: Result out of bounds")); } return s; } double binomialDistribution::n_from_p_r_s(double p, double r, double s) { double q, n, ri, bound; int which = 3, status; q = 1 - p; ri = 1 - r; cdfbin(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("binomialDistribution::n_from_p_r_s: Result out of bounds")); } return n; } double binomialDistribution::r_from_p_n_s(double p, double n, double s) { double q, r, ri, bound; int which = 4, status; q = 1 - p; cdfbin(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("binomialDistribution::r_from_p_n_s: Result out of bounds")); } return r; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability a given value $x$ of the random variable will occur. It is computed by integrating the probability density function $f(x)$ from $-\infty$ to $x$: $$P=\int_{-\infty}^{|x|}f(x)\,dx.$$ @= double binomialDistribution::CDF_P(double x) { return p_from_n_r_s(n, r, x); } @*1 Negative binomial distribution. The {\it negative binomial distribution} for a series of independent trials each with a probability of success $r$ gives the probability of $n$ successes in a sequence of $x$ trials. The probability density function is $$f_x(x;n,r)={x-1\choose n-1}(1-r)^{x-n}r^n, \qquad n=1,2,\ldots \; x=n, n+1,\ldots.$$ (We use $r$ to represent the probability in the above equations instead of the customary $p$ to avoid confusion with $P$, the value of the cumulative distribution function.) For an experiment with equal probability of success or failure such as flipping a coin, the probability $r$ is 0.5. Note that the probability is necessarily zero when $x= class negativeBinomialDistribution : public probabilityDistribution { private:@/ virtual string distributionName(void) { return "negative binomial"; } double n; // Number of successes double r; // Probability of success in each trial public:@/ /* The default constructor creates a negative binomial distribution with $n=2$, $r=0.5$. */ negativeBinomialDistribution() { set_n_r(); } negativeBinomialDistribution(double c_n = 2, double c_r = 0.5) { set_n_r(c_n, c_r); } // Accessors void set_n(double c_n = 2) { n = c_n; } double get_n(void) { return n; } void set_r(double c_r = 0.5) { r = c_r; } double get_r(void) { return r; } void set_n_r(double c_n = 2, double c_r = 0.5) { set_n(c_n); set_r(c_r); } // Implementations of abstract virtual functions double mean(void) { return (n * (1 - r)) / r; } double stdev(void) { return sqrt(n * (1 - r)) / r; // $\sigma={{\sqrt{n (1-r)}}\over{r}}$ } double skewness(void) { return (2 - r) / sqrt(n * (1 - r)); // $\gamma_1={{2-r}\over\sqrt{n (1-r)}}$ } double kurtosisExcess(void) { return (6 * (1 - r) + (r * r)) / (n * (1 - r)); // $\gamma_2={{6(1-r)+r^2}\over{n(1-r)}}$ } // Static transformation functions static double p_from_n_r_s(double n, double r, double s); static double s_from_p_r_n(double p, double r, double n); static double n_from_p_r_s(double p, double r, double s); static double r_from_p_n_s(double p, double n, double s); double CDF_P(double x); // $P=\int_{-\infty}^{|x|}f(x)\,dx$ // Write description of distribution to output stream void writeParameters(ostream &of) { probabilityDistribution::writeParameters(of); of << " Successes = " << n << " Probability = " << r << "\n"; } }; @ The following transformation functions are |static| methods of the |negativeBinomialDistribution| class. They permit calculation of properties of the negative binomial distribution without reference to a particular |negativeBinomialDistribution| object. The functions operate on the following quantities: \vskip 1ex \settabs 8 \columns \+&\.{p}&$P$&Cumulative probability distribution\cr \+&\.{s}&$s$&Number of successes\cr \+&\.{r}&$r$&Probability of success in each trial\cr \+&\.{n}&$n$&Number of trials\cr \vskip 1ex @= double negativeBinomialDistribution::p_from_n_r_s(double n, double r, double s) { double p, q, ri, bound; int which = 1, status; ri = 1 - r; cdfnbn(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("negativeBinomialDistribution::p_from_n_r_s: Result out of bounds")); } return p; } double negativeBinomialDistribution::s_from_p_r_n(double p, double r, double n) { double q, ri, s, bound; int which = 2, status; q = 1 - p; ri = 1 - r; cdfnbn(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("negativeBinomialDistribution::s_from_p_r_n: Result out of bounds")); } return s; } double negativeBinomialDistribution::n_from_p_r_s(double p, double r, double s) { double q, n, ri, bound; int which = 3, status; q = 1 - p; ri = 1 - r; cdfnbn(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("negativeBinomialDistribution::n_from_p_r_s: Result out of bounds")); } return n; } double negativeBinomialDistribution::r_from_p_n_s(double p, double n, double s) { double q, r, ri, bound; int which = 4, status; q = 1 - p; cdfnbn(&which, &p, &q, &s, &n, &r, &ri, &status, &bound); if (status != 0) { throw(out_of_range("negativeBinomialDistribution::r_from_p_n_s: Result out of bounds")); } return r; } @ The {\it cumulative distribution function} (CDF) $P$ value is the probability of $n$ successes in $x$ trials. @= double negativeBinomialDistribution::CDF_P(double x) { return p_from_n_r_s(n, r, x); } @** Descriptive statistics. Descriptive statistics refers to empirical tests applied to data sets, measuring quantities such as their mean value, variance, and shape. We implement this via the template class |dataTable|, which is a generalisation of the STL |vector| template with additional methods which compute statistical metrics. The elements in the |dataTable| may be any type which can be converted to a |double|. The elements of a |dataTable| with $n$ items are referred to as $x_1,\ldots x_n$. @= template class dataTable : public vector { public:@/ double mean(void); // Arithmetic mean value ($\mu$) double geometricMean(void); // Geometric mean double harmonicMean(void); // Harmonic mean double median(void); // Median (central value) double RMS(void); // Root mean square T mode(void); // Mode (most frequent value) double percentile(double k); // Percentile ($0\leq k\leq 1$) double quartile(int q) // Quartiles (implemented with |percentile|) { assert(q >= 1 && q <= 3); return percentile(0.25 * q); } double variance(void); // Variance ($\sigma^2$) double varianceMLE(void) // Variance Maximum Likelihood Estimate { return (variance() * (this->size() - 1)) / this->size(); } double stdev(void) // Standard deviation ($\sigma$) { return sqrt(variance()); } double stdevMLE(void) // Standard deviation Maximum Likelihood Estimate { return sqrt(varianceMLE()); } double centralMoment(int k); // $k$th central moment double skewness(void); // Skewness double kurtosis(void); // Kurtosis }; @*2 Arithmetic mean. The arithmetic {\bf mean} is the average value of the elements in the |dataTable| $${1\over n}\sum_{i=1}^n x_i.$$ The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::mean(void) { typename dataTable::iterator p; double m = 0; for (p = this->begin(); p != this->end(); p++) { m += *p; } return m / this->size(); } @*2 Geometric mean. The {\bf geometric mean} of a set of $n$ values $x_i$ is $$\prod_{i=1}^n x_i^{1\over n}.$$ The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::geometricMean(void) { typename dataTable::iterator p; double g = 1, ni = 1.0 / this->size(); for (p = this->begin(); p != this->end(); p++) { g *= pow(*p, ni); } return g; } @*2 Harmonic mean. The {\bf harmonic mean} of a set of $n$ values $x_i$ is $${n\over{\sum_{i=1}^n {1\over x_i}}}.$$ The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::harmonicMean(void) { typename dataTable::iterator p; double d = 0; for (p = this->begin(); p != this->end(); p++) { d += 1.0 / (*p); } return this->size() / d; } @*2 Root mean square. The {\bf root mean square} of a set of $n$ values $x_i$ is $$\sqrt{{1\over n}\sum_{i=1}^n x_i^2}$$ The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::RMS(void) { typename dataTable::iterator p; double sum = 0; for (p = this->begin(); p != this->end(); p++) { sum += (*p) * (*p); } return sqrt(sum / this->size()); } @*2 Median. The {\bf median} or central value of a distribution is the value for which half the elements are smaller and half are greater. For a distribution with an even number of elements, the median is defined as the arithmetic mean of the two elements in the middle of the sorted distribution. We implement the median by using the |sort| algorithm to create an ordered copy of the distribution, then extract the central value. The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::median(void) { dataTable v(*this); sort(v.begin(), v.end()); if (v.size() & 1) { return (double) v[(v.size() + 1) / 2]; } else { return (double) ((v[(v.size() / 2) - 1] + v[v.size() / 2]) / 2.0); } } @*2 Mode. The {\bf mode} is the value which occurs most frequently in the collection of data. This really only makes sense when the |dataTable| is instantiated with an integral data type, but we blither away regardless of the data type; the user may be employing |double|s to represent integers too large for the widest integral type. The type returned is the same as the the elements of the |dataTable|. We compute the mode by sorting the data table in ascending order, then scanning it linearly for repeated values, keeping track of the item with the most repeats. If two or more values share the largest number of occurrences, the largest value is returned as the mode. @= template T dataTable::mode(void) { dataTable v(*this); T cmode, bmode = 0; int n = 0, most = 0; typename dataTable::iterator p; sort(v.begin(), v.end()); p = v.begin(); cmode = *p++; while (p != v.end()) { T cval = *p++; if (cval == cmode) { n++; } else { if (n > most) { most = n; bmode = cmode; } n = 1; cmode = cval; } } if (n > most) { // Mode may be last item in table most = n; bmode = cmode; } return bmode; } @*2 Percentile. The $k$th {\bf percentile} is the value such that $100k$\% of the data are less than or equal to that value. The result returned is always a |double|. If the result of multiplying the size of the |dataTable| by $k$ is an integer, that item is returned. Otherwise, the mean of the two adjacent items is returned. Note that |percentile(0.5)| is a synonym for |median()|. @= template double dataTable::percentile(double k) { dataTable v(*this); double index, result; int i; assert(k >= 0 && k <= 1); sort(v.begin(), v.end()); index = v.size() * k; if (index != floor(index)) { i = ((int) index); result = v[i]; } else { i = ((int) index) - 1; result = (v[i] + v[i + 1]) / 2.0; } return result; } @*2 Variance. The {\bf variance} ($\sigma^2$) of a set of $n$ values $x_i$ is $${1\over {n-1}}\sum_{i=1}^n {(x_i-\overline{x})^2}$$ where $\overline{x}$ is the |mean()| value of the |dataTable|. The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::variance(void) { typename dataTable::iterator p; double mu = mean(), sum = 0; for (p = this->begin(); p != this->end(); p++) { double t = ((*p) - mu); sum += t * t; } return sum / (this->size() - 1); } @*2 Central moments. The $k$th {\bf central moment} of a set of $n$ values $x_i$ is $${1\over {n}}\sum_{i=1}^n {(x_i-\overline{x})^k}$$ where $\overline{x}$ is the |mean()| value of the |dataTable|. The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::centralMoment(int k) { typename dataTable::iterator p; double mu = mean(), sum = 0; for (p = this->begin(); p != this->end(); p++) { sum += pow((*p) - mu, (double) k); } return sum / this->size(); } @*2 Skewness. The {\bf skewness} of a distribution is a measure of its asymmetry with respect to the mean. It is defined as the third central moment of the distribution divided by the cube of the maximum likelihood estimate of the standard deviation. The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::skewness(void) { double sigma = stdevMLE(); return centralMoment(3) / (sigma * sigma * sigma); } @*2 Kurtosis. {\bf Kurtosis} is a measure of how sharply peaked the distribution is. It is defined as the fourth central moment of the distribution divided by the square of the maximum likelihood estimate of the variance. The result is always a |double| regardless of the data type of the |dataTable|. @= template double dataTable::kurtosis(void) { double v = varianceMLE(); return centralMoment(4) / (v * v); } @** Test program. @(statlib_test.cc@>= @; @; int main(int argc, char *argv[]) { extern char *optarg; /* Imported from |getopt| */ // extern int optind; int opt; @;@/ @; #if 1 // Statistical library tests { normalDistribution nd(100, 7.07); cout << "Normal dist: mu = " << nd.get_mu() << " sigma = " << nd.get_sigma() << " variance = " << nd.variance() << "\n"; cout << " P = " << nd.CDF_P(110) << " Q = " << nd.CDF_Q(110) << "\n"; nd.writeParameters(cout); nd.approximateBinomial(200); cout << "Normal dist: mu = " << nd.get_mu() << " sigma = " << nd.get_sigma() << " variance = " << nd.variance() << "\n"; cout << " P = " << nd.CDF_P(110) << " Q = " << nd.CDF_Q(110) << " z = " << nd.z_score(110) << "\n"; } { poissonDistribution pd(10); pd.writeParameters(cout); cout << "Poisson dist: lambda = " << pd.get_lambda() << "\n"; cout << " P = " << pd.CDF_P(5) << " Q = " << pd.CDF_Q(5) << "\n"; } { chiSquareDistribution xd(32); xd.writeParameters(cout); cout << "Chi-square dist: k = " << xd.get_k() << "\n"; cout << " P = " << xd.CDF_P(36) << " Q = " << xd.CDF_Q(36) << "\n"; } { gammaDistribution gd(7.5, 3.75); gd.writeParameters(cout); cout << "Gamma dist: alpha = " << gd.get_alpha() << " lambda = " << gd.get_lambda() << "\n"; cout << " P = " << gd.CDF_P(2.2) << " Q = " << gd.CDF_Q(2.2) << "\n"; } { betaDistribution bd(3, 4); bd.writeParameters(cout); cout << "Beta dist: a = " << bd.get_a() << " b = " << bd.get_b() << "\n"; cout << " P = " << bd.CDF_P(0.4) << " Q = " << bd.CDF_Q(0.4) << "\n"; } { tDistribution xd(32); xd.writeParameters(cout); cout << "t dist: k = " << xd.get_k() << "\n"; cout << " P = " << xd.CDF_P(0.8) << " Q = " << xd.CDF_Q(0.8) << "\n"; } { FDistribution fd(12, 16); fd.writeParameters(cout); cout << "F dist: u = " << fd.get_u() << " v = " << fd.get_v() << "\n"; cout << " P = " << fd.CDF_P(0.8) << " Q = " << fd.CDF_Q(0.8) << "\n"; } { binomialDistribution bd(200, 0.5); cout << "Binomial dist: n = " << bd.get_n() << " r = " << bd.get_r() << " variance = " << bd.variance() << "\n"; cout << " P = " << bd.CDF_P(110) << " Q = " << bd.CDF_Q(110) << " z = " << bd.z_score(110) << "\n"; bd.writeParameters(cout); } { negativeBinomialDistribution bd(10, 0.5); cout << "Negative binomial dist: n = " << bd.get_n() << " r = " << bd.get_r() << " variance = " << bd.variance() << "\n"; cout << " P = " << bd.CDF_P(15) << " Q = " << bd.CDF_Q(15) << "\n"; bd.writeParameters(cout); } #endif return 0; } @ We use |getopt| to process command line options. This permits aggregation of options without arguments and both \.{-d}{\it arg} and \.{-d} {\it arg} syntax. @= while ((opt = getopt(argc, argv, "nu-:")) != -1) { switch (opt) { case 'u': /* \.{-u} Print how-to-call information */ case '?': usage(); return 0; case '-': /* \.{--} Extended options */ switch (optarg[0]) { case 'c': /* \.{--copyright} */ cout << "This program is in the public domain.\n"; return 0; case 'h': /* \.{--help} */ usage(); return 0; case 'v': /* \.{--version} */ cout << PRODUCT << " " << VERSION << "\n"; cout << "Last revised: " << REVDATE << "\n"; cout << "The latest version is always available\n"; cout << "at http://www.fourmilab.ch/eggtools/eggshell\n"; return 0; } } } @ Procedure |usage| prints how-to-call information. @= static void usage(void) { cout << PRODUCT << " -- Analyse eggsummary files. Call:\n"; cout << " " << PRODUCT << " [options] [infile] [outfile]\n"; cout << "\n"; cout << "Options:\n"; cout << " --copyright Print copyright information\n"; cout << " -u, --help Print this message\n"; cout << " --version Print version number\n"; cout << "\n"; cout << "by John Walker\n"; cout << "http://www.fourmilab.ch/\n"; } @ The following test the descriptive statistics facilities implemented by the |dataTable| template class. @= { int i; dataTable zot; dataTable::iterator z; #define PR for (z = zot.begin(); z != zot.end(); z++) { cout << *z << " "; } cout << "\n"; for (i = 0; i < 10; i++) { zot.push_back(rand() & 0x3F); } PR; sort(zot.begin(), zot.end()); PR; reverse(zot.begin(), zot.end()); PR; cout << "Mean = " << zot.mean() << "\n"; cout << "Geometric mean = " << zot.geometricMean() << "\n"; cout << "Harmonic mean = " << zot.harmonicMean() << "\n"; cout << "RMS = " << zot.RMS() << "\n"; cout << "Median = " << zot.median() << "\n"; cout << "Mode = " << zot.mode() << "\n"; cout << "Percentile(0.5) = " << zot.percentile(0.5) << "\n"; cout << "Quartile(1) = " << zot.quartile(1) << "\n"; cout << "Quartile(3) = " << zot.quartile(3) << "\n"; cout << "Variance = " << zot.variance() << "\n"; cout << "Standard deviation = " << zot.stdev() << "\n"; cout << "CentralMoment(3) = " << zot.centralMoment(3) << "\n"; cout << "Skewness = " << zot.skewness() << "\n"; cout << "Kurtosis = " << zot.kurtosis() << "\n"; } @ We need the following definitions to compile the test program. @= #include "config.h" // Our configuration /* C++ include files */ #include #include #include #include using namespace std; #include #include #include #ifdef HAVE_GETOPT #ifdef HAVE_UNISTD_H #include #endif #else #include "getopt.h" /* No system \.{getopt}---use our own */ #endif #include "statlib.h" // Class definitions for this package @** Index. The following is a cross-reference table for \.{statlib}. Single-character identifiers are not indexed, nor are reserved words. Underlined entries indicate where an identifier was declared.