A class to perform basic 1D statistics. Each method is static.
#include <libinterpol1D.h>
Static Public Member Functions | |
static double | min_element (const std::vector< double > &X) |
static double | max_element (const std::vector< double > &X) |
static std::vector< double > | quantiles (const std::vector< double > &X, const std::vector< double > &quartiles) |
This function returns a vector of quantiles. The vector does not have to be sorted. See https://secure.wikimedia.org/wikipedia/en/wiki/Quartile for more. This code is heavily inspired by Ken Wilder, https://sites.google.com/site/jivsoft/Home/compute-ranks-of-elements-in-a-c—array-or-vector (quantile method, replacing the nth-element call by direct access to a sorted vector). | |
static std::vector< double > | quantiles_core (std::vector< double > X, const std::vector< double > &quartiles) |
This function returns a vector of quantiles, but does not filter out nodata values! The vector does not have to be sorted. See https://secure.wikimedia.org/wikipedia/en/wiki/Quartile for more. This code is heavily inspired by Ken Wilder, https://sites.google.com/site/jivsoft/Home/compute-ranks-of-elements-in-a-c—array-or-vector (quantile method, replacing the nth-element call by direct access to a sorted vector). | |
static std::vector< double > | derivative (const std::vector< double > &X, const std::vector< double > &Y) |
This function returns the vector of local derivatives, given a vector of abscissae and ordinates. The vectors must be sorted by ascending x. The derivatives will be centered if possible, left or right otherwise or nodata if nothing else can be computed. | |
static void | sort (std::vector< double > &X, std::vector< double > &Y, const bool &keep_nodata=true) |
This function sorts the X and Y vectors by increasing X. The nodata values (both in X and Y) are removed, meaning that the vector length might not be kept. | |
static void | equalBin (const unsigned int k, std::vector< double > &X, std::vector< double > &Y) |
data binning method This bins the data into k classes of equal width (see https://en.wikipedia.org/wiki/Data_binning) | |
static void | equalCountBin (const unsigned int k, std::vector< double > &X, std::vector< double > &Y) |
data binning method This bins the data into k classes of equal number of elements (see https://en.wikipedia.org/wiki/Data_binning). The number of elements per classes is adjusted in order to reduce unevenness between casses: for example when distributing 100 elements in 8 classes, this will generate 4 classes of 13 elements and 4 classes of 12 elements. | |
static double | weightedMean (const double &d1, const double &d2, const double &weight=1.) |
This function returns the weighted arithmetic mean of two numbers. A weight of 0 returns d1, a weight of 1 returns d2, a weight of 0.5 returns a centered mean. See https://secure.wikimedia.org/wikipedia/en/wiki/Weighted_mean for more... | |
static double | weightedMean (const std::vector< double > &vecData, const std::vector< double > &weight) |
This function returns the weighted arithmetic mean of a vector. See https://secure.wikimedia.org/wikipedia/en/wiki/Weighted_mean for more... | |
static double | arithmeticMean (const std::vector< double > &vecData) |
static double | getMedian (const std::vector< double > &vecData, const bool &keep_nodata=true) |
static double | getMedianAverageDeviation (std::vector< double > vecData, const bool &keep_nodata=true) |
static double | variance (const std::vector< double > &X) |
Compute the variance of a vector of data It is computed using a compensated variance algorithm, (see https://secure.wikimedia.org/wikipedia/en/wiki/Algorithms_for_calculating_variance) in order to be more robust to small variations around the mean. | |
static double | std_dev (const std::vector< double > &X) |
static double | covariance (const std::vector< double > &z1, const std::vector< double > &z2) |
static double | corr (const std::vector< double > &z1, const std::vector< double > &z2) |
Computes the Pearson product-moment correlation coefficient This should be equivalent to the default R "corr" method. | |
static double | Pearson (const std::vector< double > &X, const std::vector< double > &Y) |
Computes the Pearson product-moment correlation coefficient in a more numerically efficient manner than "corr". | |
static double | R2 (const std::vector< double > &obs, const std::vector< double > &sim) |
Computes the R2 coefficient of determination See https://en.wikipedia.org/wiki/Coefficient_of_determination and https://en.wikipedia.org/wiki/Fraction_of_variance_unexplained. | |
static double | NashSuttcliffe (const std::vector< double > &obs, const std::vector< double > &sim) |
Computes the Nash-Suttcliffe correlation coefficient for two vectors It is assumed that the same indices contain the same timesteps. A value of 1 means a perfect match, a value of zero that no variance is reproduced (see https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient) | |
static double | getBoxMuller () |
Box–Muller method for normally distributed random numbers. | |
static void | LinRegression (const std::vector< double > &X, const std::vector< double > &Y, double &a, double &b, double &r, std::string &mesg, const bool &fixed_rate=false) |
Computes the linear regression coefficients fitting the points given as X and Y in two vectors the linear regression has the form Y = aX + b with a regression coefficient r (it is nodata safe) | |
static void | NoisyLinRegression (const std::vector< double > &in_X, const std::vector< double > &in_Y, double &A, double &B, double &R, std::string &mesg, const bool &fixed_rate=false) |
Computes the linear regression coefficients fitting the points given as X and Y in two vectors the linear regression has the form Y = aX + b with a regression coefficient r. If the regression coefficient is not good enough, tries to remove bad points (up to 15% of the initial data set can be removed, keeping at least 4 points) | |
static void | twoLinRegression (const std::vector< double > &in_X, const std::vector< double > &in_Y, const double &bilin_inflection, std::vector< double > &coeffs) |
Computes the bi-linear regression coefficients fitting the points given as X and Y in two vectors We consider that the regression can be made with 2 linear segments with a fixed inflection point. It relies on Interpol1D::NoisyLinRegression. | |
static void | LogRegression (const std::vector< double > &X, const std::vector< double > &Y, double &a, double &b, double &r, std::string &mesg) |
Computes the Log regression coefficients fitting the points given as X and Y in two vectors the log regression has the form Y = a*ln(X) + b with a regression coefficient r (it is nodata safe) | |
static void | ExpRegression (const std::vector< double > &X, const std::vector< double > &Y, double &a, double &b, double &r, std::string &mesg) |
Computes the power regression coefficients fitting the points given as X and Y in two vectors the power regression has the form Y = b*X^a with a regression coefficient r (it is nodata safe) | |
|
static |
|
static |
Computes the Pearson product-moment correlation coefficient This should be equivalent to the default R "corr" method.
X | first vector of data |
Y | second vector of data |
|
static |
|
static |
This function returns the vector of local derivatives, given a vector of abscissae and ordinates. The vectors must be sorted by ascending x. The derivatives will be centered if possible, left or right otherwise or nodata if nothing else can be computed.
X | vector of abscissae |
Y | vector of ordinates |
|
static |
data binning method This bins the data into k classes of equal width (see https://en.wikipedia.org/wiki/Data_binning)
k | number of classes |
X | vector of abscissae |
Y | vector of ordinates |
|
static |
data binning method This bins the data into k classes of equal number of elements (see https://en.wikipedia.org/wiki/Data_binning). The number of elements per classes is adjusted in order to reduce unevenness between casses: for example when distributing 100 elements in 8 classes, this will generate 4 classes of 13 elements and 4 classes of 12 elements.
k | number of classes |
X | vector of abscissae |
Y | vector of ordinates |
|
static |
Computes the power regression coefficients fitting the points given as X and Y in two vectors the power regression has the form Y = b*X^a with a regression coefficient r (it is nodata safe)
X | vector of X coordinates |
Y | vector of Y coordinates (same order as X) |
a | slope of the regression |
b | origin of the regression |
r | regression coefficient |
mesg | information message if something fishy is detected |
|
static |
Box–Muller method for normally distributed random numbers.
This generate a normally distributed signal of mean=0 and std_dev=1. For numerical reasons, the extremes will always be less than 7 * std_dev from the mean. See https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform
|
static |
|
static |
|
static |
Computes the linear regression coefficients fitting the points given as X and Y in two vectors the linear regression has the form Y = aX + b with a regression coefficient r (it is nodata safe)
X | vector of X coordinates |
Y | vector of Y coordinates (same order as X) |
a | slope of the linear regression |
b | origin of the linear regression |
r | absolute value of linear regression coefficient |
mesg | information message if something fishy is detected |
fixed_rate | force the lapse rate? (default=false) |
|
static |
Computes the Log regression coefficients fitting the points given as X and Y in two vectors the log regression has the form Y = a*ln(X) + b with a regression coefficient r (it is nodata safe)
X | vector of X coordinates |
Y | vector of Y coordinates (same order as X) |
a | slope of the regression |
b | origin of the regression |
r | regression coefficient |
mesg | information message if something fishy is detected |
|
static |
|
static |
|
static |
Computes the Nash-Suttcliffe correlation coefficient for two vectors It is assumed that the same indices contain the same timesteps. A value of 1 means a perfect match, a value of zero that no variance is reproduced (see https://en.wikipedia.org/wiki/Nash%E2%80%93Sutcliffe_model_efficiency_coefficient)
obs | vector of observed data |
sim | vector of simulated data |
|
static |
Computes the linear regression coefficients fitting the points given as X and Y in two vectors the linear regression has the form Y = aX + b with a regression coefficient r. If the regression coefficient is not good enough, tries to remove bad points (up to 15% of the initial data set can be removed, keeping at least 4 points)
in_X | vector of X coordinates |
in_Y | vector of Y coordinates (same order as X) |
A | slope of the linear regression |
B | origin of the linear regression |
R | linear regression coefficient |
mesg | information message if something fishy is detected |
fixed_rate | force the lapse rate? (default=false) |
|
static |
Computes the Pearson product-moment correlation coefficient in a more numerically efficient manner than "corr".
X | first vector of data |
Y | second vector of data |
|
static |
This function returns a vector of quantiles. The vector does not have to be sorted. See https://secure.wikimedia.org/wikipedia/en/wiki/Quartile for more. This code is heavily inspired by Ken Wilder, https://sites.google.com/site/jivsoft/Home/compute-ranks-of-elements-in-a-c—array-or-vector (quantile method, replacing the nth-element call by direct access to a sorted vector).
X | vector to classify |
quartiles | vector of quartiles, between 0 and 1 |
|
static |
This function returns a vector of quantiles, but does not filter out nodata values! The vector does not have to be sorted. See https://secure.wikimedia.org/wikipedia/en/wiki/Quartile for more. This code is heavily inspired by Ken Wilder, https://sites.google.com/site/jivsoft/Home/compute-ranks-of-elements-in-a-c—array-or-vector (quantile method, replacing the nth-element call by direct access to a sorted vector).
X | vector to classify (nodata values processed as normal values) |
quartiles | vector of quartiles, between 0 and 1 |
|
static |
Computes the R2 coefficient of determination See https://en.wikipedia.org/wiki/Coefficient_of_determination and https://en.wikipedia.org/wiki/Fraction_of_variance_unexplained.
obs | vector of observed data |
sim | vector of simulated data |
|
static |
This function sorts the X and Y vectors by increasing X. The nodata values (both in X and Y) are removed, meaning that the vector length might not be kept.
X | vector of abscissae |
Y | vector of ordinates |
keep_nodata | should nodata values be kept? (default=true) |
|
static |
|
static |
Computes the bi-linear regression coefficients fitting the points given as X and Y in two vectors We consider that the regression can be made with 2 linear segments with a fixed inflection point. It relies on Interpol1D::NoisyLinRegression.
in_X | vector of X coordinates |
in_Y | vector of Y coordinates (same order as X) |
bilin_inflection | inflection point absissa |
coeffs | a,b,r coefficients in a vector |
|
static |
Compute the variance of a vector of data It is computed using a compensated variance algorithm, (see https://secure.wikimedia.org/wikipedia/en/wiki/Algorithms_for_calculating_variance) in order to be more robust to small variations around the mean.
X | vector of data |
|
static |
This function returns the weighted arithmetic mean of two numbers. A weight of 0 returns d1, a weight of 1 returns d2, a weight of 0.5 returns a centered mean. See https://secure.wikimedia.org/wikipedia/en/wiki/Weighted_mean for more...
d1 | first value |
d2 | second value |
weight | weight to apply to the mean |
|
static |
This function returns the weighted arithmetic mean of a vector. See https://secure.wikimedia.org/wikipedia/en/wiki/Weighted_mean for more...
vecData | vector of values |
weight | weights to apply to the mean |