

PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 
java.lang.Object  +cern.jet.stat.quantile.QuantileFinderFactory
Factory constructing exact and approximate quantile finders for both known and unknown N.
Also see hep.aida.bin.QuantileBin1D
, demonstrating how this package can be used.
The approx. algorithms compute approximate quantiles of large data sequences in a single pass.
The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the data sequence.
The main memory requirements are smaller than for any other known technique by an order of magnitude.
The approx. algorithms are primarily intended to help applications scale. When faced with a large data sequences, traditional methods either need very large memories or time consuming disk based sorting. In constrast, the approx. algorithms can deal with > 10^10 values without disk based sorting.
All classes can be seen from various angles, for example as
Use methods newXXX(...) to get new instances of one of the following quantile finders.
1. Exact quantile finding algorithm for known and unknown N requiring large main memory.
The folkore algorithm: Keeps all elements in main memory, sorts the list, then picks the quantiles.2. Approximate quantile finding algorithm for known N requiring only one pass and little main memory.
Needs as input the following parameters:
It is also possible to couple the approximation algorithm with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter "delta".
After Gurmeet Singh Manku, Sridhar Rajagopalan and Bruce G. Lindsay, Approximate Medians and other Quantiles in One Pass and with Limited Memory, Proc. of the 1998 ACM SIGMOD Int. Conf. on Management of Data, Paper available here.
3. Approximate quantile finding algorithm for unknown N requiring only one pass and little main memory.
This algorithm requires at most two times the memory of a corresponding approx. quantile finder knowing N.Needs as input the following parameters:
It is also possible to couple the approximation algorithm with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter "delta".
After Gurmeet Singh Manku, Sridhar Rajagopalan and Bruce G. Lindsay, Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. Proc. of the 1999 ACM SIGMOD Int. Conf. on Management of Data, Paper available here.
Example usage:
_TODO_
KnownDoubleQuantileEstimator
,
UnknownDoubleQuantileEstimator
Constructor Summary  
protected 
QuantileFinderFactory()
Make this class non instantiable. 
Method Summary  
protected static long[] 
known_N_compute_B_and_K_quick(long N,
double epsilon)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with a guaranteed approximation error no more than epsilon. 
protected static long[] 
known_N_compute_B_and_K_slow(long N,
double epsilon,
double delta,
int quantiles,
double[] returnSamplingRate)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. 
static long[] 
known_N_compute_B_and_K(long N,
double epsilon,
double delta,
int quantiles,
double[] returnSamplingRate)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. 
static DoubleQuantileFinder 
newDoubleQuantileFinder(boolean known_N,
long N,
double epsilon,
double delta,
int quantiles,
RandomElement generator)
Returns a quantile finder that minimizes the amount of memory needed under the user provided constraints. 
static cern.colt.list.DoubleArrayList 
newEquiDepthPhis(int quantiles)
Convenience method that computes phi's for equidepth histograms. 
protected static long[] 
unknown_N_compute_B_and_K_raw(double epsilon,
double delta,
int quantiles)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. 
static long[] 
unknown_N_compute_B_and_K(double epsilon,
double delta,
int quantiles)
Computes the number of buffers and number of values per buffer such that quantiles can be determined with an approximation error no more than epsilon with a certain probability. 
Methods inherited from class java.lang.Object 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait 
Constructor Detail 
protected QuantileFinderFactory()
Method Detail 
public static long[] known_N_compute_B_and_K(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
N
 the number of values over which quantiles shall be computed (e.g 10^6).epsilon
 the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;delta
 the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.quantiles
 the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
protected static long[] known_N_compute_B_and_K_quick(long N, double epsilon)
N
 the anticipated number of values over which quantiles shall be determined.epsilon
 the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;
protected static long[] known_N_compute_B_and_K_slow(long N, double epsilon, double delta, int quantiles, double[] returnSamplingRate)
N
 the anticipated number of values over which quantiles shall be computed (e.g 10^6).epsilon
 the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;delta
 the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.quantiles
 the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
public static DoubleQuantileFinder newDoubleQuantileFinder(boolean known_N, long N, double epsilon, double delta, int quantiles, RandomElement generator)
known_N
 specifies whether the number of elements over which quantiles are to be computed is known or not.N
 if known_N==true, the number of elements over which quantiles are to be computed.
if known_N==false, the upper limit on the number of elements over which quantiles are to be computed.
If such an upper limit is apriori unknown, then set N = Long.MAX_VALUE.epsilon
 the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;delta
 the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To avoid probabilistic answers, set delta=0.0.quantiles
 the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.generator
 a uniform random number generator. Set this parameter to null to use a default generator.
public static cern.colt.list.DoubleArrayList newEquiDepthPhis(int quantiles)
public static long[] unknown_N_compute_B_and_K(double epsilon, double delta, int quantiles)
epsilon
 the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact results, set epsilon=0.0;delta
 the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To get exact results, set delta=0.0.quantiles
 the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.
protected static long[] unknown_N_compute_B_and_K_raw(double epsilon, double delta, int quantiles)
epsilon
 the approximation error which is guaranteed not to be exceeded (e.g. 0.001) (0 <= epsilon <= 1). To get exact result, set epsilon=0.0;delta
 the probability that the approximation error is more than than epsilon (e.g. 0.0001) (0 <= delta <= 1). To get exact results, set delta=0.0.quantiles
 the number of quantiles to be computed (e.g. 100) (quantiles >= 1). If unknown in advance, set this number large, e.g. quantiles >= 10000.


PREV CLASS NEXT CLASS  FRAMES NO FRAMES  
SUMMARY: NESTED  FIELD  CONSTR  METHOD  DETAIL: FIELD  CONSTR  METHOD 