public class MultivariateOnlineSummarizer extends Object implements MultivariateStatisticalSummary, scala.Serializable
MultivariateStatisticalSummary to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for samples in sparse or dense vector
format in a online fashion.
Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.
A numerically stable algorithm is implemented to compute sample mean and variance:
Reference: variance-wiki
Zero elements (including explicit zero values) are skipped when calling add(),
to have time complexity O(nnz) instead of O(n) for each column.
| Constructor and Description |
|---|
MultivariateOnlineSummarizer() |
| Modifier and Type | Method and Description |
|---|---|
MultivariateOnlineSummarizer |
add(Vector sample)
Add a new sample to this summarizer, and update the statistical summary.
|
long |
count()
Sample size.
|
Vector |
max()
Maximum value of each column.
|
Vector |
mean()
Sample mean vector.
|
MultivariateOnlineSummarizer |
merge(MultivariateOnlineSummarizer other)
Merge another MultivariateOnlineSummarizer, and update the statistical summary.
|
Vector |
min()
Minimum value of each column.
|
Vector |
normL1()
L1 norm of each column
|
Vector |
normL2()
Euclidean magnitude of each column
|
Vector |
numNonzeros()
Number of nonzero elements (including explicitly presented zero values) in each column.
|
Vector |
variance()
Sample variance vector.
|
public MultivariateOnlineSummarizer add(Vector sample)
sample - The sample in dense/sparse vector format to be added into this summarizer.public MultivariateOnlineSummarizer merge(MultivariateOnlineSummarizer other)
this object will be modified.)
other - The other MultivariateOnlineSummarizer to be merged.public Vector mean()
MultivariateStatisticalSummarymean in interface MultivariateStatisticalSummarypublic Vector variance()
MultivariateStatisticalSummaryvariance in interface MultivariateStatisticalSummarypublic long count()
MultivariateStatisticalSummarycount in interface MultivariateStatisticalSummarypublic Vector numNonzeros()
MultivariateStatisticalSummarynumNonzeros in interface MultivariateStatisticalSummarypublic Vector max()
MultivariateStatisticalSummarymax in interface MultivariateStatisticalSummarypublic Vector min()
MultivariateStatisticalSummarymin in interface MultivariateStatisticalSummarypublic Vector normL2()
MultivariateStatisticalSummarynormL2 in interface MultivariateStatisticalSummarypublic Vector normL1()
MultivariateStatisticalSummarynormL1 in interface MultivariateStatisticalSummary