Skip to content

A Crystal shard to perform descriptive statistics and sampling on popular distributions

License

Notifications You must be signed in to change notification settings

lbarasti/statistics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub release Build Status License Docs

statistics

A statistical library to perform descriptive statistics and generate random values based on popular probability distributions.

Installation

Add the dependency to your shard.yml:

dependencies:
  statistics:
    github: lbarasti/statistics

Run shards install

Usage

require "statistics"

Descriptive statistics

You can compute mean, variance and standard deviation of a collection as follows.

include Statistics

x = [1, 10, 7]
mean(x) # 6
var(x) # 14
std(x) # 3.7416...

If you'd like to know a bit more about your dataset, you can simply describe it

x = (1..1000).map { rand }.to_a # a uniformely distributed dataset
describe(x)
# {
#   mean: 0.48, var: 0.08, std: 0.28, 
#   skewness: 0.04, kurtosis: 1.81, 
#   min: 0.01, middle: 0.49, max: 0.99, 
#   q1: 0.24, median: 0.49, q3: 0.73
# }

Statistics.describe returns a NamedTuple, so you can extract any value via indexing:

stats = describe(x)
stats[:q1] # returns the first quartile of your sample

For a complete list of the statistical functions provided, including quantile, moment and skew, check out the docs.

Sampling

To work with distributions, import the Distributions namespace as follows.

include Statistics::Distributions

Now, here is how we sample values from a normal distribution with mean = 1.5 and std = 0.2.

Normal.new(1.5, 0.2).rand

We can generate an iterable of normally distributed random values as follows.

gen = Normal.new(1.5, 0.2)
1000.times.map { gen.rand }

Supported distributions

The following distributions are supported:

  • Constant
  • Exponential
  • Normal
  • Poisson
  • Uniform

Don't see your favourite one on the list? Just fork the repo, add your distribution to the distributions.cr file, and open a PR.

Development

This shard is a work in progress. Everyone's contribution is welcome.

The guiding principle at this stage is

make it work before you make it right

Which in this context means: let's not focus on benchmarks and performance, but rather on usability and correctness.

References

Contributing

  1. Fork it (https://github.com/lbarasti/statistics/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors