scipy/doc/source/tutorial/stats/sampling_dau.rst

133 lines
4.6 KiB
ReStructuredText

.. _sampling-dau:
Discrete Alias Urn (DAU)
========================
.. currentmodule:: scipy.stats.sampling
* Required: probability vector (PV) or the PMF along with a finite domain
* Speed:
* Set-up: slow (linear with the vector-length)
* Sampling: very fast
DAU samples from distributions with arbitrary but finite probability vectors
(PV) of length N. The algorithm is based on an ingenious method by A.J.
Walker and requires a table of size (at least) N. It needs one random number
and only one comparison for each generated random variate. The setup time for
constructing the tables is O(N).
>>> import numpy as np
>>> from scipy.stats.sampling import DiscreteAliasUrn
>>>
>>> pv = [0.18, 0.02, 0.8]
>>> urng = np.random.default_rng()
>>> rng = DiscreteAliasUrn(pv, random_state=urng)
>>> rng.rvs()
0 # may vary
By default, the probability vector is indexed starting at 0. However, this
can be changed by passing a ``domain`` parameter. When ``domain`` is given
in combination with the PV, it has the effect of relocating the
distribution from ``(0, len(pv))`` to ``(domain[0]``, ``domain[0] + len(pv))``.
``domain[1]`` is ignored in this case.
>>> rng = DiscreteAliasUrn(pv, domain=(10, 13), random_state=urng)
>>> rng.rvs()
12 # may vary
The method also works when no probability vector but a PMF is given.
In that case, a bounded (finite) domain must also be given either by
passing the ``domain`` parameter explicitly or by providing a ``support``
method in the distribution object:
>>> class Distribution:
... def __init__(self, c):
... self.c = c
... def pmf(self, x):
... return x**self.c
... def support(self):
... return (0, 10)
...
>>> dist = Distribution(2)
>>> rng = DiscreteAliasUrn(dist, random_state=urng)
>>> rng.rvs()
10 # may vary
.. plot::
:alt: " "
>>> import matplotlib.pyplot as plt
>>> from scipy.stats.sampling import DiscreteAliasUrn
>>> class Distribution:
... def __init__(self, c):
... self.c = c
... def pmf(self, x):
... return x**self.c
... def support(self):
... return (0, 10)
...
>>> dist = Distribution(2)
>>> urng = np.random.default_rng()
>>> rng = DiscreteAliasUrn(dist, random_state=urng)
>>> rvs = rng.rvs(1000)
>>> fig = plt.figure()
>>> ax = fig.add_subplot(111)
>>> x = np.arange(1, 11)
>>> fx = dist.pmf(x)
>>> fx = fx / fx.sum()
>>> ax.plot(x, fx, 'bo', label='true distribution')
>>> ax.vlines(x, 0, fx, lw=2)
>>> ax.hist(rvs, bins=np.r_[x, 11]-0.5, density=True, alpha=0.5, color='r',
... label='samples')
>>> ax.set_xlabel('x')
>>> ax.set_ylabel('PMF(x)')
>>> ax.set_title('Discrete Alias Urn Samples')
>>> plt.legend()
>>> plt.show()
.. note:: As :class:`~DiscreteAliasUrn` expects PMF with signature
``def pmf(self, x: float) -> float``, it first vectorizes the
PMF using ``np.vectorize`` and then evaluates it over all the
points in the domain. But if the PMF is already vectorized,
it is much faster to just evaluate it at each point in the domain
and pass the obtained PV instead along with the domain.
For example, ``pmf`` methods of SciPy's discrete distributions
are vectorized and a PV can be obtained by doing:
>>> from scipy.stats import binom
>>> from scipy.stats.sampling import DiscreteAliasUrn
>>> dist = binom(10, 0.2) # distribution object
>>> domain = dist.support() # the domain of your distribution
>>> x = np.arange(domain[0], domain[1] + 1)
>>> pv = dist.pmf(x)
>>> rng = DiscreteAliasUrn(pv, domain=domain)
Domain is required here to relocate the distribution.
The performance can be slightly influenced by setting the size of the used
table which can be changed by passing a ``urn_factor`` parameter.
>>> # use a table twice the length of PV.
>>> urn_factor = 2
>>> rng = DiscreteAliasUrn(pv, urn_factor=urn_factor, random_state=urng)
>>> rng.rvs()
2 # may vary
.. note:: It is recommended to keep this parameter under 2.
Please see [1]_ and [2]_ for more details on this method.
References
----------
.. [1] UNU.RAN reference manual, Section 5.8.2,
"DAU - (Discrete) Alias-Urn method",
http://statmath.wu.ac.at/software/unuran/doc/unuran.html#DAU
.. [2] A.J. Walker (1977). An efficient method for generating discrete
random variables with general distributions, ACM Trans. Math.
Software 3, pp. 253-256.