 # Python's random or NumPy's random?

I’m working on a stochastic model which heavily relies on random sampling from a list of items. Normally I turn to NumPy for this with its `numpy.random.choice()` function. To my surprise, it turned out that `random.sample()` (and `random.choice()`) in Python’s standard library is quite a bit faster:

``````import numpy as np

x = np.arange(1000)
%timeit np.random.choice(x, replace=False)
``````

gives me 16.2 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each), while:

``````import random

x = list(range(1000))
%timeit random.sample(x, 1)
``````

gives me 2.16 µs ± 32.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each).

Anyone an idea of what might be going on?

1 Like

Ah, it leaves a thing when I delete my own message. I thought I couldn’t reproduce the problem but I had the wrong boolean for “replace”, so nevermind.

3 Likes

Interesting observation, thanks!

random.sample() uses 2 different approaches depending on the sample size. From random.py:

``````    # When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection.  For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
``````

This seems to work really well for small sample sizes. If you use larger samples, then the difference in time starts to disappear and for k = 200 I get:
numpy:

37.1 µs ± 942 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

random:

115 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

If random.sample is too fast for you , you can always use random.choice() which is, for some unknown reason, 20 times slower .

2 Likes

I can confirm that with higher k values, NumPy catches up! Interesting. I should spend more time reading the docs 1 Like