Przejdź do głównej zawartości

C++11 random - introduction

C++11 introduced robust random numbers library, <random>. Its author does great introduction in the video from Cppcon:
I will attempt to create short tutorial/HOWTO for this library.

Bad, old times

Before C++11, when you wanted to get random numbers in range {1, 2, ... 10}, you wrote something like this:
#include <stdio.h>      /* printf, NULL */
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
int main ()
{
    srand (time(NULL));
    int num1 = rand()%10 + 1;
    int num2 = rand()%10 + 1;
    return 0;
}
Several things are wrong here:
  • It always gives uniform distribution (1 is as probable as 5). This is usually fine, but when not, you need a lot of additional work.
  • Actually, it is not very uniform.
  • If you want for some reason to reseed the generator (call srand(time(NULL)) again) shortly after first seeding, you may get repeated answers
  • You only get discreet numbers (integers)
  • Random algorithm is not specified, so results are not portable/reproducible
  • %10 + 1 is not very pretty
  • a lot of thread safety problems

(Not) uniform distribution

Function rand() gives random number in range [0; RAND_MAX], where RAND_MAX is at least 32767 (my machine: 2147483647 = 0x7fffffff = max signed int). So this causes two problems
  1. If you want wide range of random numbers (more than 32767), it is not guaranteed that you will get them
  2. If your range is not power of 2 (10 isn't), your distribution will be skewed toward lower numbers.
What causes this skew? If RAND_MAX was 32700, then operation rand()%10 wold work fine. But for 32767, numbers {0, 1, ... 7} are more probable than {8, 9}.

API limitations

Generating random integer with uniform distribution is quite often needed, but sometimes it is not sufficient.
Perhaps you want random double, not integer?
Or maybe you don't really want every number to have same probability? This is actually often needed.
For example you may want to generate random value for human height. Wikipedia claims that tallest man on earth was Robert Waldow, with height of 272cm (8 ft 11.1 in). Peter Dinklage (Tyrion Lannister) is 135 cm (4 ft 5 in) tall. So random height should be
auto height = 135 + rand() % (272-135);
Only now, it is equally possible to generate number in range 170-190 (normal height) as in range 250-270 (extreme height). For our random variable to match real world, it should favor values around 176 cm (average male height in U.S.).
This is what Normal Distribution is about. Statistics show, that U.S. males have height avarage 175.9 cm with standard deviation 7.4 cm (https://tall.life/height-percentile-calculator-age-country/). This means that majority (68.2%) of randomly returned values should be within range 168,5-183,3.
It is possible to achieve such distribution with rand(), but is quite complicated mathematical procedure.

Modern world of C++11

Since C++11 we are encouraged to use <random> library. This is how to do it in good and simple way:
#include <random>   // only 1 include
int random10() {
    // good, expensive randomness for seed
    static std::random_device rd;
    // cheaper source of randomnes, still good; seeded
    static std::mt19937 e(rd());
    // what numbers we want
    static std::uniform_int_distribution<int> d {1, 10};
    // combine randomness and distribution
    return d(e);                  
}
We could replace both rd and e with single static  default_random_engine, but we would gain little simplicy and lose a lot of quality.
Alternatively you could seed engine with time data, either by time() or even
unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
but again - this is not actually better. First, bad guys atacking your program can guess your seed and calculate all random numbers and second, if you reseed multiple times (reseed = srand()), many seeds may end up with the same value if your clock has wide time unit (time(NULL) has precision of 1 second, so it is actually easy to trigger this with multiple threads).

Engine

First we must create source of randomness, called engine. This can be achieved either by
  • hardware device that provides true randomness, or by
  • pseudo-random generator, that keeps in memory its state and is fully deterministic.
So our options are (among others):
  • std::random_device - truly random device (if available on computer; pseudo-random if not)
    // uses RDRND (Intel hardware RNG) or /dev/urandom
    std::random_device rd1a; 
    std::random_device rd1b("/dev/urandom");
    // much slower on Linux; rd2 may block 
    // if system runs out of entropy (randomness)
    std::random_device rd2("/dev/random");
    // how much entropy we _currently_ have.
    // Not always implemented
    cout << "entropy == " << rd2.entropy() << endl;
    
  • std::linear_congruential_engine (minstd_rand) or std::subtract_with_carry_engine (ranlux24_base) - light and fast pseudo random generator of reasonable quality
    // good seed, lightweight generator
    std::minstd_rand e1 { rd2() }; 
    // not bad, nanoseconds precis
    auto seed_ok = std::chrono::system_clock::now()
                   .time_since_epoch().count();
    std::minstd_rand e2 { seed_ok }; ion
    // not good; seconds precision
    auto seed_bad = time(nullptr); 
    std::minstd_rand e3 { seed_bad }; 
    //for deterministic simulations
    const auto seed_simulation = 4; 
    std::minstd_rand e4 { seed_simulation };
    
  • std::mersenne_twister_engine (mt19937) - good quality pseudo-random generator that is not very light
    //mersenne_twister is heavy; 
    //if you care enough to use it, use it properly
    std::mt19937 e5 { rd2() };
    // also fine if you cannot risk delay 
    // due to lack of entropy
    std::mt19937 e6 { rd1() };
    
Where name was given in brackets, it is template specialization that we should actually use (as opposed to filling numerous template arguments). Also most of the engines have version returning 32 or 64 bit values, but using smaller one does NOT mean you won't get random number greater than 0xFFFF'FFFF - that depends on distribution, not engine. Engines generate single random bits. The fact, that bits come in packs of 32 or 64 is only optimization detail that is dependent on compiler and platform.

Distribution

After selecting engine, we need to decide what distribution fits our needs. I expect that most frequently used ones will be:
  • uniform_int_distribution - e.g. uniform_int_distribution<int> dis(1, 10) will generate random int of value {1, 2, ... 10}
  • // int by default
    std::uniform_int_distribution<> die {1, 6};
    std::uniform_int_distribution<int> coin {0, 1};
    std::uniform_int_distribution<int> studentId 
        {MIN_ID, MAX_ID};
    
  • uniform_real_distribution - e.g. uniform_real_distribution<double> dis(0.0, 100.0) will generate double number in range [0; 100) (0 inclusive, 100 exclusive)
  • std::uniform_real_distribution<float> degree {0, 360};
    
  • normal_distribution - for human height it could be normal_distribution<double> dis(175.9, 7.4), where arguments are of course mean value and standard deviation
  • // double by default
    std::normal_distribution<> d(avg, std); 
    std::normal_distribution<float> height(175.9, 7.4);
    std::normal_distribution<> studentScore(50, 15);
    
  • bernoulli_distribution - bernoulli_distribution dis(0.6) generates boolean true 60% of time, so can be used to simulate unfair coin toss
  • // bad coin gives bool true 90% of time
    std::bernoulli_distribution badCoin(0.9);
    
  • discrete_distribution - generates random integer in range [0; n) (n is number of possible answers) where probability of every integer is given as double. E.g. to pick random element of vector<Obj> vec, where element 1 is twice as likely as any of other 3 arguments, use discrete_distribution dis({2, 1, 1, 1}). Weights (probabilities of elements) don't have to sum to 100%, unless you want them to actually represent probabilities.
  • // int by default
    // random month (0-11), but probability
    // according to # of days
    std::discrete_distribution<> month 
        {31, 28.25, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};
    // sum of 2 dice minus 1; 
    // return 4 or 6 is 5 times more likely 
    // than return 0 or 12
    std::discrete_distribution<> doubleDice 
        {1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1};
    // we can use percentage values
    std::discreet_distribution<> percentageDistr
        { 60.0, 30.0, 10.0}; 
  • others are also useful, but I don't feel confident enough to give sensible examples for them. List of available distributions is here: http://en.cppreference.com/w/cpp/concept/RandomNumberDistribution
Every distribution gives either real number (float/double) or integers (int, long, long long, short). Also every distribution has default value type, either int or double that can be optionally changed with explicit template parameter.

Other skills

Shuffle

One feature of C++/STL that should be mentioned is std::random_shuffle and std::shuffle functions. They are actually located in <algorithm> and not in <random>. Both functions take a random access container (as iterators) and randomly rearrange values (obviously). They differ in how they handle (optional) source of randomness:
  • std::random_shuffle takes unary function
    std::random_shuffle ( foo.begin(), foo.end(), [](int i) {return rand()%i;});
  • std::shuffle takes generator from <random>
    shuffle (foo.begin(), foo.end(), std::default_random_engine(seed));
Shuffle can be used for example to pick several numbers randomly out of a given set without repetitions (like in lottery):
#include <algorithm>
#include <iostream>
#include <list>
#include <numeric>
#include <random>
#include <vector>
using namespace std;
 
void lottery() {
    // generate numbers 1..56 using std::iota
    vector<int> numbers(56);
    iota(numbers.begin(), numbers.end(), 1);
    // randomize their order
    shuffle(numbers.begin(), numbers.end(), 
            mt19937{random_device{}()});
    // pick 5 of them
    numbers.erase(numbers.begin()+5, numbers.end());
    
    for (auto i : numbers) 
        cout << i << " ";
    cout <<endl;
}

Deterministic results

Often when doing simulations, on one hand we need randomness for simulation to be realistic, on the other we need to be able to reproduce experiment with exactly same results. Simple solution is to hard-code seed value. This approach works and any engine (except of std::random_device) can take integer seed in constructor.
What changed in C++11 is that implementations of generators are standardized and not implementation dependent as was with rand(). It means, that our simulation is guaranteed to behave the same on any platform we deploy to. This may seem as obvious feature, but it was not simple to achieve before - 2 most popular compilers, GCC and VisualStudio had different pseudo random functions and therefore produced different simulation results.
cout << "old rand(): ";
// for me prints 83 86 77 15 93, 
// but may vary depending on compiler
srand(0);
for (int i=0; i<5; ++i) 
    cout << rand()%100 <<" ";
cout << "new std::random: ";
// always prints 54 59 71 84 60 
static mt19937 e {0);
static uniform_int_distribution<> d {0, 99};
for (int i=0; i<5; ++i) 
    cout << d(e) << " ";

Thread safety

Classes in <random> are NOT thread safe. However, writing thread-safe code is very easy. All we need to do, is make sure that no two threads are using the same random engine at the same time. Easiest way to achieve it, is to make the engine thread local.
#include <random>   // same include
/** this function may be used by different threads safely */
int random10() {
    // engines are expensive and not thread safe 
    // so  keep 1 engine for every thread that wants it
    thread_local std::mt19937 e{std::random_device{}()};
    // distributions are cheap, don't need to keep them
    // compiler will inline it anyway
    std::uniform_int_distribution<int> d {1, 10};
    // combine randomness and distribution
    return d(e);                  
}
If you don't know thread_local keyword, think of it as of static that is created separately for every thread that uses it, and is destroyed when thread finishes. Because every thread has its own engine, they will never interfere with one another.

Conclusion

I hope I persuaded you that new <random> library is great, and its srand()/rand() forbears are better left alone :-)

Komentarze

Popularne posty z tego bloga

GDB - beyond basics

GDB is console debugger that every Linux-using programmer heard about. It is not however easy to learn. Greg Law in his CppCon talk presented some of obscure, but useful features.

Text user interface Normally we use GDB with command line interface (CI). Beyond this, GDB has TUI based on Curses library. To activate it, use keyboard shortcut ctrl-x-a (hold ctrl, press x, unpress x, press a). Now you can see the code as you go through it.
ctrl-x-a - activate/deactivate TUIctrl-l - when screen gets messed up, use it to redraw. Happens when program prints to stdout/stderrctrl-p / ctrl-n - since you can't use arrows to reuse previously written command, use ctr-p/n instead of arrow up/down
ctrl-f / ctrl-n are arrows left / right
ctrl-a / ctrl-e are home / end (all those are copied from Emacs)ctrl-x-2 - second window (assembly). Shell You can run shell commands inside GDB command line. Just use keyword "shell" at the beginning. Examples:
shell psshell cat temporary_file.txtshell k…

<chrono> introduction

This article will summarize presentation by Howard Hinnant which he gave during CppCon 2016. Howard gives comprehensive walk through c++11 chrono library.



TL;DR (Summary) C++11 introduced <chrono> library for handling time. Use it instead of <time.h> / <ctime> because it gives you better type safety and precision.
Library provides those concepts:
time points - moments in time (as in "noon, December 1st 2000", "2016-01-31 14:15:32")time durations - "length" (amount) of time that passes between two time points ("two hours", "2563 milliseconds")clocks - providers of time points (like wall clock, stopper) Query clock to get time points; calculate difference between two time points to get time duration. You can use time points/duration with <thread> library for sleep, wait_for/wait_until functions.
Use auto, because otherwise type names are quite verbose.
Time Durationstd::time_duration is a measure of how much time…