An introduction to PRNGs and why they're bad for security.
You may have heard that Pseudo Random Number Generators (PRNGs) are an unsafe source of randomness for security related purposes, and if you're like me you may have struggled to understand how this could lead to a practical attack against applications that use PRNGs. This tutorial will explain why PRNGs are bad and demonstrate a practical attack against them.
What is a PRNG?
The first thing to understand about Pseudo Random Number Generators is that they're completely deterministic, this means that given the same starting conditions they will always produce the same string of "random" numbers. However they do appear random to the human eye, they're superficially unpredictable and they give an even distribution of numbers across a significantly large output.
PRNGs are seeded which means the random function is initialised with some initial value (the seed) and this decides what the random sequence will be, if you seed a PRNG with the same value you get the same series of random numbers, this series eventually repeats itself, the number of outputs before repetition occurs is called the period. The seed can sometimes be specified and in some languages the default seed will take information from the system such as the time or the system process ID.
PRNGs are typically used in applications where behaviour only needs to superficially appear random and it's not important that attackers are able to predict future values, typically they're built for speed and so can be used in real time simulation such as video games that need some degree of apparent randomness. This is very important because any function that is built around performing quickly can also be brute forced quickly.
It's worth noting at this stage that the seed values and the internal state of the PRNG can be different sizes, typically measured in bits. Because the initial state of the PRNG is decided by the seed the number of states that can be generated is limited by the total number of unique seeds. A PRNG with a large internal state of thousands of bits but with only a 32bit seed will only realise a small fraction of the total number of possible internal states.
It gets even worse when you consider that an initial seed may not come from a source that contains 32bits worth of unique information, something like a time stamp or a process ID is typically even smaller, thus reducing the number of possible states of the PRNG can produce to be much smaller than internally it's capable of generating.
How are random numbers used.
Typically random numbers are manipulated through some process to make the range of possible values fit the desired output, for example if you want a percentage output you'd normalize the random output into a range between 0 and 100, this is commonly done by restricting the range of the output of the PRNG to a variable of type float, between 0 and 1, then multiplying the float by whatever value you need to achieve the desired range, in the case of a percentage this would be x100, in the case of needing a random letter you'd multiply this by x25, this gives you 26 values between 0 and 25, you can then map each number to a letter to get plain text output derived from the numbers. Characters is often how the numbers are actually presented to the user in the form of some kind of security token, you'll find these being used for password reset tokens and CSRF tokens.
The basic theory behind attacks fall into two categories.
Firstly is that if you know the PRNG being used and you can determine the seed value then you're able to generate the entire string of random numbers, you can then compare some sample of random output from the application and determine where in the current series the application is and use that to generate the future random values. For practical attacks against PHP applications using these methods I'd highly recommend this talk from Blackhat 2012 by George Argyros and Aggelos Kiayias from the University of Athens.
In the cases where the seed is unknown you can use brute force attacks, you sample some output of the application you wish to attack, you run them backwards through the same maths that processed the random value into something useful, and you determine what the seed is. From there you follow the same process of generating all values for that seed and working out the current applications place in the series, again allowing you to predict future outputs. For practical attacks against many different types of PRNGs using this method I'd also recommend this talk from Blackhat 2013 Derek Soeder, Christopher Abad and Gabriel Acevedo of Cylance.com. You can also find their white paper here.
In the talk at Blackhat 2013 that I previously mentioned, a proof of concept tool was released called Prangster, you can download it here, it's open source and written in C#. They include instructions for compiling it on the website, it requires windows and Microsoft .NET 2.0, or can be built with Mono for Linux/Mac.
The attack methodology is quite straight forward.
1) Find an application that creates some kind random output, this could be something like a security token used to reset passwords or account credentials.
2) Collect samples, you need to distinguish between static parts of the output and the parts which are pseudo random, for example if you're analysing randomly generated passwords and each password contains some static component then ignore that. Try and determine a unique list of all characters used in the output, is it lower case alpha only, is it alpha-numeric, etc. This is needed later to build an alphabet.
3) Determine the type of PRNG being used, often this can be done by determining the platform which the application runs on, if it's ASP.NET for example then there's a good chance the Microsoft random() class is being used. You also need to guess how the PRNG maps the output in numbers to the characters you see in your output, if the application is open source or the source code is available you can determine this directly, otherwise it's best guess.
4) Use Prangster to analyse the samples you've recovered, determine the seed that created this string of random numbers, then use this seed to generate all the random numbers the application is using, and use that to predict future randomness.
Once compiled you can run Prangster from the command line, it has 3 basic modes you can use.
r - Recovers the seed that generate the input. It requires the PRNG type as a parameter, a string of output that you're sure was generated in that order, and an alphabet. The alphabet is the mapping of numbers to characters, for something simple like an output that only contains lower case alpha your alphabet would probably look something like this "abcdefghijklmnopqrstuvwxyz". If you're lucky this will return the seed value used to generate the random series.
g - Reproduces a series of outputs given a specific seed and a length of outputs, it takes parameters of the PRNG type, the alphabet, the seed value (learnt from -r) and the length of output you wish to generate.
s - Seeks a series given some initial seed and an offset, then returns the seed which represents the new state, it takes the parameters for the PRNG type, the seed value and the offset amount.
If you run Prangster without any parameters it will echo the usage to the screen.
Let's consider an example, let's say we're attacking an ASP.NET application which is generating unique password reset tokens and these tokens appear to use upper and lower case characters only, you might try the following commands in Prangster
<some collected output> | Prangster.exe r PrngDotNet abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
If this didn't get you a result, it's possible the alphabet is wrong, this is just best guess on how the developers mapped the random() output to readable characters, it could be reversed, for example
<some collected output> | Prangster.exe r PrngDotNet ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Some trial an error is likely to be required, especially if you do not have access to the source code, luckily Prangster runs quite quickly in most circumstances allowing you to try multiple alphabets quite quickly. If my explanation of Prangsters usage is hard to follow then I suggest watching them demo a real attack scenario in their Blackhat presentation starting at 32:40.
A final note of respect to the team at Cylance who put this together, this is an impressive tool which can predict randomness in many applications by simply analysing the output of those systems, in this respect it's completely black box analysis. Their white paper goes in to more depth about the optimisations they used to speed up brute force attacks which can be found here.
This section is an update to this post to elaborate on the mitigation for these attacks. The solution is simple and it's mainly through ignorance it's not used more often, there is already a class of RNGs that produce Cryptographically secure random numbers (CSRNGs), Linux comes with dev/random and dev/urandom which get their randomness straight from the Linux kernel. The kernel has access to various devices attached to the system through the device drivers, this allows collection of envionmental noise which becomes the source of random number generation, rather than a static list. In windows the equivalent is CryptGenRandom.