white spy  PPP Logo

GRC's Open, Ultra-High Security,
One Time Password System
  black spy


Character Extraction Statistical Analysis

During the development of the variable character set version 3 of PPP, an astute GRC newsgroup participant, Paul Wagenaars, noted that even given a perfectly random source, it was fundamentally impossible to obtain an exactly even distribution of resulting characters when the size of the character set was not an even multiple of the number of possible random values.

Perhaps the most intuitive way of visualizing this is as a "partitioning" of all possible random values into the number of required character "bins". Imagine that our "source of randomness" returns one of sixteen values randomly, which we wish to reduce into four evenly distributed values. That "partitioning" can be depicted with this diagram:

partition1

The critical property to note above is that all four "partitions" contain the same number of random values. So if our random source chooses one of sixteen values randomly, they will fall into one of four partitions, each with equal likelihood.

But we get into trouble if, for example, we wanted a sixteen-value random source of yield one of five values. We can use the partitioning concept and diagram to clarify the problem:

partition2

Since sixteen is not evenly divisible by five, one of the partitions is forced to take on an additional value as shown above. This forces an uneven distribution of outcomes and results in the larger partition having a 4/16th probability of being chosen compared with the others which each have a 3/16ths chance of occurring.

One solution to this problem would be to discard the value of 16 whenever it is randomly chosen, and retrieve another random value until one falling between 1 and 15 is obtained:

partition3

This solves the problem of uneven partition sizes. But this solution is impractical for the Perfect Paper Passwords system where we would like to be able to directly determine any passcode from its location in a continuous unbroken series. Skipping unsuitable randomly occurring values would dramatically complicate that process. So we face the question of determining how significant this uneven partitioning error is, and how it affects the PPP system's strength.

Not a Problem

It turns out that our 128-bit pseudo-random number source is so large, compared to the size of typical PPP Passcode alphabets and character set sizes, that the "plus or minus one" difference in partition sizes is much too small to be detected.

As we've seen on previous PPP pages, our 128-bit pseudo-random Rijndael source provides 2128, or 340, 282, 366, 920, 938, 463, 463, 374, 607, 431, 768, 211, 456 possible values. So, for example, assuming a PPP character set size of 35 characters, which will not divide evenly into 2128, we wind up with:

24 partitions of 9,722,353,340,598,241,813,239,274,498,050,520,327 elements . . . and . . .

11 partitions of 9,722,353,340,598,241,813,239,274,498,050,520,328 elements.

The ratio of the size of those huge partitions, whose size differs by one, is:

0.999999999999999999999999999999999999897

... which is so close to 1.0 as to be absolutely insignificant.

However . . . This is only the case for the extraction of the first passcode character. As we know, version 3 of the PPP system extracts the values of successive passcode characters from this single 128-bit pseudo-random value by successively dividing it by the size of the specified character set and capturing the remainder as the next passcode character's value. This means that the absolute magnitude of the dividend will be being reduced in size — by 1/35th in our 35-character example — for each passcode character extracted.

But, here again, we are starting with such huge partitions that any reasonable number of successive passcode characters can be extracted without the partitions shrinking to a point that the ratio of partition size differs significantly from 1.0.

A Real-World Demonstration

To further demonstrate the statistical strength of this system, 35 billion (35,000,000,000) 128-bit pseudo-random numbers were generated and each one was successively divided by 35 (25 times) to simulate the PPPv3 extraction process for a 35-character alphabet. After each division, the remainder from the division — ranging from 0 to 34 — determined the resulting character chosen, and the count for that character was incremented by one. This division process was repeated for a total of 25 divisions, until the entire 128-bit number had been exhausted by being "divided out" into 25 sets of 35 partitions. (This simulated the generation of 25-character passcodes which is much longer than would ever be employed in practice since passcodes will typically only be a few characters long.)

Given 35 billion pseudo-random numbers, each divided among 35 partitions, each partition would have been incremented approximately one billion times. If you are curious to examine the raw data you may download it with this link.

The charts below demonstrate the high degree of uniformity obtained through this system of successive division-extraction. The charts on the left show the number of counts of each character with a vertical axis origin of zero and scaled so that the value of 1.0 equals exactly one billion hits. The expanded charts on the right were scaled to show, with a magnification factor of ten-thousand, the region immediately surrounding the 1.0 result. This enables the extremely small variations in partition counts to be seen.


1st Character Extraction
stats01-23 scale zoom01


2nd Character Extraction
stats01-23 scale zoom01


3rd Character Extraction
stats01-23 scale zoom01


4th Character Extraction
stats01-23 scale zoom01


5th Character Extraction
stats01-23 scale zoom01


6th Character Extraction
stats01-23 scale zoom01


7th Character Extraction
stats01-23 scale zoom01


8th Character Extraction
stats01-23 scale zoom01


9th Character Extraction
stats01-23 scale zoom01


10th Character Extraction
stats01-23 scale zoom01


11th Character Extraction
stats01-23 scale zoom01


12th Character Extraction
stats01-23 scale zoom01


13th Character Extraction
stats01-23 scale zoom01


14th Character Extraction
stats01-23 scale zoom01


15th Character Extraction
stats01-23 scale zoom01


16th Character Extraction
stats01-23 scale zoom01


17th Character Extraction
stats01-23 scale zoom01


18th Character Extraction
stats01-23 scale zoom01


19th Character Extraction
stats01-23 scale zoom01


20th Character Extraction
stats01-23 scale zoom01


21st Character Extraction
stats01-23 scale zoom01


22nd Character Extraction
stats01-23 scale zoom01


23rd Character Extraction
stats01-23 scale zoom01


24th Character Extraction
stats01-23 scale zoom01


25th Character Extraction
zoom01


As you can see, the distribution of the first 22 extracted characters is highly uniform. But the extraction of the 23rd character begins to show a slight bias with the 35th partition being chosen somewhat less often than the other 34. And the 24th and 25th extractions clearly show the consequences of the overall dividend becoming small relative to the divisor and thus the probabilities becoming increasingly non-uniform.

This 35-billion iteration "Monte Carlo" simulation demonstrates the feasibility of using this approach to generate highly uniformly distributed short one-time use passcodes for any reasonable number of characters. And, in fact, it holds up well for the generation of longer high-strength permanent passwords.


The DLL code implementation page describes the operation of the PPP DLL cryptosystem code library which is available to anyone who wishes to play with, experiment with, or use the PPP technology for any purpose whatsoever.


Perfect Paper Password Pages:

Jump to top of page
Gibson Research Corporation is owned and operated by Steve Gibson.  The contents
of this page are Copyright (c) 2008 Gibson Research Corporation. SpinRite, ShieldsUP,
NanoProbe, and any other indicated trademarks are registered trademarks of Gibson
Research Corporation, Laguna Hills, CA, USA. GRC's web and customer privacy policy.
Jump to top of page

Last Edit: Feb 08, 2008 at 13:17 (287.40 days ago)Viewed 4 times per day