Threat Level: green Handler on Duty: Johannes Ullrich

SANS ISC: Theoretical and Practical Password Entropy - SANS Internet Storm Center SANS ISC InfoSec Forums

Participate: Learn more about our honeypot network

Sign Up for Free!   Forgot Password?
Log In or Sign Up for Free!
Theoretical and Practical Password Entropy

We got a number of submissions pointing to today's XKCD cartoon [1] . I think the cartoon is great, and illustrates a nice dilemma in password security. Yes, I know passwords don't work, but we still all use them and we still have to come up with reasonable passwords.

Even if you are using a password safe tool that comes up with new random passwords for each application and website, you still need to remember the password for the password safe, and there are a few applications (e.g. logging in to your system) that can't be covered by a password safe.

The basic dilemma is that you need to come up with a password that is hard to guess for others but easy enough for you to remember. Most password policies try to enforce a hard to guess password by forcing you to extend the range of characters from which you pick (different case letters, numbers, special characters). However, in real life, this may actually reduce the space of "memorable" passwords, or the total number of possible passwords.

Pass phrases, as suggested by the cartoon, are one solution. But once an attacker knows that you use a pass phrase, the key space is all for sudden limited again. There has been some research showing that a library of 3 word phrases pulled from wikipedia makes a decent dictionary to crack these passwords.

The qualify of a password is usually expressed in "bits of entropy". The "bits of entropy" are calculated by the number of bits it would take to represent all possible passwords. Lets look at some common schemes:

a 4 digit PIN: 10,000 possible passwords, or 13.3 bits (ln2(10,000)=13.3)
12 characters using the full 95 characters ASCII set: 5.4 10^23, or 78.8 bits. (this is the current NIST recommendation)

Pass phrases are harder to evaluate. It depends on the size of the vocabulary of the user, and of course the constraints of grammar. People will likely not choose some random words, but a phrase that makes some sense to them. One model that can be used to obtain a passphrase is called "Diceware", but it assumes random phrases from 6^5 words (7,776).If you consider Diceware's 7,776 words, you would need 6 words to arrive at the same 77.5 bits, close to strength that NIST asks for.

 What it all comes down to: How are people actually selecting passwords? People make pretty bad random number generators, in particular if you ask them to remember the result. A good password cracking algorithm takes this into account and tailors the password list based on password requirements and the targets background. For example, for web application pen testing, the simple ruby script "cewl" will create a custom password list from words it finds on the targets website. In past tests, I was easily able to double my password cracking success using this technique if compared to normal dictionaries.

In order to solve this, we need to figure out what passwords people really use. How about asking them for their password and offering them a candy bar in return :). And then there is always another XKCD cartoon for you [2]



Johannes B. Ullrich, Ph.D.
SANS Technology Institute

I will be teaching next: Application Security: Securing Web Apps, APIs, and Microservices - SANS Cyber Defence Japan August 2022


4514 Posts
ISC Handler
Aug 10th 2011
but does the diceware list really provide that much entropy - if you have a cracking util that uses the same vocabulary as dicewire and basically combines the same 7776 words into pass phrases and hashes them out you should be able to match hashes pretty quick. A Hydra type attack on a login might be a different story somewhat.

15 Posts
How about using phrases but intentionally misspelling a word? I'm not talking about l33t speaking it, I'm talking about making "invisible astronaut hamburger" into "invasible astronaut hamburger" - that would break dictionary attacks.

93 Posts
I doubt that passphrase scheme (or any other) could work in practice -- because we need to create and recall a unique passphrase for every account on every website, of course ;)
Steven C.

171 Posts
Oh, and of course passphrases don't work with a ridiculous max. password length like 8 or 10 characters (I've even seen restrictions on character set, case-insensitivity, or even numeric-only). Or when a password can be reset just using one or two 'personal questions' that everyone knows the answers to, or a phone call to customer services saying 'I can't log in'
Steven C.

171 Posts
Steven, you're probably running up against a system with a mainframe backend. Despite what the mainframe proponents say about "mainframe security", the reality is that their password controls in general suck. The 2-year old mainframe my company uses can only accept alphanumeric passwords with a maximum length of eight characters. Thanks, Unisys. We get beat up regularly by our customers because your security controls suck.
"But once an attacker knows that you use a pass phrase, the key space is all for sudden limited again."
there are perhaps ~1,000 common words

If you pick 4 random ones, there are
1000 * 1000 * 1000 * 1000 = 1000 ^ 4 = 10^12 possible choices.
That's approximately 2^39 or about 39 bits worth of 'choice'.

The problem is, that amount is only the entropy if your selection actually _IS_ truly random, which means you use a computer to make a truly random selection of 4 words; don't pick words off the top of your head.

As a human, there is likely to be some bias, in the words you would choose, reducing the entropy; if you pick them without a randomizer, there will be less than 39 bits of entropy.

A punctuation mark added, and a random letter capitalized somewhere in the passphrase also helps.

A combination of _both_ approaches is possible, without going overboard and making the password hard to remember.

146 Posts
I'm not sure any of this is of any real world relevance.

1. If you implement account lockouts, even if for ten or twenty attempts, entropy doesn't matter.

2. If they've swiped a password file, they've got all the time in the world without any lockout concerns and a graphics card password cracker can try billions of permutations per second. If it's taking too long, add in a few more graphics cards.

Passwords are passé. The only reason companies use them is because they're free, and you get what you pay for.
I've got a spreadsheet that calculates permutations out to 25 characters. I've just added in entropy calculations, and NIST's recommendations actually do not often pan out to 78.8 bits in a practical environment because there will usually be a requirement of using more than one class of characters.

For using all of the character types, the max entropy is 70.3 bits at 12 characters. There are 1.48E21 possible permutations here. Assuming a random distribution, the random time to brute force at 3.3 billion passwords per second (achievable with current GPUs) is an average of about 24.8 GPU-years. To get that down to a month, you would need about 27,500 cards--not impossible, but not trivial, either. This increases to an average of 17,900 GPU-years at 13 characters.

I don't yet have complete subsets (e.g., 2 of 4 character classes) as I've not worked out the spreadsheet structure, but it still won't reach the 78.8 bits recommended by NIST. A quick calculation suggests 72.2 bits, but I'm not swearing on that number.

I've thought about doing some research into passphrase strengths where spelling is correct, but the calculations get daunting. The Oxford English Dictionary discusses word count just for English[1] and comes up with 250,000 to 750,000 words, depending on context. Then you get into sentence structure, vocabulary statistics, and regional variations, and it becomes clear why, when no other methods are available, passphrases are by far the better solution.

5 Posts
I agree with Jarrod and Jason. You could use passwords like "123456890isQUITEeasyTOremember!" (note the missing 7).

Have you added one word per word? I mean; There are permutations of words, like lowercase, uppercase, first letter uppercase which would at least tripple the amount. Have you added whitespaces? What if someone replaces Whitespace with lets say dots or 'x' or whatever. You see, the complexity is even higher!
27 Posts
The number of words in the Oxford English Dictionary is not relevant here. What is relevant here is the number of words in the average person vocabulary. That is only a few thousand at best.

Next you can decimate the permutations, because most people will choose a grammatically correct phrase, e.g. subject transitive-verb object, or subject intransitive-verb adjective. Also you might need to ban "I", "The" and "and".
How about using another dictionary?

For example,
"There are over 80,000 Chinese characters, but most of them are seldom used today. So how many Chinese characters do you need to know? For basic reading and writing of modern Chinese, you only need a few thousands. Here are the coverage rates of the most frequently used Chinese characters:

Most frequently used 1,000 characters: ~90% (Coverage rate)
Most frequently used 2,500 characters: 98.0% (Coverage rate)
Most frequently used 3,500 characters: 99.5% (Coverage rate)

For an English word, the Chinese translation (or the Chinese 'word') often consists of two or more Chinese characters. " [1]

Now, if you understand Chinese, you can use pinyin - an anglicized form of Chinese - to construct your passwords. Pinyin comes in two basic flavors, traditional (Taiwan) and simplified (China). The simplified pinyin is often used with a number at the end of each character to denote the phonetic "tone" of pronunciation. Therefore, the work for 'sunrise' could be combined with the word for 'bright' (ming2xu4) to easily express a 'bright sunrise' and easily defeat the Oxford adherents. Furthermore, if one chooses from the more arcane characters (remember, there are about 80k choices here) it would present a rather daunting challenge. By the way, there are variations on this theme if you go with the Cantonese rather than Mandarin pronunciation.


3 Posts
There is a revolutionary device that acts as a fairly good randomiser for choosing high-entropy passphrases: a book. Simply use my patented Flick-and-Point (TM) technique with book of a reasonable length, with a reasonably wide vocabulary, and easily create passphrases like:

"civilian flyer know fait"
"roughly channel Limited fools"
"be obviously Sensia then"
1 Posts
It is obvious that the second password is for a system that only allows lower case letters plus a space. Given that, the first password is much more difficult to crack using brute force.

63 Posts
@Mysid and Greg

What is your source for the average person's vocabulary? A quick google search shows that, while there is no good answer, whatever it is is much more than a few thousand words. Here is a nice little article.

That said, I often have a problem when assigning passwords to people. I seem to use a lot of words my users don't know. I had a user complain about random passwords when I assigned her a phrase containing zymurgy.

6 Posts
I always put a high ASCII character in my admin passwords where possible. An ALT-255 instead of a space even lets me write them down and not have to hide the sticky note. :-)

The OED is commonly accepted as the ultimate source of knowledge of the English language, or at least as close as anything can get to it right now. As such, it's useful for helping to set upper bounds of complexity.

You're right that people have much smaller vocabularies; an educated person averages in the 24,000-30,000 word range[1]. Those vocabularies differ significantly from person to person, meaning that someone in a technical field and someone in a literary field may have significant differences in those 24,000+ words.

But cutting down the 250,000 base words to, say, the 20,000 most common words is the same as cutting a word list for JtR down to the most common 1000 passwords. You select an arbitrary number for your particular needs at that point so you can at least try to tackle part of the problem.


5 Posts

Sign Up for Free or Log In to start participating in the conversation!