The Security Samurai

Necessity is the plea for every infringement of human freedom. It is the argument of tyrants; it is the creed of slaves - William Pitt

My Links

Post Categories

Archives


Random Family Guy Quotes

Blog Stats

.Where I Work

General Blogs I Read

Security Blogs I Read

Useful Articles

Block Ciphers and Initialization Vectors (IVs)

I am frequently asked about Initialization Vectors (IVs) and have to address the common misconceptions people have from using them.  This is why I always build examples that use an IV with a symmetric encryption routine.  But in a post yesterday I was commenting on encryption in the abstract and realized I have been doing the same thing.  I never thought to ask why they need an IV in the first place.  What are you encrypting?  Who are you keeping it a secret from?  How valuable is the data?  For how long?  These questions must be answered before you can begin applying cryptography to a solution, even when determining the block cipher to be used.

The first thing you must understand is how data is encrypted by an algorithm.  In the following explanation, whenever defaults or exact values are stated, they pertain to the .NET implementation of the Rijndael algorithm (RijndaelManaged).

When you encrypt data, you are most likely doing so with a particular block cipher.  Block ciphers encrypt data in chunks (128 bits is the default, but you may also use 192 or 256 bits).  Whenever a block of plaintext is encrypted with the same key, it always produces the same ciphertext.  This can cause problems in certain applications where an attacker has the ability to know a large set of plaintext and its corresponding cipher text to create what’s called a code book.  Using this code book, a cryptanalyst can search for matching cipher blocks and if found will immediately know the plaintext.  It is also possible that if this code book is large enough and the data being encrypted follows a particular pattern (like emails, Word documents, custom message structures, etc.), he could launch a statistical attack on the varying bits to determine the plaintext. 

Another problem with a block cipher is that a block of ciphertext could be substituted with another and the recipient wouldn’t be able to detect it (without something else like a MAC).  Imagine a custom message format in a financial application.  An attacker could analyze cipher blocks, determine which one contains the account number by making his own transactions, and record the block with his encrypted account number.  He later could substitute a block from another transaction with his own.

This basically describes the Electronic Code Book (ECB) block cipher and its weaknesses. 

Cipher Block Chain (CBC) is the default and what most people commonly use.  At first glance, the major difference between ECB and CBC is the use of another variable called an Initialization Vector (IV).  CBC literally chains itself to the last encrypted block to eliminate the inherent weaknesses I previously described in ECB.  The process looks like this:

K(P1 XOr IV) = C1  à K(P2 XOr C1) = C2 à K(P3 XOr C2) = C3

(Where K = The Encryption With A Single Key, IV = Initialization Vector, P = Nth Plaintext Block, and C = Nth Ciphertext Block.  Here is a definition of XOr if you are unfamiliar with it.)

IV is a random series of bytes that is the same length as the block.  By first XOring it with the initial plaintext block, it adds entropy and eliminates the same plaintext from always becoming a certain ciphertext.  This is also why an IV is never reused.  The second thing you notice is that after the first block is encrypted, it uses the resulting ciphertext block to XOr with the next plaintext block to create the next ciphertext block.  The previous ciphertext block acts as the IV did when the initial block was encrypted.  This chaining effect prevents someone from replacing a block of ciphertext with a different one.  If one bit is modified in a block of ciphertext, the rest of the message will be garbled and useless. 

In addition, the IV is not a secret.  Many people find this difficult to understand, but for the same reason C1 which is used to create C2 is not secret, neither is the IV.  The IV adds entropy before the encryption takes place.  As long as it is never used twice, you can do whatever you wish with it.  A good way to ensure it is never used twice is to review methods that perform encryption and ensure it is always randomly generated.

Now that you hopefully understand the difference between these two block ciphers, it’s important to understand that they each have their own uses.  For example, if you were to encrypt an entire database file with CBC, then you would have to read the file sequentially as each block relies on the previous block.  With ECB, you could decrypt and read a block in the middle without decrypting any previous blocks.

I often see examples of peopling using CBC at inappropriate times.  Although it doesn’t add any security holes by doing so, it is a waste to store an extra field that is not needed.  For example, if you are just encrypting a database connection string and a handful of other parameters, ECB would work fine in this situation. 

Even credit cards numbers, social security numbers, etc. that are stored in a database don’t need to use CBC.  Again, a cryptanalyst would need a large amount of plaintext and corresponding ciphertext to create a code book to recover data encrypted with ECB.  The reason you store data in the database in an encrypted format is an attempt to limit the value of the data if it is ever stolen.  If a cryptanalyst has access to your code that performs the encryption and the database, then it could force it to generate enough plaintext and ciphertext to be used in an attack like this.  However, if this is possible then it is also very likely they could force your application to decrypt the data, or even have access to the encryption key itself. 

Data sent between two applications via some sort of queuing system or service based architecture are a completely different story and more than likely require CBC.  This is because the data sent will be over an insecure channel and structured in some way (XML, fixed field format, etc.) causing 2 things: 1) an opportunity for a cryptanalyst to force a known message to be sent and have access to the ciphertext and 2) the data is not simple fields but longer messages with additional data, allowing more blocks to be gathered per known message.  Depending on your block size, a credit card number can be encrypted in one or two blocks, whereas the same credit card number sent in an XML packet with other data would create many more blocks causing a huge change in the effectiveness of a code book.

Changing the block cipher is much easier than determining which one to use.  Instead of doing something like this to use CBC:

 

            RijndaelManaged alg = new RijndaelManaged();

                 

            alg.GenerateIV(); //always, always, always do this with CBC

            IV = alg.IV;

 

            alg.Key = key;

you would instead do this:

RijndaelManaged alg = new RijndaelManaged();

 

            alg.Mode = CipherMode.ECB;  //change the mode and get rid of any code that stores IVs              

            alg.Key = key;

I hope you have a clearer picture of what goes on when you perform symmetric encryption.  If you find it difficult to decide which block cipher to use or are worried about your own analysis of a situation, you can always fall back to using CBC.  No security risks are added by doing this, only a loss of functionality and performance.   

 

posted on Friday, June 10, 2005 4:33 PM