codesecurely.org

My ramblings on the world, my life, my work and oh yeah security!
Welcome to codesecurely.org Sign in | Help

Data Protection In Storage Transit

Abstract

Data represents perhaps the most critical asset within an organization. With this in mind applications that operate on, manipulate and store data must be concerned about the security of this key asset. Security of information such as personally identifiable information and corporate secrets needs to be guaranteed not only in transit but also in storage. Moreover the passage of recent laws such as California's SB 1386 and the credit card industry’s PCI standards also imply that protecting customer information is key to the survival of an organization and hence the lack of data protection can have devastating consequences as recent examples have shown.

 

Introduction

Continuing in the series of articles we now move to considering the next category of security issues within our security frame that is used for threat modeling, security code review as well as penetration testing of applications. Data represents perhaps the most critical asset within an organization. It is therefore critical that maximum effort should be made to protect this data. Two major attack avenues exist for such data – during transit and when stored in a data store. Handling the risks associated with these avenues is not necessarily tremendously complex. Tried and tested solutions exist to provide solutions for this problem. However the devil is in the details of the implementation and data protection is easy to get wrong. Moreover compliance requirements can often govern these details - the choice of algorithms, key lengths and other parameters. For instance, many government agencies demand Federal Information Processing Standards (FIPS) compliance which automatically rules out a host of algorithms and functions as being weak and it is important therefore that developers dealing with such requirements not only be cognizant of them but have the knowledge and tools necessary to achieve compliance.

 

The science and art of cryptography is often treated as a shiny red button that will solve all known and unknown security problems. Countless times as a security review begins we have heard the phrase “We use SSL so we are secure”. This is indicative of both the complexity the topic encompasses as well as the lack of sufficient understanding of the basics on the part of practitioners. On the positive side though with the advent of a number of tried and tested software development frameworks such as J2EE and Microsoft .NET, access to data protection mechanisms has been made far easier and developers have at their disposal implementations of best of breed algorithms and protocols as simple class and object methods.

 

Security Properties

Security 101 teaches you that some of the most critical security properties are confidentiality, integrity and availability – collectively called the CIA properties. Of these confidentiality and integrity are most often achieved through the use of cryptography. Besides these two, secure systems often have to also deal with authentication and non-repudiation. With this in mind, therefore, before we go into any level of depth into this article, we would do well to define[1] what each of these properties mean and how they differ to each other.

  • Confidentiality – Achieving this property relies on restricting access to information to only those who are privileged to see it. Network sniffing is an example of a violation of confidentiality. Other examples include disclosure through log files and exception and error messages.
  • Integrity - This property defines the trust that can be placed in the information received. Data integrity is having trust that the information has not been altered between its transmission and its reception. Source integrity is having trust that the sender of that information is who it is supposed to be. This latter property is often termed as authenticity. Data integrity can be compromised when information has been corrupted, willfully or accidentally, before it is read by its intended recipient. Source integrity is compromised when an agent "spoofs" its identity and supplies incorrect information to a recipient.
  • Authentication - Authentication is the process by which an entity attempts to confirm that another entity from who the first party has received some communication is who it claims to be. Identification is a weaker property that only involves the second entity making claims about its identity. Authentication therefore is the process of proving those claims with strong evidence. Authentication is not to be confused with authorization or access control which defines what an entity can do after it has been authenticated.
  • Non-repudiation - This property is also often known as accountability and is tremendously popular in the government and financial services sectors. The non-repudiation of receipt of information means that an agent can't deny receiving information. This can prevent an online-vendor from being obliged to ship replacement goods to a malicious customer who denies receiving the original items. The non-repudiation of sourcing information means that an agent can't deny sending the information. This prevents an agent from anonymously sending spoofed emails with malicious intent, for example. Non-repudiation is essentially intended to prevent an entity from denying a transaction took place after the fact. For instance, consider if a malicious user could withdraw money from an ATM machine and then deny he / she did so?

 

In the next few sections we discuss a number of the data protection primitives as well as issues such as key management providing both examples of the right choices as well as common mistakes.

 

Cryptography Primitives

Data protection is achieved by encapsulating a number of primitive building blocks into the final cryptographic solution such as SSL or digital signatures. From an application development perspective making incorrect choices with respect to anyone of these primitives can undermine the security of the entire solution.

 

  • Random Number Generation – Random number generators are used for a variety of purposes in application development. These include the generation of session identifiers, entropy for key generation and other cryptographic functions as well as for application specific purposes such as account identifiers and during passwords resets. It is important to distinguish however between cryptographically secure random numbers and those that are not. For instance, “random” numbers generated using the C rand function or Java’s Math.random are not cryptographically secure. These algorithms lack a key property that is available in classes such as SecureRandom in Java and RNGCryptoServiceProvider in .NET. These latter classes are often called pseudo-random number generators (PRNGs) and the property they share is that the sequence of numbers they generate are approximately independent of each other. These are called pseudo-random since as opposed to pure random numbers usually generated through the use of specialized hardware measuring physical phenomena such as radioactive decay. Pseudo-random numbers do have the property of periodicity which implies that after a given interval the output will be repeated and from that point on the algorithm will begin repeating the sequence of “random” numbers. The best PRNGs are therefore those that have an extremely large period. Associated with this process is also a secret value called a seed. The seed represents the starting point in the sequence and hence disclosure of this value can result in the compromise of the randomness properties. Bugs surrounding improper use of random numbers have affected the best of applications and protocols including early versions of the Secure Sockets Layer protocol.
  • Hash Functions – Hash functions (also called one-way functions or checksums) basically operate on arbitrarily long data reducing it to some finite length fingerprint. For instance the popular (but now weak) Message Digest algorithm (MD5) function produces a 128 bit long hash while the Secure-Hash 1 algorithm (SHA1) generates a 160 bit output. The best hash functions have three key properties:

a)                  One-way: Hash functions must be one-way to be effective i.e. it should not be possible to reverse a hash function i.e. go from the output hash value to the original input. This property is extremely useful and is most popularly used for password storage purposes so that even if the password database is compromised, all is not lost. This property is also often called pre-image resistance

b)                  Second-order pre-image resistance: Given a specific input, it should be tremendously computationally complex to find another input that generates the same hash as the first.

c)                  Collision resistance: It should be computationally hard to find two random inputs that both generate the same hash. In recent months this specific property has been in the news quite a bit[2] as people have questioned the effectiveness and use of algorithms such as MD5. In reality just based on the definition above it is fairly obvious that collisions will result since the input set is infinitely bigger than the output. The key aspect of a good hash algorithm however is to make these collisions difficult to find. As a rule of thumb the best hash functions operate so that changing even a single bit in the input, alters roughly half the bits in the output.

With these key properties in mind, application developers would do well to stay ahead of the curve and thus avoid MD5 entirely and begin to migrate away from SHA1 as well towards stronger algorithms such as SHA256.

 

As mentioned above hash functions get used very often for password storage. However even though hash functions are one-way, if the attacker can steal the password database, he/she can still launch an offline attack by computing the hash for every possible password. This however is a tremendously computationally intensive task. Attackers however have attempted to gain the upper hand in this battle by pre-computing the hashes and distributing these as rainbow tables[3]. Thus, when launching an offline attack the task is now as simple as performing a simple database lookup. To deal with this, most effective authentication mechanism introduce a cryptographically random source of entropy or a “salt” value. This salt is combined with the actual password before hashing it and thus the attacker now needs to compute not just the hash of every possible password but instead the hash of every possible password with every possible salt value. This adds significant complexity to the process. Obviously however, the system must store the salt value used for each account so that it can authenticate the user.

 

Finally, hash functions should not be confused with encryption. Hash functions are one-way and keyless whereas encryption functions typically have decryption ability and usually leverage a key. Thus hashing should not be used to protect confidentiality but rather are an effective mechanism for integrity checking. Hash functions are often combined with keys to produce what are called HMACs (Keyed-Hashing for Message Authentication). However in these cases they only function to perform authentication and do not by themselves guarantee confidentiality.

 

  • Encryption & Signatures: Encryption and signatures are often spoken off in the same breath since as we will soon realize not that different after all despite the fact that they attempt to achieve different security properties. Essentially both of these represent functions that take a piece of plain text input data and a cryptographic key and generate an output. This output often called the cipher text cannot be reversed back to the plain text without knowing both the algorithm used and the key. This also implies that the choice of the algorithm and the key are critical. We will discuss the issues dealing with the key later in this paper but first we need to address one of the most common mistakes in applications leveraging cryptography – building your own algorithm. Application developers must never attempt to build their own cryptographic algorithms but rather should rely on tried and tested implementations such as RSA and AES. Such algorithms have been through tremendous public and expert scrutiny and have stood the proverbial test of time. Moreover creating a strong and effective cryptographic algorithm is no easy task and requires in-depth knowledge of fields such as probability and number theory. Similarly, application development teams should refrain from using so called “secret” cryptography sold by a number of vendors. Such implementations, often referred to as “snake oil” by the security community[4], can cause more harm than good since these implementations are not known to be secure.

 

Encryption algorithms are typically classified into symmetric and asymmetric algorithms. As the name suggests symmetric algorithms use the same algorithm and the same key to both encrypt and decrypt the data. Thus, for two entities to share information using symmetric cryptography, they both need to agree on a specific algorithm and then share a key. Symmetric encryption algorithms are typically much faster than asymmetric implementation discussed below. Therefore they are preferred for bulk encryption. They however have a significant disadvantage – the problem of key distribution. If there are multiple entities looking to communicate among themselves with each pair maintaining their own confidentiality (and/or integrity) each of these need to share a different key. Distributing these keys needs a secure channel in the first place and thus symmetric key implementations can often become a chicken and egg problem.

 

Asymmetric algorithms on the other hand use a different key to encrypt (usually called the public key) and a different one to decrypt (called the private key). Thus if Alice wants to send Bob a confidential message, she first encrypts it with Bob’s public key, which Bob can publicly broadcast via his website or through some kind of key directory (which is the central component in any Public Key Infrastructure (PKI)). Once this data has been encrypted it can only be decrypted by the corresponding pair i.e. Bob’s private key. This type of algorithm thus solves the key distribution problem described above. However, due to their very nature asymmetric implementations are much slower than symmetric algorithms. Therefore they are not suitable for bulk encryption such as during an SSL session.

 

Digital signatures are created using a very similar process. In this case if Alice wants to digitally sign a piece of data before sending it to Bob, she “encrypts” the data with her private key to generate the cipher text. This cipher text represents the digital signature and can be verified using Alice’s public key which is widely available. Assuming Alice keeps here private key a well-guarded secret as she should, no one else can generate the same signature, hence it is an excellent mechanism for authentication and integrity protection. In practice, in order to prevent the signature from being as large as the data itself, the data is first hashed before being “encrypted” using the private key.

 

As you would expect encryption and signatures can be combined and developers are typically advised to first encrypt and then sign the data they are looking to keep confidential and protect its integrity. Developers must bear in mind issues such as surreptitious forwarding and repudiation which are beyond the scope of this article[5]. Similarly an appropriate choice of block cipher modes is also critical. For instance, modes such as the Electronic Code Book (ECB) are known to be insecure and can result in information disclosure as is shown in the figures below[6]. Instead stronger modes such as Cipher Block Chaining must always be preferred.

 

Plain Text

Using ECB

Using CBC

Plain Text

Cipher Text using ECB

Cipher Text using CBC

 

Development teams are advised to use best of breed algorithms when creating their applications. For symmetric algorithms, developers and architects are strongly advised to avoid the use of the Data Encryption Standard (DES) but instead use a minimum of 3DES and as far as possible the Advanced Encryption Standard (AES / Rijndael). For asymmetric algorithms, RSA is currently the most commonly used. However, as mentioned above, public key algorithms should typically only be used for one-time activities such as key exchange or session setup. Key length recommendations for all of these algorithms are discussed later in the article.

 

Key Management

As was suggested above, the effectiveness and utility of a cryptographic implementation relies primarily on the secrecy of the key. It is therefore fairly obvious that the security keys must be paramount at all stages of their lifetime. Three of these critical stages are discussed below:

 

  • Key Generation: Keys must be generated with properties that accentuate their ability to provide the security desired in the overall cryptographic implementation. Two of the main properties are length and entropy. Key length can be the bane of many applications leveraging cryptography. Short keys are far easier to guess and brute-force then longer ones. The seminal work on appropriate key lengths was done by a pair of researchers, Arjen Lenstra and Eric Verheul in 1999[7]. To summarize their recommendations, the minimum length of a key varies over time based on computing power improvements, cryptanalytic research and increasing budgets of attackers. Based on that analysis the table below can be constructed. In short, if you want your data to be secure through the year 2010 for instance you must choose symmetric keys to be at least 78 bits long (128 bits in practice) and asymmetric keys to be at least 1369 bits in length (2048 bits in practice).

Year

Minimum Symmetric Key Length

Minimum Asymmetric Key Length

1982

56

417

1987

60

539

1992

64

682

1997

68

844

2000

70

952

2001

71

990

2002

72

1028

2003

73

1068

2004

73

1108

2005

74

1149

2006

75

1191

2007

76

1235

2008

76

1279

2009

77

1323

2010

78

1369

 

As is implied above, development teams must distinguish between long-term and short-term or ephemeral keys. The latter are very commonly used as session keys in protocols such as SSL and IPSec. Given their short lifetime they do not need to be as long as long-term keys. These are often used for encryption and authentication of sensitive data in storage rather than just on the wire. With long-term keys developers must also be concerned about how to perform key rotation and revocation if keys do get compromised. In such cases it is best to rely on a PKI system or trusted certification authority based system such as the one used for digital certificates.

 

Key entropy is a key ingredient in effective cryptography. Unlike passwords keys should not be easy to remember, instead they should be cryptographically random seeded from a passphrase if required[8]. This implies that developers should not attempt to create a key by using just any sequence of letters, numbers and special characters or worse still attempt to use the password as the key. Instead it should rely on functions available as part of the underlying framework. For instance, each of the CryptoServiceProvider classes in the .NET base class library such as RijndaelCryptoServiceProvider have a GenerateKey method as well as a GenerateIV method for the initialization vector. Similarly, Java has the javax.crypto.KeyGenerator class. The Data Protection API (DPAPI) available on Microsoft Windows 2000 and above is an excellent choice of a symmetric algorithm where key management is handled by the operating system. Usage of this API does not need the developer to deal with any keys, instead the key is generated based on the logged-on user’s password hash. The drawback of this mechanism is that it is not easily portable across machines.

 

  • Key Distribution: Once the keys have been generated they often need to be distributed to the responsible parties involved in the communication. As discussed above this is not always an easy task. Two of the biggest threats to a cryptographic system are key disclosure and tampering to cause a denial of service. While asymmetric key algorithms make the distribution task trivial, they are not best suited for bulk encryption like their symmetric counterparts. In practice, we try to get the best of both worlds by using asymmetric key algorithms to first exchange a shared session key which is then used for the actual bulk encryption with a symmetric algorithm. This in fact is the approach utilized by SSL.

 

As a rule of thumb, the key distribution if needed should as far as possible take place offline using some out-of-ban mechanism (like over the phone). The application must ensure that key material is only transmitted over a secure channel such as one protected using SSL or IPSec. Only well documented and established key exchange algorithms such as Diffie-Hellman (DH) or RSA should be used when exchanging keys online. Even in such cases, one must be careful to avoid known flawed implementations such as anonymous Diffie-Hellman (ADH) which is susceptible to a man-in-the-middle (MITM) attack.

 

  • Key Storage: Key storage is probably the number one source of key compromise in applications. The most common mistake is to store the key within the source code that uses the key. This is especially riskier in managed languages such as Java and C# which can easily be disassembled into higher level language code that will quickly indicate the key being used. Keys stored in the source code are also often kept constant for the entire lifetime of the application and are not under controls that enable key rotation and revocation. This is often true across development and deployment where the same key is used during live production as was used during testing. Similarly, configuration files are also not the best location to save secrets such as keys. These files typically are accessible both to the application itself but also to developers and administrators. The keys are thus available to a very large audience. Moreover any web vulnerability that provides an external user with access to the configuration files is now exacerbated.

 

With regards to secure key storage consider first whether the keys need to be stored in the first place. For instance, ephemeral keys by definition shouldn’t be stored. If they must be stored consider the use of a hardware solution such as a smart card or a USB stick. These are now fairly commonly available from a number of vendors. Many of these devices are built to be tamper-resistant and support advanced options such as splitting the key into fragments that are stored separately. When storing in software, all of the commonly used cryptographic frameworks such as Microsoft’s CAPI support the notion of key stores. In the Java SDK for instance, the java.security.KeyStore class and the keytool command line utility both provide access to a protected repository for storing key material. Similarly in the .NET world, one can rely on either the user or machine key store.

 

Transit Security

While protecting data in storage is a large part of the puzzle, it is after only a part of it. Even the best of storage security can be undone by secrets inadvertently exposed over the wire. This could include for instance, authentication cookies and tickets being transmitted between the browser and the server in the clear. Fortunately, there are fairly well understood and easy solutions to deal with this issue. However, like a lot of the other topics discussed in this article, the solution is also easy to get wrong and the default options may not always be the correct options. With transit security this issue is even more pronounced given that these options are often handled by an entirely different team – typically the IT administrators as opposed to being a function of the development team itself. As has been discussed in previous articles, it is critical that both of these groups communicate effectively and often to ensure security all around.

Most transport security mechanisms rely on establishing a secure communication tunnel between the two entities communicating. All of the security properties discussed above are then enforced on this tunnel or envelope rather than on the data directly. While this may not appear as granular as one would like it does allow for the flexibility to layer this approach over a varied set of data protocols such as HTTP, SMTP and FTP. Two of the most common protocols are discussed below:

 

  • SSL – Most developers especially those that build web applications are intimately familiar with the Secure Sockets Layer or SSL. This is often treated as a silver bullet meant to be the solution to all types of security problems. Fortunately or unfortunately however, SSL is only intended to provide transport security. At a protocol level, SSL lets developers tunnel sensitive data through a protected channel that guarantees confidentiality and/or integrity and/or authentication. In its most common form, SSL utilizes server digital certificates or X.509 certificates as they are also known[9]. However, SSL also supports client-only authentication as well as mutual authentication with either the client only or both entities presenting their digital certificates. As mentioned above, the digital certificates and the contained public key are only used to establish a secure channel based on a symmetric encryption algorithm.

 

While SSL is a tremendously flexible yet transparent solution it does require some configuration in order to be setup correctly. Primarily this centers around selection of a cipher suite. As part of the initial handshake, both entities must agree on a specific cipher suite they will use for the duration of the session. If the parties to the communication cannot come to agreement on this, an SSL session cannot occur. The components of a cipher suite include: the choice of algorithms and key lengths for the initial key exchange, authentication, encryption and message integrity protection. For instance the cipher suite DHE-RSA-AES256-SHA uses ephemeral Diffie Hellman for the key exchange, RSA for authentication, AES with a 256 bit key for encryption and SHA1 for message integrity protection. On the other hand AES256-SHA uses RSA for both key exchange and authentication and then AES(256) for encryption and SHA1 for integrity protection. As one might expect all of the guidance provided earlier in this article individually with respect to key exchange, asymmetric and symmetric algorithms and key lengths as well as hash functions must be applied to the choice of cipher suite. This is especially true since by default a lot of the supported cipher suites use weak encryption such as DES with a 56 bit key or flawed key exchange such as anonymous Diffie-Hellman (ADH) or even no encryption in the so called NULL cipher suite.

 

  • IPSec – While SSL is designed to provide security at the transport layer, IPSec takes that one step further moving down the stack to the network layer itself. IPSec is essentially the security extensions to the Internet Protocol (IP) which enables encryption, authentication and integrity protection to network streams by encrypting and/or authentication all IP packets.

 

IPSec is a set of cryptographic protocols for both securing packet flows as well as key exchange. Packet flows can be secured using either the Encapsulating Security Payload (ESP) which provides authentication, data confidentiality and message integrity or the Authentication Header (AH) that provides just authentication and message integrity without confidentiality. The IPSec standard also defines a key exchange protocol viz. the IKE (Internet Key Exchange) protocol.

 

IPSec is not as flexible or widely supported as SSL. It primarily relies on both parties to the communication being under single operational control due to the need for explicit key distribution. While this might seem as a disadvantage at first that renders this protocol all but useless, it in fact lends itself to being tremendously useful especially in securing server to server communication. For instance, IPSec is ideal for locking down communication between the application server and the database server or between client and server when using .NET remoting. This can ensure for instance, that a random machine on the same network cannot connect to the database simply because it is accessible. Instead, the caller is challenged to authenticate itself and policies can be setup on the database server to only allow connections from the application server and to reject all other attempts. In situations such as .NET remoting this can be an effective way to achieve transport confidentiality and authentication as well as to prevent unauthorized callers from invoking the remote API.

 

  • Message Level Security – With the ever increasing popularity of web services, a new form of transit security is becoming mainstream. Web services bring in a unique requirement to transit security that was not present or relevant in the past – the need to provide cryptographic properties in multi-hop fashion. For instance consider a message which is sent from the user to the e-commerce store. This message contains both information such as the product IDs and quantity information as well as the user’s credit card details. The former only needs to be accessible to the online store but the latter on the other hand must only be decipherable to the credit card processing firm.

 

With existing transport security mechanisms such as SSL this is not currently possible given that SSL is a paired protocol operating end-to-end and hence is not meant for such multi-hop transactions. Some of the new WS-Security related standards such as XML encryption however, were designed to provide developers with just this ability. XML encryption (and XML digital signatures) allow different parts of an XML request to be encrypted under different keys (including keeping sections unencrypted as well). Thus, when an entity receives a message, it can decrypt only the information that is relevant and intended for itself and cannot decipher any information intended for downstream or upstream recipients – the credit card information in the example above.

 

This form of transit security is often referred to as message level security since it operates more at the message level rather than at the raw network or transport protocol level.

 

Managing Secrets in Memory

While data protection is traditionally associated with persistent storage and transport mechanisms, increasingly attackers are becoming more sophisticated with attacks exploiting information disclosure through crash dump files and virtual memory page files. To deal with this threat, the .NET framework for instance supports the notion of protected memory. This is a block of memory that is encrypted using the Data Protection API (DPAPI) and is only decrypted just before it is accessed. This minimizes the window within which the sensitive data is left unencrypted in memory and therefore exposed. In the .NET 2.0 framework, this functionality is accessible through two classes viz. ProtectedMemory[10] and SecureString[11]. These classes let you securely maintain byte streams (for custom objects) and strings in a secure encrypted manner as well as give the developer far more granular control over the garbage collector for these objects. One key caveat to bear in mind when using these types is that if they are ever converted to their vanilla forms, for instance, a regular string, then all of the benefits are lost since you will now have an insecure copy also being held in memory which is not afforded the same protection as the SecureString type.

 

Conclusion

While as a science, cryptography in specific and data protection in general is a complex beast which is easy to get wrong, developers can keep a few rules of thumb in mind and come out as winners. These include for instance, not creating your own cryptographic algorithm but rather using a tried and tested one or using a cryptographic pseudo-random number generator when generating session or account identifiers. We hope this article was successful in identifying those rules of thumb which would enable the reader to build strong data protection into their applications.

 

Summary

Data protection helps protect perhaps the most significant asset an application has. This is especially true given that the data might include customer private information, financial records or even corporate secrets. It is therefore critical to protect this data adequately both on the wire but also while in storage and memory. To aid in this, most of the popular programming frameworks support a rich set of APIs. However it is still fairly easy to cause a major vulnerability by making what seems like a relatively trivial and innocuous choice.



About This Page

Title: Data Protection In Storage Transit
Moderated By:
Created: 02-25-2007, 2:49
Modified: 02-25-2007, 3:21
Last Modified By: rudolph
Revision Number: 6

Common Tasks

Powered by Community Server (Personal Edition), by Telligent Systems