codesecurely.org

My ramblings on the world, my life, my work and oh yeah security!
Welcome to codesecurely.org Sign in | Help

Error Handling & Exception Management

Abstract

As part of a security assessment or from a defensive programming perspective, Error Handling and Exception Management are responsible for ensuring that all failure conditions such as errors and exceptions are dealt with in a secure manner. The nature of issues covered in this category range from detailed error messages that lead to information disclosure, to how the system deals with failure conditions. In short, this category of issues can be further subdivided into how the system deals with errors and exceptions and then what it does in engaging the user to react to these situations.

 

Introduction

As we begin to wrap up the last of the security frame categories as part of this series of articles we begin to delve into a couple of categories that are easy to ignore and in fact in many cases, development teams might even question the security impact of anything dealing with Error Handling and Exception Management or Auditing and Logging. In this article we discuss the former and attempt to prove why this category is critical to the security of your application. Further, we provide guidance on how to design and implement secure systems that prevent the classical vulnerabilities in this category  while also adopting a defense in depth strategy that mitigates risk of future vulnerabilities.

 

Why care about Error Handling & Exception Management?

Hopefully from a general usability perspective, this argument need not be made. It is probably fair to assume that most software developers and the teams within which they operate do understand that end users (the people that pay for software to be developed) care significantly about the quality of the system. While they are willing to deal with the occasional failure (perhaps even one that needs the system to be restarted), they expect that the system will deal with such situations appropriately. Appropriately here is perhaps a heavily overloaded word.  Users at a minimum expect that they will be provided with information about what went wrong and that the system will be able to recover as much as possible.

 

When concerned about security however do we really care about these issues? The answer to this may not be so obvious yet but in many ways can be an even more emphatic Yes!

 

Firstly anytime your application or system is faced with an error or exception, it is likely executing a code path that perhaps was not as heavily exercised during the test cycle. This implies that there might be some such code paths that leave the application in an untested and unexpected condition which in turn might be vulnerable to security threats otherwise considered mitigated. Further, the integrity of the application itself maybe in questions if business logic was not executed to its expected end. For instance, consider what might happen in a simple online banking transfer indicated in Figure 1.

Application Inconsistencies

Figure 1: Leaving an application in an inconsistent state

 

This leads to the idea of failing securely. As uncanny as that might sound a large part of this column is focused on this very concept. Failing securely essentially refers to the ability of a system to expect failure no matter how inevitable it may seem, and to then deal with that failure in the most secure manner possible while still attempting to maintain core functionality without compromise if at all possible.

 

There are a number of other reasons why error handling and exception management is critical to the security of an application. For instance, an unhandled exception that causes a server application to crash essentially could result in an effective denial of service for all clients. If the attacker’s motive was denial of service in the first case then he or she has essentially succeeded. Further, such unhandled crashes often make it much harder to recover (due to un-reclaimed resources, left over lock files for instance) or to debug and prevent the problem in the future.

 

Definitions

Before delving too deep into the security aspects or errors and exceptions it would help to make sure that the reader is on the same page with regards to what these terms specifically mean. Fortunately or unfortunately, both the words “errors” and “exceptions” are heavily overloaded. To avoid getting a philosophical debate, since individual readers might have a slightly different definition, we will define these terms solely for the purpose of this article.

An error is reserved to refer to a condition which was envisaged at design and development time and can be recovered from relatively easily and most often with no effort whatsoever. This could be the result of an incorrect operation performed by the user or the system. It may also refer to the message displayed when the action fails. An excellent example of this is an incorrect password entered by the user. As you would expect a situation as such should not result in the application having to take significant actions other than logging the failure and presenting an appropriate message to the user. There typically should be no long drawn recovery from a distributed transaction or the creation of a crash dump file in this case.

 

An exception on the other hand refers to a situation that can be significantly graver. Programmers in languages such as Java and C# especially would typically be drawn to the concept of an exception in those languages. However, it is important to not blindly associate all Java exceptions with exceptions as defined here. In the current context, we define exceptions more in its literal form – as an unexpected situation or one that is not expected to occur too often within the normal flow of the application. Hence, such exception can often be catastrophic and fatal with regards to the application. In comparison to errors as defined above, these are usually more critical and often will involve significant recovery or handling techniques. One such example of an exception would be running out of disk space on a file or database server resulting in an update failing.

 

Designing for Failure

As odd as the title of this section might sound, it is critical from a security perspective to plan for failures early in the software development lifecycle of a system. Designing for failure does not show a lack of confidence in you and your team’s technical abilities but rather shows a level of maturity to recognize that many times failures are inevitable especially as newer and newer software failure modes are discovered and exploitable. Preparing for failure can therefore ensure that the system will react in the most robust manner when problems (even unexpected ones) occur. This trait at the end of the day is what users expect and respect.

 

At the very least, as software designers, architects and developers we would like to deal with the “expected” errors and exceptions. An excellent way of approaching the problem of enumerating failure conditions while designing your application is to augment your existing data flow diagrams or other design artifacts by tagging them with possible failure conditions at each process, module or data store. Another approach is to hierarchically walk through the functionality of your application as a user would progress through multiple areas of the system. In the example shown in Figure 2 we distinguish between Application Errors and System Errors. An application error would be a failure that is the direct result of an invalid operation performed by the user though the methods and members of the application. System errors on the other hand are typically the result of invalid operations that fail outside the scope of the application and perhaps are a consequence of an operating system error. Consider for instance, a situation where in a user attempts to access a file that is part of another user’s profile. If this authorization is based on operating system based authorization it is likely that the file access will result in an error such as an “access denied”.

Error Mapping 

Figure 2: Error Mapping

Similarly, we can extend the same concept to exceptions. System exceptions would include network failures, memory access violations or a non-responsive database server. Application exceptions could be the result of divide by zero conditions or an invalid cast.

 Exception Mapping

Figure 3: Exception Mapping

 

Once we have taken the time to map out the possible errors and exceptions that could occur, we can move ahead in considering how to fail securely. As a first step, we recommend that you first identify the minimal set of functionality that your application must provide to be useful. This is closely related to the notion of survivability – a concept pioneered by CERT[1]. The purpose of identifying the critical pieces of functionality is to understand how your system can react to failures of parts of the system. Consider, for instance, if a part of the system is compromised, do you continue or terminate the system as a whole? What is the impact in terms of how critical the system or parts of it are to you and your business? Does continuing have a detrimental effect on security? All of these questions lead to the a crucial security design decision – does the system fail open or fail closed when confronted with an error or exception. A system that fails open responds to a failure by reverting to an “open” state – this could be a state wherein authentication or authorization rules are ignored by the system and everyone is allowed full access. Alternatively it might be a situation where in the logging service has crashed and audit trails are no longer being maintained. A fail closed system on the other hand is one in which the application reverts to a “closed” state in response to a failure. This could potentially lead to a denial of service to legitimate users and hence any such decision must be carefully considered. An interesting example of this was a previous version of the Checkpoint Firewall-1 product[2]. When faced with a packet flood on a specific port, the firewall would respond by dropping all of it filtering capabilities and allowing all traffic into the internal network. The other approach would have been to shutdown all traffic from entering the network. Obviously there are advantages and disadvantages to both – in the former case, one can argue that by failing open the security of the system and the network as a whole has been significantly jeopardized. Conversely, the fail closed model in this case would have resulted in the systems being protected becoming completely unavailable and thus leading to inconvenience and perhaps even more significant consequences depending on the criticality of the system. A middle of the road solution could have been to shutdown just the offending port for instance while continuing to allow other traffic in as dictated by the rules of the firewall.

 

The concept of failing securely or failing closed essentially requires that the application do everything in its capacity to ensure that all errors and exceptions are dealt with appropriately. This leads to the idea of structured exception handling[3]. Structured exception handling essentially utilizes four major constructs available in a number of modern programming languages: try, catch, throw and finally blocks. Discussion of these primitives in detail is outside the scope of this article however it is worth spending a little bit of time examining the finally block.

 

The finally block is interesting since it contains code that would be executed irrespective of whether an exception is raised or not. This implies that this block is an excellent location for cleanup and other activities that might need to be performed in either control flow. From a security perspective, this is especially important in ensuring that the system is not left in an inconsistent or insecure / fail open state. For instance, consider the pseudo code snippet on the left below:

1     try
2     {
3         ElevatePrivilege();
4         ReadSecretFile();
5         LowerPrivilege();
6     }
7     catch(FileException fe)
8     {
9         ReportException();
10    }

1     try
2     {
3         ElevatePrivilege();
4         ReadSecretFile();
5     }
6     catch(FileException fe)
7     {
8         ReportFileException();
9     }
10    catch(Exception e)
11    {
12        ReportException();
13    }
14    finally()
15    {
16        LowerPrivilege();
17        AllDone();
18    }
19    return;

 

Most readers will immediately point out that if an exception were to be raised in the function ReadSecretFile control would eventually leave this function without ever dropping the elevated privileges. This situation can be easily fixed through the use of the finally block (to perform the drop privileges) as seen with the example on the right. It is important as architects and designers to mandate what actions must be performed within the finally blocks of the systems. Like many other aspects discussed in our series of articles, this is another key decision that cannot be left solely to the developers implementing the code. On a related note, it should also be defined at a fairly granular level what exceptions need to be caught in various parts of the application. This will ensure that developers do not merely catch the base exception class but are provided guidance on what to catch as well as how to interpret the detailed exception objects.

 

Another mechanism perhaps that can provide structured exception handling are the frameworks for software design engineering which embody some of the more basic error handling routines besides overall software design and architecture. Struts for instance[4] , is an elegant, extensible framework for creating enterprise-ready Java web applications. Perhaps most importantly in our context, is the Struts data validation module[5] which provides a rich out of the box solution that deals with the most common data patterns but is still extensible to allow end programmers to add their own validation checks. Similarly in the .NET world there exists a built in object – the ASP.NET validator controls[6].

 

Handling Failure

While the title of this section might sound like a chapter out of a self help book, we essentially will discuss some strategies to deal with errors and exceptions in the most secure way possible.

 

It is important to first distinguish between planned and unplanned failures. Planned failures are those that are anticipated and hence code to handle them has been included within the system. With regards to planned failures, development teams can employ a number of different strategies:

·         Catch and rethrow: This involves catching the exception and then rethrowing a new more detailed exception. Typically, most programming languages permit you to include the original exception as some kind of contained field within the object. This strategy is useful when the original catcher has no effective and efficient way to recover but can provide valuable information up the tree for upstream code to recover.

·         Handle and retry: As per this strategy, the original catcher does in fact have the capability to recover as part of the exception handling. That code then retries the operation through the new (or recovered) code path and upstream code never sees the exception.

·         Ignore: This option allows for the ability to completely ignore an exception i.e. get rid of the catch and finally blocks. In general a very good rule of thumb is that, if a specific function in the code is not capable of adding any real value to the exception information, then it is best served by ignoring the exception and letting upstream code deal with it appropriately. It must be noted however, that this rule holds true for all code except the top level functions or main entry points. This is to avoid the scenario wherein a nasty stack trace or detailed error information including source code fragments are displayed to the users. By avoiding unnecessary catch and finally blocks performance should also be enhanced as new stack frames don’t have to be created too often.

·         Pass through: This strategy in many ways is a variation of the first one above. Unlike the catch and rethrow mechanism however, in this case we always rethrow the same exception. The rationale behind doing something like this would be that the pass through exception handler could perform some other critical security functions that may potentially not be accessible to upstream code. An interesting example of this is when an audit trail needs to be maintained. In those cases code often will simply write to the log file before rethrowing the very same and unchanged exception. Like ignoring exceptions above, this strategy is only permissible if the source code in question is not a top level entry point.

 

In order to implement either of the first two strategies above, it is important that specific exceptions be caught rather than the base exception object. Failure to do this would essentially render either of those strategies as potentially useless since they will not be able to provide the value they otherwise would.

 

For each potential exception and error in the maps above, it is important to adopt a strategy above that accounts for the tier within which the code executes so that maximum benefit will be gained from as detailed an exception management object as possible while still avoiding any unauthorized information disclosure via error messages.

 

Unplanned failures are essentially those that were not anticipated and hence specific code does not exist which could cleanly deal with the condition at hand. Fortunately, again a number of the programming frameworks do provide some features to deal with unplanned failures. Catch-Alls are essentially catch statements that catch the base exception class. Heeding the advice presented above, one should only use such Catch-Alls in top-level entry point functions.

 

Frameworks such as ASP.NET[7] also provide the notion of a page level as well as an application level error handler. These are automatically fired whenever the necessary pre-conditions have been met. In ASP.NET both the application and page error handlers are pretty similar and allow the programmer to retrieve information about the last error on the server. When using JSP developers are encouraged to use the errorPage tag to create a page level error handler[8]. Similarly, it is also possible to redirect to custom error pages when faced with an error in both ASP.NET and Java. These however tend to be associated more with server errors rather than application errors. In ASP.NET this is achieved using the customErrors[9] tag in the web.config whereas in J2EE the equivalent would be the error-page tag in the web.xml configuration file[10]. Each of these also allows for disabling detailed error messages (i.e. those containing stack traces and code snippets) especially when the application is being accessed remotely.

 

As a design paradigm for client server applications it is important to consider how a client might continue to operate in “disconnected” mode while the server is unavailable. This will allow clients to continue operation almost uninterrupted with the guarantee that all their changes will be synchronized successfully to the server when it is back up. Designers and developers must carefully consider the security implications of such behavior as well especially with regards to ordering of operations and the impact independently performed functions might have on the integrity of the system as a whole.

 

Two actions that must be performed across the board whether failures were planned or unplanned are logging and notification and then recovery or cleanup. Logging (which will be covered in our next column) is necessary both from an audit trail perspective but also to figure out what went wrong and who was responsible for the failure, how it may have been prevented and how can it be prevented in the future in the form of debugging information. For web server applications these logs can be easily collected. However, for applications that are installed entirely within the customer’s environment, some functionality which pushes the logs in question to the server must be considered. This can often be the only reliable way for reproducing client side bugs and determining the true sequence of events rather than solely relying on the user’s version of “what really happened”.

 

Notifications are necessary to ensure that administrators and operators can stay on top of system health. Perhaps the best way to support this is to integrate the application’s notification system into industry standard third party solutions such as IBM Tivoli, HP OpenView or Microsoft Operations Manager (MOM). Notification rules can vary based on the criticality of the failure. For instance, failure of the authentication module entirely due to a denial of service attack is perhaps more in need of attention than a specific user whose account is being brute forced. It is important to again leverage the maps above to define notification behavior at each level.

 

Finally, there is the matter of performing cleanup. Cleanup often involves reclaiming of resources, rolling back of transactions or some combination of these two among others. The best way to think about the security of such a cleanup is to maintain a state diagram which details the before and after states assuming the operation were successful. These state diagrams must contain the value of each and every variable, flag, setting,  environmental state and especially any security sensitive operation(s) that will be performed as part of this review. Then when tasked with cleaning up it is possible to re-enter the pre-state. Often times, a lot of this can be automated via database and programming transaction mechanisms. It is therefore important as far as possible to perform security sensitive operations as part of a larger distributed transaction that can be easily rolled back on failure. As part of the cleanup operation, developers might also have to make the decision about whether to continue operating, degrade functionality based on the notion of survivability or abandon all functionality entirely. The discussion on failing open vs. failing closed above are especially relevant here and as one would expect such decisions should not be left to individual implementers but must be considered as part of the design and be included in the functional specification that would be implemented.

 

An Error Message is Worth a Thousand Attacks

It is often said that a picture is worth a thousand words. While not every error message will lead to a thousand attacks (perhaps not even a single one in many cases), they can serve to significantly increase the attack surface of your system. As mentioned above detailed error messages provide an attacker with information that can then be leveraged in a number of ways. For instance, information such as server banners, favicons as well as messages that include the name or type of the database server, allow an attacker to formulate the next steps in his / her attack methodology. Consider, an application that informs the user that it is using a Microsoft SQL Server database. The skilled attacker would then be able to research for potential vulnerabilities or idiosyncrasies of that specific platform such as commenting styles for queries or meta-tables that can help in mapping the schema of a database[11]. If the server is exposed directly to the Internet, he / she might also be able to perform a scan to check for missing patches or other vulnerabilities and then download the appropriate exploits. At the other end of the spectrum, a message can also help constrain the search space for the attacker. For instance, consider an attacker that is launching a brute force password attack – error messages that go something like “Incorrect password for user ‘bob’” and “User ‘joe’ does not exist within the system” can help significantly cut down the workload in what is essentially a time complex task – repeatedly trying username and password combinations. The first message tells the attacker that he / she has hit on a valid username and now can focus only on guessing the password whereas the latter informs him / her to try a different username entirely.

 

Essentially when designing your error messages there are a number of critical parameters to keep in mind. Firstly, what information does the application need to deal with or recover from the failure. This in turn leads to the question of whether any of that information must come from the user. If the user must be asked then, it is important to consider, what information will be shared with the user and how will they be asked for help. Each of these interactions with the user must be closely screened to detect any effect detrimental to security. Often times, the most prudent action might be to not ask the question at all – can an assumption be made based on user behavior or context?

 

With all error messages consider the end impact not just in the context of that specific instance of the message but in general with regards to creating user habits. For instance, most of us have seen the SSL warning shown in Figure 4. When faced with this warning, usability studies have shown that most people click Yes since they see the message more as a question of “Do you want to get whatever it is you were doing done?”. While they might be safe in this case however, in the long run we have just trained the user to click on Yes every time they see this or a similar warning. This implies that the next time they are on a public wireless network for instance and are being subjected to a Man-in-The-Middle (MiTM) attack[12][13] or even a phishing attack, they are going to become victims since they are now trained to ignore the message in Figure 4.

Error Messages 

Figure 4: SSL Certificate Warning

So how do you balance the need to be meaningful and useful to the legitimate user while still not providing any information that could be used against the application? All too often, applications err to far on the side of caution with messages such as “An illegal operation was performed.”. Messages such as this only serve to antagonize the user and while it is definitely true that such a message has no direct security effect, they would potentially result in higher call volumes to technical support lines which in itself can represent a threat to the business. The first step in designing an error message is to determine the level of detail needed to convey the actions a user would have to take to deal with the system failure. Very rarely if ever would artifacts such as SQL query fragments or source code snippets provide a regular user with any value in making their decision. In order to deal with the more technical users however, that do want further details, it is advisable to consider a progressive disclosure of information – for instance a button to click for more information. Even in that case however, all messages displayed to the user must be scrubbed for any information that could aid an attacker.

 

It is also important to consider what the user interpreted the message to mean and what the message really meant. Along similar lines be wary about default choices from the user. For instance, as application developers we can do a lot at hinting at the “right” choice for the user. This can be done through verbiage within the error message, a secondary warning when the user is about to perform an unsafe operation (the “are you sure” dialog) as well as simple user interface elements such as which is the default button on the form. This is especially significant when the error message gives the user the option of remembering his / her response. As far as possible this option should not be provided for cases where saving the user’s response will result in a decrease in the security of the user or the system as whole. If such messages must provide a mechanism to disable them in the future, then consider making that mechanism harder to reach than merely checking a checkbox. Ensure only privileged users can perform it for instance.

 

One major problem the authors have seen having reviewed many different applications is that most often developers right messages for themselves i.e. they almost implicitly assume that the person reading the message at the other end is themselves. This leads to what may be referred to as the geek factor – messages that contain so much deep technical detail, that serves no real value to the user but also ends up confusing them to a point where they are not sure what choice to make and hence could wind up choosing the less optimal option. This problem also extends to the task of code reviews as well since most code reviews are also performed by technophiles and the messages make complete sense to them.

 

The recommended approach to deal with this problem is to make sure that all error messages are documented in specifications and have input from technical writers and communication experts. One strategy might be to deal with this problem must like dealing with internationalization i.e. place all error messages into resource libraries that are dynamically loaded into the application. Then, a technical writer can review the messages within the resource library and modify them appropriately to provide the level of detail that will serve the average user best.

 

The final step in creating the error handling strategy of your application must be user acceptance testing (UAT) of the failure conditions and specially the messages. During such UATs it is important to consider four critical reactions from the user:

 

·         Initial Reaction

o   Did the user understand the context of the error message?

o   Was the user alarmed excessively?

o   Was the user inclined to ignore the message and just accept the default choice?

·         Comprehension

o   Did the user bother to read and understand the message text?

o   Did they have any follow up questions that were not clear from the message itself?

o   Did they have t contact technical support of refer to the help files?

o   Did they bother to check any supplementary information that might have been provided?

o   Did they understand the supplementary information?

o   Was any technical jargon used that was not immediately clear to the user?

·         Confidence

o   Did the user understand what was expected of them and how they should react?

o   Did the user understand with a high degree of confidence – what the “right” way forward was?

o   Did the user opt to deviate from the default selection determined by the system?

·         Lasting Impressions

o   Did the user actually end up making the “right” choice under the circumstances?

o   If not did they realize quickly that they had made an incorrect choice?

o   Did the application provide a way for the user to back track and change their decision?

o   Was the user confident enough in his / her choice that they would make the same choice when faced with the same situation in the future?

o   If they made a different choice than they normally would, what prompted that change?

o   Did the user choose to have the system remember their decision if provided that option?

o   If not did they do this because of a true understanding of the risk that posed?

o   If they did chose to save their choice, was it out of a feeling of contempt at the message and to avoid being interrupted in the future?

 

 Performing such user acceptance testing can help fine tune error messages and failure handling strategies and ensure that the average user would be comfortable dealing with such conditions and were likely to make the correct choice both from a security perspective and otherwise. In many ways such testing must be the culmination of a focused effort to ensure that errors and exceptions are handled most appropriately and securely and that the user is confident in the system even in light of failures.

 

Conclusion

It is easy to think about error handling & exception management and feel it has little to nothing to do with security. But then when one considers the impact of failing insecurely or failing open or detailed error messages that help an attacker craft his / her attack, one quickly comes to the realization that not only is this category of vulnerabilities critical to the security of the system but can often be a pre-cursor to significant security failures which built on the verbosity of the prior error messages.

 

Summary

Error handling & exception management issues are best prevented through the careful consideration of them as early in the lifecycle. This leads to the strategy of designing your application for failure i.e. making sure that in the event of a failure, the application and indeed the system at large are protected and react in the most secure yet business criticality observant way possible.  A number of programming frameworks already provide some of this functionality and should be leveraged and then augmented with activities that are performed after the system has detected the exception. Finally, it is important to carefully consider the tone and content of error messages displayed to the user. This is to ensure that those messages cannot be used to launch a more serious attack such as SQL injection. Error messages must be considered a critical part of the user interface and hence must be usability tested. Finally, a concerted effort must be made to ensure that all errors and exceptions are logged and audited periodically to detect and potentially prevent any malicious activity that appears in the audit trail.



 

About This Page

Title: Error Handling & Exception Management
Moderated By:
Created: 02-26-2007, 1:05
Modified: 02-26-2007, 1:38
Last Modified By: rudolph
Revision Number: 4
Powered by Community Server (Personal Edition), by Telligent Systems