Logging & Auditing

Abstract

Often ignored but then missed in hindsight this category of issues is one of the reasons being pro-active can make an enormous difference to software security. Logging critical user activity can help track user actions, provide non-repudiation, early warning of attacks and information that helps recover from these attacks, prevent future ones and last but not least debug software problems in general. However, all too often logging gets treated as optional and not necessary. In this column we will cover the essentials of logging from a security perspective. We will present the tools, techniques, strategies and processes involved in efficiently and effectively logging data which will prove useful at some later point. Finally, logs are only as useful as they are monitored and used. We will therefore cover the need for auditing and the processes around it.

 

Introduction

This is the last in the series of articles that covers the Security Frame – a framework to evaluate the security of your applications as well as to build security into those very same applications. In previous editions of this column we have covered everything from Configuration Management to Data Protection in Storage & Transit, from Authentication and Authorization to User & Session Management and then Data Validation and finally in January of this year Error Handling & Exception Management. As we wrap up this series we discuss last but my no means the least – Logging & Auditing. This last category is in some way an odd ball since most of its value is often felt when things go wrong, When things are going ok, no one really cares or sees the value of Logging & Auditing. So much so that it is often a challenge to convince development teams of the value in investing a robust logging and auditing strategy right from initial design. However, when things do go wrong (and they inevitably do) the lack of logs can result in significant difficulties in dealing with the failure whether from a security perspective or otherwise.

 

Logging

Why Log?

In our experience having dealt with countless developers and development teams, most often this is not a question that causes people to think twice. It is safe to say in most cases, development teams see the value in engaging in logging. The problem however, arises in the implementation. A few things can happen – logging is not enforced as a requirement so is seen as a nice to have as opposed to a must have. This attitude also comes into play when users complain about performance – usually the first thing that gets dropped off is the logging capability. Part of the problem, is as developers and development teams we are prone to thinking that “nothing could possibly go wrong and hence we would rarely if ever need to perform any logging, hence if it is going to save us a few milliseconds  why not just get rid of the logging entirely. It’s a capability that’s hardly ever used anyways”.

 

This next paragraph is therefore intended to show the reader why logging and auditing are a critical component of a well designed and secure system. So why is logging so important then – in no particular order:

-          Survivability and securability: You must consider the possibility of failure seriously. This means considering then what happens if your application fails? How do you maintain at least the critical services as you work to recover from the failure. Logs can be the key to recovery and in determining what went wrong and thus in allowing you to build strategies to prevent such a failure in the future.

-          Bug fixing: This one goes beyond the realm of security. Often the nastiest of bugs – timing issues or race conditions for instance – will show up only in production environments as the application is exercised in a real world, possibly multi user environment. Without logs, tracking down and debugging these one off problems can be hard if not impossible since they are hard to reproduce. Hence, your best bet as a developer is to have detailed logs that can then be used to reconstruct the problem. This can also be useful for bugs in general that are discovered only once the software is deployed or in production on customer sites where installing a debugger and attaching to the offending process might not be possible. All in all logs can be an effective debugging tool especially once an application is live.

-          Health and performance monitoring: A common strategy used by large and long running enterprise applications especially is to use the logs as a mechanism to show activity and progress, This is especially true for applications that are in more of a batch processing vs. an interactive mode with the user. Log monitoring software can then be used to detect “heart beats” in an automated manner from the logs and report progress and activity in an enterprise application dashboard for instance.

-          Compliance[1]: While this is an often misused reason, the fact is a number of recent regulations such as the Gramm-Leach-Bliley Act (GLBA), Payment Card Industry (PCI) Standard and the California SB-1386 bill require some level of audit trails. Companies not providing such audit trails throughout their IT infrastructure can be found in violation and be subjected to fines and other punishment. Each of these has specific audit trail requirements and while we will not delve into the specifics of these in this article, the best practices contained herein will ensure compliance with the regulations.

-          Accountability and Non-repudiation: This in turn leads to the more general need for audit trails – to provide for accountability and non-repudiation. This is intended to help associate specific actions with the users that performed or triggered those actions. Without this the security value of the logs is questionable and in some cases the financial impact of the lack of accountability and non-repudiation can be enormous. Perhaps the best example of this is an online banking or stock trading application.

-          Forensic Value: Well maintained and handled audit trails, can prove to be extremely valuable when prosecuting the perpetrators of intrusions within a company’s IT infrastructure. Here again it is not only necessary to maintain the logs but in order to retain their evidentiary value, they must be handled appropriately especially after an intrusion has taken place and with regards to aspects such as chain of custody. Even if there is no desire to prosecute, logs can be invaluable in investigating an incident to determine exactly what took place, how and when it occurred and what was taken so to speak. Such information can also be useful to prevent future attacks by helping isolate security holes that might have been exploited this time around. Especially when logs across different servers (application itself, web server, database server) and hardware (routers and firewalls) are correlated they can provide a detailed anatomy of the attack which can provide invaluable lessons in defending the same system in the future.

-          Psychological Value: Finally, often because of all the reasons mentioned above (especially the last two), an effective and efficient logging subsystem that cannot be easily compromised can act to discourage attackers who are concerned that their attack will be detected or that they will leave a trail which can be tracked back to them. This is especially true for insider attacks where such audit trails are likely to be far more valuable for the investigators and far more damning for the perpetrator.

Hopefully at this point, you the reader are convinced at the value of logs and audit trails and why they should not be the first feature that gets chopped off or turned off in quest for coming in under budget or schedule or post production to improve performance. While these might be legitimate business decisions, it is also important to understand the impact of turning off or disabling logging on the threat model of the application.

 

What Do I Log?

To answer this question it is best to think about two types of information, meta information that provides data for context and then event specific information which provides details that correspond to the event that caused the event to be logged itself. The meta information should tell you when the event took place, who performed the event, where the event was triggered from. With this in mind the meta information must include at a minimum:

-          Date and time of the event: Without time information a log can often be meaningless since it makes it harder to back trace and determine when an attack or compromise might have commenced. In order to be most effective with regards to this parameter it is best to make sure that some level of time synchronization exists between the different servers, applications and hardware that will perform logging. This will allow for end-to-end log analysis.

-          User / Originator Information: It is critical to store information about who triggered the event. Like the date and time information above, without this information the value of the logs from the perspective of being an audit trail are severely diminished since they can provide little to no accountability and non-repudiation can be achieved unless the action and event are tied back to a specific individual. Special care should be taken when the user running a server may not necessarily be the true user that is performing the action. This can happen for instance if the server impersonates a higher privileged user (e.g. LocalSystem). In such cases it is critical to log not only the current user ID but also the true user ID.

Additionally, there might also be the need if appropriate to store the IP address of the user. This is especially useful for intranet based applications. If applications are Internet accessible it is important to bear in mind that the IP address obtained may not be the true IP address especially when network mechanisms such as NAT (Network Address Translation) are in use.

-          Miscellaneous Information: Additionally to the above, based on the needs of specific applications, it might also be useful to log programmatic information such as the caller of the function performing the logging and the values of parameters passed to the function. It is also tremendously advantageous to have source code references to aid in debugging for instance the name of the source file and a line number. Most programming languages have macros or functions that can provide that information and thus it is accessible fairly easily. Finally, depending on whether the log file is shared across multiple applications and processes it might also be necessary to log the application name, process ID and potentially even the thread ID.

 

With regards to the actual events now, what are some of the critical security events that must be logged. It is best to view these events along the security which by this point you are probably very familiar with.

 

What to Log? 

 

What Should I NOT  Log?

A number of the events mentioned above can deal with sensitive information. Now that we have said that we should be logging the occurrence of these events, the danger is how much information do we log? The general rule of thumb is any information that is intended to be kept confidential should never be logged, certainly not in it’s clear text form but not even in their encrypted form. This includes all sensitive data such as passwords and private information (e.g. social security numbers or credit card information). Further, to control the size and overheads of logging, avoid logging entire database tables or record sets. If absolutely necessary development teams may consider logging queries, the size of the record set and whether the access was successful or denied. Similarly, it is best to simply log a reference to the code rather than logging actual source code chunks. A filename and line number should be all the information that is necessary to provide developers with an indication of where in the source code the event occurred.

 

Where Should I Log To?

There are quite a few options in terms of where the logs should be written. The basic requirement for a log location is that it be securable. This implies that it have adequate access control to prevent unauthorized tampering of the log file from outside the application for instance by directly editing the text file in a regular text editor. With this in mind the general recommendation is that the log files be placed on a different and dedicated log server possibly on a separate VLAN. From a security perspective the advantage of doing this is that even if the attacker successfully compromises the application server he / she would still need to get past another barrier to compromise the logs and / or delete traces of the malicious activity. All updates to the remote log server must then be performed over a secure channel to prevent tampering. The authors have even run into cases where the security demands are so high that logs are directly written to write-once media such as DVDs[2].

 

For many applications however, an elaborate setup such as the one described above may not be practical. If the threat model does not demand such an approach there are single machine alternatives as well that can be effectively secured. The operating system itself provides logging options. For instance, the NT Event Log on Microsoft Windows and syslog on Unix flavors can be used through well defined APIs. There are however a few caveats to bear in mind when using operating system based logging. Firstly, such logs are a shared resource across all the applications and operating system components. This implies that these are not meant to be used for extensive logging. Secondly and in many ways related, the size of such log files is often controlled by the operating system with little granular control. Hence, it is quite possible that when you do go into the logs to check on activity, the log entries for your application have been replaced by those from some other application. It does however remain an efficient logging mechanism especially when logging limited information of a highly critical nature or for instance logging information about an application’s logging subsystem itself.

 

Besides the operating system, most other parts of the infrastructure do have some level of logging capabilities. For instance , most hardware (routers and firewalls), web servers, application servers and database servers all have logging capabilities that are configurable to determine how much and what type of information will be logged. Database transaction logs in fact can be at a level of detail wherein they can be replayed in the case of database failures to repopulate the missing data into a fresh database. The problem with this level of logging in general however is that it lacks the application context. For instance, a web server log might be able to tell you the specific HTTP return code in response to a request. However, it will struggle to tell you specifically what business object was passed in, how it was processed and what the business response was. To obtain that level of detail developers are most often required to create their own log entries whether writing to one of the logs already described or to a custom log file reserved for just this application. Custom logs can go a long way in eliminating false positives and saving time by providing that level of detail.

 

How to Log?

In its most basic form logs are typically stored on the file system or in a database. Hence, developers have the option of using raw platform APIs for creating and writing to such logs. However, as one might expect this can be inefficient from both a performance and productivity standpoint. There are therefore a number of more elegant ways to log data from within an application.

 

Firstly most programming frameworks and operating systems provide some level of access to at least the operating system logs. On Unix systems the syslog API for instance or the System.Diagnostics.EventLog class in .NET provide access to the /var/log/messages and the NT Event Log respectively. As mentioned above such logging capabilities comes with a number of caveats but it never the less represents an easily accessible option. With .NET 2.0, an important new feature was health monitoring[3] which provides you with the rich capabilities of a built in logging subsystem but eliminates some of the traditional bottlenecks associated with such systems. Health monitoring is tremendously configurable allowing even for defining parameters such as thresholds when logging and alerting should start and when they should stop.

 

Third party libraries such as the log4* family[4] and the .NET Enterprise Library[5] provide another option. These libraries provide full function logging capabilities and are tremendously configurable. Further, the can easily integrate with different application types including thick clients, web applications, services and even controls. Especially in the case of the Apache Logging Project, logging API are available in a wide variety of languages from Java and .NET to C++ and PHP. These logging API also allow for a variety of log sinks including the more traditional file system, database or syslog to message queues and system management software solutions. One important feature that most third party logging solutions support is the notion of log levels. Log levels, typically Informational, Debug, Warning, Error or Fatal, can help control the level and volume of information that is logged. Production systems should by default only log Warnings and higher, unless a problem is being debugged in production.

 

Creating custom loggers used to be fairly common especially before rich third party libraries were available. Such libraries would essentially implement many of the features now available in the third party libraries. In most cases it is strongly discouraged for applications to build their own custom logging implementations. In the worst case, the team might chose to extend one of these loggers to for instance support a new log sink such as a custom mainframe based logging protocol.

 

Log Concerns

While logging is critical to the security of the application, there are a few considerations that must be kept in mind in order to securely create a logging implementation. It is therefore critical when creating the threat model for the system to consider the threats to the logging subsystem. The common threats against the logging component include:

  • Denial of Service – The logging subsystem should be implemented to account for disk space utilization. Since the logger will be typically saving all the logs to some persistent data store, it is critical to ensure that the logger is not responsible for exhausting the disk space on that data store which in turn could result in the logger or potentially the entire application crashing or a denial of service. The logging subsystem must implement disk space throttling to ensure disk space utilization never exceeds a fixed quota. This can be implemented in a number of different ways. For instance, consider the use of log rotation wherein logs are archived to a different location when the reach the maximum size or after a specific time period. A simpler scheme would be to just zero out the log and start again each time the quota is reached.
  • Log Wiping – All log files should have strong access controls defined on them so that they cannot be deleted by an attacker. As one would expect this is a typical step in the attacker’s methodology so as to wipe out any traces of his / her malicious activity. Storing the log files on a separate and hardened log server as mentioned above is therefore regarded as a good practice. However, even something like privilege separation where the application identity does not by default have write access to the log file can act as risk mitigation.
  • Log Bypass – Attackers will often try and bypass the logger again as part of being stealthy. Typical log bypass attacks attempt to flood the log so that the maximum log quota is reached. A number of systems will at that point stop logging entirely thus allowing all future actions to go through without ever being logged. This would allow an attacker to perform malicious activity without resulting in that activity showing up as an audit trail. This can also happen as described above if the log service itself crashes.  It is therefore useful to have a watchdog timer that will automatically restart the logging service as soon as it detects the logger is not available.
  • Log Tampering – Attackers may also attempt to tamper with the contents of the log files, either to wipe out their activities or to create false or malicious log entries. Having discussed the former above, the latter is typically performed by injecting malicious meta characters such as the carriage returns and line feeds or cross site scripting characters – especially when the log is viewable in a browser as part of a web application. This last threat can result in critical vulnerabilities especially because the log files are typically viewed by administrative or power users.

 

Besides the threats described above two other issues are critical – log overhead and log overload. Log overhead is unavoidable and the effort must be to minimize it. It represents the performance penalty paid as a result of using the logging subsystem. Many different strategies exist to optimize the logger. For instance it is important to bear in mind that overheads tend to be more significant for smaller operations. Hence it is important to avoid opening and closing file system handles or database connections for every log operation. Along similar lines it is vital to batch log operations together. A caching strategy can help implement such batching. As with any delayed disk write the risk always exists that the system might crash or be powered off before the cache has been flushed to disk. Hence the cache size and the buffering time must be carefully tuned for performance as well as security. Another common strategy is to also make use of threads for performing the heavy disk operations asynchronously while letting the application continue to make progress. Finally, log levels can also be used to control the volume of information that will be written to the persistent data stores.

 

Logging Best Practices

Before we move the discussion to auditing, we list a few miscellaneous best practices for logging. Some of these best practices are critical for the security of the logging subsystem or the application in general while others are defense in depth strategies that will enhance the security of the system as a whole.

  • Configuration Management – It is vital that the logger be tremendously configurable. This includes parameters such as log levels but also the location where the logs will be stored i.e. the log sinks. Such paths should never be assumed or hardcoded but should instead be entirely administrator configurable. This will allow for the scenario where a specific administrator might want to store the logs on a separate file server for instance.
  • Archiving – As mentioned above, it is important that sufficient safeguards  are in place to prevent a log from being filled by junk. It is therefore important to backup and archive the log files to some offsite storage such as tape backups. These backups must be retained as per the organization’s data retention policies but at least for a minimum of six months. Further, these log files should be adequately protected at their archive location as well. Consider the use of cryptographic techniques such as digital signatures and encryption. Archived log files can be especially useful in detecting how far back a compromise had commenced.
  • Digital Signatures – In applications that have a high security bar and where the audit trail is going to be heavily relied on, it is advisable to tamper proof the log files. This can be done by incremental digital signatures that are maintained in a different location. Essentially each time an entry is made to the log file, an out-of-band process running under a highly restricted account (not accessible to the application identity or its users or even the administrator account) computes a digital signature of the log in its current form and stores the same in its own database which again is not accessible (especially for Write operations) to other users.
  • Multiple Log Files – It is often most productive to have multiple log files for larger applications. One common technique is for each subsystem to have its log file. However, this can often be handled by saving a parameter with each log entry that identifies the subsystem responsible for logging the data. A more effective strategy in practice is to use at least three log files, one for audit trail related logging, one for exceptional conditions and another for general logging. This allows a developer or incident response investigator attempting to debug a problem to view the error or exception in isolation before attempting to gather more context by using the general log.
  • Logging at Different Layers – It is best to log at every layer of the application that has the capability. For instance, consider logging at the network boundary, the DMZ, the web server, the application server, the database server, file servers and so on. One important caveat to bear in mind when performing such distributed logging is to make sure that clock skews between each of these devices / servers are known. This time synchronization goes a long way in helping teams to follow the control and data flows and examine the so called anatomy of the attack as it traversed the different parts of the application infrastructure and environment.
  • Easy to Use API – Finally, if you do build a custom logger, one thing you must ensure is that it is easy to use. If this is not the case, it will not be used. Ideally, the API should simply take the core log message from the user and be able to infer and insert all of the other data elements such as metadata including the date and time or source code references.

 

Auditing

One of the most important uses of log files from a security perspective is in forming the audit trail. An audit trail represents a record of a user’s activity as he / she uses the system. Consider the scenario where a user logs into his / her online banking account and then transfers $100 from one account to another. An audit trail must be designed in this case to make it hard for that user to deny he / she performed the transaction after the fact. This is just a simple example and in reality there are many other events that should and would be logged in this case. In fact in most systems we would like to get in more expansive and maintain an audit trail of when the system is restarted or users are added and deleted, essentially all of the events mentioned in the section What to Log above? Essentially an audit trail is intended to provide for accountability, non-repudiation and both of these as mentioned above are valuable among other things for their evidentiary value. Besides this however audit trails are also useful in identifying which parts of your system are most frequently used for instance or wherein the bottlenecks lie. Metrics can be gathered from the production system, then analyzed and used to optimize the performance by tuning system parameters such as cache sizes and timeouts. For instance, one common argument against short and secure session timeouts as described in our article on User & Session Management , is that most users will complain about having to login again. Well an audit trail can be a good source of empirical data which shows if this is indeed the case. For instance, with the current setting for session timeouts do most users timeout or do they explicitly logout. If the latter is true perhaps the session timeouts can be tightened.

 

Audit trails are also often required as part of a compliance requirement. Two examples that come to mind are the Gramm-Leach-Bliley Act and California’s State Bill 1386[6]. Having a strong audit trail is considered part of due diligence in maintaining the security of the assets the application is maintaining in the first place. This can often save the organization from large fines and audits in the event that they do get compromised. Security mechanisms in place such an audit trail can be used to prove that the organization did everything reasonable in protecting itself and its customers.

 

As mentioned above, however an audit trail is only as valuable as it is reviewed periodically. As an organization or a team it is therefore critical to define roles and responsibilities and a workflow for the various types of events and especially the security significant ones. Additionally, thresholds should be defined and tuned over time as the team learns more about the system in production. For instance, consider repeated failed logins within a short time span, this is most likely a brute force attack at work and the operations staff might want to take remedial actions such as investigating where the attack is originating from or perhaps warning users whose accounts seem like they might have been compromised. Obviously such a system would need to account for the fact that many users might type in the wrong password periodically and therefore the thresholds must account for such behavior to avoid expensive false positives. The health monitoring feature introduced in .NET 2.0[7] can be extremely effective in helping in defining such thresholds as well as in extending the basic events available by default to support application specific events.

 

Log monitoring as one would expect can be done in two ways: manual or automated. Manual approaches have the advantage of most often being highly accurate and focused. They however tend to be expensive due to the pure cost of man-hours and the fact that most real world applications will be generating thousands of log entries every hour and hence the volume of data to be sifted through can often become inundating and the staff can end up with the proverbial “finding a needle in a haystack” task. Techniques such as using multiple log files and log levels, both of which were discussed above, can be used to make this process more efficient. However, like most human driven tasks there is an inherent cost associated with it. Automated analysis is becoming increasingly more common and the tools in the space are maturing. The major area of research for the tool makers centers around elimination of false positives and false negatives. False positives occur when an event or alarm is triggered, however in reality it turns out to be just an error in the log monitoring software’s heuristics. False positives are like the boy who cried wolf. They train the recipient of the alarms to ignore these and take them for granted until finally a true alarm does occur and the mis-trained operations person will just ignore it. False negatives on the other hand can in some ways be even more dangerous since they prevent the organization from knowing when true events have taken place and let attacks go unnoticed, thus defeating one of the very purposes the audit trail and log monitoring was being performed in the first place. However automated log analysis does have its advantages. For instance, most of the commercial systems available today have the ability to integrate with systems management software and can thus provide features such as call out trees wherein an operational person can be called or paged in an automated manner when some threshold is reached. Further, such software also have the ability to define rich workflow scenarios that can take into account average response times vs. expected response times, staff being on vacation or otherwise inaccessible and escalation paths.

 

In most cases the automated systems described above tend to be real time and use one of two approaches. They attempt to detect attacks and other significant events through the use of signatures or by looking for patterns and anti-patterns (anomaly detection). Signature based detection is obviously only as powerful as the signatures being updated and available from the vendor. Further, implicitly it relies on the vendor’s capabilities in turning around signatures quickly and effectively to minimize both false positives and negatives. For instance, until a few years back it was trivial to bypass a popular intrusion detection system by using ‘ or 2 > 1;-- as opposed to the canonical ‘ 1=1;--. This represents the inherent weakness of signature based systems. Anomaly systems on the other hand typically require a little more hand-holding especially during the initial deployment. The aim here is to train the software to identify regular usage patterns and therefore to identify deviation from such patterns. When such deviation occurs the monitoring system can then trigger an alarm that warns the operational staff about a potential problem. Obviously initially the false positive rates on such systems tend to be fairly high, however once in full production mode, anomaly detection systems can be quite effective.

 

In practice a commonly used third option is to use a semi-automated approach. Typically this involves designating a specific individual with performing log analysis but performing him / her with tools such as log parsers and analyzers that can convert the raw log into a form that is more easily humanly readable and that allows for the use of post processing techniques such as trend analysis. Another approach that falls in this category that requires some effort from the development team is to build a custom, light-weight “intrusion detection system” for the application. The authors created such as proof of concept called Validator.NET[8]. Such subsystems tend to be more effective since they are far more intertwined with the context and business logic of the underlying application than an external system or piece of software can be. This approach is much less likely to result in false positives or negatives. In fact development teams and organizations can extend the Validator.NET project with minimal effort and primarily in a declarative and configuration driven fashion. Thus implementing such a sub-system for individual applications can be done quickly and efficiently.

 

Conclusion

Logging and auditing are another category of issues that you don’t miss until you need them in hindsight. Unfortunately they are also a category of issues that are hard to introduce in hindsight and thus must be considered from day 1 while thinking about system requirements and design. In our experience, while most applications do perform some basic logging, they often never consider the audit trail aspects of it and rarely if ever engage in any kind of log monitoring rendering the logs to be purely a debugging tool rather than as an aid that can help in thwarting attacks and preventing future ones.

 

Summary

In the final of our articles on the Security Frame we have covered the important category of logging and auditing. It is important to consider the key benefits of logging followed by auditing bring to the table. Under this wide umbrella it is also important to bear in mind what events must be logged from a security perspective and certainly what data elements should never be logged. Further, as you design and build out your applications consider the various options available for performing these critical security functions both with regards to logging frameworks and log sinks as well as with auditing strategies and tools. Finally, there is also a lot of knowledge that has been gained over the years which formulates best practices around tuning and optimizing logging subsystems – both to improve performance and decrease overheads but also to reduce and eliminate false positives and false negatives.



[1] http://msdn2.microsoft.com/en-us/architecture/aa480484.aspx

[2] An old story in the security community goes that at one of the 3 letter agencies in the Washington DC area there is even a syslog daemon (housed in a huge dark room apparently) that directly outputs its log to print! Urban legend or super security no one will perhaps ever know but it does reinforce the concept of thinking about the threat model.

[3] http://msdn2.microsoft.com/en-us/library/ms998306.aspx

[4] http://logging.apache.org

[5] http://www.codeplex.com/entlib

[6] http://msdn2.microsoft.com/en-us/architecture/aa480484.aspx

[7] http://msdn2.microsoft.com/en-us/library/ms998306.aspx

[8] http://www.foundstone.com/resources/proddesc/validator.htm

Powered by Community Server (Personal Edition), by Telligent Systems