During RSA Conference 2020 KernelCare CEO, Igor Seletskiy, had a chance to share the best practices of enabling compliance with faster patch management. In this article you will find the takeaways from the speech.
For enterprises, being compliant is not a trend, it's a necessity. Increasingly, clients won't do business with you unless you are compliant. Non-compliance can cause severe losses, in revenue, and across the whole business.
Compliance best-practices affect all parts of your company's business, from people and HR departments, all the way to the very core of your IT infrastructure. There are dozens of standards to choose from.
If you look at SOC2 or the Sarbanes-Oxley Act, for example, they say that "software vulnerabilities must be mitigated within 30 days". If you've ever managed a server fleet, you'll know it's easier said than done.
So today I want to explain how you can stay compliant by streamlining your patch management, and why patch management is crucial for security and compliance.
Why is patch management important?
First, why is patch management so important?
We all know that outdated or vulnerable software is one of the main causes of security breaches. Here are a few examples to remind us of this fact:
- The Equifax Data Breach in 2017, in which an unpatched Apache Struts vulnerability led to hackers stealing the data of nearly 150 million customers. The vulnerability means that a hacker can execute commands under the privileges of the Web server. It is full remote command execution and has been actively exploited in the wild since the initial disclosure. Equifax has lost close to $700 million, and they're still counting the cost of their ... let's be kind and say "oversight".
- The Marriott Data Breach in 2018, a hack which leaked between 300 and 500 million customer records from a hotel reservation system. Marriott got an alert from an internal security tool on September 8 of that year, warning them that someone was trying to access their Starwood guest reservation database in the United States. When they investigated, they found that an unauthorized party had copied stuff from the database, encrypted it and downloaded it. The reason for this data breach was unpatched software on a system previously acquired by Marriott (the Starwood network) allowing hackers to gain access to the customer database.
- A Fortune-500 power company Data Breach. An unnamed power company violated mandatory security rules by ignoring physical control system software patches for more than a year, putting the power supply for millions of homes at risk. The root cause was bad configuration that stopped the system automatically updating itself. This company had no backups because the system was being decommissioned. They didn’t want to do manual patching because it would have meant a reboot. Even though the Company self-reported the incident to the Western Electricity Coordinating Council and paid a penalty, they remained non-compliant through to July 2017, when the physical access control systems at issue were replaced and a new security patch management program was implemented.
This is just a few of the many cases that help explain the importance of patch management. The problem is that only the big names and big data breaches make the headlines. To see what I mean, take a look at privacyrights.org. They have a spreadsheet with over 9000 cases, and that's only for the US. Imagine all the other unreported cases in the rest of the world.
So, we should all know that there is a need to patch vulnerable software in 30 days after a vulnerability becomes public. An essential IT security practice is to scan for vulnerabilities and then patch them, typically via a patch management tool.
Must-use: Vulnerability scanners and Patch Management tools
Perhaps you've heard of some of these vulnerability scanners.
Such tools make vulnerability management much easier. They can even find and patch vulnerabilities for you, which really helps security and operations staff. Scanners detect and classify system weaknesses, prioritize fixes, and can sometimes help predict the effectiveness of countermeasures.
Typically, the target attack surface is looked up in a database of information about known security holes, including such details as anomalies in packet construction, and paths to exploitable programs or scripts.
Automated patch management tools like these take the pressure off system administrators. If you speak to any, they'll tell you how great it is to have something that automatically downloads and installs patches across different devices. Linux server patches are deployed more quickly and more reliably.
Patching Management - Some best practices
Enterprises typically think about using automated patch management tools when their Linux server fleet grows above 40 or 50. At that number, IT departments get stretched so thin with manual patching that they end up deploying only those patches with urgent or high priorities.
Linux patching involves more than a simple update of the source code of a kernel. These patches include updates that ensure system security, minimize errors, and introduce the latest features. But most importantly, proper patch management can greatly improve an enterprise’s security, by addressing the vulnerabilities in its software and operating systems.
But patching Linux servers can be complex. Open source software development is less regimented, so updates have more unpredictable release cadences. Open source software also runs at every point in the software stack, so changes must be carefully analyzed.
While most updates are as easy as running a Linux command, it's not just about the technical aspects. There are also organizational issues. You have to keep track of what needs to be updated, and understand the impact of the updates. Updates must be tested and staged, and the change has to be communicated across departments.
According to a Ponemon Institute research, annually, organizations are spending 18,000 hours at a cost of $1.1m on patching activities on average. Despite this effort, many have seen a reduction in the time it takes for an exploit to appear in the wild for a given patched vulnerability. Without implementing strong patch management best practices, Enterprise will waste time and risk leaving the door open to attack.
Also in that Ponemon study was the fact that almost two thirds of cyberattack victims said that applying a patch would have prevented an attack. And a third said they knew about the vulnerability before the attack but did nothing. Scary.
So clearly, a solid patch management process is an essential part of a mature security framework. The faster you can apply the right patch to the right application, the more secure your environment will be.
While patch management is a challenge, it’s not impossible. There are a number of patch management best practices that enterprises use to their advantage:
- First, they do a comprehensive inventory of all software and hardware within the environment. Once a company has a clear picture of what they have, they are comparing the known vulnerabilities to the inventory to quickly discover which patches matter.
- I already mentioned that companies waste an average of 18,000 hours per year by patching the wrong systems. There's a way to avoid that. While all systems should be patched, it makes sense to assign risk levels to each item in the inventory. For example, patching of some components may be a planned activity while applying security updates against major CVE to the Linux kernel is critical to the enterprise. A golden rule is - the more exposed to attack an item is, the faster it should be patched.
- Enterprises are used to consolidate software versions (and software itself). The more versions of a piece of software the organization uses, the higher is the risk of exposure. It also creates large amounts of administrative overhead. Periodically review all software in use and its purpose, find multiple pieces of software performing the same function, choose one and get rid of the rest.
- Another common practice in the process of patch management is keeping up with vendor patch announcements and create a process to ensure each patch can be added to the patch schedule.
- Sometimes applying a patch requires more time - when the changes need to be made to make the patch work. In such cases, companies mitigate the risk to the extent possible. It is important to reduce the impact and probability of an exploit until the patch can be applied safely.
- No matter how thrilled you are to get your hands on a new patch and apply it to all the prod servers at once - test it first. Because every environment is unique - even a simple patch can cause issues or bring down your servers. Usually, companies take a small subset of their systems and apply the patch to them to make sure there are no major problems.
Patching Linux Kernels
99% types of patching doesn't require system reboot. But when it comes to Linux Kernel patching, 99% of organisations apply patches the same way: by rebooting their servers. Because rebooting a server fleet is a headache, people put if off for as long as they can. Which means patches aren’t applied as early as possible. This gap between patch issue and patch application means risk, and also malpractice.
Additionally, most enterprise companies can't go offline, making it a challenge keeping system software up to date. Enterprises deal with it by developing maintenance windows or reboot cycles and following them rigidly. These practices take a lot of time to plan, and a lot of resources to implement, and they still end up with a system reboot and the start of another reboot cycle. Here's a typical patch process.
- First, you need to know what to patch. That means having an accurate system inventory covering all services and platforms across the enterprise.
- Then you need to figure out the impact, and have some sort of change management process going on.
- You have to test, and it has to be repeatable and auditable.
- Next, the joy of negotiating maintenance windows with dozens of stakeholders, all of them eager to avoid any hit on their SLAs and uptime records.
- Then you do the update. Often, that's the easiest part, except for the engineers who have to stay up late and work weekends.
- Finally, you plan another reboot cycle. And so it goes on.
But why is this process so involved? What's the real reason behind it? I'll tell you. It's because:
- Reboots are hard to manage.
- Reboots require downtime scheduling.
- Reboots make you nervous - What if the system doesn't come back?
- The more servers you have, the more problematic reboots are.
Many companies had to work with unpatched software for more than 30 days!
Many cases show that even a day with an unprotected system may end in a data breach
and billions of dollars of losses.
KernelCare customers have fully-patched servers that have been running non-stop for 6+ years, and it supports all major Linux distributions. It works in the cloud or on-prem, behind firewalls or air-gapped environments. There is out-of-the-box support for vulnerability scanners and patch management solutions, and for custom kernel patching.
Here is how it works:
KernelCare has helped companies with more than 300 000 servers to achieve their SOC2 compliance status. Here's a few of them.
- Efinity Insurance - An online Insurer with customers in 14 countries and tons of personal data. They make their money by selling a real-time insurance quoting service. It's implemented in Java and runs on CentOS in AWS. For various reasons, they can't cluster this app, so they turned to us to keep their servers compliant, that is, patched, without losing service availability. Read more in this case study.
- One of our customers is a famous online digital payments firm, a fortune-500 company with hundreds of millions of customers, custom Linux kernels and strictly firewalled systems. I'm sure you'd know them if I was allowed to reveal the name, but we've signed an NDA with them, so, sorry. All I can say is they're also using KernelCare to keep hundreds of different kernels security patched without rebooting, something essential for continuous electronic payment services. Read more in this case study.
- A well-known enterprise video conferencing solution. Their compliance analyst reached out to us first. They had a SOC2 audit coming up and reboot cycles were preventing them from satisfying the compliance criteria. They spent 2 months trying out KernelCare and then went into a formal PoC. That ran for only 7 days before they moved KernelCare into production. We gave them some help integrating with Ansible to push KernelCare out to 7,000+ servers. We also got some great feedback and worked with them regarding their integration with the Qualys vulnerability scanner which needed to see completely clean scans.
KernelCare enhances compliance on over 300 thousand servers of various companies where the service availability and data protection are the most crucial parts of the business: financial and insurance services, video conferencing solution providers, companies protecting domestic abuse victims, hosting companies, and public service providers.
We provide flexible pricing for different server fleets, a free trial for everyone, and bespoke Proof of Concepts for clients with custom infrastructures.