Linux Kernel CVE Data Analysis - Part 1 - Importing into CouchDB

Published: Jan 30, 2020 5:47:00 PM / Last update: Jun 5, 2020 / by Paul Jacobs

part1

Which is the best Linux kernel?

Linux kernel developers tell us that the ‘best’ Linux kernel to use is the one that comes with whatever distribution we’re using. Or the latest stable version. Or the most recent long-term support (LTS) version. Or whatever one we want, so long as it’s maintained.

Choice is great, but I’d rather have a single answer; I just want the best. The trouble is, for some people, best means fastest. For others, the best is the one with the latest features, or a specific feature. For me, the best Linux kernel is the safest one.

Which is the ‘safest’ Linux kernel?

A crude way to consider the safety of a piece of software is to see how many security issues appear in a specific version of it. (I know it’s a shaky premise, but let’s run with it, if only as a way to learn more about CVEs and how to interpret them.) Applying this tactic to the Linux kernel means examining vulnerability reports.

That’s what I’ll be doing in this series of three articles. I’ll be analyzing Linux kernel CVEs (reported vulnerabilities) to see if there are any trends that can help answer the question, Which is the best (i.e. safest) Linux kernel to use.

But wait—hasn’t something like this been done already?

Kind of.

  • CVE Details: A CVE search engine. You can find CVEs by severity, product, vendor, etc.
  • Linux Kernel CVEs: Also a CVE search engine, listing CVE IDs for specific Linux kernel versions.
  • Alexander Leonov: Nice article on the anatomy of a CVE data file, but with a focus on CPE data.
  • Tenable.sc: Commercial product with a CVE analysis dashboard.

So, why do this?

Because either those resources and projects don’t answer my question, or they cost money.

And besides, I want to learn what a CVE is, what they can tell me about the differences (if any) between Linux kernel versions, and what the limitations of this approach is.

The process and tools

It’s pretty simple; get the data, analyze it, chart the results.

The tools are free and run on many platforms, but, for the sake of illustration, I’ll be working on Ubuntu 18.04 LTS, and will assume you’re not fazed by the sight of a naked terminal.

Just so you know what you’re in for, here are the practical steps.

  • A. Install Apache CouchDB database locally. (This will be a single-node set-up—I want to explore the data, not get bogged down with administration.)
  • B. Download NVD data (as JSON files) and import all CVE records, not just those for the Linux kernel. (This way it’s easier to select relevant data at query time than it is to parse and extract records from the JSON files.)
  • C. Use Mango to query the data.

First, here’s a bit of background on what a CVE is, and what a CVE record contains.

What is a CVE?

CVE stands for Common Vulnerability and Exposures.

For our purposes, a CVE is a code that identifies a software vulnerability. It’s in the form CVE-YYYY-N, where YYYY is the year the ID was assigned or made public, and N is a sequence number of arbitrary length. An organization called Mitre coordinates the CVE list.

Clearly, a CVE isn’t defined by its ID, but by its details. A number or organizations look after these details, but the one I’ll use is the most well-known, the National Vulnerability Database (NVD), managed by the National Institute of Standards and Technology (NIST). Anyone can download their CVE data for free.

NIST have some simple bar charts that show how the severity of vulnerabilities varies by year. Here’s an example.

how the severity of vulnerabilities varies by year chart

This is a chart of all software CVEs (not just Linux ones), from 2001 to 2019. I want something similar, but only for Linux vulnerabilities, and broken down by kernel version, too.

What’s in a CVE?

Before I load and query the JSON-format vulnerability files, let me show you what’s in them. (There is a file specification schema, but it has a lot of detail we don’t need to know about.) Each NVD CVE file contains a year’s worth of CVEs (January 1 to December 31). Here’s the basic structure of a CVE entry.

  • The first JSON section is a header, comprising the data format, version, number of records, and a timestamp.
  • The rest is one large array, CVE_Items, containing one cve block per vulnerability.
    • A cve block has affects and impacts blocks.
      • The affects block says what software this vulnerability affects. There are values for:
        • the software vendor’s name (vendor_name);
        • the software product’s name (product_name);
        • the software product’s version, defined with an operator (version_affected), and a value (version_value). This pair specifies a list of specific versions, or a range of versions.
      • The impact block defines the severity of a vulnerability as a number and a name using the Common Vulnerability Scoring System (CVSS). (There are two versions, 2.0 and 3.0. They differ slightly as shown below.)
        • The number is the Base Score Range (baseScore), a decimal value from 0.0 to 10.0.
        • The name (severity) is one of LOW, MEDIUM or HIGH (for CVSS V2.0; NONE and CRITICAL are added in v3.0).


CVSS v2.0 vs v3.0

The two versions differ in how they map the base score to severity. (The table below comes from the NVD CVSS page.)

v2.0   v3.0  
Severity Base Score Range Severity Base Score Range
    None 0.0
Low 0.0–3.9 Low 0.1–3.9
Medium 4.0–6.9 Medium 4.0–6.9
High 7.0–10.0 High 7.0–8.9
    Critical 9.0–10.0

With this overview of the data, here’s a sketch of my first query, in SQL-like pseudo-code (the irony being that CouchDB is one of the class of so-called NoSQL databases).

select 'cve' records from 'CVE_Items'
where 'vendor_name' is 'linux'
and 'product_name' is 'linux_kernel'
and 'version_value' is <kernel_version>
and 'baseScore' is between <severity_min> and <severity_max>

That’s the theory, now for the practice. I’ll set up an environment, load the data, and express this query in Mango.

A. Install CouchDB on Ubuntu 18.04

  1. Copy and paste these commands into a terminal and run them. (From now on, I’ll assume you can recognize a command when you see one and know what to do with it.)

    sudo apt-get install -y apt-transport-https gnupg ca-certificates
    echo "deb https://apache.bintray.com/couchdb-deb bionic main" | sudo tee -a /etc/apt/sources.list.d/couchdb.list
    sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 8756C4F765C9AC3CB6B85D62379CE192D401AB61
    sudo apt update
    sudo apt install -y couchdb

    This will install certificates, add a source to the Ubuntu repositories, and install CouchDB. (Instructions for other platforms are at docs.couchdb.org.)

  2. When the Configuring CouchDB screen appears, choose standalone. (If it doesn’t appear, you may have previously installed and removed CouchDB.)

  3. Keep the default value (127.0.0.1) for CouchDB interface bind address.

  4. Enter a password for the admin user and confirm it.

  5. Open a web browser and go to http://127.0.0.1:5984/_utils

  6. Log in as your admin user.

  7. In the Databases pane, select Create Database.

  8. In Name of database, enter nvd (or your own choice of database name).

  9. Click Create.

  10. A temporary message will report Database created successfully.

B. Import CVE Data into CouchDB

  1. Install Node.js, and the couchimport tool.

    sudo apt-get install npm
    sudo npm install -g couchimport
  2. Download CVE JSON data files.

    R="https://nvd.nist.gov/feeds/json/cve/1.0/"
    for y in {2009..2019}; do
      eval $(printf "wget ${R}nvdcve-1.0-%s.json.gz" $y)
    done

    Note: Adjust the year range (2009..2019) to suit your interests. (NVD have data from 2002.)

  3. Decompress the files.

    gunzip *.gz
  4. Import CVE data into the CouchDB database.

    cat nvdcve*.json | couchimport --url http://127.0.0.1:5984 --database nvd --type json --jsonpath "CVE_Items.*"

Note: If you are using CouchDB Version 3 and younger, you need to add  the authorisation data (user and password) so that the step #4 would look like this:

cat nvdcve*.json | couchimport --url http://admin:password@127.0.0.1:5984 --database nvd --type json --jsonpath "CVE_Items.*"
5. Check the number of records in the database is the same as in the files. (The counts from these two commands should be the same.)
grep CVE_data_numberOfCVEs nvdcve*.json | cut -d':' -f3 | tr -cd '[:digit:][:cntrl:]' | awk '{s+=$1} END {print s}'
curl -sX GET http://127.0.0.1:5984/nvd | json_pp | grep doc_count | cut -d':' -f2

C. Run a Mango Query in Fauxton

To make this first query easy, I’ll use the CouchDB GUI (Fauxton). But don’t get too comfortable with it, because in Parts 2 and 3 I’ll work solely on the command line.

  1. In a browser, go to: http://127.0.0.1:5984/_utils/#database/nvd/_find

  2. Delete the contents of the Mango Query pane, then copy and paste this query text into it:

    {
        "selector": {
            "cve.affects.vendor.vendor_data": {
                "$elemMatch": {
                    "vendor_name": "linux",
                    "product.product_data": {
                        "$elemMatch": {
                            "product_name": "linux_kernel"
                        }
                    }
                }
            },
            "publishedDate": {
                "$gte": "2019-01-01",
                "$lte": "2019-01-31"
            }
        }
    }
  3. Set Documents per page to its highest value.

  4. Click Run Query.

  5. The results is a list of Linux kernel vulnerabilities, for all severities and kernel versions, assigned or published in January 2019.

  6. To see the details of a CVE, copy and paste its ID into the NVD search page.

Homework

Try changing the date range for different months, or the whole year.

Conclusion

Congratulations. You are now the proud owner of a CouchDB database bulging with a decade’s worth (or whatever you chose) of CVEs. Remember, the database contains data for all software from all vendors.

In Part 2, I’ll add selectors for severity and kernel version, run the queries and chart the results.

 

Topics: Developer Blog

Paul Jacobs

Written by Paul Jacobs

With more than a quarter of a century in IT, Paul brings with him a kaleidoscope of experiences and insight which he uses to drill into and pick apart the complexities of Linux server security and hosting issues, as Technical Evangelist and Content Writer for CloudLinux.

    cover for blog

    Download Whitepaper

    Subscribe to Email Updates

    Recent Posts