Linux Kernel CVE Data Analysis - Part 2 - Vulnerabilities by Year

Jan 30, 2020 5:48:00 PM / by Paul Jacobs

part2

Introduction

In Part 1, I installed CouchDB, loaded CVE data into it, and ran a simple Mango query that listed the Linux kernel vulnerabilities for a chosen date range for all severities and all kernel versions.

Here, in Part 2, I will extend and refine that query to see results by severity and kernel version. But rather than run queries repeatedly, I will use the power of the command line to semi-automate the process, and Gnuplot will chart the results.

Preliminaries

  1. Create some directories to keep track of everything.

    mkdir bin qry img out

    I’ll use them like this:

    • bin is for scripts;
    • qry for Mango query files;
    • img for images of charts;
    • out for the output from queries.
  2. Install gnuplot. (I’ll use it to visualize the data.)

    sudo apt install -y gnuplot

I want to start where I left off, using the query from Part 1. Here it is again, as a reminder.

{
    "selector": {
        "cve.affects.vendor.vendor_data": {
            "$elemMatch": {
                "vendor_name": "linux",
                "product.product_data": {
                    "$elemMatch": {
                        "product_name": "linux_kernel"
                    }
                }
            }
        },
        "publishedDate": {
            "$gte": "2019-01-01",
            "$lte": "2019-01-31"
        }
    }
}

First, I’ll look at how the number of Linux kernel vulnerabilities varies year on year. To do that, I’ll add parameters using printf tokens in the Mango queries, with values substituted at run time. (Later, I’ll add parameters for kernel version, too.) To keep things quick and simple, I’ll use the CouchDB POST API to submit queries on the command line.

Query 1a - Linux kernel CVEs by year

  1. Run this to create a Mango query file in qry/cve-1a.json:

    cat<<\EOF>qry/cve-1a.json 
    {
      "selector": {
        "cve.affects.vendor.vendor_data": {
           "$elemMatch": {
              "vendor_name": "linux",
              "product.product_data": {
                 "$elemMatch": {
                    "product_name": "linux_kernel"
                 }
              }
           }
        },
        "publishedDate": {
          "$gte": "%s-01-01",
          "$lte": "%s-12-31"
        }
      },
      "fields": ["cve.CVE_data_meta.ID"],
      "limit": 999999
    }
    EOF

    The limit field is set to return more than the default number of records (25) when using the CouchDB POST API.

  2. Run this.

    for YEAR in {2009..2019}; do
      echo -en "$YEAR\t"
      printf "$(cat qry/cve-1a.json)" $YEAR $YEAR |
        curl -sX POST -d @- http://127.0.0.1:5984/nvd/_find \
          --header "Content-Type:application/json" |
        json_pp | grep '"ID"' | wc -l
    done | tee out/cve-1a.tsv

    This runs the query in a loop for every year in the range (line 1), and print the results to the console, and to out/cve-1a.tsv which I’ll use for graphing. Change the year range to suit your interests or to match whatever you imported in Part 1.

  3. Here’s the output I get.

    2009	102
    2010	121
    2011	83
    2012	115
    2013	189
    2014	133
    2015	87
    2016	215
    2017	451
    2018	178
    2019	287

    This is the number of Linux kernel vulnerabilities for each year.

  4. Run this to create a gnuplot script in bin/cve-1a.gnuplot:

    cat<<\EOF>bin/cve-1a.gnuplot
    reset
    set terminal png size 800,600
    set output 'img/cve-1a.png'
    set style fill solid 0.5
    set boxwidth 0.8 relative
    set xtics 1; set key top left
    set title 'Linux kernel CVEs'
    set xlabel "Year"; set ylabel "Number of CVEs"
    plot [2008.5:2019.5] [] 'out/cve-1a.tsv' w boxes t 'All severities', \
    'out/cve-1a.tsv' u 1:($2+15):2 w labels t ''
    EOF
  5. Run this to create an image file in img/cve-1a.png:

    gnuplot -c bin/cve-1a.gnuplot
  6. If you have a local X display, run this:

    eog img/cve-1a.png

    Otherwise, copy the image from your server and view it in your favorite image viewer. Here’s what mine looks like.

Commentary

That’s a lot of Linux kernel vulnerabilities. 2017 was a spectacularly good/bad year for Linux kernel vulnerabilities.

I was concerned when I saw this, so I cross-checked against other sources, CVEDetails among them. The numbers are almost identical. Any differences can be explained by this observation that CVE Details makes on their site:

“…CVE data have inconsistencies which affect [the] accuracy of data displayed … For example vulnerabilities related to Oracle Database 10g might have been defined for products ”Oracle Database“, ”Oracle Database10g“, ”Database10g“, ”Oracle 10g" and similar.

In other words, the NVD CVE data is not completely accurate. (More on this later.)

This chart shows the totals CVE counts aggregated across all CVE severities. To break the results down by severity, I will add a selector to the Mango query.

Query 1b - Linux kernel CVEs by year and severity

  1. Run this to create qry/cve-1b.json (a copy of Query 1a with an added severity parameter):

    cat<<\EOF>qry/cve-1b.json
    {
        "selector": {
            "cve.affects.vendor.vendor_data": {
                "$elemMatch": {
                    "vendor_name": "linux",
                    "product.product_data": {
                        "$elemMatch": {
                            "product_name": "linux_kernel"
                        }
                    }
                }
            },
            "publishedDate": {
                "$gte": "%s-01-01",
                "$lte": "%s-12-31"
            },
            "impact.baseMetricV2.severity": "%s"
        },
        "fields": ["cve.CVE_data_meta.ID"],
        "limit": 999999
    }
    EOF
  2. Run this.

    SEVS=(LOW MEDIUM HIGH)
    (IFS=$'\t'; echo -e "YEAR\t${SEVS[*]}" | tee out/cve-1b.tsv); for YEAR in {2009..2019}
    do
        RES=()
        echo -en "$YEAR\t"
        for SEV in ${SEVS[*]}
        do
            RES+=($(printf "$(cat qry/cve-1b.json)" $YEAR $YEAR $SEV |
            curl -sX POST -d @- http://127.0.0.1:5984/nvd/_find \
              --header "Content-Type:application/json" |
            json_pp | grep '"ID"' | wc -l))
        done
        (IFS=$'\t'; echo "${RES[*]}")
    done | tee -a out/cve-1b.tsv
  3. Here’s the output.

    YEAR	LOW	MEDIUM	HIGH
    2009	7	56	39
    2010	34	53	34
    2011	18	51	14
    2012	22	69	24
    2013	44	127	18
    2014	20	90	23
    2015	17	46	24
    2016	25	102	88
    2017	73	90	288
    2018	35	88	55
    2019	45	118	124
  4. Run this to create a gnuplot script in bin/cve-1b.gnuplot:

    cat<<\EOF>bin/cve-1b.gnuplot
    reset
    set terminal png size 800,600
    set output 'img/cve-1b.png'
    set style data histograms
    set style histogram rowstacked
    set style fill solid 0.5
    set boxwidth 0.8 relative
    set xtics 1; set key top left title "Severity"
    set title 'Linux kernel CVEs'
    set xlabel "Year"; set ylabel "Number of CVEs"
    plot 'out/cve-1b.tsv' u ($2):xtic(1) t col, '' u ($3):xtic(1) t col, '' u ($4):xtic(1) t col, '' u ($0-1):($2+10):(sprintf("%d",$2)) with labels t '', '' u ($0-1):($2+$3+10):(sprintf("%d",$3)) with labels t '', '' u ($0-1):($2+$3+$4+10):(sprintf("%d",$4)) with labels t ''
    EOF
  5. Run this to create an image file in img/cve-1b.png and view it:

    gnuplot -c bin/cve-1b.gnuplot && eog img/cve-1b.png
  6. Here’s mine.

Notes on severity

  • Query 1b uses CVSS version 2 rather than the more recent version 3. There’s a good reason for this. CVSS version 3 was introduced in December 2015; CVEs before that date can only be selected using the version 2 CVSS scheme.
  • CVSS versions 2 and 3 both have numerical and textual representations for severity. The numeric is more precise, while the textual is coarser, but easier to relate to. For simplicity, I’m using the textual scheme. I’ll change to the numeric scheme later, for accuracy.

Conclusion

Query 1b is a sharper picture than Query 1a but no less shocking. For instance, compare the relative proportions of HIGH severity Linux kernel vulnerabilities to those before and after 2016.

It could be that many of these vulnerabilities are concentrated in a few kernel versions. I won’t know until I modify the query to select on version as well as year and severity. That’s what I’ll do next, in Part 3.

 

Topics: Developer Blog

Paul Jacobs

Written by Paul Jacobs

With more than a quarter of a century in IT, Paul brings with him a kaleidoscope of experiences and insight which he uses to drill into and pick apart the complexities of Linux server security and hosting issues, as Technical Evangelist and Content Writer for CloudLinux.