Linux Kernel CVE Data Analysis - Part 3 - Vulnerabilities by Version

Published: Jan 30, 2020 5:49:00 PM / Last update: Mar 30, 2020 / by Paul Jacobs

part3

Introduction

In Part 2, I ran Mango queries on a CouchDB database full of CVEs, and had a good picture of how the number and severity of Linux kernel vulnerabilities varies from year to year. (Part 1 showed how to set up CouchDB and import CVE data into it on Ubuntu 18.04.)

In this part, Part 3, I develop that core Mango query to look at how the number of Linux kernel vulnerabilities varies by kernel version.

Because there are so many versions, the queries and commands start to get more complicated, and take longer to complete. So, for now, I will only vary the year and kernel version. Severity remains a parameter but I will query results for all severities.

Here is what I’ve added to the Mango query:

  1. A selector for version (version.version_data) in the product.product_data block. It matches records when the version_value equals ($eq) a parameter (which I’ll pass in). This first parameter will be the kernel version for the query.

  2. A version_affected selector. This is an operator that matches the version_value against either a range (<=) or a specific value (=).

  3. The publishedDate block time field is now time-zone independent. This is for edge cases, CVEs reported near midnight, and where your local timezone differs from that used by the NVD.

I’ll start by looking at the last five releases (2.6.35 to 2.6.39) of the famously long-running 2.6.x Linux kernel branch.


Dash (-) as a value for version_value

The official NVD CVE JSON feed schema defines both version_value and version_affected as strings but doesn’t say what the possible values are. I looked at the source files and found that version_value is either a release number or a dash (-). But I couldn’t find any information on what a dash means. So, for now, the queries ignore any records with a dash in the version_value field.


Query 2 - Linux kernel CVEs by year and kernel version

  1. Run this to create qry/cve-2.json:

    cat<<\EOF>qry/cve-2.json
    {
        "selector": {
            "cve.affects.vendor.vendor_data": {
                "$allMatch": {
                    "vendor_name": "linux",
                    "product.product_data": {
                        "$allMatch": {
                            "product_name": "linux_kernel",
                            "version.version_data": {
                                "$elemMatch": {
                                    "version_value": {
                                        "$eq": "%s"
                                    },
                                    "version_affected": {
                                        "$in": ["=","<="]
                                    }
                                }
                            }
                        }
                    }
                }
            },
            "publishedDate": {
                "$gte": "%s-01-01T00:00Z",
                "$lte": "%s-12-31T23:59Z"
            },
            "impact.baseMetricV2.cvssV2.baseScore": {
                "$gte": %d,
                "$lte": %d
            }
        },
        "fields": ["cve.CVE_data_meta.ID"],
        "limit": 999999
    }
    EOF
  2. Run this.

    VERSIONS=(2.6.35 2.6.36 2.6.37 2.6.38 2.6.39)
    (IFS=$'\t'; echo -e "Vers\t${VERSIONS[*]}" | tee out/cve-2_26x.tsv); for YEAR in {2015..2019}
    do
      RES=()
      echo -en "$YEAR\t"
      for VERS in ${VERSIONS[*]}
      do
        RES+=($(printf "$(cat qry/cve-2.json)" $VERS $YEAR $YEAR 0 10 |
            curl -sX POST -d @- http://127.0.0.1:5984/nvd/_find \
              --header "Content-Type:application/json" |
            json_pp | grep '"ID"' | wc -l))
        done
        (IFS=$'\t'; echo "${RES[*]}")
    done | tee -a out/cve-2_26x.tsv

    This is two nested for-loops. The outer loop (line 2) is the years 2015 to 2019 inclusive. They are used twice (line 8) as the second and third printf substitutions in the Mango query (in the publishedDate block) which is the date range for the query (January 1 to December 31 for the year). The inner loop (line 6) is the last five 2.6.x branch kernel version numbers (taken from kernelnewbies.org and defined on line 1). These are expanded by printf into values that match version_value.

  3. The output:

    Vers	2.6.35	2.6.36	2.6.37	2.6.38	2.6.39
    2015	6	6	6	6	6
    2016	1	1	1	1	1
    2017	9	9	9	10	10
    2018	21	21	21	21	21
    2019	121	121	120	121	120
  4. Run this to create the gnuplot script:

    cat<<\EOF>bin/cve-2_26x.gnuplot
    reset
    set terminal png size 800,600
    set output 'img/cve-2_26x.png'
    set xtics 1
    set key top left autotitle columnheader title "Kernel Version"
    set title "Linux kernel CVEs by version (2.6.x)"
    set xlabel "Year"; set ylabel "Number of CVEs"
    plot [] [0:160] for [c=2:6] 'out/cve-2_26x.tsv' using 1:c with lines lw 3
    EOF
  5. Run gnuplot to create the image file, then view it.

    gnuplot -c bin/cve-2_26x.gnuplot && eog img/cve-2_26x.png
  6. Here’s the chart.

    linux kernel cves by version 2.6.x graph
  7. Repeat for the last five releases of the other major branches by changing the value of VERSIONS:

    • 3.15 3.16 3.17 3.18 3.19 (Releases between 8 June 2014 and 8 February 2015)
    • 4.16 4.17 4.18 4.19 4.20 (1 April 2018 – 23 December 2018)
    • 5.0 5.1 5.2 5.3 5.4 (3 March 2019 – 24 November 2019)

    For each run, change the output file names (e.g. out/cve-2_<VER>.tsv) on the second and last line of the command. Copy the gnuplot script change the set output, set title and input .tsv file names.

    Here are the charts for the remaining branches.

    linux kernel cves by version 3.x graph

    Yes, the big spike in 2017 was mainly due to vulnerabilities found in 3.18.

    linux kernel cves by version 4x graph
    linux kernel cves by version 5x

Conclusions

  1. Taken together, the complete set of graphs for all major Linux kernel branches seems to answer my original question: the safest Linux kernel (the one with fewest vulnerabilities) is the latest, version 5.4.

  2. The effect of many vulnerabilities extends beyond branches and ‘End of Life’ dates. For example, for 2.6, the last release, 2.6.39, was in May 2011. It was designated ‘end of life’ at version 2.6.39.4 in August the same year. Nevertheless, reports of vulnerabilities affecting it haven’t stopped, and have increased since 2016.

  3. Comparing vulnerabilities branch-by-branch is not the best way. Branches are continuous, with the final release of one branch forming the basis of the next. For example, 5.0 is a continuation of the 4.x branch, 4.0 a continuation of 3.19, and the final 2.6.39 release of the long-running 2.6 branch continued from 3.0. The way code bases are branched like this means that each subsequent branch will have progressively less bugs.

Topics: Developer Blog

Paul Jacobs

Written by Paul Jacobs

With more than a quarter of a century in IT, Paul brings with him a kaleidoscope of experiences and insight which he uses to drill into and pick apart the complexities of Linux server security and hosting issues, as Technical Evangelist and Content Writer for CloudLinux.

    cover for blog

    Download Whitepaper

    Subscribe to Email Updates

    Recent Posts