Currently, we grab the per-year CVE feeds, in two passes: first, we grab
the meta files, and check whether something has changed since last we
downloaded it; second, we download the feed proper, unless the meta file
has not changed, in which case we use the locally cached feed.
However, it has appeared that the FKIE releases no longer provide the
meta files, which means that (once again), our daily reports are broken.
The obvious fix would be to drop the use of the meta file, and always
and unconditionally download the feeds. That's relatively trivial to do,
but the feeds are relatively big (even as xz-xompressed).
However, the CVE database from FKIE is available as a git tree. Git is
pretty good at only sending delta when updating a local copy. In
addition, the git tree, contains each CVE as an individual file, so it
is relatively easier to scan and parse.
Switch to using a local git clone.
Slightly surprisingly (but not so much either), parsing the CVE files is
much faster when using the git working copy, than it is when parsing the
per-year feeds: indeed, the per-year feeds are xz-compressed, and even
if python is slow-ish to scan a directory and opening files therein, it
is still much faster than to decompress xz files. The timing delta [0]
is ~100s before and ~10s now, about a ten time improvement, over the
whole package set.
The drawback, however, is that the git tree is much bigger on-disk, from
~55MiB for the per-year compressed feeds, to 2.1GiB for the git tree
(~366MiB) and a working copy (~1.8GiB)... Given very few people are
going to use that, that's considered acceptable...
Eventually, with a bit of hacking [1], the two pkg-stats, before and
after this change, yield the same data (except for the date and commit
hash).
[0] hacking support/scripts/pkg-stats to display the time before/after
the CVE scan, and hacking support/scripts/cve.py to do no download so
that only the CVE scan happens (and also because the meta files are no
longer available).
[1] sorting the CVE lists in json, sorting the json keys, and using the
commit from the FKIE git tree that was used for the current per-year
feeds.
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Arnout Vandecappelle <arnout@mind.be>
Commit 22b6945552 (support/scripts/cve.py: switch from NVD to FKIE for
the JSON files) had to change the decompressor from gz to xz, as the new
location is using xz compression.
That commit mentioned that it was spawning an external xz process to do
the decompression, on the pretence that "there is no xz decompressor in
Python stdlib."
Before version 3.1, ijson.items() only accepted a file-like object as
input (that file-like object could yield bytes() or str(), both were
supported). Starting with version 3.1, ijson.items() also accepts that
it be directly passed bytes() or str() directly. subprocess.check_output()
means we are now passing bytes() to ijson.items(), so it fails on ijson
versions before 3.1, with failures such as:
[...]
File "/usr/lib/python3/dist-packages/ijson/backends/python.py", line 25, in Lexer
if type(f.read(0)) == bytetype:
AttributeError: 'bytes' object has no attribute 'read'
Ubuntu 20.04, on which the pkg-stats run to generate the daily report,
only has ijson 2.3. More recent distros have more recent versions of
ijson, like Fedora 39 that has 3.2.3, recent enough to support being fed
bytes(). Commit 22b6945552 was tested on Fedora 39, so did not catch
the issue.
However, the reasoning in 22b6945552 is wrong: there *is* the lzma
module, at least since python 3.3 (that is, aeons ago), which is able to
read xz-compressed files; it also has an API similar to the gzip module,
and can provide a file-like object that exposes the decompressed data.
So, do just that: provide an lzma-wrapped file-like object to ijson, so
that we can eventually recover our daily reports that everything is
broken! :-]
Note that this construct still works on recent versions!
Reported-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
While the old NVD JSON feed provided data files where the CVEs were
sorted by ID, the new feed from FKIE does not have sorted CVEs.
Add a method to sort a list of CVE IDs (i.e. CVE ID strings, not CVE
objects!), and use that when emiting the HTML output.
The JSON output need not be sorted, because it is supposed to be used
for post-processing, and we do not care about the ordering there; a
consumer interested in sorting should sort on their side.
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Signed-off-by: Arnout Vandecappelle <arnout@mind.be>
Commit 22b6945552 (support/scripts/cve.py: switch from NVD to FKIE for
the JSON files) missed the fact that the layout of the FKIE data files
are different from the original NVD ones. They are formatted according
to the NVD v2 API.
Most differences are relatively trivial fields renaming, and those are
easily spotted in this patch.
There is however one key difference in the layout of the configurations.
Where the NVD had "configurations" as an object with a "nodes" key, the
FKIE has a "configurations" as a list of objects with a single "nodes"
key; i.e. it is one-level deeper.
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Signed-off-by: Arnout Vandecappelle <arnout@mind.be>
When the CVE lookup was added in commit
4a157be9ef, the starting year of the JSON
files was set to 2002. However, there are also CVEs from 1999, 2000 and
2001. It is not clear why these were skipped back then.
Set the start year to 1999 to capture these old CVEs too.
Signed-off-by: Arnout Vandecappelle <arnout@mind.be>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
NVD will deprecate the v1.1 API which allows us to download the full
database as individual JSON files. Instead, there's a horribly crappy
API that is extremely slow and subject to race conditions.
Fortunately, there is a project, Fraunhofer FKIE - Cyber Analysis and
Defense [1], that goes through the effort of adapting to this new API
and regenerating the convenient JSON files. The JSON files and meta
files are re-generated daily.
Instead of implementing the NVD v2 API, we decided to just use the JSON
files generatd by fkie-cad. That saves us the effort of solving the race
conditions, devising a cache mechanism that works, handling the frequent
gateway timeouts on the NVD servers, dealing with the rate limiting, and
keeping up with changes in the API.
Switch to this repository on github as NVD_BASE_URL. The file name is
also slightly different (CVE-20XX.json instead of nvdcve-1.1-20XX.json).
The fkie-cad repository compresses with xz instead of gz. Therefore:
- rename the filename variables to _xz instead of _gz;
- use xz as a subprocess because there is no xz decompressor in Python
stdlib.
[1] https://www.fkie.fraunhofer.de/en/departments/cad.html
Cc: Daniel Lang <dalang@gmx.at>
Signed-off-by: Arnout Vandecappelle <arnout@mind.be>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Python 2 is EOL sice 2020 [1], it's still available on distros, but may not
be installed by default (as being replaced by python3).
Thus remove compatibility imports:
from __future__ import print_function
from __future__ import absolute_import
Tested with python3 -m py_compile.
[1] https://www.python.org/doc/sunset-python-2/
Signed-off-by: Petr Vorel <petr.vorel@gmail.com>
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
ijson < 2.5 (as available in Debian 10) use the slow python backend by
default instead of the most efficient one available like modern ijson
versions, significantly slowing down cve checking. E.G.:
time ./support/scripts/pkg-stats --nvd-path ~/.nvd -p avahi --html foobar.html
Goes from
174,44s user 2,11s system 99% cpu 2:58,04 total
To
93,53s user 2,00s system 98% cpu 1:36,65 total
E.G. almost 2x as fast.
As a workaround, detect when the python backend is used and try to use a
more efficient one instead. Use the yajl2_cffi backend as recommended by
upstream, as it is most likely to work, and print a warning (and continue)
if we fail to load it.
The detection is slightly complicated by the fact that ijson.backends used
to be a reference to a backend module, but is nowadays a string (without the
ijson.backends prefix).
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
This commit modifies cve.py, as well as its users cve-checker and
pkg-stats to support CPE ID based matching, for packages that have CPE
ID information.
One of the non-trivial thing is that we can't simply iterate over all
CVEs, and then iterate over all our packages to see which packages
have CPE ID information that match the CPEs affected by the
CVE. Indeed, this is an O(n^2) operation.
So instead, we do a pre-filtering of packages potentially affected. In
check_package_cves(), we build a cpe_product_pkgs dict that associates
a CPE product name to the packages that have this CPE product
name. The CPE product name is either derived from the CPE information
provided by the package if available, and otherwise we use the package
name, which is what was used prior to this patch.
And then, when we look at CVEs, we only consider the packages that
have a CPE product name matching the CPE products affected by the
CVEs. This is done in check_package_cve_affects().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Currently, when the version encoded in a CPE is '-', we assume all
versions are affected, but when it's '*' with no further range
information, we assume no version is affected.
This doesn't make sense, so instead, we handle '*' and '-' in the same
way. If there's no version information available in the CVE CPE ID, we
assume all versions are affected.
This increases quite a bit the number of CVEs and package affected:
- "total-cves": 302,
- "pkg-cves": 100,
+ "total-cves": 597,
+ "pkg-cves": 135,
For example, CVE-2007-4476 has a CPE ID of:
cpe:2.3🅰️gnu:tar:*:*:*:*:*:*:*:*
So it should be taken into account. In this specific case, it is
combined with an AND with CPE ID
cpe:2.3⭕suse:suse_linux:10:*:enterprise_server:*:*:*:*:* but since
we don't support this kind of matching, we'd better be on the safe
side, and report this CVE as affecting tar, do an analysis of the CVE
impact, and document it in TAR_IGNORE_CVES.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Matt Weber <matthew.weber@rockwellcollins.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
The affects method of the CVE uses the Package class defined in
pkg-stats. The purpose of migrating the CVE class outside of pkg-stats
was to be able to reuse it from other scripts. So let's remove the
Package dependency and only use the needed information.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
In 2019, the JSON vulnerability feeds switched their schema from
version 1.0 to 1.1.
The main difference is the removal of the "affects" element that we
were using to check if a package was affected by a CVE.
This information is now available in the "configuration" element which
contains the cpeid as well as properties about the versions
affected. Instead of having a list of the versions affected, with
these properties, it is possible to have a range of versions.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
In order to be able to use the CVE checking logic outside of
pkg-stats, move the CVE class in a module that can be used by other
scripts.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>