In order to be able to use the CVE checking logic outside of
pkg-stats, move the CVE class in a module that can be used by other
scripts.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Some CVE entries in the NVD database have version_value set to "-",
which seems to indicate that it applies to all versions of the
software project, or that they don't really know which versions are
affected, and which are not.
So, for the benefit of doubt, it seems more appropriate to consider
such CVEs as affecting our packages.
This makes the total number of CVEs affecting our next branch jump
from 141 CVEs to 658 CVEs, but that number will go back down once we
switch to the JSON 1.1 schema. Indeed, in the JSON 1.0 schema, there
are often cases where a version_value is set to "=" *and* specific
versions are set to.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
This commit slightly improves the output of pkg-stats by showing the
progress of the upstream URL checks and latest version retrieval, on a
package basis:
Checking URL status
[0001/0062] curlpp
[0002/0062] cmocka
[0003/0062] snappy
[0004/0062] nload
[...]
[0060/0062] librtas
[0061/0062] libsilk
[0062/0062] jhead
Getting latest versions ...
[0001/0064] libglob
[0002/0064] perl-http-daemon
[0003/0064] shadowsocks-libev
[...]
[0061/0064] lua-flu
[0062/0064] python-aiohttp-security
[0063/0064] ljlinenoise
[0064/0064] matchbox-lib
Note that the above sample was run on 64 packages. Only 62 packages
appear for the URL status check, because packages that do not have any
URL in their Config.in file, or don't have any Config.in file at all,
are not checked and therefore not accounted.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
This commit reworks the code that checks if the upstream URL of each
package (specified by its Config.in file) using the aiohttp
module. This makes the implementation much more elegant, and avoids
the problematic multiprocessing Pool which is causing issues in some
situations.
Suggested-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
This commit reworks the code that retrieves the latest upstream
version of each package from release-monitoring.org using the aiohttp
module. This makes the implementation much more elegant, and avoids
the problematic multiprocessing Pool which is causing issues in some
situations.
Since we're now using some async functionality, the script is Python
3.x only, so the shebang is changed to make this clear.
Suggested-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
This fixes the following flake8 warning:
support/scripts/pkg-stats:1005:9: E117 over-indented
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
With python 3, when a package has a version number x-y-z instead of
x.y.z, then the version returned by LooseVersion can't be compared
which raises a TypeError exception:
Traceback (most recent call last):
File "./support/scripts/pkg-stats", line 1062, in <module>
__main__()
File "./support/scripts/pkg-stats", line 1051, in __main__
check_package_cves(args.nvd_path, {p.name: p for p in packages})
File "./support/scripts/pkg-stats", line 613, in check_package_cves
if pkg_name in packages and cve.affects(packages[pkg_name]):
File "./support/scripts/pkg-stats", line 386, in affects
return pkg_version <= cve_affected_version
File "/usr/lib64/python3.8/distutils/version.py", line 58, in __le__
c = self._cmp(other)
File "/usr/lib64/python3.8/distutils/version.py", line 337, in _cmp
if self.version < other.version:
TypeError: '<' not supported between instances of 'str' and 'int'
This patch handles this exception by adding a new return value when
the comparison can't be done. The code is adjusted to take of this
change. For now, a return value of CVE_UNKNOWN is handled the same way
as a CVE_DOESNT_AFFECT return value, but this can be improved later
on.
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
When the 'nvd-path', 'json' and 'html' are used like this:
--html ~/foo
then the tilde expansion is properly done by the shell. However, when
they are used like this:
--html=~/foo
The shell doesn't do the tilde expansion, and pkg-stats doesn't do
it. This commit modifies pkg-stats to ensure that tilde expansion is
done when parsing the 'nvd-path', 'json' and 'html' arguments.
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
[Thomas: improve commit log]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
flake8 complains with:
support/scripts/pkg-stats:339:13: E722 do not use bare 'except'
Due to the construct:
try:
something
except:
print("some message")
raise
Which is in fact OK because the exception is re-raised. This issue is
discussed at https://github.com/PyCQA/pycodestyle/issues/703, and the
general agreement is that these "bare except" are OK, and should be
ignored from flake8 using a noqa statement.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
flake8 complains with:
pkg-stats:38:1: E402 module level import not at top of file
This is due to sys.path.append() being before the import from
getdeveloperlib, but we really need this sys.path.append() to be
before, so let's ignore this flake8 warning.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
If there is no infra set or infra is virtual the status is set to 'na'.
This is done for the follwing checks:
- license
- license-files
- hash
- hash-license
- patches
- version
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
This value can be used for later processing.
In the buildroot-stats application this is used to create links pointing
to the git repo of buildroot.
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Unify the status check information. The status is stored in a tuple. The
first entry is the status that can be 'ok', 'warning' or 'error'. The
second entry is a verbose message.
The following checks are performed:
- url: status of the URL check
- license: status of the license presence check
- license-files: status of the license file check
- hash: status of the hash file presence check
- patches: status of the patches count check
- pkg-check: status of the check-package script result
- developers: status if a package has developers in the DEVELOPERS file
- version: status of the version check
With that status information the following variables are replaced:
has_license, has_license_files, has_hash, url_status
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Use the function 'parse_developers' function from getdeveloperlib that
collect the information about the developers and the files they
maintain. Then set the maintainer(s) to each package.
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Remove the patch_count attribute and use a class property instead.
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
This patch changes the type of the latest_version variable to a dict.
This is for better readability/usability of the data. With this the json
output is more descriptive in later processing of the json output.
Signed-off-by: Heiko Thiery <heiko.thiery@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
During the CVE checking phase, we can still see a huge amount of
Python processes (actually 128) running on the host, even though
the CVE step is entirely ran in the main thread.
These are actually the worker processes spawned to check for the
packages URL statuses and the latest versions from release-monitoring.
This is because of an issue in Python's multiprocessing implementation:
https://bugs.python.org/issue34172
The problem was already there before the CVE matching step was
introduced, but because pkg-stat was terminating right after the
release-monitoring step, it went unnoticed.
Also, do not hold a reference to the multiprocessing pool from
the Package class, as this is not needed.
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
In Python 3, the functions from the subprocess module return bytes
(and no longer strings as in Python 2), which must be decoded for
further text operations.
Now, pkg-stats can be run in Python 3.
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
It seems like throughout the series that the CVE pkg-stats support
went through, the support for ignoring CVEs in the per-package
<pkg>_IGNORE_CVES variable was forgotten.
Let's re-introduce this, which is now very simple thanks to the CVE
class, its .identifier() propertly and the .is_cve_ignored() method of
the Package class
Cc: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
During the CVE checking phase, we can still see a huge amount of
Python processes (actually 128) running on the host, even though
the CVE step is entirely ran in the main thread.
These are actually the worker processes spawned to check for the
packages URL statuses and the latest versions from release-monitoring.
This is because of an issue in Python's multiprocessing implementation:
https://bugs.python.org/issue34172
The problem was already there before the CVE matching step was
introduced, but because pkg-stat was terminating right after the
release-monitoring step, it went unnoticed.
Also, do not hold a reference to the multiprocessing pool from
the Package class, as this is not needed.
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
In Python 3, the functions from the subprocess module return bytes
(and no longer strings as in Python 2), which must be decoded for
further text operations.
Now, pkg-stats can be run in Python 3.
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
The NVD files that are used to build the list of CVEs affecting
Buildroot packages are quite large (a few hundreds MB of json),
and cause the pkg-stats scripts to have a huge memory footprint
(a few GB with Python 2.7).
However, because we only need to iterate on CVE items one by one,
we can process them in streaming (ie decoding one CVE at a time
from the JSON representation). Because the json module from the
python standard library does not support such a mode of operation,
we switch to the third-party package ijson, which is compatible
with both Python 2 and Python3.
To run the script with these modifications, one should install
the ijson python package. This can be done with pip:
`pip install ijson`. On Debian based distributions, this can
also be done with the apt package manager:
`apt install python-ijson`.
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Reviewed-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Tested-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
The NVD files that are used to build the list of CVEs affecting
Buildroot packages are quite large (a few hundreds MB of json),
and cause the pkg-stats scripts to have a huge memory footprint
(a few GB with Python 2.7).
However, because we only need to iterate on CVE items one by one,
we can process them in streaming (ie decoding one CVE at a time
from the JSON representation). Because the json module from the
python standard library does not support such a mode of operation,
we switch to the third-party package ijson, which is compatible
with both Python 2 and Python3.
To run the script with these modifications, one should install
the ijson python package. This can be done with pip:
`pip install ijson`. On Debian based distributions, this can
also be done with the apt package manager:
`apt install python-ijson`.
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Reviewed-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Tested-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
It seems like throughout the series that the CVE pkg-stats support
went through, the support for ignoring CVEs in the per-package
<pkg>_IGNORE_CVES variable was forgotten.
Let's re-introduce this, which is now very simple thanks to the CVE
class, its .identifier() propertly and the .is_cve_ignored() method of
the Package class
Cc: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
This commit extends the pkg-stats script to grab information about the
CVEs affecting the Buildroot packages.
To do so, it downloads the NVD database from
https://nvd.nist.gov/vuln/data-feeds in JSON format, and processes the
JSON file to determine which of our packages is affected by which
CVE. The information is then displayed in both the HTML output and the
JSON output of pkg-stats.
To use this feature, you have to pass the new --nvd-path option,
pointing to a writable directory where pkg-stats will store the NVD
database. If the local database is less than 24 hours old, it will not
re-download it. If it is more than 24 hours old, it will re-download
only the files that have really been updated by upstream NVD.
Packages can use the newly introduced <pkg>_IGNORE_CVES variable to
tell pkg-stats that some CVEs should be ignored: it can be because a
patch we have is fixing the CVE, or because the CVE doesn't apply in
our case.
>From an implementation point of view:
- A new class CVE implement most of the required functionalities:
- Downloading the yearly NVD files
- Reading and extracting relevant data from these files
- Matching Packages against a CVE
- The statistics are extended with the total number of CVEs, and the
total number of packages that have at least one CVE pending.
- The HTML output is extended with these new details. There are no
changes to the code generating the JSON output because the existing
code is smart enough to automatically expose the new information.
This development is a collective effort with Titouan Christophe
<titouan.christophe@railnova.eu> and Thomas De Schampheleire
<thomas.de_schampheleire@nokia.com>.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Titouan Christophe <titouan.christophe@railnova.eu>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
As suggested by Baruch Siach, using "git rev-parse HEAD" is a lot
simpler than playing around with "git log" to just retrieve the commit
id corresponding to the current HEAD.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
pkg-stats extracts the Buildroot commit id from which the package
information was collected. However, when doing so, it always assumes
we're using the master branch, by running "git log master".
But in fact, pkg-stats can be run from any branch/tag, so it makes a
lot more sense to use "git log HEAD".
Cc: victor.huesca@bootlin.com
Cc: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
The major bottleneck in pkg-stats is the time spent waiting for
answers from remote servers. Two functions involve such communication
with remote servers:
- 'check_package_urls' which checks that each package upstream website
is up, it is efficient due to the use of process-pools thanks to
Matt Weber.
- 'check_package_latest_version' which fetches the latest package
version from release-monitoring, it uses a http-pool but runs
sequentially.
This patch extends the use of process-pools to 'check_latest_version'.
Due to some limitations of multiprocess callbacks, this patch loses
the overall progress of packages in favour of just the current package
name.
Runtimes for this function are ~3m vs ~25m for the linear version.
Tested on an i7 7500U (2/4 cores/threads @3.5GHz) with 15ms ping.
Note: There have already been work trying to parallelize this function
using threads but there were a failure on some configurations [1].
This implementation rely on a dedicated module already in use on this
script, so it's unlikely to see failure with this version.
[1] http://lists.busybox.net/pipermail/buildroot/2018-March/215368.html
Signed-off-by: Victor Huesca <victor.huesca@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Fixes:
- blank space before ':'
- unused 'o' variable left from a previous patch
- bad continuous alignment
Signed-off-by: Victor Huesca <victor.huesca@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
The pkg-stats calls 3 times `make` to get a bunch of variables. These
variables can be obtained in only one make invocation. This patch
replaces the three calls by just one and adjusts the parsing logic
accordingly.
Note: another option suggested by Arnout would be to run `make
show-info` that produces a json with the necessary variables. This
would avoid the duplicated effort done in pkg-stats and pkg-utils and
allow to add other infos to pkg-stats like dependencies, reversed
dependencies or if the package is virtual.
In order to use this method, the following changes are required in
pkg-generic's show-info:
- include license_files;
- have an option to run it on *all* packages, not just the selected
ones.
This patch take the simplest approach of only factorizing the make
calls as it requires less changes.
Signed-off-by: Victor Huesca <victor.huesca@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Since it's used only for the HTML output, and all other functions used
for HTML output are prefixed by dump_html, let's do so for
dump_gen_info() as well by renaming it to dump_html_gen_info().
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
The 'dump_html' and 'dump_json' both include commit infos as well as the
current date. It make more sense to retrieve these information once.
This patch simply does this factorization.
Signed-off-by: Victor Huesca <victor.huesca@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Pkg-stats is a great script that get a lot of interesting info from
buildroot packages. Unfortunately it is currently designed to output a
static HTML page only. While this is great to include on the
buildroot's website, the HTML is not designed to be easily parsable and
thus it is difficult to reuse it in other scripts.
This patch provide a new option to output a JSON file in addition to the
HTML one.
The old 'output' option has been renamed to 'html' to distinguish from
the new 'json' option.
Signed-off-by: Victor Huesca <victor.huesca@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Move the mutual exculsion of the '-n' and '-p' options to be part of the
parser instead of being checked in main.
Signed-off-by: Victor Huesca <victor.huesca@bootlin.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Fixes the following flake8 warnings:
support/scripts/pkg-stats:34:2: W605 invalid escape sequence '\$'
support/scripts/pkg-stats:34:4: W605 invalid escape sequence '\('
support/scripts/pkg-stats:34:11: W605 invalid escape sequence '\$'
support/scripts/pkg-stats:34:13: W605 invalid escape sequence '\('
support/scripts/pkg-stats:34:32: W605 invalid escape sequence '\)'
support/scripts/pkg-stats:34:34: W605 invalid escape sequence '\)'
support/scripts/pkg-stats:35:2: W605 invalid escape sequence '\s'
support/scripts/pkg-stats:35:14: W605 invalid escape sequence '\S'
support/scripts/pkg-stats:35:17: W605 invalid escape sequence '\s'
support/scripts/pkg-stats:42:1: E302 expected 2 blank lines, found 1
support/scripts/pkg-stats:587:133: E501 line too long (157 > 132 characters)
Note that the "invalid escape sequence" errors work because Python
leaves the \ in place if it doesn't recognise the escape sequence. But
it's better practice to use a raw string for regular expressions.
Signed-off-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
This commit adds fetching the latest upstream version of each package
from release-monitoring.org.
The fetching process first tries to use the package mappings of the
"Buildroot" distribution [1]. This mapping mechanism allows to tell
release-monitoring.org what is the name of a package in a given
distribution/build-system. For example, the package xutil_util-macros
in Buildroot is named xorg-util-macros on release-monitoring.org. This
mapping can be seen in the section "Mappings" of
https://release-monitoring.org/project/15037/.
If there is no mapping, then it does a regular search, and within the
search results, looks for a package whose name matches the Buildroot
name.
Even though fetching from release-monitoring.org is a bit slow, using
multiprocessing.Pool has proven to not be reliable, with some requests
ending up with an exception. So we keep a serialized approach, but
with a single HTTPSConnectionPool() for all queries. Long term, we
hope to be able to use a database dump of release-monitoring.org
instead.
From an output point of view, the latest version column:
- Is green when the version in Buildroot matches the latest upstream
version
- Is orange when the latest upstream version is unknown because the
package was not found on release-monitoring.org
- Is red when the version in Buildroot doesn't match the latest
upstream version. Note that we are not doing anything smart here:
we are just testing if the strings are equal or not.
- The cell contains the link to the project on release-monitoring.org
if found.
- The cell indicates if the match was done using a distro mapping, or
through a regular search.
[1] https://release-monitoring.org/distro/Buildroot/
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Tested-by: Matthew Weber <matthew.weber@rockwellcollins.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Adds a pool of worker threads to accelerate connection testing.
~7.5MB and 2% CPU per thread on a Intel i5-3230M CPU @ 2.60GHz.
Runtime is ~3min in parallel vs ~15min.
CC: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
Reviewed-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
- Adds support to check if a package has a URL and if that URL
is valid by doing a header request.
- Reports this information as part of the generated html output
The URL data is currently gathered from the URL string provided
in the Kconfig help sections for each package.
This check helps ensure the URLs are valid and can be used
for other scripting purposes as the product's home site/URL.
CPE XML generation is an example of a case that could use this
product URL as part of an automated update generation script.
CC: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Signed-off-by: Matt Weber <matthew.weber@rockwellcollins.com>
Reviewed-by: Ricardo Martincoski <ricardo.martincoski@gmail.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Use Python 3 style print calls, in order to make pkg-stats Python 3
compliant.
Signed-off-by: Matthew Weber <matthew.weber@rockwellcollins.com>
Reviewed-by: Ricardo Martincoski <ricardo.martincoski@datacom.ind.br>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>