bd1798ad95
pkg-stats currently uses the services from support/scripts/cpedb.py to match the CPE identifiers of packages with the official CPE database. Unfortunately, the cpedb.py code uses regular ElementTree parsing, which involves loading the full XML tree into memory. This causes the pkg-stats process to consume a huge amount of memory: thomas 1310458 85.2 21.4 3708952 3450164 pts/5 R+ 16:04 0:33 | | \_ python3 ./support/scripts/pkg-stats So, 3.7 GB of VSZ and 3.4 GB of RSS are used by the pkg-stats process. This is causing the OOM killer to kick-in on machines with relatively low memory. This commit reimplements the XML parsing needed to do the CPE matching directly in pkg-stats, using the XmlParser functionality of ElementTree, also called "streaming parsing". Thanks to this, we never load the entire XML tree in RAM, but only stream it through the parser, and construct a very simple list of all CPE identifiers. The max memory consumption of pkg-stats is now: thomas 1317511 74.2 0.9 381104 152224 pts/5 R+ 16:08 0:17 | | \_ python3 ./support/scripts/pkg-stats So, 381 MB of VSZ and 152 MB of RSS, which is obviously much better. The JSON output of pkg-stats for the full package set, before and after this commit, is exactly identical. Now, one will probably wonder why this isn't directly changed in cpedb.py. The reason is simple: cpedb.py is also used by support/scripts/missing-cpe, which (for now) heavily relies on having in memory the ElementTree objects, to re-generate a snippet of XML that allows us to submit to NIST new CPE entries. So, future work could include one of those two options: (1) Re-integrate cpedb.py into missing-cpe directly, and live with two different ways of processing the CPE database. (2) Rewrite the missing-cpe logic to also be compatible with a streaming parsing, which would allow this logic to be again shared between pkg-stats and missing-cpe. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> [yann.morin.1998@free.fr: - add missing import of requests - import CPEDB_URL from cpedb, instead of duplicating it - fix flake8 errors ] Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr> |
||
---|---|---|
.. | ||
apply-patches.sh | ||
boot-qemu-image.py | ||
br2-external | ||
brpkgutil.py | ||
check-bin-arch | ||
check-dotconfig.py | ||
check-host-rpath | ||
check-kernel-headers.sh | ||
check-merged-usr.sh | ||
cpedb.py | ||
cve.py | ||
eclipse-register-toolchain | ||
expunge-gconv-modules | ||
fix-configure-powerpc64.sh | ||
fix-rpath | ||
gen-bootlin-toolchains | ||
gen-missing-cpe | ||
generate-gitlab-ci-yml | ||
genimage.sh | ||
graph-build-time | ||
graph-depends | ||
hardlink-or-copy | ||
mkmakefile | ||
mkusers | ||
pkg-stats | ||
pycompile.py | ||
pyinstaller.py | ||
setlocalversion | ||
size-stats |