Go to file
Thomas Petazzoni bd1798ad95 support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats
pkg-stats currently uses the services from support/scripts/cpedb.py to
match the CPE identifiers of packages with the official CPE database.

Unfortunately, the cpedb.py code uses regular ElementTree parsing,
which involves loading the full XML tree into memory. This causes the
pkg-stats process to consume a huge amount of memory:

thomas   1310458 85.2 21.4 3708952 3450164 pts/5 R+   16:04   0:33  |   |   \_ python3 ./support/scripts/pkg-stats

So, 3.7 GB of VSZ and 3.4 GB of RSS are used by the pkg-stats
process. This is causing the OOM killer to kick-in on machines with
relatively low memory.

This commit reimplements the XML parsing needed to do the CPE matching
directly in pkg-stats, using the XmlParser functionality of
ElementTree, also called "streaming parsing". Thanks to this, we never
load the entire XML tree in RAM, but only stream it through the
parser, and construct a very simple list of all CPE identifiers. The
max memory consumption of pkg-stats is now:

thomas   1317511 74.2  0.9 381104 152224 pts/5   R+   16:08   0:17  |   |   \_ python3 ./support/scripts/pkg-stats

So, 381 MB of VSZ and 152 MB of RSS, which is obviously much better.

The JSON output of pkg-stats for the full package set, before and after
this commit, is exactly identical.

Now, one will probably wonder why this isn't directly changed in
cpedb.py. The reason is simple: cpedb.py is also used by
support/scripts/missing-cpe, which (for now) heavily relies on having
in memory the ElementTree objects, to re-generate a snippet of XML
that allows us to submit to NIST new CPE entries.

So, future work could include one of those two options:

 (1) Re-integrate cpedb.py into missing-cpe directly, and live with
     two different ways of processing the CPE database.

 (2) Rewrite the missing-cpe logic to also be compatible with a
     streaming parsing, which would allow this logic to be again
     shared between pkg-stats and missing-cpe.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
[yann.morin.1998@free.fr:
  - add missing import of requests
  - import CPEDB_URL from cpedb, instead of duplicating it
  - fix flake8 errors
]
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
2022-04-02 19:14:17 +02:00
arch core: introduce NORMALIZED_ARCH as non-kernel replacement for KERNEL_ARCH 2022-02-08 21:20:23 +01:00
board configs/octavo_osd32mp1_red: new defconfig 2022-03-20 18:08:55 +01:00
boot boot/optee-os: fix version choice 2022-03-27 17:33:24 +02:00
configs configs/zynqmp_zcu10x: change git to https 2022-03-24 18:11:18 +01:00
docs docs/website: update for 2021.02.11 2022-03-25 14:17:04 +01:00
fs fs/common.mk: use find instead of shell glob patterns 2022-03-12 17:45:21 +01:00
linux {linux, linux-headers}: bump 4.{9, 14, 19}.x / 5.{4, 10, 15, 16}.x series 2022-03-25 17:43:59 +01:00
package package/kodi-pvr-mythtv: bump version to 19.0.8-Matrix 2022-03-31 17:53:54 +02:00
support support/scripts/pkg-stats: reimplement CPE parsing in pkg-stats 2022-04-02 19:14:17 +02:00
system system/skeleton: provide run/lock directory 2022-01-12 20:38:09 +01:00
toolchain toolchain/toolchain-external/toolchain-external-bootlin: update with new s390x toolchain 2022-03-10 22:09:26 +01:00
utils utils/scanpypi: support alternative Homepage format 2022-03-13 19:24:23 +01:00
.clang-format .clang-format: initial import from Linux 5.15.6 2022-01-01 15:01:13 +01:00
.defconfig
.flake8
.gitignore
.gitlab-ci.yml utils/checkpackagelib/lib_sysv: run shellcheck 2022-02-06 18:27:03 +01:00
CHANGES Update for 2021.02.11 2022-03-25 14:14:31 +01:00
Config.in support/download: Add SFTP support 2022-01-06 09:34:05 +01:00
Config.in.legacy package/libcurl: bump version to 7.82.0 2022-03-23 21:44:00 +01:00
COPYING
DEVELOPERS DEVELOPERS: Add some more entries for Thomas Huth 2022-03-24 21:10:36 +01:00
Makefile support/scripts/graph-build-time: add support for timeline graphing 2022-03-20 23:52:24 +01:00
Makefile.legacy
README docs: move the IRC channel away from Freenode 2021-05-29 22:16:23 +02:00

Buildroot is a simple, efficient and easy-to-use tool to generate embedded
Linux systems through cross-compilation.

The documentation can be found in docs/manual. You can generate a text
document with 'make manual-text' and read output/docs/manual/manual.text.
Online documentation can be found at http://buildroot.org/docs.html

To build and use the buildroot stuff, do the following:

1) run 'make menuconfig'
2) select the target architecture and the packages you wish to compile
3) run 'make'
4) wait while it compiles
5) find the kernel, bootloader, root filesystem, etc. in output/images

You do not need to be root to build or run buildroot.  Have fun!

Buildroot comes with a basic configuration for a number of boards. Run
'make list-defconfigs' to view the list of provided configurations.

Please feed suggestions, bug reports, insults, and bribes back to the
buildroot mailing list: buildroot@buildroot.org
You can also find us on #buildroot on OFTC IRC.

If you would like to contribute patches, please read
https://buildroot.org/manual.html#submitting-patches