Commit 768f9f80f6 (support/download: generate even more reproducible
tarballs) causes non-reproducibility in tarballs we previousy
generated, especially the archives for two cargo-vendored packages,
ripgrep and sentry-cli.
The cause is that those two pakcages eventually vendor a file that has
the u+x bit set, but is otehrwise go-x. With 768f9f80f6, the files are
now go+x, so the hash for those generated archives has changed.
Besides, that commit was wrong: it did not account for the 'r' bit for
go part, leaving some non-reproducibility still unaccounted for.
So, to generate really reproducible archives, we would need to fix that
read bit as well, and that has the potential to affect all the archives
we generated so far. If we wanted to do so, we'd need a way to version
all generated archives, like we do for git and svn, but now for all the
different CVSes, as well as for all the vendoring post-processes.
For 768f9f80f6, all that was of conern was the working copies of CVSes
(i.e. git, svn, cvs...) that we cache in the Buildroot download dir, not
the temporary files during post-processing. Indeed, in that latter case,
the user has virtually no way to mangle with the mode of the
intermediate extract before repack.
And we do have a big fat warning that users should not attempt to meddle
with the git tree that Buildroot caches.
As 768f9f80f6 however demonstrates, is that it took quite a long time
between the introduction of the git caching, and the time someone
eventually discovered they could meddle in there. This shows that the
issue it not actually critical in most setups.
Also, the tar manual [0] hints at a better solution to handle
reproducibility, which even avoids touching the files on disk which is
even nicer:
‘--mode='go+u,go-w'’
Omit irrelevant information about file permissions.
If we were to actually handle the mode bit for reproducibility, we'd
need to:
- introduce archive versioning for all download backends and
prost-processing
- use the tar officially suggested method
So, revert that change, as it was incomplete, was not really fixing much
issues, and causes actual issues.
This reverts commit 768f9f80f6.
[0] https://www.gnu.org/software/tar/manual/tar.html#Reproducibility
Thanks to Vincent and Arnout for pointing at the tar manual.
Reported-by: Antoine Coutant <antoine.coutant@smile.fr>
Reported-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Vincent Fazio <vfazio@xes-inc.com>
Cc: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Tested-by: Romain Naour <romain.naour@smile.fr>
Signed-off-by: Antoine Coutant <antoine.coutant@smile.fr>
When we generate the taballs off a local working copy of a VCS tree,
the umask is the one that we enforce in out top-level Makefile.
However, it is possible that a user manually tinkers in said working
copy (e.g. to check an upstream bug fix, or regression). If the user
umask is different from the one Buildroot enfirces, such tinkering
can impact the mode bits of the files, even if their content is not
modified.
When we eventually need to create a tarball from said working copy,
the VCS (e.g. git) will only be interested in checking whether the
content of the files have changed before chcking them out, and will
not look at, and restore/fix the mode bits.
As a consequence, we may create non-reproducible archives.
We fix that by enforcing the mode bits on the files before we create
the tarball: we disable the write and execute bits, and only set the
execute bit if the user execute bit is set.
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Vincent Fazio <vfazio@xes-inc.com>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
The -z option for head was only added in coreutils 8.25, but some older
enterprise-grade distributions (e.g. the oldest still maintained RHEL 7)
only have nothing more recent than coreutils 8.22.
We fix that by using sed to remove everything that starts with the first
NULL byte, \x00.
Signed-off-by: Clayton Shotwell <clayton.shotwell@collins.com>
[yann.morin.1998@free.fr: hex is \xHH, not \xH, reword commit log]
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
For now, the download post-process logic uses mk_tar_gz, which repacks
a tarball compressed with gzip. So we can only accept as input a
tarball also compressed with gzip. To enforce that, this commit
changes post_process_unpack() to use tar xzf. This makes sure that if
a tarball compressed with something else than gzip gets used, it will
bail out and we will notice.
Support for other compression schemes can be added later on.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
download post process scripts will often need to unpack the source
code tarball, do some operation, and then repack it. In order to help
with this, post-process-helpers provide an unpack() function and a
repack() function.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
When a --transform expression is provided, it is by default also applied
to the target of a symlink.
When we create tarballs (from git or svn checkouts), we use a --transform
expression to replace the leading ./ with the package name and version.
This causes issues when a package contains symlinks that points to
./something, as the leading './' is also replaced.
Fix that by using the 'S' transformation scope flag, as described in the
tar manual:
https://www.gnu.org/software/tar/manual/html_node/transform.html#transform
In addition, several transformation scope flags are supported, that
control to what files transformations apply. These are:
‘r’ Apply transformation to regular archive members.
‘R’ Do not apply transformation to regular archive members.
‘s’ Apply transformation to symbolic link targets.
‘S’ Do not apply transformation to symbolic link targets.
‘h’ Apply transformation to hard link targets.
‘H’ Do not apply transformation to hard link targets.
Default is ‘rsh’ [...].
Fixes: #13616
This has been checked to not change any of the existing hash for any of
our git-downloaded package (some are host-only, hence the few fixups):
---8<---
$ m="$( git grep -l -E -- -br[[:digit:]]+.tar.gz boot package/ \
|awk -F/ '{print $(NF-1)}' \
|sed -r -e 's/(imx-mkimage|netsurf-buildsystem|prelink-cross|qoriq-rcw|vboot-utils)/host-\1/g' \
-e 's/$/-source/'
)"
$ make defconfig; make clean; BR2_DL_DIR=$(pwd)/trash-me make ${m}
---8<---
Note: it is unclear what the 'H' flag does nor how it works, because the
concept of "target of a hardlink" is not obvious; probably it has to do
with how tar internally detects and stores hardlinks. Since we do not
yet have any issue with hardlinks, just ignore the problem for now, and
postpone until we have an actual issue with a real test-case.
Signed-off-by: Jean-pierre Cartal <jpcartal@free.fr>
Cc: Vincent Fazio <vfazio@xes-inc.com>
[yann.morin.1998@free.fr:
- re-indent commit log
- add scriptlet to test existing hashes
]
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Some download backends, like svn, will provide timestamps with a
sub-second precision, e.g.
$ svn info --show-item last-changed-date [...]
2021-02-19T20:22:34.889717Z
However, the PAX headers do not accept sub-second precision, leading to
failure to download from subversion:
tar: Time stamp is out of allowed range
tar: Exiting with failure status due to previous errors
make[1]: *** [package/pkg-generic.mk:148: [...]/build/subversion-1886712/.stamp_downloaded] Error 1
Fix that by massaging the timestamp to drop the sub-second part. We
do that in the generic helper, rather than the svn backend, so that
all callers to the generic helper benefit from this, as this is more
an internal details of the tarball limitations, than of the backends
themselves.
Reported-by: Roosen Henri <Henri.Roosen@ginzinger.com>
Signed-off-by: Vincent Fazio <vfazio@xes-inc.com>
[yann.morin.1998@free.fr:
- add Henri as reporter
- move it out of the svn backend, and to the generic helper
- reword the commit log accordingly
- use an explicit time format rather than -Iseconds
]
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
We currently need to generate reproducible archives in at least two
locations: the git and svn download backends. We also know of some
future potential use (e.g. the other download backends, like cvs, or
in the upcoming download post-processors for vendoring, like cargo
and go).
However, we are currently limited to a narrow range of tar versions
that we support, to create reproducible archives, because the gnu
format we use has changed with tar 1.30.
As a consequence, and as time advances, more and more distros are,
or will eventually start, shipping with tar 1.30 or later, and thus
we need to always build our on host-tar.
Now, thanks to some grunt work by Vincent, we have a set of options
that we can pass tar, to generate reproducible archives back from
tar-1.27 and up through tar-1.32, the latest released version.
However, those options are non-trivial, so we do not want to have
to repeat those (and maintain them) in multiple locations.
Introduce a helper that can generate a reproducible archive from
an input directory.
The --pax-option, to set specific PAX headers, does not accept
RFC2822 timestamps which value are too away from some fixed point
(set atcompile-time?):
tar: Time stamp is out of allowed range
However, the same timestamps passed as strict compliant ISO 8601 are
accepted, so that's what we expect as a date format.
Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Vincent Fazio <vfazio@xes-inc.com>
Acked-by: Arnout Vandecappelle (Essensium/Mind) <arnout@mind.be>
Reviewed-by: Vincent Fazio <vfazio@xes-inc.com>
---8<------8<------8<------8<---
# Here is a Makefile used to test all the versions of tar, with
# different output formats and different sets of options:
# Versions prior to 1.27 do not build on recent machines, because
# 'gets()' got removed (rightfully so), so don't count them as
# candidates.
VERSIONS = 1.27 1.27.1 1.28 1.29 1.30 1.31 1.32
DATE = Thu 21 May 2020 06:44:11 PM CEST
TARS = \
$(patsubst %,test_gnu_%.tar,$(VERSIONS)) \
$(patsubst %,test_posix_%.tar,$(VERSIONS)) \
$(patsubst %,test_posix_paxoption_%.tar,$(VERSIONS))
all: $(TARS)
sha1sum $(^)
.INTERMEDIATE: test_%.tar
test_gnu_%.tar: tar.% list
./$(<) cf - -C test \
--transform="s#^\./#test-version/#" \
--numeric-owner --owner=0 --group=0 \
--mtime="$(DATE)" \
--format=gnu \
-T list \
>$(@)
test_posix_%.tar: tar.% list
./$(<) cf - -C test \
--transform="s#^\./#test-version/#" \
--numeric-owner --owner=0 --group=0 \
--mtime="$(DATE)" \
--format=posix \
-T list \
>$(@)
test_posix_paxoption_%.tar: tar.% list
./$(<) cf - -C test \
--transform="s#^\./#test-version/#" \
--numeric-owner --owner=0 --group=0 \
--mtime="$(DATE)" \
--format=posix \
--pax-option='delete=atime,delete=ctime,delete=mtime' \
--pax-option='exthdr.name=%d/PaxHeaders/%f,exthdr.mtime={$(DATE)}' \
-T list \
>$(@)
list: .FORCE
list: test
(cd test && find . -not -type d ) |LC_ALL=C sort >$(@)
LONG = L$$(for i in $$(seq 1 200); do printf 'o'; done)ng
test: .FORCE
test:
rm -rf test
mkdir -p test/bar
echo foo >test/Foo
echo bar >test/bar/Bar
ln -s bar/Bar test/buz
echo long >test/Very-$(LONG)-filename
ln test/Very-$(LONG)-filename \
test/short
.PRECIOUS: tar.%
tar.%: tar-%
cd $(<) && ./configure
$(MAKE) -C $(<)
install -m 0755 $(<)/src/tar $(@)
.PRECIOUS: tar-%
tar-%: tar-%.tar.gz
tar xzf $(<)
.PRECIOUS: tar-%.tar.gz
tar-%.tar.gz:
wget "https://ftp.gnu.org/gnu/tar/$(@)"
.FORCE:
clean:
rm -rf tar-* tar.* test_* test list
---8<------8<------8<------8<---