3035fc23de
Currently, when we generate archives, we rely on a few assumptions and mechanisms to ensure reproducilibity. So far, we mostly accounted for the content (i.e. content, filenames, and path) of the files we archived, and this is OK (git and svn should provide reproducilbe content by design, and cargo and go vendoring are also supposed to be generating reproducible content. However, tarballs do not only contain the content of the files; they also have a few metadata about those files. Beyond filenames and paths, which are already reproducible, there is the timestamp, the user and group name and ID. Those are also accounted for and made reproducible. The final touch (so far!) is that files have access rights (aka mode), and those too are stored in tarballs. So far we accounted for those by ensuring that Buildroot would always run under a known umask, thus generating files with reproducible modes. That falls short in one case that we did not envision, though: a shared download directory, where extended attributes are set to provide a default ACL that is permissive, to allow two or more users (with different uid and gid) to all read and write to such a directory. This is trivially achieved with something like: $ mkdir -p "${BR2_DL_DIR}" $ setfacl -m 'default:user::rwx' "${BR2_DL_DIR}" $ setfacl -m 'default:group::rwx' "${BR2_DL_DIR}" $ setfacl -m 'default:other::rwx' "${BR2_DL_DIR}" This has the effect that: - files below BR2_DL_DIR are all set with user, group, and world read and write access, - files executable by the owner will also be group and world executable, - directories are user, group, and world readable, writable, and searchable. This means that all the archives we generate from files in BR2_DL_DIR will have modes that are different from those generated on other systems, where only the traditional umask is used. There are various solutions to solve that issue: - detect the situation and abort: that's not nice, because users have a legitimiate reason to want to share that directory, - find a solution for each affected download mechanism: git, svn, hg, cvs, bzr... and for each of the affected vendoring mechanism: go and cargo [0]; this is not nice, because it means a lot of repetition, with the risk that they diverge over time (e.g. one is fixed for a newer issue, while the others are left out due to an oversight...) - find a single, common solution that works in all cases, whatever the download mechanism and/or vendoring: this is the best, because we can extend and fix it once and everything else benefits from it. We obviously go for the third option. The common solution is rather simple. When creating the tarball in support/download/helpers, give an option to tar to set the group and other permissions to those of the user, but without write permission. This implies that we must bump the version-suffix for the download backends [1] and for the vendoring post-processes. It also implies that the hash may change, under the following circumstances: - Symlinks normally have permissions 0777 (because symlink permissions are in fact meaningless). They will now have permission 0755 in the tarball. - If the original tarball (for vendored go and cargo packages) contained files that are readable or executable by owner but not by group or other, they will now be readable resp. executable by group and other too. Note that for writeable it is not the case, because those were already handled by our 0022 umask (which makes them not writeable by group and other). Because the hash may change, we need to update the BR_FMT_VERSION for everything that creates tarballs. Go and cargo didn't have one up to now, the the previous commit added the possibility to give one. The ones for git and svn have to be updated. Since it is now possible to have a suffix for both the VCS and the post-processing, change the suffix to something more descriptive than "-brX", i.e. -git3 for git, -go1 for golang, etc. The hash updates and filename changes will be handled in a follow-up commit. [0] Note however that the vendoring is currently not done in a sub-directory of BR2_DL_DIR, but the cargo and go caches are located there. Files that get copied from there to the vendoring area would be tainted as well, and thus we want to address that situation as well. [1] we currently do not have a CVS version suffix, because we do not guarantee the reproducilibity of CVS archives (we can't); for hg, we are currently using hg's own archive tool, and presumably that does not have the mode issue because it is not using the checked-out files. Still, doing the mode fix in a single location will help extend those two backends in the future (if that ever happens...). Reported-by: Peter Korsgaard <peter@korsgaard.com> Signed-off-by: Yann E. MORIN <yann.morin.1998@free.fr> Signed-off-by: Arnout Vandecappelle <arnout@mind.be>
99 lines
2.9 KiB
Bash
Executable File
99 lines
2.9 KiB
Bash
Executable File
# Generate a reproducible archive from the content of a directory
|
|
#
|
|
# $1 : input directory
|
|
# $2 : leading component in archive
|
|
# $3 : ISO8601 date: YYYY-MM-DDThh:mm:ssZZ
|
|
# $4 : output file
|
|
# $5... : globs of filenames to exclude from the archive, suitable for
|
|
# find's -path option, and relative to the input directory $1
|
|
#
|
|
# Notes :
|
|
# - the timestamp is internally rounded to the highest entire second
|
|
# less than or equal to the timestamp (i.e. any sub-second fractional
|
|
# part is ignored)
|
|
# - must not be called with CWD as, or below, the input directory
|
|
# - some temporary files are created in CWD, and removed at the end
|
|
#
|
|
# Example:
|
|
# $ find /path/to/temp/dir
|
|
# /path/to/temp/dir/
|
|
# /path/to/temp/dir/some-file
|
|
# /path/to/temp/dir/some-dir/
|
|
# /path/to/temp/dir/some-dir/some-other-file
|
|
#
|
|
# $ mk_tar_gz /path/to/some/dir \
|
|
# foo_bar-1.2.3 \
|
|
# 1970-01-01T00:00:00Z \
|
|
# /path/to/foo.tar.gz \
|
|
# '.git/*' '.svn/*'
|
|
#
|
|
# $ tar tzf /path/to/foo.tar.gz
|
|
# foo_bar-1.2.3/some-file
|
|
# foo_bar-1.2.3/some-dir/some-other-file
|
|
#
|
|
mk_tar_gz() {
|
|
local in_dir="${1}"
|
|
local base_dir="${2}"
|
|
local date="${3}"
|
|
local out="${4}"
|
|
shift 4
|
|
local glob tmp pax_options
|
|
local -a find_opts
|
|
|
|
for glob; do
|
|
find_opts+=( -or -path "./${glob#./}" )
|
|
done
|
|
|
|
# Drop sub-second precision to play nice with GNU tar's valid_timespec check
|
|
date="$(date -d "${date}" -u +%Y-%m-%dT%H:%M:%S+00:00)"
|
|
|
|
pax_options="delete=atime,delete=ctime,delete=mtime"
|
|
pax_options+=",exthdr.name=%d/PaxHeaders/%f,exthdr.mtime={${date}}"
|
|
|
|
tmp="$(mktemp --tmpdir="$(pwd)")"
|
|
pushd "${in_dir}" >/dev/null
|
|
|
|
# Establish list
|
|
find . -not -type d -and -not \( -false "${find_opts[@]}" \) >"${tmp}.list"
|
|
# Sort list for reproducibility
|
|
LC_ALL=C sort <"${tmp}.list" >"${tmp}.sorted"
|
|
|
|
# Create POSIX tarballs, since that's the format the most reproducible
|
|
tar cf - --transform="s#^\./#${base_dir}/#S" \
|
|
--numeric-owner --owner=0 --group=0 --mtime="${date}" \
|
|
--format=posix --pax-option="${pax_options}" --mode='go=u,go-w' \
|
|
-T "${tmp}.sorted" >"${tmp}.tar"
|
|
|
|
# Compress the archive
|
|
gzip -6 -n <"${tmp}.tar" >"${out}"
|
|
|
|
rm -f "${tmp}"{.list,.sorted,.tar}
|
|
|
|
popd >/dev/null
|
|
}
|
|
|
|
post_process_unpack() {
|
|
local dest="${1}"
|
|
local tarball="${2}"
|
|
local one_file
|
|
|
|
mkdir "${dest}"
|
|
tar -C "${dest}" --strip-components=1 -xzf "${tarball}"
|
|
one_file="$(find "${dest}" -type f -print0 |LC_ALL=C sort -z |sed 's/\x0.*//')"
|
|
touch -r "${one_file}" "${dest}.timestamp"
|
|
}
|
|
|
|
post_process_repack() {
|
|
local in_dir="${1}"
|
|
local base_dir="${2}"
|
|
local out="${3}"
|
|
local date
|
|
|
|
date="@$(stat -c '%Y' "${in_dir}/${base_dir}.timestamp")"
|
|
|
|
mk_tar_gz "${in_dir}/${base_dir}" "${base_dir}" "${date}" "${out}"
|
|
}
|
|
|
|
# Keep this line and the following as last lines in this file.
|
|
# vim: ft=bash
|