Bug 1094323 - packages do not build reproducibly from pip install
packages do not build reproducibly from pip install
Status: RESOLVED FIXED
Classification: openSUSE
Product: openSUSE Tumbleweed
Classification: openSUSE
Component: Development
Current
Other openSUSE Factory
: P5 - None : Normal (vote)
: ---
Assigned To: Matej Cepl
E-mail List
:
Depends on:
Blocks: 1062303
  Show dependency treegraph
 
Reported: 2018-05-23 09:12 UTC by Bernhard Wiedemann
Modified: 2020-03-27 11:40 UTC (History)
4 users (show)

See Also:
Found By: Development
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Wiedemann 2018-05-23 09:12:58 UTC
When working on reproducible builds for openSUSE I found that
some packages use pip install which creates .pyc files
that contain a random tmp path in them that is different for every build

affects at least:
python-cluster
python-jupyter_bqplot
python-jupyter_imatlab_kernel
python-PsyLab
python-sphinxcontrib-github-alt

here is an example diff:

/usr/lib/python3.6/site-packages/cluster/method/__pycache__/kmeans.cpython-36.pyc
@@ -110,7 +110,7 @@
 000006d0  00 5a 0e 63 6f 6e 74 72  6f 6c 5f 6c 65 6e 67 74  |.Z.control_lengt|
 000006e0  68 da 04 69 74 65 6d a9  00 72 13 00 00 00 fa 3a  |h..item..r.....:|
 000006f0  2f 74 6d 70 2f 70 69 70  2d 69 6e 73 74 61 6c 6c  |/tmp/pip-install|
-00000700  2d 32 75 38 6a 6f 79 74  76 2f 63 6c 75 73 74 65  |-2u8joytv/cluste|
+00000700  2d 6e 69 6c 34 6b 39 77  73 2f 63 6c 75 73 74 65  |-nil4k9ws/cluste|
 00000710  72 2f 63 6c 75 73 74 65  72 2f 6d 65 74 68 6f 64  |r/cluster/method|
 00000720  2f 6b 6d 65 61 6e 73 2e  70 79 da 08 5f 5f 69 6e  |/kmeans.py..__in|
 00000730  69 74 5f 5f 2e 00 00 00  73 20 00 00 00 00 01 06  |it__....s ......|

IMHO, those .pyc files should not contain any tmp path
because it does not exist in the target system anyway.
Comment 1 Tomáš Chvátal 2018-06-20 11:13:20 UTC
Feel free to convert them to setuptools, all packages that are using pip are wrong anyway.

I've converted python-cluster.
Comment 2 Bernhard Wiedemann 2018-06-20 11:27:38 UTC
I know too little about python-setuptools to do it.
I was actually hoping, that Todd could help there
since he originally added these packages

Current list of affected packages in Factory is:

python-cluster
python-jupyter_bqplot
python-jupyter_imatlab_kernel
python-jupyter_jupyterlab_discovery
python-jupyter_jupyterlab_github
python-jupyter_jupyterlab_latex
python-jupyter_jupyterlab
python-jupyter_kernel_test
python-jupyter_matlab_kernel
python-jupyter_nbdime
python-jupyter_rise
python-jupyter_Video_Widget
python-PsyLab

and some with pip-build instead of pip-install:
python-jupyter_octave_kernel
python-jupyter_widgetsnbextension
python-sphinxcontrib-github-alt
Comment 3 Swamp Workflow Management 2018-06-20 11:50:06 UTC
This is an autogenerated message for OBS integration:
This bug (1094323) was mentioned in
https://build.opensuse.org/request/show/618015 Factory / python-cluster
Comment 4 Todd R 2018-06-21 13:59:26 UTC
The jupyter packages at the very least pretty much have to use pip.  They aren't simple python packages, they also include configuration files, javascript code, and assets that are really hard to install reliably and can change randomly between releases in undocumented ways.  Using pip guarantees that everything that is needed to run the code is installed and installed in the right place.  

In more and more cases this includes files pulled in specifically for the wheels that aren't part of the github releases (for example npm packages), which makes it effectively impossible to use the github releases directly.  

If there is a problem with how pip is installed stuff it really needs to be fixed.  More and more packages are going for wheel-only releases, and some are dropping setup.py entirely.  Whether we like it is not, we are going to need to be able to deal with wheels.
Comment 5 Bernhard Wiedemann 2018-06-21 15:19:08 UTC
so there are 2 problems:

1.
the random tmp path in .pyc files shown in comment 0
that can be solved with a
%python_expand find %{buildroot}%{$python_sitelib} -name \*.pyc | xargs rm

Since (according to my measurements)
.pyc files dont give a significant performance boost,
we could just opt to omit them, but if someone insists on having them,
we can probably call py_compile to create them again.


2.
/usr/lib/python3.6/site-packages/PsyLab-0.4.7.12.dist-info/RECORD
generated by pip install contains entries of .pyc files in indetermistic filesystem order.
I guess, there is a os.listdir, os.walk or glob.glob call somewhere that needs a sort() or sorted() added, but finding the right line can take a while.

I guess we dont even need the RECORD file, because RPM metadata already keeps track of installed files. =>
%python_expand rm %{buildroot}%{$python_sitelib}/*%{version}*/RECORD

but actually, it would be nicer, if upstream pip would just do both right
Comment 6 Todd R 2018-06-21 16:03:01 UTC
As of pip 10, RECORD should be deterministic [1], and for my with tumbleweed it looks like it is in sorted order by file type.

I am communicating with upstream about the pyc file issue[2] .

[1] https://github.com/pypa/pip/pull/4667
[2] https://github.com/pypa/pip/issues/4371
Comment 7 Bernhard Wiedemann 2018-06-21 20:15:17 UTC
Made a working pip patch for the RECORD ordering:
https://github.com/pypa/pip/pull/5525
Comment 8 Matej Cepl 2018-12-04 11:25:04 UTC
That pull request has been merged and it should be available in Tumbleweed's version.
Comment 9 Bernhard Wiedemann 2018-12-04 12:02:38 UTC
While RECORD files are indeed reproducible now,
.pyc files are still not with the same type of diff as in comment 0

IMHO, there is no good reason for them to contain a random path under /tmp/
because that will not exist at runtime, so if it was used for anything,
it would be broken.
And if it is not used for anything, it does not need to be in there.

Is there already an upstream pip bug tracking that?
Comment 10 Bernhard Wiedemann 2018-12-06 12:31:25 UTC
On python-jupyter_imatlab_kernel I found a nice solution to this bug:
We need to change the pip install call to use --no-compile instead of --compile
and let the normal openSUSE python scripts handle the compilation instead.
Comment 11 Matej Cepl 2019-01-30 20:40:12 UTC
(In reply to Bernhard Wiedemann from comment #10)
> On python-jupyter_imatlab_kernel I found a nice solution to this bug:
> We need to change the pip install call to use --no-compile instead of
> --compile
> and let the normal openSUSE python scripts handle the compilation instead.

You mean that the macro

%python3_install \
%{_python_use_flavor python3} \
%__python3 %{py_setup} %{?py_setup_args} install \\\
    -O1 --skip-build --force --root %{buildroot} --prefix %{_prefix}

in /etc/rpm/macros.python_all should be changed by adding --no-compile to the setup.py install command?

We can do that, I suppose.
Comment 12 Tomáš Chvátal 2019-01-31 08:21:05 UTC
(In reply to Matej Cepl from comment #11)
> (In reply to Bernhard Wiedemann from comment #10)
> > On python-jupyter_imatlab_kernel I found a nice solution to this bug:
> > We need to change the pip install call to use --no-compile instead of
> > --compile
> > and let the normal openSUSE python scripts handle the compilation instead.
> 
> You mean that the macro
> 
> %python3_install \
> %{_python_use_flavor python3} \
> %__python3 %{py_setup} %{?py_setup_args} install \\\
>     -O1 --skip-build --force --root %{buildroot} --prefix %{_prefix}
> 
> in /etc/rpm/macros.python_all should be changed by adding --no-compile to
> the setup.py install command?
> 
> We can do that, I suppose.

Nope, what Berhnard means is really pip call (also sidenote we should probably provide these as macros).

From spec:

%python_expand pip%{$python_bin_suffix} install --root %{buildroot} --prefix %{_prefix} --no-deps %{SOURCE0}
Comment 13 Swamp Workflow Management 2019-04-26 20:20:06 UTC
This is an autogenerated message for OBS integration:
This bug (1094323) was mentioned in
https://build.opensuse.org/request/show/698361 Factory / jupyter-imatlab
Comment 14 Matej Cepl 2019-05-17 13:52:38 UTC
Is https://github.com/openSUSE/python-rpm-macros/commit/2ed22b611eba what you wanted?
Comment 15 Matej Cepl 2019-10-17 10:23:42 UTC
Bernhard, what’s missing from this bug, once the changes in python-rpm-macros have been made?
Comment 16 Bernhard Wiedemann 2019-10-31 12:45:15 UTC
I'm still not satisfied.

1: the new macros are only used in 1 package atm: python-tox
   However, what is worse:

2: I patched python-jupyter-require.spec with

 %install
-%python_expand pip%{$python_bin_suffix} install --root=%{buildroot} %{SOURCE0}
+cp -a %{SOURCE0} .
+%pyproject_install

And that still resulted in .pyc files that have a /tmp/pip-install-XXXXX random path embedded, because the pyproject_install macro misses the --no-compile flag.

python-tox is similarly affected.
Comment 17 Matej Cepl 2019-10-31 17:19:15 UTC
(In reply to Bernhard Wiedemann from comment #16)
> And that still resulted in .pyc files that have a /tmp/pip-install-XXXXX
> random path embedded, because the pyproject_install macro misses the
> --no-compile flag.

So, https://github.com/openSUSE/python-rpm-macros/pull/37 would be fixing this bug?
Comment 18 Bernhard Wiedemann 2019-11-29 14:04:54 UTC
Works for me. IMHO those .pyc files did not belong into binary rpms anyway.
Comment 19 Matej Cepl 2019-11-30 12:56:18 UTC
At least related to this: do you have any idea what's wrong with https://github.com/sdispater/poetry/issues/1645#issuecomment-559891651 ?
Comment 20 Bernhard Wiedemann 2020-01-16 10:07:46 UTC
I submitted updates to all affected packages
Only one left is python-ipyscales which is already reproducible, 
because it compiles .pyc files itself.
Comment 21 Swamp Workflow Management 2020-03-26 09:22:11 UTC
This is an autogenerated message for OBS integration:
This bug (1094323) was mentioned in
https://build.opensuse.org/request/show/788450 15.2 / tensorflow2
https://build.opensuse.org/request/show/788451 Backports:SLE-15-SP2 / tensorflow2
Comment 22 Swamp Workflow Management 2020-03-26 11:20:16 UTC
This is an autogenerated message for OBS integration:
This bug (1094323) was mentioned in
https://build.opensuse.org/request/show/787674 Factory / tensorflow2
Comment 23 Swamp Workflow Management 2020-03-27 11:40:07 UTC
This is an autogenerated message for OBS integration:
This bug (1094323) was mentioned in
https://build.opensuse.org/request/show/788978 15.2 / tensorflow2