Issue
Say I'm using the PythonVirtualenvOperator and have PyTorch as a requirement. When calling "pip freeze" I get
#requirements.txt
.
.
torch==1.8.1+cpu
and defining my task as
#tasks.py
from airflow.operators.python import PythonVirtualenvOperator
t1= PythonVirtualenvOperator(
task_id = "test",
python_version = "3.7",
python_callable = test_func,
requirements = ["torch==1.8.1+cpu"]
)
throws the ERROR: Could not find a version that satisfies the requirement torch==1.8.1+cpu
.
In the documentation from PyTorch we install it by pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
i.e downloading it from their webpage and not from pip (if I understand it correctly), which might be why pip
fails in the venv. Thus I would like to make the venv (created by airflow for the PythonVirtualOperator) to download torch from the link specified above, instead of pip.
Is that doable? And is there a difference between torch==1.8.1+cpu
and just torch==1.8.1
when using the cpu i.e does it make a difference if I just remove the +cpu
?
Solution
This seems to be working (tested on Py3.7):
requirements=["torch==1.8.1+cpu", "-f", "https://download.pytorch.org/whl/torch_stable.html"]
Logs from the task:
[2021-05-24 18:37:20,762] {process_utils.py:135} INFO - Executing cmd: virtualenv /tmp/venv9kpx2ahm --system-site-packages --python=python3.7
[2021-05-24 18:37:20,781] {process_utils.py:139} INFO - Output:
[2021-05-24 18:37:21,365] {process_utils.py:143} INFO - created virtual environment CPython3.7.10.final.0-64 in 436ms
[2021-05-24 18:37:21,367] {process_utils.py:143} INFO - creator CPython3Posix(dest=/tmp/venv9kpx2ahm, clear=False, no_vcs_ignore=False, global=True)
[2021-05-24 18:37:21,369] {process_utils.py:143} INFO - seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
[2021-05-24 18:37:21,370] {process_utils.py:143} INFO - added seed packages: pip==21.1.1, setuptools==56.0.0, wheel==0.36.2
[2021-05-24 18:37:21,371] {process_utils.py:143} INFO - activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
[2021-05-24 18:37:21,386] {process_utils.py:135} INFO - Executing cmd: /tmp/venv9kpx2ahm/bin/pip install torch==1.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
[2021-05-24 18:37:21,401] {process_utils.py:139} INFO - Output:
[2021-05-24 18:37:22,455] {process_utils.py:143} INFO - Looking in links: https://download.pytorch.org/whl/torch_stable.html
[2021-05-24 18:37:34,259] {process_utils.py:143} INFO - Collecting torch==1.8.1+cpu
[2021-05-24 18:37:34,820] {process_utils.py:143} INFO - Downloading https://download.pytorch.org/whl/cpu/torch-1.8.1%2Bcpu-cp37-cp37m-linux_x86_64.whl (169.1 MB)
[2021-05-24 18:41:46,125] {process_utils.py:143} INFO - Requirement already satisfied: numpy in /usr/local/lib/python3.7/site-packages (from torch==1.8.1+cpu) (1.20.3)
[2021-05-24 18:41:46,128] {process_utils.py:143} INFO - Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/site-packages (from torch==1.8.1+cpu) (3.7.4.3)
[2021-05-24 18:41:49,211] {process_utils.py:143} INFO - Installing collected packages: torch
[2021-05-24 18:41:57,106] {process_utils.py:143} INFO - Successfully installed torch-1.8.1+cpu
However, I'm not sure if installing pytroch for every task/DAG run is optimal. By installing required dependencies on workers you can reduce the overhead (in my case installing pytorch took 5 minutes).
Answered By - Tomasz Urbaszek Answer Checked By - Robin (WPSolving Admin)