Notebook-to-Docker Conversion
Jupyter notebooks and Docker are two convenient ways to run Pathway code. Jupyter notebooks are useful for exploration and interactive development while Docker is used for deployment.
In notebooks, bash commands, code, and explanations are intertwined.
To successfully run a notebook with Docker, you need to extract the shell commands, such as pip install
and make Docker run them separately.
⚠️ In Jupyter notebooks the exclamation mark (!) allows users to run shell commands from inside a Jupyter Notebook code cell. Those commands should be removed and added to the Dockerfile in order to be able to run the Jupyter notebook as a regular Python file.
This tutorial will show you how to easily convert your Jupyter notebook to make it work with Docker by following those steps:
- Create a Dockerfile: This file defines the instructions for building your Docker image.
- Add the dependencies to the Dockerfile: Specify any libraries your code requires.
- Customize the Dockerfile (Optional): Add additional shell commands for specific needs.
- Refactor Pathway Code: Remove unnecessary code specific to the notebook environment such as the shell commands.
- Run with Docker: Execute your code using the created Docker image.
1. Dockerfile
The Docker deployment will be done using a Dockerfile
.
This file contains the instructions used by Docker to build a container image.
Pathway comes with its own Docker image.
To use it, you can simply create a simple file called Dockerfile
and use the pathway image with FROM
:
FROM pathwaycom/pathway:latest
COPY . .
CMD [ "python", "./your-script.py" ]
You can also use a regular Python image, you can learn more about it in the dedicated article.
2. Dependencies
In Jupyter notebooks, dependencies are installed using a code cell and the exclamation mark (!) to run pip install
bash commands.
For example, suppose you want to install langchain
, langchain_community
, and lanchain_openai
.
In a Jupyter notebook, you would create a code cell like this:
!pip install langchain
!pip install langchain_community
!pip install lanchain_openai
This cell would not work in a regular Python file. You need to remove those lines and install the dependencies via the Dockerfile.
There are two main approaches to manage dependencies in a Dockerfile:
- Installing the dependencies manually with
pip install
commands. - Using a
requirements.txt
file.
Solution 1: Using pip install
commands
If you have a small number of dependencies, you can directly list the installation commands within the Dockerfile:
FROM pathwaycom/pathway:latest
RUN pip install langchain
RUN pip install langchain_community
RUN pip install lanchain_openai
COPY . .
CMD [ "python", "./your-script.py" ]
Replace langchain
, langchain_community
, and langchain_openai
with the actual libraries your code uses.
Solution 2: Using a requirements.txt
file
For a larger number of dependencies, consider creating a requirements.txt
file that lists them:
langchain
langchain_community
langchain_openai
Then, update your Dockerfile to install dependencies from this file:
FROM pathwaycom/pathway:latest
# Copy requirements file and install dependencies
COPY requirements.txt ./
RUN pip install -r ./requirements.txt
COPY . .
CMD [ "python", "./your-script.py" ]
Choose the method that best suits your project's complexity.
3. Customize the Dockerfile
Similarly to dependencies, you may have other bash commands in your Python code that you may want to execute.
For example, suppose that your Jupyter notebook downloads data using the following cell:
!wget -nc https://your-data-add.com/data
This will not work if your Jupyter notebook is executed as a regular file. You need to remove this line and add the command to the Dockerfile:
FROM pathwaycom/pathway:latest
# Copy requirements file and install dependencies
COPY requirements.txt ./
RUN pip install -r ./requirements.txt
RUN wget -nc https://your-data-add.com/data
COPY . .
CMD [ "python", "./your-script.py" ]
⚠️ Note that the command does not have the exclamation mark (!) anymore.
You should do this step for each shell command in your Jupyter notebook.
4. Refactor Pathway Code
You need to convert your .ipynb
file to a regular .py
Python file.
You can do it directly from JupyterLab by File -> Save and Export Notebook as... -> Executable Script
.
Remove all unnecessary commands
You need to remove any code specifically used for the interactive notebook environment (e.g., displaying visualizations). Don't forget to remove all the shell commands.
(Optional) Switch to streaming
To switch your example from static to streaming, you need to:
- Remove all
pw.debug
references. - Check that all your connectors are in the streaming mode (you can set them to streaming with
mode="streaming"
). - Add a
pw.run()
.
To learn more about how to switch from batch to streaming, read our dedicated tutorial.
5. Run with Docker
Now that your Dockerfile and your Python files are ready, you can then build and run the Docker image:
docker build -t my-pathway-app .
docker run -it --rm --name my-pathway-app my-pathway-app