Beam allows you to create persistent volumes that can be used across tasks. You might use persistent volumes for things like storing model artifacts created during training, or caching model weights to avoid having to download them each time your function is invoked.

Mounting a Persistent Volume

To setup a persistent volume, first mount in your app definition file. The path is relative to the files in your workspace.

app.py
app.Mount.PersistentVolume(name="model_weights", app_path="./weights")

Reading and Writing Files

You can read and write to the persistent volume like an ordinary Python file:

run.py
def get_model_weights():
    with open("./weights/my_file.pkl", "r") as f:
        f.read()
        f.close()

def update_model_weights():
    with open("./weights/my_file.pkl", "w") as f:
        f.write("update I want to persist")
        f.close()

Example: Caching Model Weights

A good use case for Persistent Volumes is caching your ML model in order to avoid downloading it each time you make a prediction.

If you’re using a Huggingface model, you can set the Huggingface cache location in app.py.

app.py
import os

app.Mount.PersistentVolume(app_path="./cached_models", name="cached_model")
os.environ["TRANSFORMERS_CACHE"] = "/workspace/cached_models"

Example: Saving Pickle Files

You can manually save files to the Persistent Volume. First, mount the volume in app.py.

app.py
app.Mount.PersistentVolume(app_path="./my_pickles", name="pickle_files")

After mounting the volume, you can read and write to it from anywhere in your app, and the contents will be persisted between runs.

run.py
import os
import pickle

APP_PATH = "./my_pickles"

def load_pkl(my_pickle):
    return pickle.load(open(f'{APP_PATH}/{my_pickle}', "rb"))

def save_pkl(my_pickle):
    return pickle.dump(my_pickle, open(f'{APP_PATH}/{pickle_id}', "wb"))