Superpipe Studio¶
Superpipe Studio is a free and open-source observability and experimentation app for the Superpipe SDK. It can help you:
- Log and monitor results of your Superpipe pipelines in dev or production
- Manage datasets and build golden sets for ground truth labeling and evaluation
- Track experiments/grid searches and compare them on accuracy, latency and cost
Superpipe Studio is a Next JS app that can be run locally or self-hosted with Vercel.
Demo
Running Superpipe Studio¶
To run Superpipe Studio locally follow instructions in Running Studio locally or to self-host with Vercel follow instructions in Deplying Studio with Vercel.
Usage with Superpipe¶
- Install the superpipe-studio python library with
pip install superpipe-studio
. Also make sure you're on the latest version of superpipe (pip install superpipe-py -U
). - Set the following environment variables:
SUPERPIPE_STUDIO_URL
= the url where your studio instance is hostedSUPERPIPE_API_KEY
= your Superpipe API key if running Studio with authentication enabled (see the Authenticating Superpipe Studio requests section of the Studio readme)
Logging¶
To log a pipeline to Studio, simply pass in enable_logging=True
when calling pipeline.run
.
It’s helpful to set the pipeline’s name
field when initializing it to identify the pipeline logs in Studio.
Datasets¶
Creating a Studio dataset uploads the data to Studio where you can visualize it in a convenient interface. It also allows you to use the same dataset across experiments.
Datasets are created by calling the constructor of the Dataset
class and passing in a pandas dataframe.
from studio import Dataset
import pandas as pd
df = pd.DataFrame(...)
dataset = Dataset(data=df, name="furniture")
Ground truth columns
You can optionally specify a list of ground truth columns which will be shown separately in the Studio UI and can be edited. This lets you create datasets with accurate ground truth labels that can be used to evaluate your pipelines.
You can also download a dataset that already exists in Studio by passing in its id
.
To add data to an existing dataset, call the add_data
function on a dataset and pass in a pandas dataframe.
Experiments¶
Experiments in Studio help you log the results of running a pipeline or a grid search on a dataset, so you can evaluate their accuracy, cost and speed and compare pipelines objectively.
To run a pipeline experiment, define your pipeline as usual, call pipeline.run_experiment
and pass in a pandas dataframe, a Studio dataset object or a dataset id string. If you pass in a dataframe, a Studio dataset object will be created and can be reused for future experiments. If you pass in a dataset id, a Studio dataset will be downloaded.
import pandas as pd
from studio import Dataset
# running an experiment on a dataframe implicitly creates a new dataset
df = pd.DataFrame(...)
pipeline.run_experiment(data=df)
# running an experiment on an existing dataset
dataset = Dataset(data=df, name="furniture")
pipeline.run_experiment(data=dataset)
To run a grid search experiment, define your grid search as usual, call grid_search.run_experiment
. Everything else is the same as a pipeline experiment, but you will see one experiment created for each set of parameters in the grid search.
Experiment groups
Studio intelligently groups pipeline experiments so you can compare them more easily. Two experiments will be added to the same group if:
- they have the same name, and
- they have the same steps (but the steps can have different params)
Grid searches are automatically grouped into the same experiment group.
Usage without Superpipe¶
Studio can be used even if you're not using Superpipe to build your LLM pipelines.
You can interact with Studio directly via REST API - see API docs here.