Pipelines¶
Pipelines are the engines that make Superpipe run. A pipeline is a series of steps chained together that acts on a dataframe. A pipeline takes an optional evaluation function that can run arbitrary Python code. Evaluation functions need to return booleans.
Pipeline statistics¶
A pipeline object has associated pipeline statistics.
Stat | Description |
---|---|
score | Accuracy score of the pipeline as defined by the evaluation function. |
input_tokens | Total number of input tokens used by the pipeline split out by model. |
output_tokens | Total number of output tokens used by the pipeline split out by model. |
input_cost | Total input cost of the pipeline split out by model. |
output_cost | Total output cost of the pipeline split out by model. |
num_success | Number of successful rows. |
num_failure | Number of unsuccessful rows. |
total_latency | Total latency of the pipeline. |
Pipeline methods¶
update_param()¶
pipeline.update_params()
takes a parameters dictionary of steps and parameters.
For example, to update the categorize
pipeline to use GPT-4, we can call update_param
and pass in the step name as the key, with a sub dictionary with model as the key.
Example¶
You can find the full code for this example in the comparing pipelines example. This is just the pipeline definition.