Comparing different approaches¶

View on Github

There are many ways to build a labeling pipeline that all will accomplish the same result. The goal of Superpipe is to empower rapid and robust experimentation so that you can understand the performance, accuracy, and cost tradeoffs between approaches.

In this example, we'll experiment with a few different approaches to a categorization pipeline we want to build. Superpipe will make this experimentation quick and at the end we'll have a solid understanding of how different approaches perform.

Task¶

The task at hand is to categorize furniture items into a multi-level taxonomy based on their name and description.

For example Name: Blair Table by homestyles

Description: This Blair Table by homestyles is perfect for Sunday brunches or game night. The round pedestal table is available as shown, or as part of a five-piece set. Features solid hardwood construction in a black finish that can easily match a traditional or contemporary aesthetic. Measures: 30"H x 42" Diameter

Correct classification: Tables & Desks > Dining Tables

Approaches¶

There are two different approaches we want to try.

LLMs + Embedding
Heiarchical prompting

In [1]:

Copied!





from dotenv import load_dotenv
load_dotenv()

import os
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
COHERE_API_KEY = os.getenv('COHERE_API_KEY')
from dotenv import load_dotenv
load_dotenv()

import os
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
COHERE_API_KEY = os.getenv('COHERE_API_KEY')

In [2]:

Copied!





# %pip install cohere

import pandas as pd
from superpipe import *
from pydantic import BaseModel, Field
import cohere
import os
import numpy as np
from typing import List
# %pip install cohere

import pandas as pd
from superpipe import *
from pydantic import BaseModel, Field
import cohere
import os
import numpy as np
from typing import List

Data processing¶

We'll start out with reading in our data and building our taxonomy. The process of building a taxonomy is a project in and of itself. There are also many taxonomies available online that you can use. In our case, we're building our taxonomy based on our ground truth dataset. Since we have such a large dataset we can be reasonably confident that all values are represented. As you'll see our approach does not use the ground truth data as training data so it will be easy for us to expand the taxonomy without needing additional data.

In [3]:

Copied!

df = pd.read_csv('./furniture_clean.csv')
df = pd.read_csv('./furniture_clean.csv')

In [4]:

Copied!

# Remove the 'Furniture > ' from each string in the 'category' column since they all start with Furniture.
df['category_new'] = df['category'].str.replace('Furniture > ', '')
# Remove the 'Furniture > ' from each string in the 'category' column since they all start with Furniture.
df['category_new'] = df['category'].str.replace('Furniture > ', '')

For our embeddings approach we want the taxonomy to be a single string. We'll create the taxonomy from the ground truth data.

In [5]:

Copied!

taxonomy = list(set(df['category_new']))
taxonomy[0:5]
taxonomy = list(set(df['category_new']))
taxonomy[0:5]

Out[5]:

['Outdoor Tables > Outdoor Coffee Tables',
 'Chairs > Dining Chairs',
 'Tables & Desks > Bar Carts',
 'Chairs > Accent Chairs',
 'Chairs > Desk Chairs']

However, for our heiarchical approach we need to understand the taxonomy a little more so we'll create a lookup table between first and second level categories.

In [6]:

Copied!

# Create a lookup table with first level taxonomy as keys and second level as values
lookup_table = df['category_new'].str.split(' > ', expand=True).groupby(0)[1].apply(list).apply(set)
lookup_table['Chairs']
# Create a lookup table with first level taxonomy as keys and second level as values
lookup_table = df['category_new'].str.split(' > ', expand=True).groupby(0)[1].apply(list).apply(set)
lookup_table['Chairs']

Out[6]:

{'Accent Chairs', 'Desk Chairs', 'Dining Chairs', 'Recliners'}

Building our pipeline using Superpipe¶

Approach 1: Embeddings¶

The first approach is similar to the approach we took in the Product Categorization example we gave in the project repo. We are omitting the Google Search step because we already have item descriptions.

Write a simple description of the product given name and description
Vector embedding search for top N categories
LLM: pick the best category

In [7]:

Copied!





short_description_prompt = lambda row: f"""
You are given a product name and description for a piece of furniture.
Return a single sentence decribing the product.
Product name: {row['name']}
Product description: {row['description']}
"""

class ShortDescription(BaseModel):
  short_description: str = Field(description="A single sentence describing the product")
  
short_description_step = steps.LLMStructuredStep(
  prompt=short_description_prompt,
  model=models.gpt35,
  out_schema=ShortDescription,
  name="short_description"
)
short_description_prompt = lambda row: f"""
You are given a product name and description for a piece of furniture.
Return a single sentence decribing the product.
Product name: {row['name']}
Product description: {row['description']}
"""

class ShortDescription(BaseModel):
  short_description: str = Field(description="A single sentence describing the product")
  
short_description_step = steps.LLMStructuredStep(
  prompt=short_description_prompt,
  model=models.gpt35,
  out_schema=ShortDescription,
  name="short_description"
)

We are using Cohere to embed both our description and the taxonomy but you can substitute in any embeddings provider with the EmbeddingSearchStep. Unlike LLMs that are good at ignoring irrelevant information, we've learned from experience that short, simple descriptions work better in embedding space than trying to include too much. This is something you can and should experiment with.

In [8]:

Copied!





# set your cohere api key as an env var or set it directly here
COHERE_API_KEY = os.environ.get('COHERE_API_KEY')
co = cohere.Client(COHERE_API_KEY)

def embed_fn(texts: List[str]):
  embeddings = co.embed(
    model="embed-english-v3.0",
    texts=texts,
    input_type='classification'
  ).embeddings
  return np.array(embeddings).astype('float32')

embedding_search_prompt = lambda row: row["short_description"]

embedding_search_step = steps.EmbeddingSearchStep(
  search_prompt= embedding_search_prompt,
  embed_fn=embed_fn,
  k=5,    
  candidates=taxonomy,
  name="embedding_search"
)
# set your cohere api key as an env var or set it directly here
COHERE_API_KEY = os.environ.get('COHERE_API_KEY')
co = cohere.Client(COHERE_API_KEY)

def embed_fn(texts: List[str]):
  embeddings = co.embed(
    model="embed-english-v3.0",
    texts=texts,
    input_type='classification'
  ).embeddings
  return np.array(embeddings).astype('float32')

embedding_search_prompt = lambda row: row["short_description"]

embedding_search_step = steps.EmbeddingSearchStep(
  search_prompt= embedding_search_prompt,
  embed_fn=embed_fn,
  k=5,    
  candidates=taxonomy,
  name="embedding_search"
)

We now take the result of the embeddings and ask the LLM to pick the best response. It's important that our embedding search is optimized for recall because if the correct answer doesn't exist in the response our categorize step will have no chance of succeeding.

In [9]:

Copied!





def categorize_prompt(row):
    categories = ""
    i = 1
    while f"candidate{i}" in row:
        categories += f'{i}. {row["embedding_search"][f"candidate{i}"]}\n'
        i += 1

    return f"""
    You are given a product description and {i-1} options for the product's category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i-1}.
    Product description: {row['short_description']}
    Categories:
    {categories}
    """
    
class CategoryIndex(BaseModel):
    category_index: int = Field(description="The index of the most accurate category")
    
categorize_step = steps.LLMStructuredStep(
  prompt=categorize_prompt,
  model=models.gpt35,
  out_schema=CategoryIndex,
  name="categorize"
)
def categorize_prompt(row):
    categories = ""
    i = 1
    while f"candidate{i}" in row:
        categories += f'{i}. {row["embedding_search"][f"candidate{i}"]}\n'
        i += 1

    return f"""
    You are given a product description and {i-1} options for the product's category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i-1}.
    Product description: {row['short_description']}
    Categories:
    {categories}
    """
    
class CategoryIndex(BaseModel):
    category_index: int = Field(description="The index of the most accurate category")
    
categorize_step = steps.LLMStructuredStep(
  prompt=categorize_prompt,
  model=models.gpt35,
  out_schema=CategoryIndex,
  name="categorize"
)

By returning just the index we can ensure that the actual string we use is in the taxonomy since LLMs sometimes hallucinate characters. Additionally, we don't need to waste response tokens on printing the entire string.

In [10]:

Copied!





predicated_category_step = steps.CustomStep(
  transform=lambda row: row["embedding_search"][f'candidate{row["category_index"]}'],
  name="predicted_category"
)
predicated_category_step = steps.CustomStep(
  transform=lambda row: row["embedding_search"][f'candidate{row["category_index"]}'],
  name="predicted_category"
)

We'd like to test our end to end pipeline to make sure it works before we go any further. We'll make a copy of the first five rows of the dataframe and run the pipeline to make sure it works

In [11]:

Copied!

test_df = df.head(5).copy()
test_df = df.head(5).copy()

In [12]:

Copied!





evaluate = lambda row: row['predicted_category'].lower() == row['category_new'].lower()

categorizer = pipeline.Pipeline([
  short_description_step, 
  embedding_search_step, 
  categorize_step,
  predicated_category_step
], evaluation_fn=evaluate)

categorizer.run(test_df)
evaluate = lambda row: row['predicted_category'].lower() == row['category_new'].lower()

categorizer = pipeline.Pipeline([
  short_description_step, 
  embedding_search_step, 
  categorize_step,
  predicated_category_step
], evaluation_fn=evaluate)

categorizer.run(test_df)

Applying step short_description:   0%|          | 0/5 [00:00<?, ?it/s]

Applying step short_description: 100%|██████████| 5/5 [00:08<00:00,  1.60s/it]
Applying step embedding_search: 100%|██████████| 5/5 [00:00<00:00,  7.53it/s]
Applying step categorize: 100%|██████████| 5/5 [00:02<00:00,  1.72it/s]
Applying step predicted_category: 100%|██████████| 5/5 [00:00<00:00, 9271.23it/s]

Out[12]:

	name	description	category	brand.name	category_new	__short_description__	short_description	category1	category2	category3	category4	category5	__categorize__	category_index	predicted_category
0	EnGauge Deluxe Bedframe	Introducing the Engauge Deluxe Bedframe - the ...	Furniture > Beds & Headboards > Bedframes	NaN	Beds & Headboards > Bedframes	{'input_tokens': 313, 'output_tokens': 62, 'in...	Introducing the EnGauge Deluxe Bedframe - a st...	Beds & Headboards > Bedframes	Beds & Headboards > Beds	Beds & Headboards > Headboards	Mattresses & Box Springs > Mattresses	Mattresses & Box Springs > Box Springs & Found...	{'input_tokens': 208, 'output_tokens': 10, 'in...	1	Beds & Headboards > Bedframes
1	Sparrow & Wren Sullivan King Channel-Stitched ...	85"L x 83"W x 56"H \| Total weight: 150 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 169, 'output_tokens': 68, 'in...	Handcrafted Sparrow & Wren Sullivan King Chann...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Kids Beds & Headboards > Kid's Beds	Mattresses & Box Springs > Mattresses	{'input_tokens': 213, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
2	Queen Bed With Frame	Dimensions:Head Board -49H x 63.75W x 1.5DFoot...	Furniture > Beds & Headboards > Beds	Hillsdale	Beds & Headboards > Beds	{'input_tokens': 124, 'output_tokens': 60, 'in...	The Queen Bed With Frame features a stylish de...	Beds & Headboards > Bedframes	Beds & Headboards > Headboards	Beds & Headboards > Beds	Kids Beds & Headboards > Kid's Beds	Sets > Bedroom Furniture Sets	{'input_tokens': 202, 'output_tokens': 10, 'in...	1	Beds & Headboards > Bedframes
3	Dylan Queen Bed	Add a touch of a modern farmhouse to your bedr...	Furniture > Beds & Headboards > Beds	NaN	Beds & Headboards > Beds	{'input_tokens': 140, 'output_tokens': 49, 'in...	Add a touch of modern farmhouse to your bedroo...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Sets > Bedroom Furniture Sets	Kids Beds & Headboards > Kid's Beds	{'input_tokens': 191, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
4	Sparrow & Wren Mara Full Diamond-Tufted Bed	78"L x 56"W x 51"H \| Total weight: 130 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 168, 'output_tokens': 97, 'in...	The Sparrow & Wren Mara Full Diamond-Tufted Be...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Kids Beds & Headboards > Kid's Beds	Sets > Bedroom Furniture Sets	{'input_tokens': 236, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds

Let's print our pipeline statistics and see how it's doing

In [13]:

Copied!

print(categorizer.statistics)
print(categorizer.statistics)

+---------------+------------------------------+
|     score     |             0.8              |
+---------------+------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 1964} |
+---------------+------------------------------+
| output_tokens | {'gpt-3.5-turbo-0125': 386}  |
+---------------+------------------------------+
|   input_cost  |    $0.0009819999999999998    |
+---------------+------------------------------+
|  output_cost  |          $0.000579           |
+---------------+------------------------------+
|  num_success  |              5               |
+---------------+------------------------------+
|  num_failure  |              0               |
+---------------+------------------------------+
| total_latency |      10.87322429305641       |
+---------------+------------------------------+

Our pipeline is doing well but that's only on 5 data points. Let's try it on a few more.

In [14]:

Copied!

test_df100 = df.head(100).copy()
categorizer.run(test_df100)
test_df100 = df.head(100).copy()
categorizer.run(test_df100)

Applying step short_description: 100%|██████████| 100/100 [02:10<00:00,  1.31s/it]
Applying step embedding_search: 100%|██████████| 100/100 [00:14<00:00,  7.09it/s]
Applying step categorize: 100%|██████████| 100/100 [00:50<00:00,  2.00it/s]
Applying step predicted_category: 100%|██████████| 100/100 [00:00<00:00, 25426.19it/s]

Out[14]:

	name	description	category	brand.name	category_new	__short_description__	short_description	category1	category2	category3	category4	category5	__categorize__	category_index	predicted_category
0	EnGauge Deluxe Bedframe	Introducing the Engauge Deluxe Bedframe - the ...	Furniture > Beds & Headboards > Bedframes	NaN	Beds & Headboards > Bedframes	{'input_tokens': 313, 'output_tokens': 52, 'in...	Introducing the EnGauge Deluxe Bedframe, a stu...	Beds & Headboards > Bedframes	Beds & Headboards > Beds	Beds & Headboards > Headboards	Mattresses & Box Springs > Mattresses	Sets > Bedroom Furniture Sets	{'input_tokens': 193, 'output_tokens': 10, 'in...	1	Beds & Headboards > Bedframes
1	Sparrow & Wren Sullivan King Channel-Stitched ...	85"L x 83"W x 56"H \| Total weight: 150 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 169, 'output_tokens': 63, 'in...	The Sparrow & Wren Sullivan King Channel-Stitc...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Kids Beds & Headboards > Kid's Beds	Mattresses & Box Springs > Mattresses	{'input_tokens': 207, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
2	Queen Bed With Frame	Dimensions:Head Board -49H x 63.75W x 1.5DFoot...	Furniture > Beds & Headboards > Beds	Hillsdale	Beds & Headboards > Beds	{'input_tokens': 124, 'output_tokens': 55, 'in...	Queen Bed With Frame featuring a Head Board me...	Beds & Headboards > Bedframes	Beds & Headboards > Headboards	Beds & Headboards > Beds	Kids Beds & Headboards > Kid's Beds	Sets > Bedroom Furniture Sets	{'input_tokens': 197, 'output_tokens': 10, 'in...	3	Beds & Headboards > Beds
3	Dylan Queen Bed	Add a touch of a modern farmhouse to your bedr...	Furniture > Beds & Headboards > Beds	NaN	Beds & Headboards > Beds	{'input_tokens': 140, 'output_tokens': 40, 'in...	Add a touch of modern farmhouse charm to your ...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Sets > Bedroom Furniture Sets	Kids Beds & Headboards > Kid's Beds	{'input_tokens': 182, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
4	Sparrow & Wren Mara Full Diamond-Tufted Bed	78"L x 56"W x 51"H \| Total weight: 130 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 168, 'output_tokens': 54, 'in...	The Sparrow & Wren Mara Full Diamond-Tufted Be...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Mattresses & Box Springs > Mattresses	Sets > Bedroom Furniture Sets	{'input_tokens': 194, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
95	Modway Melanie Tufted Button Upholstered Fabri...	Twin \| Clean lines, a straightforward profile,...	Furniture > Beds & Headboards > Beds	Modway	Beds & Headboards > Beds	{'input_tokens': 225, 'output_tokens': 77, 'in...	The Modway Melanie Tufted Button Upholstered F...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Mattresses & Box Springs > Mattresses	Beds & Headboards > Bedframes	Sets > Bedroom Furniture Sets	{'input_tokens': 218, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
96	Concord Queen Panel Bed	Looking for a new bed that has it all? Check o...	Furniture > Beds & Headboards > Beds	Daniel's Amish	Beds & Headboards > Beds	{'input_tokens': 205, 'output_tokens': 55, 'in...	The Concord Queen Panel Bed is a contemporary ...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Kids Beds & Headboards > Kid's Beds	Sets > Bedroom Furniture Sets	{'input_tokens': 197, 'output_tokens': 11, 'in...	2	Beds & Headboards > Beds
97	Sparrow & Wren Myers King Bed	Dimensions: 85"L x 82"W x 56"H \| Headboard hei...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 271, 'output_tokens': 64, 'in...	The Sparrow & Wren Myers King Bed is a luxurio...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Kids Beds & Headboards > Kid's Beds	Mattresses & Box Springs > Mattresses	{'input_tokens': 209, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
98	Loden Beige 3 Pc Queen Upholstered Bed with 2 ...	A classic design and sophisticated silhouette ...	Furniture > Beds & Headboards > Beds	Rooms To Go	Beds & Headboards > Beds	{'input_tokens': 181, 'output_tokens': 56, 'in...	The Loden Beige 3 Pc Queen Upholstered Bed wit...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Storage > Dressers	Storage > Nightstands	Beds & Headboards > Bedframes	{'input_tokens': 192, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds
99	Hempstead Captain Bed in Graystone by A-America	Hempstead Captain Bed	Furniture > Beds & Headboards > Beds	A-America	Beds & Headboards > Beds	{'input_tokens': 97, 'output_tokens': 33, 'inp...	The Hempstead Captain Bed in Graystone by A-Am...	Beds & Headboards > Headboards	Beds & Headboards > Beds	Beds & Headboards > Bedframes	Sets > Bedroom Furniture Sets	Kids Beds & Headboards > Kid's Beds	{'input_tokens': 175, 'output_tokens': 10, 'in...	2	Beds & Headboards > Beds

100 rows × 15 columns

In [15]:

Copied!

print(categorizer.statistics)
print(categorizer.statistics)

+---------------+-------------------------------+
|     score     |              0.91             |
+---------------+-------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 39747} |
+---------------+-------------------------------+
| output_tokens |  {'gpt-3.5-turbo-0125': 6728} |
+---------------+-------------------------------+
|   input_cost  |     $0.019873500000000002     |
+---------------+-------------------------------+
|  output_cost  |     $0.010092000000000002     |
+---------------+-------------------------------+
|  num_success  |              100              |
+---------------+-------------------------------+
|  num_failure  |               0               |
+---------------+-------------------------------+
| total_latency |       191.24415755318478      |
+---------------+-------------------------------+

At current gpt-3.5-turbo pricing this batch of 100 requests cost $0.030291 and took five minutes and a half minutes to run for 90% accuracy. Let's see how hierarchical prompting does.

Approach 2: hierarchical prompting¶

Next we want to try forgoing embeddings all together and simply stuffing all of the categories into the prompt. There are too many categories to do this all in one go but we can use the fact that our categories are hierarchical and take a step by step approach.

LLM: given product name, description, and first level categories, pick the best one.
LLM: given product name, description, and second level categories, pick the best one.

We may want to iterate a bit on this process. For example, we may want to use one model in step 1 and a different model in step 2. Superpipe makes this type of hyperparameter tuning easy and robust.

In our first step we're just asking the model to pick the right top level category. This is a relatively easy task if the categories are non-overlapping or can be very difficult if there are multiple correct answers. We'll only know by trying and inspecting our losses.

In [16]:

Copied!





first_level_categories = list(lookup_table.keys())

def first_level_category_prompt(row):
    i = len(first_level_categories)

    return f"""
    You are given a product name, description and {i} options for the product's top level category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i}.
    Product description: {row['description']}
    Product name: {row['name']}
    Categories:
    {first_level_categories}
    """
    
class FirstLevelCategoryIndex(BaseModel):
    first_category_index: int = Field(description="The index of the most accurate first level category")
    
first_level_category_step = steps.LLMStructuredStep(
  prompt=first_level_category_prompt,
  model=models.gpt35,
  out_schema=FirstLevelCategoryIndex,
  name="first_categorize"
)
first_level_categories = list(lookup_table.keys())

def first_level_category_prompt(row):
    i = len(first_level_categories)

    return f"""
    You are given a product name, description and {i} options for the product's top level category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i}.
    Product description: {row['description']}
    Product name: {row['name']}
    Categories:
    {first_level_categories}
    """
    
class FirstLevelCategoryIndex(BaseModel):
    first_category_index: int = Field(description="The index of the most accurate first level category")
    
first_level_category_step = steps.LLMStructuredStep(
  prompt=first_level_category_prompt,
  model=models.gpt35,
  out_schema=FirstLevelCategoryIndex,
  name="first_categorize"
)

In [17]:

Copied!





select_first_category_step = steps.CustomStep(
  transform=lambda row: first_level_categories[row["first_category_index"] - 1],
  name="predicted_first_category"
)
select_first_category_step = steps.CustomStep(
  transform=lambda row: first_level_categories[row["first_category_index"] - 1],
  name="predicted_first_category"
)

Next we'll give the second layer of the taxonomy to the model to classify. Just as before are trying to predict the index to make sure our final output is valid.

In [18]:

Copied!





def second_level_category_prompt(row):
    second_level_categories = list(lookup_table[row['predicted_first_category']])
    i = len(second_level_categories)

    return f"""
    You are given a product name, description, first level category 
    and {i} options for the product's second level category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i}.
    Product description: {row['description']}
    Product name: {row['name']}
    First level category: {row['predicted_first_category']}
    Categories:
    {second_level_categories}
    """
    
class SecondLevelCategoryIndex(BaseModel):
    second_category_index: int = Field(description="The index of the most accurate second level category")
    
second_level_category_step = steps.LLMStructuredStep(
  prompt=second_level_category_prompt,
  model=models.gpt35,
  out_schema=SecondLevelCategoryIndex,
  name="second_categorize"
)
def second_level_category_prompt(row):
    second_level_categories = list(lookup_table[row['predicted_first_category']])
    i = len(second_level_categories)

    return f"""
    You are given a product name, description, first level category 
    and {i} options for the product's second level category.
    Pick the index of the most accurate category.
    The index must be between 1 and {i}.
    Product description: {row['description']}
    Product name: {row['name']}
    First level category: {row['predicted_first_category']}
    Categories:
    {second_level_categories}
    """
    
class SecondLevelCategoryIndex(BaseModel):
    second_category_index: int = Field(description="The index of the most accurate second level category")
    
second_level_category_step = steps.LLMStructuredStep(
  prompt=second_level_category_prompt,
  model=models.gpt35,
  out_schema=SecondLevelCategoryIndex,
  name="second_categorize"
)

In [19]:

Copied!





select_second_category_step = steps.CustomStep(
  transform=lambda row: list(lookup_table[row['predicted_first_category']])[row["second_category_index"] - 1],
  name="predicted_second_category"
)
select_second_category_step = steps.CustomStep(
  transform=lambda row: list(lookup_table[row['predicted_first_category']])[row["second_category_index"] - 1],
  name="predicted_second_category"
)

Let's combine our results so we can properly compare to our ground truth column.

In [20]:

Copied!





combine_taxonomy_step = steps.CustomStep(
    transform=lambda row: f"{row['predicted_first_category']} > {row['predicted_second_category']}",
    name='combine_taxonomy'
)
combine_taxonomy_step = steps.CustomStep(
    transform=lambda row: f"{row['predicted_first_category']} > {row['predicted_second_category']}",
    name='combine_taxonomy'
)

In [22]:

Copied!





test_df2 = df.head(5).copy()

evaluate2 = lambda row: row['combine_taxonomy'].lower() == row['category_new'].lower()

categorizer_llm = pipeline.Pipeline([
  first_level_category_step, 
  select_first_category_step,
  second_level_category_step,
  select_second_category_step,
  combine_taxonomy_step
], evaluation_fn=evaluate2)

categorizer_llm.run(test_df2)
test_df2 = df.head(5).copy()

evaluate2 = lambda row: row['combine_taxonomy'].lower() == row['category_new'].lower()

categorizer_llm = pipeline.Pipeline([
  first_level_category_step, 
  select_first_category_step,
  second_level_category_step,
  select_second_category_step,
  combine_taxonomy_step
], evaluation_fn=evaluate2)

categorizer_llm.run(test_df2)

Applying step first_categorize: 100%|██████████| 5/5 [00:02<00:00,  2.11it/s]
Applying step predicted_first_category: 100%|██████████| 5/5 [00:00<00:00, 6659.74it/s]
Applying step second_categorize: 100%|██████████| 5/5 [00:02<00:00,  2.03it/s]
Applying step predicted_second_category: 100%|██████████| 5/5 [00:00<00:00, 6288.31it/s]
Applying step combine_taxonomy: 100%|██████████| 5/5 [00:00<00:00, 5734.62it/s]

Out[22]:

	name	description	category	brand.name	category_new	__first_categorize__	first_category_index	predicted_first_category	__second_categorize__	second_category_index	predicted_second_category	combine_taxonomy
0	EnGauge Deluxe Bedframe	Introducing the Engauge Deluxe Bedframe - the ...	Furniture > Beds & Headboards > Bedframes	NaN	Beds & Headboards > Bedframes	{'input_tokens': 419, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 372, 'output_tokens': 11, 'in...	2	Bedframes	Beds & Headboards > Bedframes
1	Sparrow & Wren Sullivan King Channel-Stitched ...	85"L x 83"W x 56"H \| Total weight: 150 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 275, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 228, 'output_tokens': 11, 'in...	3	Headboards	Beds & Headboards > Headboards
2	Queen Bed With Frame	Dimensions:Head Board -49H x 63.75W x 1.5DFoot...	Furniture > Beds & Headboards > Beds	Hillsdale	Beds & Headboards > Beds	{'input_tokens': 230, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 183, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds
3	Dylan Queen Bed	Add a touch of a modern farmhouse to your bedr...	Furniture > Beds & Headboards > Beds	NaN	Beds & Headboards > Beds	{'input_tokens': 246, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 199, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds
4	Sparrow & Wren Mara Full Diamond-Tufted Bed	78"L x 56"W x 51"H \| Total weight: 130 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 274, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 227, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds

In [23]:

Copied!

print(categorizer_llm.statistics)
print(categorizer_llm.statistics)

+---------------+------------------------------+
|     score     |             0.8              |
+---------------+------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 5306} |
+---------------+------------------------------+
| output_tokens | {'gpt-3.5-turbo-0125': 221}  |
+---------------+------------------------------+
|   input_cost  |          $0.002653           |
+---------------+------------------------------+
|  output_cost  |          $0.0003315          |
+---------------+------------------------------+
|  num_success  |              5               |
+---------------+------------------------------+
|  num_failure  |              0               |
+---------------+------------------------------+
| total_latency |      10.390514832979534      |
+---------------+------------------------------+

It works, let's run it on some more data like we did before.

In [24]:

Copied!

test_df2_100 = df.head(100).copy()
categorizer_llm.run(test_df2_100)
test_df2_100 = df.head(100).copy()
categorizer_llm.run(test_df2_100)

Applying step first_categorize: 100%|██████████| 100/100 [00:58<00:00,  1.71it/s]
Applying step predicted_first_category: 100%|██████████| 100/100 [00:00<00:00, 22609.58it/s]
Applying step second_categorize: 100%|██████████| 100/100 [03:06<00:00,  1.86s/it]
Applying step predicted_second_category: 100%|██████████| 100/100 [00:00<00:00, 27186.31it/s]
Applying step combine_taxonomy: 100%|██████████| 100/100 [00:00<00:00, 32531.64it/s]

Out[24]:

	name	description	category	brand.name	category_new	__first_categorize__	first_category_index	predicted_first_category	__second_categorize__	second_category_index	predicted_second_category	combine_taxonomy
0	EnGauge Deluxe Bedframe	Introducing the Engauge Deluxe Bedframe - the ...	Furniture > Beds & Headboards > Bedframes	NaN	Beds & Headboards > Bedframes	{'input_tokens': 419, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 372, 'output_tokens': 11, 'in...	2	Bedframes	Beds & Headboards > Bedframes
1	Sparrow & Wren Sullivan King Channel-Stitched ...	85"L x 83"W x 56"H \| Total weight: 150 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 275, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 228, 'output_tokens': 11, 'in...	3	Headboards	Beds & Headboards > Headboards
2	Queen Bed With Frame	Dimensions:Head Board -49H x 63.75W x 1.5DFoot...	Furniture > Beds & Headboards > Beds	Hillsdale	Beds & Headboards > Beds	{'input_tokens': 230, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 183, 'output_tokens': 11, 'in...	3	Headboards	Beds & Headboards > Headboards
3	Dylan Queen Bed	Add a touch of a modern farmhouse to your bedr...	Furniture > Beds & Headboards > Beds	NaN	Beds & Headboards > Beds	{'input_tokens': 246, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 199, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds
4	Sparrow & Wren Mara Full Diamond-Tufted Bed	78"L x 56"W x 51"H \| Total weight: 130 lbs. \| ...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 274, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 227, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds
...	...	...	...	...	...	...	...	...	...	...	...	...
95	Modway Melanie Tufted Button Upholstered Fabri...	Twin \| Clean lines, a straightforward profile,...	Furniture > Beds & Headboards > Beds	Modway	Beds & Headboards > Beds	{'input_tokens': 331, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 284, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds
96	Concord Queen Panel Bed	Looking for a new bed that has it all? Check o...	Furniture > Beds & Headboards > Beds	Daniel's Amish	Beds & Headboards > Beds	{'input_tokens': 311, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 264, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds
97	Sparrow & Wren Myers King Bed	Dimensions: 85"L x 82"W x 56"H \| Headboard hei...	Furniture > Beds & Headboards > Beds	Sparrow & Wren	Beds & Headboards > Beds	{'input_tokens': 377, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 330, 'output_tokens': 11, 'in...	3	Headboards	Beds & Headboards > Headboards
98	Loden Beige 3 Pc Queen Upholstered Bed with 2 ...	A classic design and sophisticated silhouette ...	Furniture > Beds & Headboards > Beds	Rooms To Go	Beds & Headboards > Beds	{'input_tokens': 287, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 240, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds
99	Hempstead Captain Bed in Graystone by A-America	Hempstead Captain Bed	Furniture > Beds & Headboards > Beds	A-America	Beds & Headboards > Beds	{'input_tokens': 203, 'output_tokens': 11, 'in...	1	Beds & Headboards	{'input_tokens': 156, 'output_tokens': 11, 'in...	1	Beds	Beds & Headboards > Beds

100 rows × 12 columns

Let's compare approach 1 to approach 2.

In [25]:

Copied!





print(categorizer.statistics)
print(f"Total cost: ${categorizer.statistics.input_cost + categorizer.statistics.output_cost}")
print(categorizer_llm.statistics)
print(f"Total cost: ${categorizer_llm.statistics.input_cost + categorizer_llm.statistics.output_cost}")
print(categorizer.statistics)
print(f"Total cost: ${categorizer.statistics.input_cost + categorizer.statistics.output_cost}")
print(categorizer_llm.statistics)
print(f"Total cost: ${categorizer_llm.statistics.input_cost + categorizer_llm.statistics.output_cost}")

+---------------+-------------------------------+
|     score     |              0.91             |
+---------------+-------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 39747} |
+---------------+-------------------------------+
| output_tokens |  {'gpt-3.5-turbo-0125': 6728} |
+---------------+-------------------------------+
|   input_cost  |     $0.019873500000000002     |
+---------------+-------------------------------+
|  output_cost  |     $0.010092000000000002     |
+---------------+-------------------------------+
|  num_success  |              100              |
+---------------+-------------------------------+
|  num_failure  |               0               |
+---------------+-------------------------------+
| total_latency |       191.24415755318478      |
+---------------+-------------------------------+
Total cost: $0.029965500000000006
+---------------+-------------------------------+
|     score     |              0.76             |
+---------------+-------------------------------+
|  input_tokens | {'gpt-3.5-turbo-0125': 58359} |
+---------------+-------------------------------+
| output_tokens |  {'gpt-3.5-turbo-0125': 2651} |
+---------------+-------------------------------+
|   input_cost  |     $0.029179500000000004     |
+---------------+-------------------------------+
|  output_cost  |     $0.003976500000000004     |
+---------------+-------------------------------+
|  num_success  |              100              |
+---------------+-------------------------------+
|  num_failure  |               0               |
+---------------+-------------------------------+
| total_latency |       254.63680869879317      |
+---------------+-------------------------------+
Total cost: $0.033156000000000005

Our hierarchical approach cost just a bit more at $0.032814 / 100 rows. It was much faster and seemed to perform better on accuracy as well. However, we're not done just yet. The power of SuperPipe is that we can easily try many different permuations of our pipeline using a grid search. There might be a better pipeline out there.

Grid search¶

Our first pipeline has three steps we want to search over.

Short description: vary the model
Embedding search: vary the number of results
Categorize: vary the model

It's not clear which permutation will work the best so we'll try all of them.

In [26]:

Copied!





from superpipe import grid_search

params_grid = {
    short_description_step.name: {
        'model': [models.gpt35, models.gpt4], 
    },
    embedding_search_step.name: {
        'k': [3, 5, 7],  
    },
    categorize_step.name: {
        'model': [models.gpt35, models.gpt4], 
    },
}

small_df = df.head(30).copy()


search_embeddings = grid_search.GridSearch(categorizer, params_grid)
search_embeddings.run(small_df)
from superpipe import grid_search

params_grid = {
    short_description_step.name: {
        'model': [models.gpt35, models.gpt4], 
    },
    embedding_search_step.name: {
        'k': [3, 5, 7],  
    },
    categorize_step.name: {
        'model': [models.gpt35, models.gpt4], 
    },
}

small_df = df.head(30).copy()


search_embeddings = grid_search.GridSearch(categorizer, params_grid)
search_embeddings.run(small_df)

Iteration 1 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 3}, 'categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step short_description: 100%|██████████| 30/30 [00:46<00:00,  1.55s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:05<00:00,  5.70it/s]
Applying step categorize: 100%|██████████| 30/30 [00:16<00:00,  1.87it/s]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 27100.82it/s]

Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 3, 'categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.8666666666666667, 'input_cost': 0.005675499999999998, 'output_cost': 0.0032055, 'total_latency': 62.25483133213129, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 11351}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 2137}), 'num_success': 30, 'num_failure': 0, 'index': -6675265432874878197}
Iteration 2 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 3}, 'categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step short_description: 100%|██████████| 30/30 [00:45<00:00,  1.53s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  6.68it/s]
Applying step categorize: 100%|██████████| 30/30 [02:08<00:00,  4.29s/it]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 28384.64it/s]

Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 3, 'categorize__model': 'gpt-4-turbo-preview', 'score': 0.9333333333333333, 'input_cost': 0.057895999999999996, 'output_cost': 0.0117495, 'total_latency': 174.37851832807064, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-preview': 5497}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 1833, 'gpt-4-turbo-preview': 300}), 'num_success': 30, 'num_failure': 0, 'index': 5054694111921705162}
Iteration 3 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 5}, 'categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step short_description: 100%|██████████| 30/30 [00:46<00:00,  1.53s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  7.15it/s]
Applying step categorize: 100%|██████████| 30/30 [00:15<00:00,  1.89it/s]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 20226.51it/s]

Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 5, 'categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.9, 'input_cost': 0.005970999999999999, 'output_cost': 0.003195, 'total_latency': 61.76637146304711, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 11942}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 2130}), 'num_success': 30, 'num_failure': 0, 'index': -4607444568377834415}
Iteration 4 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 5}, 'categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step short_description: 100%|██████████| 30/30 [00:48<00:00,  1.63s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  7.20it/s]
Applying step categorize: 100%|██████████| 30/30 [00:47<00:00,  1.58s/it]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 13654.81it/s]

Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 5, 'categorize__model': 'gpt-4-turbo-preview', 'score': 0.9666666666666667, 'input_cost': 0.062886, 'output_cost': 0.011592, 'total_latency': 96.02623478500755, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-preview': 5996}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 1728, 'gpt-4-turbo-preview': 300}), 'num_success': 30, 'num_failure': 0, 'index': -8503795277776717559}
Iteration 5 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 7}, 'categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step short_description: 100%|██████████| 30/30 [00:43<00:00,  1.45s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  6.99it/s]
Applying step categorize: 100%|██████████| 30/30 [00:18<00:00,  1.65it/s]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 13476.40it/s]

Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 7, 'categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.9, 'input_cost': 0.006242499999999998, 'output_cost': 0.003072, 'total_latency': 61.58326062496053, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 12485}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 2048}), 'num_success': 30, 'num_failure': 0, 'index': 4015312520520374081}
Iteration 6 of 12
Params:  {'short_description': {'model': 'gpt-3.5-turbo-0125'}, 'embedding_search': {'k': 7}, 'categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step short_description: 100%|██████████| 30/30 [00:41<00:00,  1.39s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  7.16it/s]
Applying step categorize: 100%|██████████| 30/30 [00:51<00:00,  1.70s/it]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 19828.10it/s]

Result:  {'short_description__model': 'gpt-3.5-turbo-0125', 'embedding_search__k': 7, 'categorize__model': 'gpt-4-turbo-preview', 'score': 0.9333333333333333, 'input_cost': 0.06853599999999999, 'output_cost': 0.0115485, 'total_latency': 92.52336829315755, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-preview': 6561}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 1699, 'gpt-4-turbo-preview': 300}), 'num_success': 30, 'num_failure': 0, 'index': -6042391003316854449}
Iteration 7 of 12
Params:  {'short_description': {'model': 'gpt-4-turbo-preview'}, 'embedding_search': {'k': 3}, 'categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step short_description: 100%|██████████| 30/30 [02:13<00:00,  4.45s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  6.62it/s]
Applying step categorize: 100%|██████████| 30/30 [00:21<00:00,  1.38it/s]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 14540.00it/s]

Result:  {'short_description__model': 'gpt-4-turbo-preview', 'embedding_search__k': 3, 'categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.6666666666666666, 'input_cost': 0.061239999999999996, 'output_cost': 0.05397149999999999, 'total_latency': 155.0610504578508, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0125': 5440}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 1784, 'gpt-3.5-turbo-0125': 301}), 'num_success': 30, 'num_failure': 0, 'index': -3802806156793363307}
Iteration 8 of 12
Params:  {'short_description': {'model': 'gpt-4-turbo-preview'}, 'embedding_search': {'k': 3}, 'categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step short_description: 100%|██████████| 30/30 [02:10<00:00,  4.35s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  7.09it/s]
Applying step categorize: 100%|██████████| 30/30 [00:39<00:00,  1.31s/it]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 13742.80it/s]

Result:  {'short_description__model': 'gpt-4-turbo-preview', 'embedding_search__k': 3, 'categorize__model': 'gpt-4-turbo-preview', 'score': 0.9, 'input_cost': 0.11293, 'output_cost': 0.062310000000000004, 'total_latency': 169.59190933196805, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 11293}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 2077}), 'num_success': 30, 'num_failure': 0, 'index': -3569261079577541644}
Iteration 9 of 12
Params:  {'short_description': {'model': 'gpt-4-turbo-preview'}, 'embedding_search': {'k': 5}, 'categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step short_description: 100%|██████████| 30/30 [01:57<00:00,  3.91s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  7.20it/s]
Applying step categorize: 100%|██████████| 30/30 [00:14<00:00,  2.04it/s]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 18842.34it/s]

Result:  {'short_description__model': 'gpt-4-turbo-preview', 'embedding_search__k': 5, 'categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.8666666666666667, 'input_cost': 0.0615815, 'output_cost': 0.055741500000000006, 'total_latency': 131.90737874808838, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0125': 6123}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 1843, 'gpt-3.5-turbo-0125': 301}), 'num_success': 30, 'num_failure': 0, 'index': 9106143806313371546}
Iteration 10 of 12
Params:  {'short_description': {'model': 'gpt-4-turbo-preview'}, 'embedding_search': {'k': 5}, 'categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step short_description: 100%|██████████| 30/30 [02:04<00:00,  4.14s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:04<00:00,  6.91it/s]
Applying step categorize: 100%|██████████| 30/30 [00:41<00:00,  1.38s/it]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 17329.45it/s]

Result:  {'short_description__model': 'gpt-4-turbo-preview', 'embedding_search__k': 5, 'categorize__model': 'gpt-4-turbo-preview', 'score': 0.9333333333333333, 'input_cost': 0.11912999999999999, 'output_cost': 0.06251999999999999, 'total_latency': 165.1544967039954, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 11913}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 2084}), 'num_success': 30, 'num_failure': 0, 'index': 2557837101084672621}
Iteration 11 of 12
Params:  {'short_description': {'model': 'gpt-4-turbo-preview'}, 'embedding_search': {'k': 7}, 'categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step short_description: 100%|██████████| 30/30 [01:51<00:00,  3.71s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:08<00:00,  3.53it/s]
Applying step categorize: 100%|██████████| 30/30 [00:15<00:00,  1.99it/s]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 19242.87it/s]

Result:  {'short_description__model': 'gpt-4-turbo-preview', 'embedding_search__k': 7, 'categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.8333333333333334, 'input_cost': 0.0618185, 'output_cost': 0.05157, 'total_latency': 126.11481383198407, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0125': 6597}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 1704, 'gpt-3.5-turbo-0125': 300}), 'num_success': 30, 'num_failure': 0, 'index': -3503115138122502664}
Iteration 12 of 12
Params:  {'short_description': {'model': 'gpt-4-turbo-preview'}, 'embedding_search': {'k': 7}, 'categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step short_description: 100%|██████████| 30/30 [01:53<00:00,  3.78s/it]
Applying step embedding_search: 100%|██████████| 30/30 [00:03<00:00,  8.31it/s]
Applying step categorize: 100%|██████████| 30/30 [00:35<00:00,  1.18s/it]
Applying step predicted_category: 100%|██████████| 30/30 [00:00<00:00, 15283.51it/s]

Result:  {'short_description__model': 'gpt-4-turbo-preview', 'embedding_search__k': 7, 'categorize__model': 'gpt-4-turbo-preview', 'score': 0.9333333333333333, 'input_cost': 0.12448999999999999, 'output_cost': 0.06029999999999999, 'total_latency': 148.47731783005293, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 12449}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 2010}), 'num_success': 30, 'num_failure': 0, 'index': -3155834270503923271}

/Users/bscharfstein/Projects/Stelo/code/superpipe/superpipe/grid_search.py:146: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  styler = styler.applymap(lambda val, col=col: apply_style(val, col), subset=[col])

Out[26]:

	short_description__model	embedding_search__k	categorize__model	score	input_cost	output_cost	total_latency	input_tokens	output_tokens	num_success	index
0	gpt-3.5-turbo-0125	3	gpt-3.5-turbo-0125	0.866667	0.005675	0.003206	62.254831	defaultdict(, {'gpt-3.5-turbo-0125': 11351})	defaultdict(, {'gpt-3.5-turbo-0125': 2137})	30	-6675265432874878197
1	gpt-3.5-turbo-0125	3	gpt-4-turbo-preview	0.933333	0.057896	0.011749	174.378518	defaultdict(, {'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-preview': 5497})	defaultdict(, {'gpt-3.5-turbo-0125': 1833, 'gpt-4-turbo-preview': 300})	30	5054694111921705162
2	gpt-3.5-turbo-0125	5	gpt-3.5-turbo-0125	0.900000	0.005971	0.003195	61.766371	defaultdict(, {'gpt-3.5-turbo-0125': 11942})	defaultdict(, {'gpt-3.5-turbo-0125': 2130})	30	-4607444568377834415
3	gpt-3.5-turbo-0125	5	gpt-4-turbo-preview	0.966667	0.062886	0.011592	96.026235	defaultdict(, {'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-preview': 5996})	defaultdict(, {'gpt-3.5-turbo-0125': 1728, 'gpt-4-turbo-preview': 300})	30	-8503795277776717559
4	gpt-3.5-turbo-0125	7	gpt-3.5-turbo-0125	0.900000	0.006242	0.003072	61.583261	defaultdict(, {'gpt-3.5-turbo-0125': 12485})	defaultdict(, {'gpt-3.5-turbo-0125': 2048})	30	4015312520520374081
5	gpt-3.5-turbo-0125	7	gpt-4-turbo-preview	0.933333	0.068536	0.011548	92.523368	defaultdict(, {'gpt-3.5-turbo-0125': 5852, 'gpt-4-turbo-preview': 6561})	defaultdict(, {'gpt-3.5-turbo-0125': 1699, 'gpt-4-turbo-preview': 300})	30	-6042391003316854449
6	gpt-4-turbo-preview	3	gpt-3.5-turbo-0125	0.666667	0.061240	0.053971	155.061050	defaultdict(, {'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0125': 5440})	defaultdict(, {'gpt-4-turbo-preview': 1784, 'gpt-3.5-turbo-0125': 301})	30	-3802806156793363307
7	gpt-4-turbo-preview	3	gpt-4-turbo-preview	0.900000	0.112930	0.062310	169.591909	defaultdict(, {'gpt-4-turbo-preview': 11293})	defaultdict(, {'gpt-4-turbo-preview': 2077})	30	-3569261079577541644
8	gpt-4-turbo-preview	5	gpt-3.5-turbo-0125	0.866667	0.061581	0.055742	131.907379	defaultdict(, {'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0125': 6123})	defaultdict(, {'gpt-4-turbo-preview': 1843, 'gpt-3.5-turbo-0125': 301})	30	9106143806313371546
9	gpt-4-turbo-preview	5	gpt-4-turbo-preview	0.933333	0.119130	0.062520	165.154497	defaultdict(, {'gpt-4-turbo-preview': 11913})	defaultdict(, {'gpt-4-turbo-preview': 2084})	30	2557837101084672621
10	gpt-4-turbo-preview	7	gpt-3.5-turbo-0125	0.833333	0.061818	0.051570	126.114814	defaultdict(, {'gpt-4-turbo-preview': 5852, 'gpt-3.5-turbo-0125': 6597})	defaultdict(, {'gpt-4-turbo-preview': 1704, 'gpt-3.5-turbo-0125': 300})	30	-3503115138122502664
11	gpt-4-turbo-preview	7	gpt-4-turbo-preview	0.933333	0.124490	0.060300	148.477318	defaultdict(, {'gpt-4-turbo-preview': 12449})	defaultdict(, {'gpt-4-turbo-preview': 2010})	30	-3155834270503923271

The results of our grid search are conveniently put into a dataframe for us to review.

Its seems that GPT-3.5 is more than sufficient for our description step and that 5 embeddings results is as well. For the last step, we have a cost/latency vs. accuracy tradeoff we need to make between the two models.

This search was only run on 30 rows so we'd want to run it more extensively before making decisions for production but at least now we can reasonably confidently narrow down our search space.

Let's do the same for our hierarchical prompting approach. This time we'll just vary the model selection for each step.

In [28]:

Copied!





params_grid = {
    first_level_category_step.name: {
        'model': [models.gpt35, models.gpt4],  
    },
    second_level_category_step.name: {
        'model': [models.gpt35, models.gpt4],  
    },
}

small_df2 = df.head(30).copy()

search_llm = grid_search.GridSearch(categorizer_llm, params_grid)
search_llm.run(small_df2)
params_grid = {
    first_level_category_step.name: {
        'model': [models.gpt35, models.gpt4],  
    },
    second_level_category_step.name: {
        'model': [models.gpt35, models.gpt4],  
    },
}

small_df2 = df.head(30).copy()

search_llm = grid_search.GridSearch(categorizer_llm, params_grid)
search_llm.run(small_df2)

Iteration 1 of 4
Params:  {'first_categorize': {'model': 'gpt-3.5-turbo-0125'}, 'second_categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step first_categorize: 100%|██████████| 30/30 [00:20<00:00,  1.45it/s]
Applying step predicted_first_category: 100%|██████████| 30/30 [00:00<00:00, 6613.54it/s]
Applying step second_categorize: 100%|██████████| 30/30 [00:16<00:00,  1.80it/s]
Applying step predicted_second_category: 100%|██████████| 30/30 [00:00<00:00, 6980.04it/s]
Applying step combine_taxonomy: 100%|██████████| 30/30 [00:00<00:00, 20631.11it/s]

Result:  {'first_categorize__model': 'gpt-3.5-turbo-0125', 'second_categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.7666666666666667, 'input_cost': 0.008323999999999998, 'output_cost': 0.0009915000000000006, 'total_latency': 37.14652425216627, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 16648}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 661}), 'num_success': 30, 'num_failure': 0, 'index': 8291905896722117770}
Iteration 2 of 4
Params:  {'first_categorize': {'model': 'gpt-3.5-turbo-0125'}, 'second_categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step first_categorize: 100%|██████████| 30/30 [00:15<00:00,  1.96it/s]
Applying step predicted_first_category: 100%|██████████| 30/30 [00:00<00:00, 10063.11it/s]
Applying step second_categorize: 100%|██████████| 30/30 [00:45<00:00,  1.50s/it]
Applying step predicted_second_category: 100%|██████████| 30/30 [00:00<00:00, 6388.56it/s]
Applying step combine_taxonomy: 100%|██████████| 30/30 [00:00<00:00, 25450.87it/s]

Result:  {'first_categorize__model': 'gpt-3.5-turbo-0125', 'second_categorize__model': 'gpt-4-turbo-preview', 'score': 0.9333333333333333, 'input_cost': 0.080676, 'output_cost': 0.010305000000000009, 'total_latency': 60.223217750986805, 'input_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 9032, 'gpt-4-turbo-preview': 7616}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-3.5-turbo-0125': 330, 'gpt-4-turbo-preview': 327}), 'num_success': 30, 'num_failure': 0, 'index': 3072009371341041402}
Iteration 3 of 4
Params:  {'first_categorize': {'model': 'gpt-4-turbo-preview'}, 'second_categorize': {'model': 'gpt-3.5-turbo-0125'}}

Applying step first_categorize: 100%|██████████| 30/30 [00:59<00:00,  1.97s/it]
Applying step predicted_first_category: 100%|██████████| 30/30 [00:00<00:00, 22137.42it/s]
Applying step second_categorize: 100%|██████████| 30/30 [00:16<00:00,  1.85it/s]
Applying step predicted_second_category: 100%|██████████| 30/30 [00:00<00:00, 8736.31it/s]
Applying step combine_taxonomy: 100%|██████████| 30/30 [00:00<00:00, 22832.36it/s]

Result:  {'first_categorize__model': 'gpt-4-turbo-preview', 'second_categorize__model': 'gpt-3.5-turbo-0125', 'score': 0.7333333333333333, 'input_cost': 0.094135, 'output_cost': 0.010396500000000008, 'total_latency': 75.15457291598432, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 9032, 'gpt-3.5-turbo-0125': 7630}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 330, 'gpt-3.5-turbo-0125': 331}), 'num_success': 30, 'num_failure': 0, 'index': -172566906004615760}
Iteration 4 of 4
Params:  {'first_categorize': {'model': 'gpt-4-turbo-preview'}, 'second_categorize': {'model': 'gpt-4-turbo-preview'}}

Applying step first_categorize: 100%|██████████| 30/30 [00:45<00:00,  1.52s/it]
Applying step predicted_first_category: 100%|██████████| 30/30 [00:00<00:00, 12587.95it/s]
Applying step second_categorize: 100%|██████████| 30/30 [00:40<00:00,  1.34s/it]
Applying step predicted_second_category: 100%|██████████| 30/30 [00:00<00:00, 8019.19it/s]
Applying step combine_taxonomy: 100%|██████████| 30/30 [00:00<00:00, 22762.14it/s]

Result:  {'first_categorize__model': 'gpt-4-turbo-preview', 'second_categorize__model': 'gpt-4-turbo-preview', 'score': 0.9, 'input_cost': 0.16662, 'output_cost': 0.019800000000000016, 'total_latency': 85.77000208501704, 'input_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 16662}), 'output_tokens': defaultdict(<class 'int'>, {'gpt-4-turbo-preview': 660}), 'num_success': 30, 'num_failure': 0, 'index': 7445854736442369664}

/Users/bscharfstein/Projects/Stelo/code/superpipe/superpipe/grid_search.py:146: FutureWarning: Styler.applymap has been deprecated. Use Styler.map instead.
  styler = styler.applymap(lambda val, col=col: apply_style(val, col), subset=[col])

Out[28]:

	first_categorize__model	second_categorize__model	score	input_cost	output_cost	total_latency	input_tokens	output_tokens	num_success	index
0	gpt-3.5-turbo-0125	gpt-3.5-turbo-0125	0.766667	0.008324	0.000992	37.146524	defaultdict(, {'gpt-3.5-turbo-0125': 16648})	defaultdict(, {'gpt-3.5-turbo-0125': 661})	30	8291905896722117770
1	gpt-3.5-turbo-0125	gpt-4-turbo-preview	0.933333	0.080676	0.010305	60.223218	defaultdict(, {'gpt-3.5-turbo-0125': 9032, 'gpt-4-turbo-preview': 7616})	defaultdict(, {'gpt-3.5-turbo-0125': 330, 'gpt-4-turbo-preview': 327})	30	3072009371341041402
2	gpt-4-turbo-preview	gpt-3.5-turbo-0125	0.733333	0.094135	0.010397	75.154573	defaultdict(, {'gpt-4-turbo-preview': 9032, 'gpt-3.5-turbo-0125': 7630})	defaultdict(, {'gpt-4-turbo-preview': 330, 'gpt-3.5-turbo-0125': 331})	30	-172566906004615760
3	gpt-4-turbo-preview	gpt-4-turbo-preview	0.900000	0.166620	0.019800	85.770002	defaultdict(, {'gpt-4-turbo-preview': 16662})	defaultdict(, {'gpt-4-turbo-preview': 660})	30	7445854736442369664

These results highlight the importance of experimentation and optimization. As we can see, the GPT-3.5 + GPT-4 hierarchical pipeline performs the best with relatively low latency with the GPT-3.5 only approach performing about as well as the GPT-3.5 only + 5 embedding approach.

If we only care about accuracy, it looks like an embeddings based approach is our best bet. However, we may have other considerations. We're faced with a cost, accuracy, and latency tradeoff with no clear "best" option. Depending on what metric we care we'll choose a different approach. This is a decision we're now empowered to make with our Superpipe pipeline results.