🔍 Code Extractor

class MyEmbeddingFunction

Maturity: 49

Custom embedding function class that integrates OpenAI's embedding API with Chroma DB for generating vector embeddings from text documents.

File:
/tf/active/vicechatdev/project_victoria_disclosure_generator.py
Lines:
819 - 856
Complexity:
moderate

Purpose

This class serves as an adapter between Chroma DB's EmbeddingFunction interface and OpenAI's embedding API. It enables Chroma DB to use OpenAI's embedding models (like text-embedding-3-small) for converting text documents into vector representations. The class handles API authentication, embedding generation, and error fallback scenarios. It's designed to be used as a custom embedding function when initializing Chroma collections.

Source Code

class MyEmbeddingFunction(EmbeddingFunction):
    """
    Custom embedding function for Chroma DB using OpenAI embeddings.
    """
    
    def __init__(self, model_name: str, embedding_model: str, api_key: str):
        self.model_name = model_name
        self.embedding_model = embedding_model
        self.api_key = api_key
        
        # Set up OpenAI client
        os.environ["OPENAI_API_KEY"] = api_key
        from openai import OpenAI
        self.client = OpenAI(api_key=api_key)
    
    def __call__(self, input: Documents) -> Embeddings:
        """
        Generate embeddings for input documents.
        
        Args:
            input: List of document texts
            
        Returns:
            List of embedding vectors
        """
        try:
            response = self.client.embeddings.create(
                input=input,
                model=self.embedding_model
            )
            
            embeddings = [data.embedding for data in response.data]
            return embeddings
            
        except Exception as e:
            print(f"Error generating embeddings: {e}")
            # Return zero embeddings as fallback
            return [[0.0] * 1536 for _ in input]  # 1536 is dimension for text-embedding-3-small

Parameters

Name Type Default Kind
bases EmbeddingFunction -

Parameter Details

model_name: Name identifier for the model being used. This parameter is stored but not actively used in the current implementation - it appears to be for tracking or logging purposes.

embedding_model: The specific OpenAI embedding model to use (e.g., 'text-embedding-3-small', 'text-embedding-ada-002'). This determines the embedding dimensions and quality.

api_key: OpenAI API key for authentication. This key is used to initialize the OpenAI client and is also set as an environment variable.

Return Value

Instantiation returns a MyEmbeddingFunction object that can be called like a function. When called (via __call__), it returns a list of embedding vectors (Embeddings type), where each vector is a list of floats representing the semantic embedding of the corresponding input document. On error, returns zero-filled vectors with dimension 1536.

Class Interface

Methods

__init__(self, model_name: str, embedding_model: str, api_key: str)

Purpose: Initializes the embedding function with OpenAI credentials and model configuration

Parameters:

  • model_name: Name identifier for the model (stored but not actively used)
  • embedding_model: OpenAI embedding model name (e.g., 'text-embedding-3-small')
  • api_key: OpenAI API key for authentication

Returns: None (constructor)

__call__(self, input: Documents) -> Embeddings

Purpose: Generates embeddings for input documents using the OpenAI API, making the class instance callable

Parameters:

  • input: List of document texts (strings) to generate embeddings for

Returns: List of embedding vectors, where each vector is a list of floats. On success, returns actual embeddings from OpenAI. On error, returns zero-filled vectors of dimension 1536.

Attributes

Name Type Description Scope
model_name str Stores the model name identifier passed during initialization instance
embedding_model str Stores the OpenAI embedding model name to use for generating embeddings instance
api_key str Stores the OpenAI API key for authentication instance
client OpenAI OpenAI client instance used to make API calls for embedding generation instance

Dependencies

  • os
  • openai
  • chromadb

Required Imports

import os
from chromadb import Documents
from chromadb import EmbeddingFunction
from chromadb import Embeddings

Conditional/Optional Imports

These imports are only needed under specific conditions:

from openai import OpenAI

Condition: imported inside __init__ method when the class is instantiated

Required (conditional)

Usage Example

# Initialize the embedding function
api_key = 'sk-your-openai-api-key'
embedding_fn = MyEmbeddingFunction(
    model_name='my-model',
    embedding_model='text-embedding-3-small',
    api_key=api_key
)

# Use with Chroma DB
import chromadb
client = chromadb.Client()
collection = client.create_collection(
    name='my_collection',
    embedding_function=embedding_fn
)

# Or call directly to generate embeddings
documents = ['Hello world', 'This is a test document']
embeddings = embedding_fn(documents)
print(f'Generated {len(embeddings)} embeddings')
print(f'Embedding dimension: {len(embeddings[0])}')

Best Practices

  • Always provide a valid OpenAI API key with sufficient credits and permissions for embedding generation
  • The class modifies the global environment variable OPENAI_API_KEY, which may affect other parts of your application
  • Error handling returns zero-filled embeddings (1536 dimensions) as fallback - ensure your application can handle these gracefully
  • The hardcoded dimension of 1536 in the error fallback is specific to text-embedding-3-small; if using different models, this may need adjustment
  • The model_name parameter is stored but unused - consider removing it or implementing logging/tracking functionality
  • Consider implementing retry logic for transient API failures instead of immediately falling back to zero embeddings
  • The class is designed to be instantiated once and reused for multiple embedding calls to avoid repeated client initialization
  • Ensure input documents are properly formatted strings; the OpenAI API has token limits per request

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class MyEmbeddingFunction_v1 60.1% similar

    A class named MyEmbeddingFunction

    From: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py
  • class MyEmbeddingFunction_v3 59.9% similar

    A class named MyEmbeddingFunction

    From: /tf/active/vicechatdev/offline_docstore_multi.py
  • class MyEmbeddingFunction_v2 59.8% similar

    A class named MyEmbeddingFunction

    From: /tf/active/vicechatdev/offline_docstore_multi_vice.py
  • function main_v16 49.8% similar

    Entry point function that executes a comprehensive test suite for Chroma DB collections, including collection listing and creation tests, followed by troubleshooting suggestions.

    From: /tf/active/vicechatdev/test_chroma_collections.py
  • function test_collection_creation 46.9% similar

    A diagnostic test function that verifies Chroma DB functionality by creating a test collection, adding a document, querying it, and cleaning up.

    From: /tf/active/vicechatdev/test_chroma_collections.py
← Back to Browse