🔍 Code Extractor

class AnnotationInfo

Maturity: 45

A dataclass that stores comprehensive information about a detected annotation in a PDF document, including its type, visual properties, location, and associated text content.

File:
/tf/active/vicechatdev/e-ink-llm/annotation_detector.py
Lines:
18 - 26
Complexity:
simple

Purpose

This dataclass serves as a structured container for metadata about annotations detected in PDF documents. It captures visual characteristics (color, area, bounds), classification information (annotation type, confidence score), location data (page number, bounding box), and optional text content. It is typically used as a return type or data transfer object in PDF annotation detection and analysis workflows.

Source Code

class AnnotationInfo:
    """Information about a detected annotation"""
    annotation_type: str  # 'highlight', 'strikethrough', 'markup', 'underline', 'insertion'
    confidence: float     # Confidence score 0-1
    area: int            # Area in pixels
    color: Tuple[int, int, int]  # RGB color
    bounds: Tuple[int, int, int, int]  # x, y, width, height
    page_number: int     # Page where annotation was found
    text_content: Optional[str] = None  # Associated text if available

Parameters

Name Type Default Kind
bases - -

Parameter Details

annotation_type: String identifier for the type of annotation detected. Expected values are: 'highlight', 'strikethrough', 'markup', 'underline', or 'insertion'. This categorizes the visual annotation style.

confidence: Float value between 0 and 1 representing the confidence score of the annotation detection. Higher values indicate greater certainty that the detected region is indeed an annotation of the specified type.

area: Integer representing the area of the annotation in pixels. Calculated from the bounding box dimensions, useful for filtering or prioritizing annotations by size.

color: Tuple of three integers (R, G, B) representing the RGB color values of the annotation. Each value ranges from 0 to 255. Used to identify and categorize annotations by their visual appearance.

bounds: Tuple of four integers (x, y, width, height) defining the bounding box of the annotation. x and y are the top-left corner coordinates, width and height define the rectangle dimensions in pixels.

page_number: Integer indicating which page of the PDF document contains this annotation. Page numbering typically starts at 0 or 1 depending on the implementation context.

text_content: Optional string containing any text associated with or extracted from the annotation region. May be None if no text is available or if text extraction was not performed.

Return Value

Instantiation returns an AnnotationInfo object containing all the specified annotation metadata. As a dataclass, it automatically generates __init__, __repr__, __eq__, and other methods. The object is immutable by default unless frozen=False is specified in the dataclass decorator.

Class Interface

Methods

__init__(annotation_type: str, confidence: float, area: int, color: Tuple[int, int, int], bounds: Tuple[int, int, int, int], page_number: int, text_content: Optional[str] = None) -> None

Purpose: Initialize an AnnotationInfo instance with all required annotation metadata. Auto-generated by @dataclass decorator.

Parameters:

  • annotation_type: Type of annotation ('highlight', 'strikethrough', 'markup', 'underline', 'insertion')
  • confidence: Detection confidence score (0-1)
  • area: Annotation area in pixels
  • color: RGB color tuple (R, G, B)
  • bounds: Bounding box as (x, y, width, height)
  • page_number: Page number where annotation appears
  • text_content: Optional associated text content

Returns: None - initializes the instance

__repr__() -> str

Purpose: Return a string representation of the AnnotationInfo instance. Auto-generated by @dataclass decorator.

Returns: String representation showing all field values

__eq__(other: object) -> bool

Purpose: Compare two AnnotationInfo instances for equality based on all fields. Auto-generated by @dataclass decorator.

Parameters:

  • other: Another object to compare with

Returns: True if all fields are equal, False otherwise

Attributes

Name Type Description Scope
annotation_type str Type of annotation detected: 'highlight', 'strikethrough', 'markup', 'underline', or 'insertion' instance
confidence float Confidence score of the detection, ranging from 0 to 1 instance
area int Area of the annotation in pixels instance
color Tuple[int, int, int] RGB color values of the annotation as a tuple (R, G, B) instance
bounds Tuple[int, int, int, int] Bounding box coordinates and dimensions as (x, y, width, height) instance
page_number int Page number where the annotation was found instance
text_content Optional[str] Associated text content extracted from the annotation region, or None if not available instance

Dependencies

  • dataclasses
  • typing

Required Imports

from dataclasses import dataclass
from typing import Tuple, Optional

Usage Example

from dataclasses import dataclass
from typing import Tuple, Optional

@dataclass
class AnnotationInfo:
    annotation_type: str
    confidence: float
    area: int
    color: Tuple[int, int, int]
    bounds: Tuple[int, int, int, int]
    page_number: int
    text_content: Optional[str] = None

# Create an annotation info object for a yellow highlight
annotation = AnnotationInfo(
    annotation_type='highlight',
    confidence=0.95,
    area=15000,
    color=(255, 255, 0),
    bounds=(100, 200, 300, 50),
    page_number=1,
    text_content='Important passage to remember'
)

# Access attributes
print(f"Type: {annotation.annotation_type}")
print(f"Confidence: {annotation.confidence}")
print(f"Location: Page {annotation.page_number}, bounds {annotation.bounds}")
print(f"Text: {annotation.text_content}")

# Create annotation without text content
strikethrough = AnnotationInfo(
    annotation_type='strikethrough',
    confidence=0.87,
    area=8000,
    color=(255, 0, 0),
    bounds=(150, 300, 200, 20),
    page_number=2
)

Best Practices

  • This is a data container class with no methods - use it to store and pass annotation information between functions
  • Ensure confidence values are always between 0 and 1 when creating instances
  • RGB color values should be integers between 0 and 255
  • Page numbers should be consistent with your PDF processing library's indexing (0-based or 1-based)
  • The bounds tuple follows (x, y, width, height) format - ensure consistency when creating instances
  • text_content is optional and defaults to None - only populate it when text extraction is performed
  • Consider validating input values in a factory function or wrapper if strict constraints are needed
  • This dataclass is immutable by default - create new instances rather than modifying existing ones
  • Use type hints when working with collections of AnnotationInfo objects (e.g., List[AnnotationInfo])
  • The area field should match the calculated area from bounds (width * height) for consistency

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class AnnotationResult 76.7% similar

    A dataclass that encapsulates the results of an annotation detection process on PDF documents, containing detected annotations, processing statistics, and a summary.

    From: /tf/active/vicechatdev/e-ink-llm/annotation_detector.py
  • class PageAnalysis 65.8% similar

    A dataclass that encapsulates the analysis results for a single PDF page, including its image representation, text content, dimensions, and optional analysis metadata.

    From: /tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
  • class SessionInfo 63.4% similar

    A dataclass that stores session information extracted from PDF documents, including conversation ID, exchange number, confidence level, and source of extraction.

    From: /tf/active/vicechatdev/e-ink-llm/session_detector.py
  • class AnnotationDetector 62.9% similar

    A class that detects various types of annotations in PDF documents including red pen markups, highlights, strikethrough lines, underlines, and insertion marks using computer vision and native PDF annotation extraction.

    From: /tf/active/vicechatdev/e-ink-llm/annotation_detector.py
  • class TableInfo 60.3% similar

    A dataclass that encapsulates comprehensive metadata about a database table, including schema information, columns, keys, and data quality metrics.

    From: /tf/active/vicechatdev/full_smartstat/dynamic_schema_discovery.py
← Back to Browse