AnnotationInfo - Code Extractor

class AnnotationInfo

Maturity: 45

A dataclass that stores comprehensive information about a detected annotation in a PDF document, including its type, visual properties, location, and associated text content.

File:
/tf/active/vicechatdev/e-ink-llm/annotation_detector.py

Lines:
18 - 26

Complexity:
simple

Purpose

This dataclass serves as a structured container for metadata about annotations detected in PDF documents. It captures visual characteristics (color, area, bounds), classification information (annotation type, confidence score), location data (page number, bounding box), and optional text content. It is typically used as a return type or data transfer object in PDF annotation detection and analysis workflows.

Source Code

class AnnotationInfo:
    """Information about a detected annotation"""
    annotation_type: str  # 'highlight', 'strikethrough', 'markup', 'underline', 'insertion'
    confidence: float     # Confidence score 0-1
    area: int            # Area in pixels
    color: Tuple[int, int, int]  # RGB color
    bounds: Tuple[int, int, int, int]  # x, y, width, height
    page_number: int     # Page where annotation was found
    text_content: Optional[str] = None  # Associated text if available

Parameters

Name	Type	Default	Kind
`bases`	-	-

Parameter Details

annotation_type: String identifier for the type of annotation detected. Expected values are: 'highlight', 'strikethrough', 'markup', 'underline', or 'insertion'. This categorizes the visual annotation style.

confidence: Float value between 0 and 1 representing the confidence score of the annotation detection. Higher values indicate greater certainty that the detected region is indeed an annotation of the specified type.

area: Integer representing the area of the annotation in pixels. Calculated from the bounding box dimensions, useful for filtering or prioritizing annotations by size.

color: Tuple of three integers (R, G, B) representing the RGB color values of the annotation. Each value ranges from 0 to 255. Used to identify and categorize annotations by their visual appearance.

bounds: Tuple of four integers (x, y, width, height) defining the bounding box of the annotation. x and y are the top-left corner coordinates, width and height define the rectangle dimensions in pixels.

page_number: Integer indicating which page of the PDF document contains this annotation. Page numbering typically starts at 0 or 1 depending on the implementation context.

text_content: Optional string containing any text associated with or extracted from the annotation region. May be None if no text is available or if text extraction was not performed.

Return Value

Instantiation returns an AnnotationInfo object containing all the specified annotation metadata. As a dataclass, it automatically generates __init__, __repr__, __eq__, and other methods. The object is immutable by default unless frozen=False is specified in the dataclass decorator.

Class Interface

Methods

`init(annotation_type: str, confidence: float, area: int, color: Tuple[int, int, int], bounds: Tuple[int, int, int, int], page_number: int, text_content: Optional[str] = None) -> None`

Purpose: Initialize an AnnotationInfo instance with all required annotation metadata. Auto-generated by @dataclass decorator.

Parameters:

annotation_type: Type of annotation ('highlight', 'strikethrough', 'markup', 'underline', 'insertion')
confidence: Detection confidence score (0-1)
area: Annotation area in pixels
color: RGB color tuple (R, G, B)
bounds: Bounding box as (x, y, width, height)
page_number: Page number where annotation appears
text_content: Optional associated text content

Returns: None - initializes the instance

`repr() -> str`

Purpose: Return a string representation of the AnnotationInfo instance. Auto-generated by @dataclass decorator.

Returns: String representation showing all field values

`eq(other: object) -> bool`

Purpose: Compare two AnnotationInfo instances for equality based on all fields. Auto-generated by @dataclass decorator.

Parameters:

other: Another object to compare with

Returns: True if all fields are equal, False otherwise

Attributes

Name	Type	Description	Scope
`annotation_type`	str	Type of annotation detected: 'highlight', 'strikethrough', 'markup', 'underline', or 'insertion'	instance
`confidence`	float	Confidence score of the detection, ranging from 0 to 1	instance
`area`	int	Area of the annotation in pixels	instance
`color`	Tuple[int, int, int]	RGB color values of the annotation as a tuple (R, G, B)	instance
`bounds`	Tuple[int, int, int, int]	Bounding box coordinates and dimensions as (x, y, width, height)	instance
`page_number`	int	Page number where the annotation was found	instance
`text_content`	Optional[str]	Associated text content extracted from the annotation region, or None if not available	instance

Dependencies

dataclasses
typing

Required Imports

from dataclasses import dataclass
from typing import Tuple, Optional

Usage Example

from dataclasses import dataclass
from typing import Tuple, Optional

@dataclass
class AnnotationInfo:
    annotation_type: str
    confidence: float
    area: int
    color: Tuple[int, int, int]
    bounds: Tuple[int, int, int, int]
    page_number: int
    text_content: Optional[str] = None

# Create an annotation info object for a yellow highlight
annotation = AnnotationInfo(
    annotation_type='highlight',
    confidence=0.95,
    area=15000,
    color=(255, 255, 0),
    bounds=(100, 200, 300, 50),
    page_number=1,
    text_content='Important passage to remember'
)

# Access attributes
print(f"Type: {annotation.annotation_type}")
print(f"Confidence: {annotation.confidence}")
print(f"Location: Page {annotation.page_number}, bounds {annotation.bounds}")
print(f"Text: {annotation.text_content}")

# Create annotation without text content
strikethrough = AnnotationInfo(
    annotation_type='strikethrough',
    confidence=0.87,
    area=8000,
    color=(255, 0, 0),
    bounds=(150, 300, 200, 20),
    page_number=2
)

Best Practices

This is a data container class with no methods - use it to store and pass annotation information between functions
Ensure confidence values are always between 0 and 1 when creating instances
RGB color values should be integers between 0 and 255
Page numbers should be consistent with your PDF processing library's indexing (0-based or 1-based)
The bounds tuple follows (x, y, width, height) format - ensure consistency when creating instances
text_content is optional and defaults to None - only populate it when text extraction is performed
Consider validating input values in a factory function or wrapper if strict constraints are needed
This dataclass is immutable by default - create new instances rather than modifying existing ones
Use type hints when working with collections of AnnotationInfo objects (e.g., List[AnnotationInfo])
The area field should match the calculated area from bounds (width * height) for consistency

Similar Components

AI-powered semantic similarity - components with related functionality:

class AnnotationResult 76.7% similar

A dataclass that encapsulates the results of an annotation detection process on PDF documents, containing detected annotations, processing statistics, and a summary.
From: /tf/active/vicechatdev/e-ink-llm/annotation_detector.py
class PageAnalysis 65.8% similar

A dataclass that encapsulates the analysis results for a single PDF page, including its image representation, text content, dimensions, and optional analysis metadata.
From: /tf/active/vicechatdev/e-ink-llm/multi_page_processor.py
class SessionInfo 63.4% similar

A dataclass that stores session information extracted from PDF documents, including conversation ID, exchange number, confidence level, and source of extraction.
From: /tf/active/vicechatdev/e-ink-llm/session_detector.py
class AnnotationDetector 62.9% similar

A class that detects various types of annotations in PDF documents including red pen markups, highlights, strikethrough lines, underlines, and insertion marks using computer vision and native PDF annotation extraction.
From: /tf/active/vicechatdev/e-ink-llm/annotation_detector.py
class TableInfo 60.3% similar

A dataclass that encapsulates comprehensive metadata about a database table, including schema information, columns, keys, and data quality metrics.
From: /tf/active/vicechatdev/full_smartstat/dynamic_schema_discovery.py

🔍 Code Extractor

class AnnotationInfo

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`init(annotation_type: str, confidence: float, area: int, color: Tuple[int, int, int], bounds: Tuple[int, int, int, int], page_number: int, text_content: Optional[str] = None) -> None`

`repr() -> str`

`eq(other: object) -> bool`

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class AnnotationResult 76.7% similar

class PageAnalysis 65.8% similar

class SessionInfo 63.4% similar

class AnnotationDetector 62.9% similar

class TableInfo 60.3% similar

class AnnotationInfo

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__init__(annotation_type: str, confidence: float, area: int, color: Tuple[int, int, int], bounds: Tuple[int, int, int, int], page_number: int, text_content: Optional[str] = None) -> None

__repr__() -> str

__eq__(other: object) -> bool

Attributes

Dependencies

Required Imports

Usage Example

Best Practices

Tags

Similar Components

class AnnotationResult 76.7% similar

class PageAnalysis 65.8% similar

class SessionInfo 63.4% similar

class AnnotationDetector 62.9% similar

class TableInfo 60.3% similar

✨ Improve Code: AnnotationInfo

Code Comparison

`init(annotation_type: str, confidence: float, area: int, color: Tuple[int, int, int], bounds: Tuple[int, int, int, int], page_number: int, text_content: Optional[str] = None) -> None`

`repr() -> str`

`eq(other: object) -> bool`