ndmapping_groupby - Code Extractor

class ndmapping_groupby

Maturity: 47

A parameterized function class that performs groupby operations on NdMapping objects, automatically using pandas for improved performance when available, falling back to pure Python implementation otherwise.

File:
/tf/active/vicechatdev/patches/util.py

Lines:
1864 - 1918

Complexity:
moderate

Purpose

This class provides an optimized groupby operation for NdMapping data structures. It intelligently selects between a pandas-based implementation (for performance) and a pure Python fallback. The groupby operation reorganizes the NdMapping by specified dimensions, creating a new container with grouped data. This is useful for data analysis workflows where you need to aggregate or reorganize multi-dimensional mappings by specific dimension keys.

Source Code

class ndmapping_groupby(param.ParameterizedFunction):
    """
    Apply a groupby operation to an NdMapping, using pandas to improve
    performance (if available).
    """

    sort = param.Boolean(default=False, doc='Whether to apply a sorted groupby')

    def __call__(self, ndmapping, dimensions, container_type,
                 group_type, sort=False, **kwargs):
        try:
            import pandas # noqa (optional import)
            groupby = self.groupby_pandas
        except:
            groupby = self.groupby_python
        return groupby(ndmapping, dimensions, container_type,
                       group_type, sort=sort, **kwargs)

    @param.parameterized.bothmethod
    def groupby_pandas(self_or_cls, ndmapping, dimensions, container_type,
                       group_type, sort=False, **kwargs):
        if 'kdims' in kwargs:
            idims = [ndmapping.get_dimension(d) for d in kwargs['kdims']]
        else:
            idims = [dim for dim in ndmapping.kdims if dim not in dimensions]

        all_dims = [d.name for d in ndmapping.kdims]
        inds = [ndmapping.get_dimension_index(dim) for dim in idims]
        getter = operator.itemgetter(*inds) if inds else lambda x: ()

        multi_index = pd.MultiIndex.from_tuples(ndmapping.keys(), names=all_dims)
        df = pd.DataFrame(list(map(wrap_tuple, ndmapping.values())), index=multi_index)

        # TODO: Look at sort here
        kwargs = dict(dict(get_param_values(ndmapping), kdims=idims), sort=sort, **kwargs)
        groups = ((wrap_tuple(k), group_type(OrderedDict(unpack_group(group, getter)), **kwargs))
                   for k, group in df.groupby(level=[d.name for d in dimensions], sort=sort))

        if sort:
            selects = list(get_unique_keys(ndmapping, dimensions))
            groups = sorted(groups, key=lambda x: selects.index(x[0]))

        return container_type(groups, kdims=dimensions, sort=sort)

    @param.parameterized.bothmethod
    def groupby_python(self_or_cls, ndmapping, dimensions, container_type,
                       group_type, sort=False, **kwargs):
        idims = [dim for dim in ndmapping.kdims if dim not in dimensions]
        dim_names = [dim.name for dim in dimensions]
        selects = get_unique_keys(ndmapping, dimensions)
        selects = group_select(list(selects))
        groups = [(k, group_type((v.reindex(idims) if hasattr(v, 'kdims')
                                  else [((), v)]), **kwargs))
                  for k, v in iterative_select(ndmapping, dim_names, selects)]
        return container_type(groups, kdims=dimensions)

Parameters

Name	Type	Default	Kind
`bases`	param.ParameterizedFunction	-

Parameter Details

sort: Boolean parameter (default: False) that controls whether the groupby operation should sort the resulting groups. When True, groups are sorted according to the order of unique keys in the original NdMapping.

__call__.ndmapping: The NdMapping object to perform the groupby operation on. This is the source data structure containing multi-dimensional key-value pairs.

__call__.dimensions: List of dimensions to group by. These dimensions will become the key dimensions of the resulting container.

__call__.container_type: The type of container to return after grouping. This determines the structure of the output.

__call__.group_type: The type to use for individual groups within the container. Each group will be instantiated as this type.

__call__.sort: Boolean flag to override the class-level sort parameter for this specific call.

__call__.**kwargs: Additional keyword arguments passed to the group_type constructor, such as 'kdims' to specify key dimensions for the resulting groups.

Return Value

Returns a container_type instance containing the grouped data. The container has dimensions specified by the 'dimensions' parameter, and each group is an instance of group_type. The structure is a mapping where keys are tuples of dimension values and values are the grouped NdMapping objects reindexed by the remaining dimensions.

Class Interface

Methods

`call(self, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

Purpose: Main entry point that automatically selects between pandas and Python implementations based on pandas availability, then performs the groupby operation.

Parameters:

ndmapping: The NdMapping object to group
dimensions: List of dimensions to group by
container_type: Type of container for the result
group_type: Type for individual groups
sort: Whether to sort the groups
**kwargs: Additional arguments passed to group_type constructor

Returns: A container_type instance with grouped data organized by the specified dimensions

`groupby_pandas(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

Purpose: Performs groupby operation using pandas DataFrame and MultiIndex for improved performance. Uses pandas groupby functionality on a DataFrame representation of the NdMapping.

Parameters:

self_or_cls: Instance or class (bothmethod decorator allows both)
ndmapping: The NdMapping object to group
dimensions: List of dimensions to group by
container_type: Type of container for the result
group_type: Type for individual groups
sort: Whether to sort the groups
**kwargs: Additional arguments including optional 'kdims' to specify key dimensions

Returns: A container_type instance with grouped data, using pandas for efficient grouping operations

`groupby_python(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

Purpose: Pure Python fallback implementation of groupby that doesn't require pandas. Uses iterative selection and reindexing to group the data.

Parameters:

self_or_cls: Instance or class (bothmethod decorator allows both)
ndmapping: The NdMapping object to group
dimensions: List of dimensions to group by
container_type: Type of container for the result
group_type: Type for individual groups
sort: Whether to sort the groups
**kwargs: Additional arguments passed to group_type constructor

Returns: A container_type instance with grouped data, using pure Python operations without pandas dependency

Attributes

Name	Type	Description	Scope
`sort`	param.Boolean	Class-level parameter that sets the default sorting behavior for groupby operations. Can be overridden in individual method calls.	class

Dependencies

param
pandas
operator
collections.OrderedDict

Required Imports

import param
import operator
from collections import OrderedDict

Conditional/Optional Imports

These imports are only needed under specific conditions:

import pandas as pd

Condition: Required for the pandas-based groupby implementation (groupby_pandas method). If pandas is not available, the class falls back to groupby_python method.

Optional

Usage Example

# Assuming you have an NdMapping object and appropriate container/group types
import param
from collections import OrderedDict

# Instantiate the groupby function
groupby_func = ndmapping_groupby(sort=True)

# Example usage (pseudo-code as actual NdMapping requires HoloViews context)
# ndmap = NdMapping({(1, 'a'): data1, (1, 'b'): data2, (2, 'a'): data3})
# dimensions_to_group = [dim1]  # Group by first dimension
# result = groupby_func(ndmap, dimensions_to_group, ContainerType, GroupType)

# Alternative: Call as class method without instantiation
# result = ndmapping_groupby.groupby_pandas(ndmap, dimensions, ContainerType, GroupType, sort=True)

# The result will be a ContainerType with groups organized by the specified dimensions

Best Practices

The class automatically selects the optimal implementation (pandas vs pure Python) based on availability, so no manual selection is needed.
Use the sort parameter when you need deterministic ordering of groups, especially important for reproducible results.
The groupby_pandas and groupby_python methods are decorated with @param.parameterized.bothmethod, allowing them to be called as both instance and class methods.
When pandas is available, the pandas implementation is significantly faster for large datasets due to vectorized operations.
Pass kdims in kwargs to explicitly control which dimensions remain as key dimensions in the grouped results.
The class is designed to work with NdMapping objects from HoloViews or similar frameworks that support multi-dimensional key-value mappings.
Helper functions like wrap_tuple, unpack_group, get_param_values, and get_unique_keys must be available in the module scope for the class to function properly.

Similar Components

AI-powered semantic similarity - components with related functionality:

function unpack_group 59.4% similar

Unpacks a pandas DataFrame group by iterating over rows and yielding tuples of keys and objects, with special handling for objects with 'kdims' attribute.
From: /tf/active/vicechatdev/patches/util.py
function get_ndmapping_label 54.6% similar

Retrieves the first non-auxiliary object's label attribute from an NdMapping data structure by iterating through its values.
From: /tf/active/vicechatdev/patches/util.py
function get_unique_keys 49.7% similar

Extracts unique key values from an ndmapping object for specified dimensions, returning an iterator of unique tuples.
From: /tf/active/vicechatdev/patches/util.py
function group_select 48.9% similar

Recursively groups a list of key tuples into a nested dictionary structure to optimize indexing operations by avoiding duplicate key lookups.
From: /tf/active/vicechatdev/patches/util.py
function iterative_select 44.5% similar

Recursively selects subgroups from a hierarchical object structure by iterating through dimensions and applying select operations, avoiding duplication of selections.
From: /tf/active/vicechatdev/patches/util.py

← Back to Browse

Assistant

Hi! I can help improve this code. Tell me what you'd like to enhance (e.g., "add error handling", "optimize performance", "improve readability", "add type hints").

Code Comparison

Original Code

                            class ndmapping_groupby(param.ParameterizedFunction):
    """
    Apply a groupby operation to an NdMapping, using pandas to improve
    performance (if available).
    """

    sort = param.Boolean(default=False, doc='Whether to apply a sorted groupby')

    def __call__(self, ndmapping, dimensions, container_type,
                 group_type, sort=False, **kwargs):
        try:
            import pandas # noqa (optional import)
            groupby = self.groupby_pandas
        except:
            groupby = self.groupby_python
        return groupby(ndmapping, dimensions, container_type,
                       group_type, sort=sort, **kwargs)

    @param.parameterized.bothmethod
    def groupby_pandas(self_or_cls, ndmapping, dimensions, container_type,
                       group_type, sort=False, **kwargs):
        if 'kdims' in kwargs:
            idims = [ndmapping.get_dimension(d) for d in kwargs['kdims']]
        else:
            idims = [dim for dim in ndmapping.kdims if dim not in dimensions]

        all_dims = [d.name for d in ndmapping.kdims]
        inds = [ndmapping.get_dimension_index(dim) for dim in idims]
        getter = operator.itemgetter(*inds) if inds else lambda x: ()

        multi_index = pd.MultiIndex.from_tuples(ndmapping.keys(), names=all_dims)
        df = pd.DataFrame(list(map(wrap_tuple, ndmapping.values())), index=multi_index)

        # TODO: Look at sort here
        kwargs = dict(dict(get_param_values(ndmapping), kdims=idims), sort=sort, **kwargs)
        groups = ((wrap_tuple(k), group_type(OrderedDict(unpack_group(group, getter)), **kwargs))
                   for k, group in df.groupby(level=[d.name for d in dimensions], sort=sort))

        if sort:
            selects = list(get_unique_keys(ndmapping, dimensions))
            groups = sorted(groups, key=lambda x: selects.index(x[0]))

        return container_type(groups, kdims=dimensions, sort=sort)

    @param.parameterized.bothmethod
    def groupby_python(self_or_cls, ndmapping, dimensions, container_type,
                       group_type, sort=False, **kwargs):
        idims = [dim for dim in ndmapping.kdims if dim not in dimensions]
        dim_names = [dim.name for dim in dimensions]
        selects = get_unique_keys(ndmapping, dimensions)
        selects = group_select(list(selects))
        groups = [(k, group_type((v.reindex(idims) if hasattr(v, 'kdims')
                                  else [((), v)]), **kwargs))
                  for k, v in iterative_select(ndmapping, dim_names, selects)]
        return container_type(groups, kdims=dimensions)
                        

Improved Code

🔍 Code Extractor

class ndmapping_groupby

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

`call(self, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

`groupby_pandas(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

`groupby_python(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function unpack_group 59.4% similar

function get_ndmapping_label 54.6% similar

function get_unique_keys 49.7% similar

function group_select 48.9% similar

function iterative_select 44.5% similar

class ndmapping_groupby

Purpose

Source Code

Parameters

Parameter Details

Return Value

Class Interface

Methods

__call__(self, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)

groupby_pandas(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)

groupby_python(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)

Attributes

Dependencies

Required Imports

Conditional/Optional Imports

Usage Example

Best Practices

Tags

Similar Components

function unpack_group 59.4% similar

function get_ndmapping_label 54.6% similar

function get_unique_keys 49.7% similar

function group_select 48.9% similar

function iterative_select 44.5% similar

✨ Improve Code: ndmapping_groupby

Code Comparison

`call(self, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

`groupby_pandas(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`

`groupby_python(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)`