🔍 Code Extractor

class ndmapping_groupby

Maturity: 47

A parameterized function class that performs groupby operations on NdMapping objects, automatically using pandas for improved performance when available, falling back to pure Python implementation otherwise.

File:
/tf/active/vicechatdev/patches/util.py
Lines:
1864 - 1918
Complexity:
moderate

Purpose

This class provides an optimized groupby operation for NdMapping data structures. It intelligently selects between a pandas-based implementation (for performance) and a pure Python fallback. The groupby operation reorganizes the NdMapping by specified dimensions, creating a new container with grouped data. This is useful for data analysis workflows where you need to aggregate or reorganize multi-dimensional mappings by specific dimension keys.

Source Code

class ndmapping_groupby(param.ParameterizedFunction):
    """
    Apply a groupby operation to an NdMapping, using pandas to improve
    performance (if available).
    """

    sort = param.Boolean(default=False, doc='Whether to apply a sorted groupby')

    def __call__(self, ndmapping, dimensions, container_type,
                 group_type, sort=False, **kwargs):
        try:
            import pandas # noqa (optional import)
            groupby = self.groupby_pandas
        except:
            groupby = self.groupby_python
        return groupby(ndmapping, dimensions, container_type,
                       group_type, sort=sort, **kwargs)

    @param.parameterized.bothmethod
    def groupby_pandas(self_or_cls, ndmapping, dimensions, container_type,
                       group_type, sort=False, **kwargs):
        if 'kdims' in kwargs:
            idims = [ndmapping.get_dimension(d) for d in kwargs['kdims']]
        else:
            idims = [dim for dim in ndmapping.kdims if dim not in dimensions]

        all_dims = [d.name for d in ndmapping.kdims]
        inds = [ndmapping.get_dimension_index(dim) for dim in idims]
        getter = operator.itemgetter(*inds) if inds else lambda x: ()

        multi_index = pd.MultiIndex.from_tuples(ndmapping.keys(), names=all_dims)
        df = pd.DataFrame(list(map(wrap_tuple, ndmapping.values())), index=multi_index)

        # TODO: Look at sort here
        kwargs = dict(dict(get_param_values(ndmapping), kdims=idims), sort=sort, **kwargs)
        groups = ((wrap_tuple(k), group_type(OrderedDict(unpack_group(group, getter)), **kwargs))
                   for k, group in df.groupby(level=[d.name for d in dimensions], sort=sort))

        if sort:
            selects = list(get_unique_keys(ndmapping, dimensions))
            groups = sorted(groups, key=lambda x: selects.index(x[0]))

        return container_type(groups, kdims=dimensions, sort=sort)

    @param.parameterized.bothmethod
    def groupby_python(self_or_cls, ndmapping, dimensions, container_type,
                       group_type, sort=False, **kwargs):
        idims = [dim for dim in ndmapping.kdims if dim not in dimensions]
        dim_names = [dim.name for dim in dimensions]
        selects = get_unique_keys(ndmapping, dimensions)
        selects = group_select(list(selects))
        groups = [(k, group_type((v.reindex(idims) if hasattr(v, 'kdims')
                                  else [((), v)]), **kwargs))
                  for k, v in iterative_select(ndmapping, dim_names, selects)]
        return container_type(groups, kdims=dimensions)

Parameters

Name Type Default Kind
bases param.ParameterizedFunction -

Parameter Details

sort: Boolean parameter (default: False) that controls whether the groupby operation should sort the resulting groups. When True, groups are sorted according to the order of unique keys in the original NdMapping.

__call__.ndmapping: The NdMapping object to perform the groupby operation on. This is the source data structure containing multi-dimensional key-value pairs.

__call__.dimensions: List of dimensions to group by. These dimensions will become the key dimensions of the resulting container.

__call__.container_type: The type of container to return after grouping. This determines the structure of the output.

__call__.group_type: The type to use for individual groups within the container. Each group will be instantiated as this type.

__call__.sort: Boolean flag to override the class-level sort parameter for this specific call.

__call__.**kwargs: Additional keyword arguments passed to the group_type constructor, such as 'kdims' to specify key dimensions for the resulting groups.

Return Value

Returns a container_type instance containing the grouped data. The container has dimensions specified by the 'dimensions' parameter, and each group is an instance of group_type. The structure is a mapping where keys are tuples of dimension values and values are the grouped NdMapping objects reindexed by the remaining dimensions.

Class Interface

Methods

__call__(self, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)

Purpose: Main entry point that automatically selects between pandas and Python implementations based on pandas availability, then performs the groupby operation.

Parameters:

  • ndmapping: The NdMapping object to group
  • dimensions: List of dimensions to group by
  • container_type: Type of container for the result
  • group_type: Type for individual groups
  • sort: Whether to sort the groups
  • **kwargs: Additional arguments passed to group_type constructor

Returns: A container_type instance with grouped data organized by the specified dimensions

groupby_pandas(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)

Purpose: Performs groupby operation using pandas DataFrame and MultiIndex for improved performance. Uses pandas groupby functionality on a DataFrame representation of the NdMapping.

Parameters:

  • self_or_cls: Instance or class (bothmethod decorator allows both)
  • ndmapping: The NdMapping object to group
  • dimensions: List of dimensions to group by
  • container_type: Type of container for the result
  • group_type: Type for individual groups
  • sort: Whether to sort the groups
  • **kwargs: Additional arguments including optional 'kdims' to specify key dimensions

Returns: A container_type instance with grouped data, using pandas for efficient grouping operations

groupby_python(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)

Purpose: Pure Python fallback implementation of groupby that doesn't require pandas. Uses iterative selection and reindexing to group the data.

Parameters:

  • self_or_cls: Instance or class (bothmethod decorator allows both)
  • ndmapping: The NdMapping object to group
  • dimensions: List of dimensions to group by
  • container_type: Type of container for the result
  • group_type: Type for individual groups
  • sort: Whether to sort the groups
  • **kwargs: Additional arguments passed to group_type constructor

Returns: A container_type instance with grouped data, using pure Python operations without pandas dependency

Attributes

Name Type Description Scope
sort param.Boolean Class-level parameter that sets the default sorting behavior for groupby operations. Can be overridden in individual method calls. class

Dependencies

  • param
  • pandas
  • operator
  • collections.OrderedDict

Required Imports

import param
import operator
from collections import OrderedDict

Conditional/Optional Imports

These imports are only needed under specific conditions:

import pandas as pd

Condition: Required for the pandas-based groupby implementation (groupby_pandas method). If pandas is not available, the class falls back to groupby_python method.

Optional

Usage Example

# Assuming you have an NdMapping object and appropriate container/group types
import param
from collections import OrderedDict

# Instantiate the groupby function
groupby_func = ndmapping_groupby(sort=True)

# Example usage (pseudo-code as actual NdMapping requires HoloViews context)
# ndmap = NdMapping({(1, 'a'): data1, (1, 'b'): data2, (2, 'a'): data3})
# dimensions_to_group = [dim1]  # Group by first dimension
# result = groupby_func(ndmap, dimensions_to_group, ContainerType, GroupType)

# Alternative: Call as class method without instantiation
# result = ndmapping_groupby.groupby_pandas(ndmap, dimensions, ContainerType, GroupType, sort=True)

# The result will be a ContainerType with groups organized by the specified dimensions

Best Practices

  • The class automatically selects the optimal implementation (pandas vs pure Python) based on availability, so no manual selection is needed.
  • Use the sort parameter when you need deterministic ordering of groups, especially important for reproducible results.
  • The groupby_pandas and groupby_python methods are decorated with @param.parameterized.bothmethod, allowing them to be called as both instance and class methods.
  • When pandas is available, the pandas implementation is significantly faster for large datasets due to vectorized operations.
  • Pass kdims in kwargs to explicitly control which dimensions remain as key dimensions in the grouped results.
  • The class is designed to work with NdMapping objects from HoloViews or similar frameworks that support multi-dimensional key-value mappings.
  • Helper functions like wrap_tuple, unpack_group, get_param_values, and get_unique_keys must be available in the module scope for the class to function properly.

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function unpack_group 59.4% similar

    Unpacks a pandas DataFrame group by iterating over rows and yielding tuples of keys and objects, with special handling for objects with 'kdims' attribute.

    From: /tf/active/vicechatdev/patches/util.py
  • function get_ndmapping_label 54.6% similar

    Retrieves the first non-auxiliary object's label attribute from an NdMapping data structure by iterating through its values.

    From: /tf/active/vicechatdev/patches/util.py
  • function get_unique_keys 49.7% similar

    Extracts unique key values from an ndmapping object for specified dimensions, returning an iterator of unique tuples.

    From: /tf/active/vicechatdev/patches/util.py
  • function group_select 48.9% similar

    Recursively groups a list of key tuples into a nested dictionary structure to optimize indexing operations by avoiding duplicate key lookups.

    From: /tf/active/vicechatdev/patches/util.py
  • function iterative_select 44.5% similar

    Recursively selects subgroups from a hierarchical object structure by iterating through dimensions and applying select operations, avoiding duplication of selections.

    From: /tf/active/vicechatdev/patches/util.py
← Back to Browse