class ndmapping_groupby
A parameterized function class that performs groupby operations on NdMapping objects, automatically using pandas for improved performance when available, falling back to pure Python implementation otherwise.
/tf/active/vicechatdev/patches/util.py
1864 - 1918
moderate
Purpose
This class provides an optimized groupby operation for NdMapping data structures. It intelligently selects between a pandas-based implementation (for performance) and a pure Python fallback. The groupby operation reorganizes the NdMapping by specified dimensions, creating a new container with grouped data. This is useful for data analysis workflows where you need to aggregate or reorganize multi-dimensional mappings by specific dimension keys.
Source Code
class ndmapping_groupby(param.ParameterizedFunction):
"""
Apply a groupby operation to an NdMapping, using pandas to improve
performance (if available).
"""
sort = param.Boolean(default=False, doc='Whether to apply a sorted groupby')
def __call__(self, ndmapping, dimensions, container_type,
group_type, sort=False, **kwargs):
try:
import pandas # noqa (optional import)
groupby = self.groupby_pandas
except:
groupby = self.groupby_python
return groupby(ndmapping, dimensions, container_type,
group_type, sort=sort, **kwargs)
@param.parameterized.bothmethod
def groupby_pandas(self_or_cls, ndmapping, dimensions, container_type,
group_type, sort=False, **kwargs):
if 'kdims' in kwargs:
idims = [ndmapping.get_dimension(d) for d in kwargs['kdims']]
else:
idims = [dim for dim in ndmapping.kdims if dim not in dimensions]
all_dims = [d.name for d in ndmapping.kdims]
inds = [ndmapping.get_dimension_index(dim) for dim in idims]
getter = operator.itemgetter(*inds) if inds else lambda x: ()
multi_index = pd.MultiIndex.from_tuples(ndmapping.keys(), names=all_dims)
df = pd.DataFrame(list(map(wrap_tuple, ndmapping.values())), index=multi_index)
# TODO: Look at sort here
kwargs = dict(dict(get_param_values(ndmapping), kdims=idims), sort=sort, **kwargs)
groups = ((wrap_tuple(k), group_type(OrderedDict(unpack_group(group, getter)), **kwargs))
for k, group in df.groupby(level=[d.name for d in dimensions], sort=sort))
if sort:
selects = list(get_unique_keys(ndmapping, dimensions))
groups = sorted(groups, key=lambda x: selects.index(x[0]))
return container_type(groups, kdims=dimensions, sort=sort)
@param.parameterized.bothmethod
def groupby_python(self_or_cls, ndmapping, dimensions, container_type,
group_type, sort=False, **kwargs):
idims = [dim for dim in ndmapping.kdims if dim not in dimensions]
dim_names = [dim.name for dim in dimensions]
selects = get_unique_keys(ndmapping, dimensions)
selects = group_select(list(selects))
groups = [(k, group_type((v.reindex(idims) if hasattr(v, 'kdims')
else [((), v)]), **kwargs))
for k, v in iterative_select(ndmapping, dim_names, selects)]
return container_type(groups, kdims=dimensions)
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
param.ParameterizedFunction | - |
Parameter Details
sort: Boolean parameter (default: False) that controls whether the groupby operation should sort the resulting groups. When True, groups are sorted according to the order of unique keys in the original NdMapping.
__call__.ndmapping: The NdMapping object to perform the groupby operation on. This is the source data structure containing multi-dimensional key-value pairs.
__call__.dimensions: List of dimensions to group by. These dimensions will become the key dimensions of the resulting container.
__call__.container_type: The type of container to return after grouping. This determines the structure of the output.
__call__.group_type: The type to use for individual groups within the container. Each group will be instantiated as this type.
__call__.sort: Boolean flag to override the class-level sort parameter for this specific call.
__call__.**kwargs: Additional keyword arguments passed to the group_type constructor, such as 'kdims' to specify key dimensions for the resulting groups.
Return Value
Returns a container_type instance containing the grouped data. The container has dimensions specified by the 'dimensions' parameter, and each group is an instance of group_type. The structure is a mapping where keys are tuples of dimension values and values are the grouped NdMapping objects reindexed by the remaining dimensions.
Class Interface
Methods
__call__(self, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)
Purpose: Main entry point that automatically selects between pandas and Python implementations based on pandas availability, then performs the groupby operation.
Parameters:
ndmapping: The NdMapping object to groupdimensions: List of dimensions to group bycontainer_type: Type of container for the resultgroup_type: Type for individual groupssort: Whether to sort the groups**kwargs: Additional arguments passed to group_type constructor
Returns: A container_type instance with grouped data organized by the specified dimensions
groupby_pandas(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)
Purpose: Performs groupby operation using pandas DataFrame and MultiIndex for improved performance. Uses pandas groupby functionality on a DataFrame representation of the NdMapping.
Parameters:
self_or_cls: Instance or class (bothmethod decorator allows both)ndmapping: The NdMapping object to groupdimensions: List of dimensions to group bycontainer_type: Type of container for the resultgroup_type: Type for individual groupssort: Whether to sort the groups**kwargs: Additional arguments including optional 'kdims' to specify key dimensions
Returns: A container_type instance with grouped data, using pandas for efficient grouping operations
groupby_python(self_or_cls, ndmapping, dimensions, container_type, group_type, sort=False, **kwargs)
Purpose: Pure Python fallback implementation of groupby that doesn't require pandas. Uses iterative selection and reindexing to group the data.
Parameters:
self_or_cls: Instance or class (bothmethod decorator allows both)ndmapping: The NdMapping object to groupdimensions: List of dimensions to group bycontainer_type: Type of container for the resultgroup_type: Type for individual groupssort: Whether to sort the groups**kwargs: Additional arguments passed to group_type constructor
Returns: A container_type instance with grouped data, using pure Python operations without pandas dependency
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
sort |
param.Boolean | Class-level parameter that sets the default sorting behavior for groupby operations. Can be overridden in individual method calls. | class |
Dependencies
parampandasoperatorcollections.OrderedDict
Required Imports
import param
import operator
from collections import OrderedDict
Conditional/Optional Imports
These imports are only needed under specific conditions:
import pandas as pd
Condition: Required for the pandas-based groupby implementation (groupby_pandas method). If pandas is not available, the class falls back to groupby_python method.
OptionalUsage Example
# Assuming you have an NdMapping object and appropriate container/group types
import param
from collections import OrderedDict
# Instantiate the groupby function
groupby_func = ndmapping_groupby(sort=True)
# Example usage (pseudo-code as actual NdMapping requires HoloViews context)
# ndmap = NdMapping({(1, 'a'): data1, (1, 'b'): data2, (2, 'a'): data3})
# dimensions_to_group = [dim1] # Group by first dimension
# result = groupby_func(ndmap, dimensions_to_group, ContainerType, GroupType)
# Alternative: Call as class method without instantiation
# result = ndmapping_groupby.groupby_pandas(ndmap, dimensions, ContainerType, GroupType, sort=True)
# The result will be a ContainerType with groups organized by the specified dimensions
Best Practices
- The class automatically selects the optimal implementation (pandas vs pure Python) based on availability, so no manual selection is needed.
- Use the sort parameter when you need deterministic ordering of groups, especially important for reproducible results.
- The groupby_pandas and groupby_python methods are decorated with @param.parameterized.bothmethod, allowing them to be called as both instance and class methods.
- When pandas is available, the pandas implementation is significantly faster for large datasets due to vectorized operations.
- Pass kdims in kwargs to explicitly control which dimensions remain as key dimensions in the grouped results.
- The class is designed to work with NdMapping objects from HoloViews or similar frameworks that support multi-dimensional key-value mappings.
- Helper functions like wrap_tuple, unpack_group, get_param_values, and get_unique_keys must be available in the module scope for the class to function properly.
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function unpack_group 59.4% similar
-
function get_ndmapping_label 54.6% similar
-
function get_unique_keys 49.7% similar
-
function group_select 48.9% similar
-
function iterative_select 44.5% similar