🔍 Code Extractor

function arglexsort

Maturity: 43

Returns the indices that would lexicographically sort multiple arrays, treating them as columns of a structured array.

File:
/tf/active/vicechatdev/patches/util.py
Lines:
1955 - 1964
Complexity:
moderate

Purpose

This function performs lexicographical (dictionary-style) sorting on multiple arrays simultaneously, returning the indices that would sort the arrays. It's useful when you need to sort data by multiple keys in priority order, similar to SQL's ORDER BY with multiple columns. The first array has the highest sorting priority, followed by subsequent arrays as tiebreakers.

Source Code

def arglexsort(arrays):
    """
    Returns the indices of the lexicographical sorting
    order of the supplied arrays.
    """
    dtypes = ','.join(array.dtype.str for array in arrays)
    recarray = np.empty(len(arrays[0]), dtype=dtypes)
    for i, array in enumerate(arrays):
        recarray['f%s' % i] = array
    return recarray.argsort()

Parameters

Name Type Default Kind
arrays - - positional_or_keyword

Parameter Details

arrays: A sequence (list or tuple) of numpy arrays or array-like objects. All arrays must have the same length. The arrays are treated as sorting keys in order of priority - the first array is the primary sort key, the second is used to break ties in the first, and so on. Each array can have any numeric or comparable dtype.

Return Value

Returns a numpy array of integer indices that would sort the input arrays in lexicographical order. The returned array has the same length as the input arrays. These indices can be used to reorder the original arrays (e.g., arrays[0][result] would give the sorted version of the first array).

Dependencies

  • numpy

Required Imports

import numpy as np

Usage Example

import numpy as np

def arglexsort(arrays):
    dtypes = ','.join(array.dtype.str for array in arrays)
    recarray = np.empty(len(arrays[0]), dtype=dtypes)
    for i, array in enumerate(arrays):
        recarray['f%s' % i] = array
    return recarray.argsort()

# Example: Sort by first array, then by second array for ties
first_names = np.array(['John', 'Jane', 'John', 'Alice'])
ages = np.array([30, 25, 25, 30])

# Get sorting indices
indices = arglexsort([first_names, ages])
print(indices)  # Output: [3 1 2 0]

# Apply indices to sort the arrays
sorted_names = first_names[indices]
sorted_ages = ages[indices]
print(sorted_names)  # ['Alice' 'Jane' 'John' 'John']
print(sorted_ages)    # [30 25 25 30]

Best Practices

  • All input arrays must have the same length, otherwise the function will raise an error
  • The function creates a structured numpy array internally, which may use significant memory for large datasets
  • Arrays are sorted in the order they appear in the input list - first array has highest priority
  • The function works with any numpy-compatible dtype that supports comparison operations
  • For very large datasets, consider using numpy.lexsort() directly as it may be more memory efficient
  • The returned indices can be used with fancy indexing to reorder any array of the same length

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function search_indices 60.5% similar

    Finds the indices of specified values within a source array by using sorted search for efficient lookup.

    From: /tf/active/vicechatdev/patches/util.py
  • function dimension_sort 50.2% similar

    Sorts an ordered dictionary by specified dimension keys, supporting both standard Python tuple sorting and categorical ordering for dimensions with predefined values.

    From: /tf/active/vicechatdev/patches/util.py
  • function python2sort 47.4% similar

    A sorting function that mimics Python 2's behavior of grouping incomparable types separately and sorting within each group, rather than raising a TypeError when comparing incompatible types.

    From: /tf/active/vicechatdev/patches/util.py
  • function sort_topologically 41.8% similar

    Performs stackless topological sorting on a directed acyclic graph (DAG), organizing nodes into levels based on their dependencies.

    From: /tf/active/vicechatdev/patches/util.py
  • function cross_index 41.8% similar

    Efficiently indexes into a Cartesian product of iterables without materializing the full product, using a linear index to retrieve the corresponding tuple of values.

    From: /tf/active/vicechatdev/patches/util.py
← Back to Browse