🔍 Code Extractor

function unique_array

Maturity: 45

Returns an array of unique values from the input array while preserving the original order of first occurrence.

File:
/tf/active/vicechatdev/patches/util.py
Lines:
1147 - 1174
Complexity:
moderate

Purpose

This function extracts unique values from an array or list while maintaining the order in which they first appear. It handles special cases like datetime objects and empty arrays, and optimizes performance by using pandas.unique() when available. It's particularly useful for data preprocessing, deduplication, and maintaining temporal ordering in time-series data.

Source Code

def unique_array(arr):
    """
    Returns an array of unique values in the input order.

    Args:
       arr (np.ndarray or list): The array to compute unique values on

    Returns:
       A new array of unique values
    """
    if not len(arr):
        return np.asarray(arr)
    elif pd:
        if isinstance(arr, np.ndarray) and arr.dtype.kind not in 'MO':
            # Avoid expensive unpacking if not potentially datetime
            return pd.unique(arr)

        values = []
        for v in arr:
            if (isinstance(v, datetime_types) and
                not isinstance(v, cftime_types)):
                v = pd.Timestamp(v).to_datetime64()
            values.append(v)
        return pd.unique(values)
    else:
        arr = np.asarray(arr)
        _, uniq_inds = np.unique(arr, return_index=True)
        return arr[np.sort(uniq_inds)]

Parameters

Name Type Default Kind
arr - - positional_or_keyword

Parameter Details

arr: Input array or list from which to extract unique values. Can be a numpy.ndarray or a Python list. May contain datetime objects, numeric values, strings, or other data types. Empty arrays are handled gracefully.

Return Value

Returns a new array containing only the unique values from the input array, preserving the order of first occurrence. The return type is typically a numpy.ndarray or pandas array depending on the input type and available libraries. For empty input arrays, returns an empty numpy array with the same dtype.

Dependencies

  • numpy
  • pandas
  • cftime

Required Imports

import numpy as np
import pandas as pd

Conditional/Optional Imports

These imports are only needed under specific conditions:

import cftime

Condition: Required for handling cftime datetime types (climate and forecast time conventions). The code references 'cftime_types' which must be defined elsewhere in the codebase.

Required (conditional)
from datetime import datetime, date, time

Condition: Required for datetime type checking. The code references 'datetime_types' which must be defined elsewhere in the codebase.

Required (conditional)

Usage Example

import numpy as np
import pandas as pd
from datetime import datetime

# Define required type tuples (normally defined in module)
datetime_types = (datetime,)
cftime_types = ()

# Example 1: Simple numeric array
arr1 = [1, 2, 2, 3, 1, 4]
result1 = unique_array(arr1)
print(result1)  # Output: [1 2 3 4]

# Example 2: String array
arr2 = np.array(['a', 'b', 'a', 'c', 'b'])
result2 = unique_array(arr2)
print(result2)  # Output: ['a' 'b' 'c']

# Example 3: Empty array
arr3 = []
result3 = unique_array(arr3)
print(result3)  # Output: []

# Example 4: Array with datetime objects
arr4 = [datetime(2023, 1, 1), datetime(2023, 1, 2), datetime(2023, 1, 1)]
result4 = unique_array(arr4)
print(result4)  # Output: unique datetime values

Best Practices

  • Ensure 'datetime_types' and 'cftime_types' are properly defined in the module scope before using this function
  • The function preserves input order, which is important for time-series data - use this instead of np.unique() when order matters
  • For large arrays with non-datetime types, the pandas path is more efficient than the numpy fallback
  • Empty arrays are handled safely and return empty numpy arrays
  • The function automatically converts datetime objects to pandas Timestamp format for consistency
  • If pandas is not available, the function falls back to a numpy-based implementation that may be slower but still preserves order

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function unique_iterator 46.2% similar

    A generator function that yields unique elements from an input sequence in order of first appearance, filtering out duplicates.

    From: /tf/active/vicechatdev/patches/util.py
  • function unique_zip 44.8% similar

    Returns a unique list of tuples created by zipping multiple iterables together, removing any duplicate tuples while preserving order.

    From: /tf/active/vicechatdev/patches/util.py
  • function asarray 42.4% similar

    Converts array-like objects (lists, pandas Series, objects with __array__ method) to NumPy ndarray format with optional strict validation.

    From: /tf/active/vicechatdev/patches/util.py
  • function arglexsort 41.0% similar

    Returns the indices that would lexicographically sort multiple arrays, treating them as columns of a structured array.

    From: /tf/active/vicechatdev/patches/util.py
  • function get_unique_keys 40.5% similar

    Extracts unique key values from an ndmapping object for specified dimensions, returning an iterator of unique tuples.

    From: /tf/active/vicechatdev/patches/util.py
← Back to Browse