function unique_array
Returns an array of unique values from the input array while preserving the original order of first occurrence.
/tf/active/vicechatdev/patches/util.py
1147 - 1174
moderate
Purpose
This function extracts unique values from an array or list while maintaining the order in which they first appear. It handles special cases like datetime objects and empty arrays, and optimizes performance by using pandas.unique() when available. It's particularly useful for data preprocessing, deduplication, and maintaining temporal ordering in time-series data.
Source Code
def unique_array(arr):
"""
Returns an array of unique values in the input order.
Args:
arr (np.ndarray or list): The array to compute unique values on
Returns:
A new array of unique values
"""
if not len(arr):
return np.asarray(arr)
elif pd:
if isinstance(arr, np.ndarray) and arr.dtype.kind not in 'MO':
# Avoid expensive unpacking if not potentially datetime
return pd.unique(arr)
values = []
for v in arr:
if (isinstance(v, datetime_types) and
not isinstance(v, cftime_types)):
v = pd.Timestamp(v).to_datetime64()
values.append(v)
return pd.unique(values)
else:
arr = np.asarray(arr)
_, uniq_inds = np.unique(arr, return_index=True)
return arr[np.sort(uniq_inds)]
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
arr |
- | - | positional_or_keyword |
Parameter Details
arr: Input array or list from which to extract unique values. Can be a numpy.ndarray or a Python list. May contain datetime objects, numeric values, strings, or other data types. Empty arrays are handled gracefully.
Return Value
Returns a new array containing only the unique values from the input array, preserving the order of first occurrence. The return type is typically a numpy.ndarray or pandas array depending on the input type and available libraries. For empty input arrays, returns an empty numpy array with the same dtype.
Dependencies
numpypandascftime
Required Imports
import numpy as np
import pandas as pd
Conditional/Optional Imports
These imports are only needed under specific conditions:
import cftime
Condition: Required for handling cftime datetime types (climate and forecast time conventions). The code references 'cftime_types' which must be defined elsewhere in the codebase.
Required (conditional)from datetime import datetime, date, time
Condition: Required for datetime type checking. The code references 'datetime_types' which must be defined elsewhere in the codebase.
Required (conditional)Usage Example
import numpy as np
import pandas as pd
from datetime import datetime
# Define required type tuples (normally defined in module)
datetime_types = (datetime,)
cftime_types = ()
# Example 1: Simple numeric array
arr1 = [1, 2, 2, 3, 1, 4]
result1 = unique_array(arr1)
print(result1) # Output: [1 2 3 4]
# Example 2: String array
arr2 = np.array(['a', 'b', 'a', 'c', 'b'])
result2 = unique_array(arr2)
print(result2) # Output: ['a' 'b' 'c']
# Example 3: Empty array
arr3 = []
result3 = unique_array(arr3)
print(result3) # Output: []
# Example 4: Array with datetime objects
arr4 = [datetime(2023, 1, 1), datetime(2023, 1, 2), datetime(2023, 1, 1)]
result4 = unique_array(arr4)
print(result4) # Output: unique datetime values
Best Practices
- Ensure 'datetime_types' and 'cftime_types' are properly defined in the module scope before using this function
- The function preserves input order, which is important for time-series data - use this instead of np.unique() when order matters
- For large arrays with non-datetime types, the pandas path is more efficient than the numpy fallback
- Empty arrays are handled safely and return empty numpy arrays
- The function automatically converts datetime objects to pandas Timestamp format for consistency
- If pandas is not available, the function falls back to a numpy-based implementation that may be slower but still preserves order
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function unique_iterator 46.2% similar
-
function unique_zip 44.8% similar
-
function asarray 42.4% similar
-
function arglexsort 41.0% similar
-
function get_unique_keys 40.5% similar