Find column whose name contains a specific string

ghz 3months ago ⋅ 32 views

I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I'm searching for 'spike' in column names like 'spike-2', 'hey spike', 'spiked-in' (the 'spike' part is always continuous).

I want the column name to be returned as a string or a variable, so I access the column later with df['name'] or df[name] as normal. I've tried to find ways to do this, to no avail. Any tips?

Answers

To find a column name that contains a certain substring (e.g., 'spike') but does not exactly match it, you can use the .str.contains() method in combination with df.columns, which contains the column names of your DataFrame. Here’s how you can do it:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'spike-2': [1, 2, 3],
    'hey spike': [4, 5, 6],
    'spiked-in': [7, 8, 9],
    'other': [10, 11, 12]
})

# Search for column names that contain 'spike'
matching_columns = df.columns[df.columns.str.contains('spike')]

# If you want to access the first matched column
if not matching_columns.empty:
    name = matching_columns[0]
    print(name)
    print(df[name])

Explanation:

  1. df.columns provides an Index object with the column names.
  2. df.columns.str.contains('spike') uses a regular expression to find column names that contain 'spike'. This returns a boolean array.
  3. Using df.columns[boolean_array] filters out the column names that contain 'spike'.
  4. matching_columns[0] accesses the first matched column name.

Output:

spike-2
0    1
1    2
2    3
Name: spike-2, dtype: int64

This way, you can store the matched column name in a variable (name) and use it to access the column later.