I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I'm searching for 'spike'
in column names like 'spike-2'
, 'hey spike'
, 'spiked-in'
(the 'spike'
part is always continuous).
I want the column name to be returned as a string or a variable, so I access the column later with df['name']
or df[name]
as normal. I've tried to find ways to do this, to no avail. Any tips?
Answers
To find a column name that contains a certain substring (e.g., 'spike'
) but does not exactly match it, you can use the .str.contains()
method in combination with df.columns
, which contains the column names of your DataFrame. Here’s how you can do it:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'spike-2': [1, 2, 3],
'hey spike': [4, 5, 6],
'spiked-in': [7, 8, 9],
'other': [10, 11, 12]
})
# Search for column names that contain 'spike'
matching_columns = df.columns[df.columns.str.contains('spike')]
# If you want to access the first matched column
if not matching_columns.empty:
name = matching_columns[0]
print(name)
print(df[name])
Explanation:
df.columns
provides an Index object with the column names.df.columns.str.contains('spike')
uses a regular expression to find column names that contain'spike'
. This returns a boolean array.- Using
df.columns[boolean_array]
filters out the column names that contain'spike'
. matching_columns[0]
accesses the first matched column name.
Output:
spike-2
0 1
1 2
2 3
Name: spike-2, dtype: int64
This way, you can store the matched column name in a variable (name
) and use it to access the column later.