How to replace value in specific index in each row with corresponding value in numpy array
My dataframe looks like this:
datetime1 datetime2 datetime3 datetime4
id
1 5 6 5 5
2 7 2 3 5
3 4 2 3 2
4 6 4 4 7
5 7 3 8 9
and I have a numpy array like this:
index_arr = [3, 2, 0, 1, 2]
This numpy array refers to the index in each row, respectively, that I want to replace. The values I want to use in the replacement are in another numpy array:
replace_arr = [14, 12, 23, 17, 15]
so that the updated dataframe looks like this:
datetime1 datetime2 datetime3 datetime4
id
1 5 6 5 14
2 7 2 12 5
3 23 2 3 2
4 6 17 4 7
5 7 3 15 9
What is the best way to go about doing this replacement quickly? I've tried using enumerate and iterrows but couldn't get the syntax to work. Would appreciate any help - thank you
Answer
To efficiently perform the replacement in a pandas DataFrame based on the index_arr
and replace_arr
arrays, you can directly use pandas' .iloc[]
functionality, which allows you to access and modify specific rows and columns by their indices.
Here's a step-by-step solution to perform the replacement based on the provided arrays:
Solution:
import pandas as pd
import numpy as np
# Your initial DataFrame
df = pd.DataFrame({
'datetime1': [5, 7, 4, 6, 7],
'datetime2': [6, 2, 2, 4, 3],
'datetime3': [5, 3, 3, 4, 8],
'datetime4': [5, 5, 2, 7, 9]
}, index=[1, 2, 3, 4, 5])
# Your numpy arrays
index_arr = np.array([3, 2, 0, 1, 2]) # Indices to replace
replace_arr = np.array([14, 12, 23, 17, 15]) # New values to put at those indices
# Replace the values in the DataFrame
df.iloc[np.arange(len(df)), index_arr] = replace_arr
# Output the updated DataFrame
print(df)
Explanation:
-
np.arange(len(df))
: This creates an array of row indices that corresponds to the rows of the DataFrame. In your case, it will be[0, 1, 2, 3, 4]
(for the rows with indices 1, 2, 3, 4, 5). -
index_arr
: This array holds the column indices for each row where you want to replace the value. For example, in row 1 (index 0 indf
), you want to replace the value in the 3rd column (index 3, corresponding todatetime4
). -
replace_arr
: This array holds the new values to be placed at the specified indices. -
df.iloc[np.arange(len(df)), index_arr]
: This allows you to select the specific cell in the DataFrame using the row index and column index (index_arr
). The new values fromreplace_arr
are then assigned to those cells.
Result:
The updated DataFrame will look like this:
datetime1 datetime2 datetime3 datetime4
id
1 5 6 5 14
2 7 2 12 5
3 23 2 3 2
4 6 17 4 7
5 7 3 15 9
Key Points:
- The code uses
iloc[]
to modify the DataFrame in-place. - The approach avoids the need for looping or iterating through rows manually, making it much faster and more efficient.
- This solution works well for replacing multiple values across rows using a set of indices.