How to replace value in specific index in each row with correspo

ghz 3days ago ⋅ 2 views

How to replace value in specific index in each row with corresponding value in numpy array

My dataframe looks like this:

     datetime1 datetime2 datetime3 datetime4
id                                      
1    5          6         5         5   
2    7          2         3         5  
3    4          2         3         2 
4    6          4         4         7 
5    7          3         8         9 

and I have a numpy array like this:

index_arr = [3, 2, 0, 1, 2]

This numpy array refers to the index in each row, respectively, that I want to replace. The values I want to use in the replacement are in another numpy array:

replace_arr = [14, 12, 23, 17, 15]

so that the updated dataframe looks like this:

     datetime1 datetime2 datetime3 datetime4
id                                      
1    5          6         5         14   
2    7          2         12        5  
3    23         2         3         2 
4    6          17        4         7 
5    7          3         15        9 

What is the best way to go about doing this replacement quickly? I've tried using enumerate and iterrows but couldn't get the syntax to work. Would appreciate any help - thank you

Answer

To efficiently perform the replacement in a pandas DataFrame based on the index_arr and replace_arr arrays, you can directly use pandas' .iloc[] functionality, which allows you to access and modify specific rows and columns by their indices.

Here's a step-by-step solution to perform the replacement based on the provided arrays:

Solution:

import pandas as pd
import numpy as np

# Your initial DataFrame
df = pd.DataFrame({
    'datetime1': [5, 7, 4, 6, 7],
    'datetime2': [6, 2, 2, 4, 3],
    'datetime3': [5, 3, 3, 4, 8],
    'datetime4': [5, 5, 2, 7, 9]
}, index=[1, 2, 3, 4, 5])

# Your numpy arrays
index_arr = np.array([3, 2, 0, 1, 2])  # Indices to replace
replace_arr = np.array([14, 12, 23, 17, 15])  # New values to put at those indices

# Replace the values in the DataFrame
df.iloc[np.arange(len(df)), index_arr] = replace_arr

# Output the updated DataFrame
print(df)

Explanation:

  1. np.arange(len(df)): This creates an array of row indices that corresponds to the rows of the DataFrame. In your case, it will be [0, 1, 2, 3, 4] (for the rows with indices 1, 2, 3, 4, 5).

  2. index_arr: This array holds the column indices for each row where you want to replace the value. For example, in row 1 (index 0 in df), you want to replace the value in the 3rd column (index 3, corresponding to datetime4).

  3. replace_arr: This array holds the new values to be placed at the specified indices.

  4. df.iloc[np.arange(len(df)), index_arr]: This allows you to select the specific cell in the DataFrame using the row index and column index (index_arr). The new values from replace_arr are then assigned to those cells.

Result:

The updated DataFrame will look like this:

   datetime1  datetime2  datetime3  datetime4
id                                           
1          5          6          5         14
2          7          2         12          5
3         23          2          3          2
4          6         17          4          7
5          7          3         15          9

Key Points:

  • The code uses iloc[] to modify the DataFrame in-place.
  • The approach avoids the need for looping or iterating through rows manually, making it much faster and more efficient.
  • This solution works well for replacing multiple values across rows using a set of indices.