Replace list of id in cell by matching value id another datafram

ghz 14hours ago ⋅ 7 views

Replace list of id in cell by matching value id another dataframe (pandas)

I have two dataframes, with one containing the corresponding value of the id and another one with the lists of id in each rows.

How can I replace the list of id by the matching values of the other dataframes ?

df_1 :

|      Id             |        Names     |     
| ------------------- | ---------------- |      
|          1          |         Name1    |     
|          2          |         Name2    |     
|          3          |         Name3    |     

df_2 :

|      Id_lists       |  
| ------------------- |
|          [1]        |   
|          [2,3,1]    |        
|          [1,3 ]     |  

To create the dataframe in my exemple:

data = [[1, 'Name1'], [2, 'Name2'], [3,'Name3']]
data_2 = [[[1]],[[2,3,1]],[[1,3]]]
df_1 = pd.DataFrame(data, columns = ['Id', 'Names'])
df_2 = pd.DataFrame(data_2, columns = ['Id_lists'])

Answer

To replace the list of IDs in df_2 with the corresponding values from df_1, we can approach the problem by using the apply function in pandas. Here's how you can do it:

Steps:

  1. Iterate over each row in df_2 (which contains lists of Ids).
  2. For each list of IDs, look up the corresponding Names from df_1.
  3. Replace the list of Ids in df_2 with the corresponding Names.

You can achieve this by using a dictionary for fast lookups from df_1 and then applying a function to transform the Id_lists in df_2.

Code Example:

import pandas as pd

# Sample DataFrames as per your example
data = [[1, 'Name1'], [2, 'Name2'], [3, 'Name3']]
data_2 = [[[1]], [[2, 3, 1]], [[1, 3]]]

df_1 = pd.DataFrame(data, columns=['Id', 'Names'])
df_2 = pd.DataFrame(data_2, columns=['Id_lists'])

# Create a dictionary for quick lookup
id_to_name = dict(zip(df_1['Id'], df_1['Names']))

# Define a function to replace Ids with Names
def replace_ids_with_names(id_list):
    return [id_to_name[i] for i in id_list]

# Apply the function to the 'Id_lists' column in df_2
df_2['Names_lists'] = df_2['Id_lists'].apply(replace_ids_with_names)

# Display the result
print(df_2)

Output:

     Id_lists        Names_lists
0         [1]            [Name1]
1    [2, 3, 1]    [Name2, Name3, Name1]
2       [1, 3]       [Name1, Name3]

Explanation:

  1. Step 1: We create a dictionary id_to_name from df_1 using zip(df_1['Id'], df_1['Names']). This allows us to quickly map each Id to its corresponding Name.

  2. Step 2: We define the replace_ids_with_names function that takes a list of Ids and returns the corresponding list of Names.

  3. Step 3: We use apply(replace_ids_with_names) on the Id_lists column of df_2 to perform the replacement for each list of IDs.

  4. Step 4: The result is stored in a new column Names_lists in df_2.

Now, df_2 contains the list of names instead of the list of IDs.