Replace list of id in cell by matching value id another dataframe (pandas)
I have two dataframes, with one containing the corresponding value of the id and another one with the lists of id in each rows.
How can I replace the list of id by the matching values of the other dataframes ?
df_1 :
| Id | Names |
| ------------------- | ---------------- |
| 1 | Name1 |
| 2 | Name2 |
| 3 | Name3 |
df_2 :
| Id_lists |
| ------------------- |
| [1] |
| [2,3,1] |
| [1,3 ] |
To create the dataframe in my exemple:
data = [[1, 'Name1'], [2, 'Name2'], [3,'Name3']]
data_2 = [[[1]],[[2,3,1]],[[1,3]]]
df_1 = pd.DataFrame(data, columns = ['Id', 'Names'])
df_2 = pd.DataFrame(data_2, columns = ['Id_lists'])
Answer
To replace the list of IDs in df_2
with the corresponding values from df_1
, we can approach the problem by using the apply
function in pandas. Here's how you can do it:
Steps:
- Iterate over each row in
df_2
(which contains lists ofId
s). - For each list of IDs, look up the corresponding
Names
fromdf_1
. - Replace the list of
Ids
indf_2
with the correspondingNames
.
You can achieve this by using a dictionary for fast lookups from df_1
and then applying a function to transform the Id_lists
in df_2
.
Code Example:
import pandas as pd
# Sample DataFrames as per your example
data = [[1, 'Name1'], [2, 'Name2'], [3, 'Name3']]
data_2 = [[[1]], [[2, 3, 1]], [[1, 3]]]
df_1 = pd.DataFrame(data, columns=['Id', 'Names'])
df_2 = pd.DataFrame(data_2, columns=['Id_lists'])
# Create a dictionary for quick lookup
id_to_name = dict(zip(df_1['Id'], df_1['Names']))
# Define a function to replace Ids with Names
def replace_ids_with_names(id_list):
return [id_to_name[i] for i in id_list]
# Apply the function to the 'Id_lists' column in df_2
df_2['Names_lists'] = df_2['Id_lists'].apply(replace_ids_with_names)
# Display the result
print(df_2)
Output:
Id_lists Names_lists
0 [1] [Name1]
1 [2, 3, 1] [Name2, Name3, Name1]
2 [1, 3] [Name1, Name3]
Explanation:
-
Step 1: We create a dictionary
id_to_name
fromdf_1
usingzip(df_1['Id'], df_1['Names'])
. This allows us to quickly map eachId
to its correspondingName
. -
Step 2: We define the
replace_ids_with_names
function that takes a list ofId
s and returns the corresponding list ofNames
. -
Step 3: We use
apply(replace_ids_with_names)
on theId_lists
column ofdf_2
to perform the replacement for each list of IDs. -
Step 4: The result is stored in a new column
Names_lists
indf_2
.
Now, df_2
contains the list of names instead of the list of IDs.