How to create (or change) the dimensions of an array/list using

ghz 昨天 ⋅ 4 views

How to create (or change) the dimensions of an array/list using numpy or pandas?

I'm trying to create a pandas DataFrame to collect everything I have but I'm having difficulty combining numpy arrays in a list to create a single array.

Suppose I have the following data:

df0 = pd.DataFrame([[1,2],[2,2],[3,1],[4,4],[5,4]], columns = ['A','B'])

switch = [[1,3,4],[2,5]]

collect = []
for lists in switch:
    mask = df0.A.isin(lists)
    avg = df0[mask].mean().round(2)
    collect.append(avg)
    collect.append((avg[0]**2+avg[1]+2).round(2))

This produces the following output:

[A    2.67
 B    2.33
 dtype: float64,
 11.46,
 A    3.5
 B    3.0
 dtype: float64,
 17.25]

However, I want the following output:

 A     B      C
2.67  2.33  11.46
3.5   3.0   17.25

but I can't create a 2x3 matrix because len(collect) is 4. I think I'm not using .append in the right way in the for-loop. How do I create an array (or a list) such that len(collect) is either 2 or 6? I'm thinking if it's of length 2, we can simply transpose collect or of it's of length 6, we can reshape it.

Answer

To create the desired DataFrame from your collect list, the key is to organize the data into a consistent structure that can be directly converted into a 2x3 DataFrame.

You're currently appending a pandas Series (avg) and a scalar result to the collect list. While this is fine for calculating individual values, it’s not directly structured in a way that can be reshaped into a 2x3 matrix.

The issue here is that you're appending both a pandas Series (which has index labels) and a scalar. We need to ensure that each item in collect is a list of values, and that the final list can be reshaped into a 2x3 matrix.

Here's a step-by-step solution:

  1. Create a list of lists: Instead of appending individual values (i.e., pandas Series and scalars) to collect, create lists for each iteration in the loop and append them to collect.

  2. Ensure consistent dimensions: Once the list is populated, reshape or transpose it into a structure that can be used to create a pandas DataFrame.

Here’s how you can modify your code:

import pandas as pd

# Sample DataFrame
df0 = pd.DataFrame([[1,2],[2,2],[3,1],[4,4],[5,4]], columns = ['A','B'])

# Switch list containing lists of values
switch = [[1,3,4], [2,5]]

collect = []

# Iterate over the switch list
for lists in switch:
    # Mask the rows where column 'A' has values in the current list from switch
    mask = df0.A.isin(lists)
    
    # Calculate the mean for the filtered rows
    avg = df0[mask].mean().round(2)
    
    # Append the avg values and the calculated result into a list
    collect.append([avg[0], avg[1], (avg[0]**2 + avg[1] + 2).round(2)])

# Create DataFrame directly from the collect list
result = pd.DataFrame(collect, columns=['A', 'B', 'C'])

print(result)

Explanation:

  1. Masking the rows: We are using df0.A.isin(lists) to filter the rows of df0 based on the values in the lists element from the switch list. Then, we calculate the mean for columns A and B for the filtered rows.

  2. Appending values to collect: Instead of appending the avg Series and the scalar result separately, we append them as a list: [avg[0], avg[1], (avg[0]**2 + avg[1] + 2).round(2)]. This ensures that collect contains a list of lists, which can later be directly converted into a DataFrame.

  3. Creating the DataFrame: Finally, we create a pandas DataFrame from the collect list, specifying column names 'A', 'B', and 'C'.

Output:

     A    B      C
0  2.67  2.33  11.46
1  3.50  3.00  17.25

Key points:

  • We ensure that each item in collect is a list of values, where each list corresponds to a row.
  • The DataFrame can now be created directly from the collect list, without needing reshaping or transposing.