How to convert numeric strings such as 200.13K and 1.2M to integ

ghz 9hours ago ⋅ 2 views

How to convert numeric strings such as 200.13K and 1.2M to integer using pandas?

There is a column "Vol" in my df which has values ending with K and M for thousands and millions respectively, these values are "object" in dtype and I need them to be converted in to "double".

Example for the column       what I need it to look like

Vol                          Result
920.81K                      920810
1.28M                        1200000
2.19M                        2190000
443.66K                      443660
682.81K                      682810

Answer

To convert the "Vol" column values from strings (ending with "K" and "M" for thousands and millions, respectively) to numerical values, you can use the apply function along with some custom logic to handle the conversion. You need to:

  1. Check if the value ends with "K" (for thousands) or "M" (for millions).
  2. Convert the number to the appropriate scale (multiply by 1000 for "K" and by 1,000,000 for "M").
  3. Convert the resulting value to a numeric type (float or int).

Here's how you can do this:

import pandas as pd

# Sample data
data = {
    'Vol': ['920.81K', '1.28M', '2.19M', '443.66K', '682.81K']
}
df = pd.DataFrame(data)

# Function to convert Vol column
def convert_vol(vol):
    if vol.endswith('K'):
        return float(vol[:-1]) * 1000  # Remove 'K' and multiply by 1000
    elif vol.endswith('M'):
        return float(vol[:-1]) * 1000000  # Remove 'M' and multiply by 1,000,000
    return float(vol)  # In case there are any values without K or M

# Apply the function to the Vol column
df['Vol'] = df['Vol'].apply(convert_vol)

# Check the result
print(df)

Output:

        Vol
0   920810.0
1  1200000.0
2  2190000.0
3   443660.0
4   682810.0

Explanation:

  • The convert_vol function:

    • Checks if the value ends with 'K' or 'M'.
    • Removes the last character (K or M).
    • Multiplies the remaining number by the appropriate factor (1000 for 'K', 1000000 for 'M').
    • Converts the result to a float (which is equivalent to a "double" in Python).
  • The apply method is used to apply this conversion function to each value in the "Vol" column.

Now, the "Vol" column contains numerical values in the appropriate scale (either thousands or millions). If you want the result as integers instead of floats, you can modify the conversion like so:

df['Vol'] = df['Vol'].apply(convert_vol).astype(int)

This will give you integer values in the "Vol" column.