How to convert numeric strings such as 200.13K and 1.2M to integer using pandas?
There is a column "Vol" in my df which has values ending with K and M for thousands and millions respectively, these values are "object" in dtype and I need them to be converted in to "double".
Example for the column what I need it to look like
Vol Result
920.81K 920810
1.28M 1200000
2.19M 2190000
443.66K 443660
682.81K 682810
Answer
To convert the "Vol" column values from strings (ending with "K" and "M" for thousands and millions, respectively) to numerical values, you can use the apply
function along with some custom logic to handle the conversion. You need to:
- Check if the value ends with "K" (for thousands) or "M" (for millions).
- Convert the number to the appropriate scale (multiply by 1000 for "K" and by 1,000,000 for "M").
- Convert the resulting value to a numeric type (float or int).
Here's how you can do this:
import pandas as pd
# Sample data
data = {
'Vol': ['920.81K', '1.28M', '2.19M', '443.66K', '682.81K']
}
df = pd.DataFrame(data)
# Function to convert Vol column
def convert_vol(vol):
if vol.endswith('K'):
return float(vol[:-1]) * 1000 # Remove 'K' and multiply by 1000
elif vol.endswith('M'):
return float(vol[:-1]) * 1000000 # Remove 'M' and multiply by 1,000,000
return float(vol) # In case there are any values without K or M
# Apply the function to the Vol column
df['Vol'] = df['Vol'].apply(convert_vol)
# Check the result
print(df)
Output:
Vol
0 920810.0
1 1200000.0
2 2190000.0
3 443660.0
4 682810.0
Explanation:
-
The
convert_vol
function:- Checks if the value ends with 'K' or 'M'.
- Removes the last character (
K
orM
). - Multiplies the remaining number by the appropriate factor (
1000
for 'K',1000000
for 'M'). - Converts the result to a float (which is equivalent to a "double" in Python).
-
The
apply
method is used to apply this conversion function to each value in the "Vol" column.
Now, the "Vol" column contains numerical values in the appropriate scale (either thousands or millions). If you want the result as integers instead of floats, you can modify the conversion like so:
df['Vol'] = df['Vol'].apply(convert_vol).astype(int)
This will give you integer values in the "Vol" column.