Question
I want to filter my dataframe with an or
condition to keep rows with a
particular column's values that are outside the range [-0.25, 0.25]
. I
tried:
df = df[(df['col'] < -0.25) or (df['col'] > 0.25)]
But I get the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Answer
The or
and and
Python statements require truth -values. For pandas,
these are considered ambiguous, so you should use "bitwise" |
(or) or &
(and) operations:
df = df[(df['col'] < -0.25) | (df['col'] > 0.25)]
These are overloaded for these kinds of data structures to yield the element-
wise or
or and
.
Just to add some more explanation to this statement:
The exception is thrown when you want to get the bool
of a pandas.Series
:
>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You hit a place where the operator implicitly converted the operands to
bool
(you used or
but it also happens for and
, if
and while
):
>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
... print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Besides these four statements, there are several Python functions that hide
some bool
calls (like any
, all
, filter
, ...). These are normally not
problematic with pandas.Series
, but for completeness I wanted to mention
these.
In your case, the exception isn't really helpful, because it doesn't mention
the right alternatives. For and
and or
, if you want element-wise
comparisons, you can use:
-
>>> import numpy as np
np.logical_or(x, y)
or simply the |
operator:
>>> x | y
-
>>> np.logical_and(x, y)
or simply the &
operator:
>>> x & y
If you're using the operators, then be sure to set your parentheses correctly because of [operator precedence](https://docs.python.org/reference/expressions.html#operator- precedence).
There are several logical NumPy
functions
which should work on pandas.Series
.
The alternatives mentioned in the Exception are more suited if you encountered
it when doing if
or while
. I'll shortly explain each of these:
-
If you want to check if your Series is empty :
>>> x = pd.Series([])
x.empty True x = pd.Series([1]) x.empty False
Python normally interprets the len
gth of containers (like list
, tuple
,
...) as truth-value if it has no explicit Boolean interpretation. So if you
want the Python-like check, you could do: if x.size
or if not x.empty
instead of if x
.
-
If your
Series
contains one and only one Boolean value:>>> x = pd.Series([100])
(x > 50).bool() True (x < 50).bool() False
-
If you want to check the first and only item of your Series (like
.bool()
, but it works even for non-Boolean contents):>>> x = pd.Series([100])
x.item() 100
-
If you want to check if all or any item is not-zero, not-empty or not-False:
>>> x = pd.Series([0, 1, 2])
x.all() # Because one element is zero False x.any() # because one (or more) elements are non-zero True