This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.
For many users starting out with pandas, a common and frustrating warning that pops up sooner or later is the following:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
To the uninitiated, it can be hard to know what it means or if it even matters. In this guide, we’ll walk through what the warning means, why you are seeing it, and what you can do to avoid it.
Reading in a dataset
If you don’t have a dataset you want to play around with, University of California Irvine has an excellent online repository of datasets that you can play with. For this explainer we are going to be using the Wine Quality dataset. If you want to follow along, you can import the dataset as follows:
import pandas as pd
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=';')
When it occurs
A typical scenario that leads to the SettingWithCopyWarning appearing is as follows:
df_slice = df[df["quality"] > 6]
# Some other stuff here...
df_slice['quality'] = 0
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Let’s understand what this code is trying to do. First we create a new DataFrame called df_slice
, which is all the rows from df
that have quality
greater than 6. We do some other stuff to df_slice
. Then we try to overwrite the quality
column in df_slice
with a new value: 0.
Why it occurs
It’s important to understand this message is a warning and not an error. If you run the above code, you should note that it does actually overwrite the quality
column in df_slice
with 0. This will always be the case, but if you see this warning, it may mean your code is also doing some other unexpected things.
At the highest level, the warning occurs due to the way python handles variable assignments. When we assign a value to a variable, python creates an object with the value, then makes the variable a reference or pointer to that object. When we assign that same value to another variable, it doesn’t create a new object, it just creates another pointer to the same object. Let’s look at an example to understand what this means practically:
x = [1, 2, 3, 4]
y = x
y[1] = 9
print(x)
[1, 9, 3, 4]
print(y)
[1, 9, 3, 4]
When we create a list and assign it to the variable x
, then make a new variable y
which is a “copy” of x
, we are creating two pointers to the same list. When we update a value in y
, as can be seen, we also update x
.
The same thing happens with DataFrames:
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
y = x
y.at[0, "a"] = 9
print(x)
a b
0 9 5
1 2 6
2 3 7
3 4 8
print(y)
a b
0 9 5
1 2 6
2 3 7
3 4 8
So what happens when we create a new variable that is a slice of (or selection from) a DataFrame? This is where it gets a little complicated: it depends on how we do the selection. Sometimes we get a new separate DataFrame object (a “copy”), sometimes we get a reference to a section of the original DataFrame (a “view”). Sometimes even pandas can’t be sure which one you are getting (it depends on the memory layout of the array).
This is the crux of the problem. When we try to set values on a DataFrame when pandas is not sure if it is a view or a copy, the behavior is unpredictable. This is why the SettingWithCopy
Warning message appears. It’s also why pandas, in the warning, tries to recommend using loc
with both row and column selectors – the behavior in that case is predictable.
However, sometimes you don’t want or need to select columns, and that is where it can get messy, as we will see in some of the examples below.
Scenarios
Let’s test a few different ways we can select a slice of a DataFrame and then update some values:
loc
with range of row indices
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
y = x.loc[0:2]
y["a"] = 9
print(x)
a b
0 9 5
1 9 6
2 9 7
3 4 8
print(y)
a b
0 9 5
1 9 6
2 9 7
# VIEW
# Warning triggered
loc
with list of row indices
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
y = x.loc[[0, 1]]
y["a"] = 9
print(x)
a b
0 1 5
1 2 6
2 3 7
3 4 8
print(y)
a b
0 9 5
1 9 6
# COPY
# No warning
loc
with a row filter
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
y = x.loc[x["a"] > 2]
y["a"] = 9
print(x)
a b
0 1 5
1 2 6
2 3 7
3 4 8
print(y)
a b
2 9 7
3 9 8
# COPY
# Warning triggered
Row filter without loc
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
y = x[x["a"] > 2]
y["a"] = 9
print(x)
a b
0 1 5
1 2 6
2 3 7
3 4 8
print(y)
a b
2 9 7
3 9 8
# COPY
# Warning triggered
loc
with a row filter and column selection
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
y = x.loc[x["a"] > 2, ["a"]]
y["a"] = 9
print(x)
a b
0 1 5
1 2 6
2 3 7
3 4 8
print(y)
a
2 9
3 9
# COPY
# No Warning
iloc
with range of row indices
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": ["5", "6", "7", "8"]})
y = x.iloc[0:2]
y["a"] = 9
print(x)
a b
0 9 5
1 9 6
2 3 7
3 4 8
print(y)
a b
0 9 5
1 9 6
# VIEW
# Warning triggered
iloc
with a list of rows
x = pd.DataFrame({"a": [1, 2, 3, 4], "b": ["5", "6", "7", "8"]})
y = x.iloc[[0, 1]]
y["a"] = 9
print(x)
a b
0 1 5
1 2 6
2 3 7
3 4 8
print(y)
a b
0 9 5
1 9 6
# COPY
# Warning triggered
Further Reading
For a more in detail explanation of why the SettingWithCopyWarning
message appears and what selection methods will return a copy vs a view, I highly recommend reading the official pandas documentation on this topic.
How to avoid it
There is a hint in the warning itself that is worth understanding: “Try using .loc[row_indexer,col_indexer] = value instead
“. Let’s look at our first example again:
df_slice = df[df["quality"] > 6]
df_slice['quality'] = 0
This seems like it is perfectly reasonable code, but let’s remove the intermediate df_slice
to better see why pandas doesn’t like it:
df[df["quality"] > 6]["quality"] = 0
This code still works, but if you have read the filtering and segmenting guide, you will notice this looks kind of… wrong. That’s because we are chaining a row filter with a column selection, instead of combining it into one loc
statement:
df.loc[df["quality"] > 6, "quality"] = 0
This is what the hint is trying to tell us. It’s not immediately obvious because it was split over two lines, and in practice those two lines might have a lot of other lines of code in between them, but, in effect, we are chaining a row selector and a column selector.
Solution for 95%+ of cases
In the vast majority of cases where people are seeing the SettingWithCopyWarning
, it is the scenario that we looked at initially:
df_slice = df[df["quality"] > 6]
# Some other stuff here...
df_slice['quality'] = 0
You make a new DataFrame, df_slice
, as a selection of rows from a bigger DataFrame df
. You then update some of the values in df_slice
.
Annoyingly, the pandas hinted solution to use loc
doesn’t really work because you don’t want to update the original DataFrame df
. df_slice
is your working DataFrame now. So what to do?
In this case, we can help pandas by removing the ambiguity and explicitly make df_slice
a copy:
df_slice = df[df["quality"] > 6].copy()
df_slice['quality'] = 0
Wrapping up
For people starting out with pandas, the SettingWithCopyWarning
can be a confusing and frustrating warning to encounter. Making it more frustrating is that there are plenty of overly complex and confusing explanations online that probably don’t help you identify if you have done something wrong, or how to fix it. Hopefully this guide has given you a more practical guide on why you are getting the warning and how to address it.