What's the difference between select_related and prefetch_related in Django ORM?
In Django doc:
select_related()
"follows" foreign-key relationships, selecting additional related-object data when it executes its query.
prefetch_related()
does a separate lookup for each relationship, and does the "joining" in Python.
What does it mean by "doing the joining in python"? Can someone illustrate with an example?
My understanding is that for foreign key relationship, use select_related
; and for M2M relationship, use prefetch_related
. Is this correct?
Answers
In Django, the concepts of select_related()
and prefetch_related()
both deal with optimizing database queries when retrieving related data, but they function differently in terms of how the related objects are fetched.
What does "doing the joining in Python" mean?
-
select_related()
: Performs a single SQL query with a SQL join (typicallyINNER JOIN
orLEFT OUTER JOIN
) to fetch both the main object and its related objects at once. This is done directly in the database. The joining happens in the database, and the result is a single set of rows that Django parses into related objects. -
prefetch_related()
: Executes multiple SQL queries: one for the main object and separate queries for each related object (e.g., M2M or reverse FK relations). Django then "joins" the results in Python code after fetching all data. This means that the database doesn't perform the join operation itself, but the fetched related objects are attached to the main object by Django.
Example: Illustration with ForeignKey Relationship
Suppose we have the following models:
class Author(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
title = models.CharField(max_length=100)
author = models.ForeignKey(Author, on_delete=models.CASCADE)
Without select_related()
or prefetch_related()
:
books = Book.objects.all()
for book in books:
print(book.title, book.author.name)
- This will perform N+1 queries:
- 1 query to get all the books.
- N additional queries to get the author for each book, because each time
book.author
is accessed, a separate query is made to fetch the relatedAuthor
.
Using select_related()
:
books = Book.objects.select_related('author').all()
for book in books:
print(book.title, book.author.name)
- This performs 1 query:
- A single query that uses SQL joins to fetch the books and their authors in one go. The "joining" is done by the database, and Django doesn’t need to make additional queries when accessing
book.author
.
- A single query that uses SQL joins to fetch the books and their authors in one go. The "joining" is done by the database, and Django doesn’t need to make additional queries when accessing
Using prefetch_related()
:
If we reverse the relationship and need to fetch all books for each author, prefetch_related()
would be more appropriate.
class Author(models.Model):
name = models.CharField(max_length=100)
class Book(models.Model):
title = models.CharField(max_length=100)
author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name='books')
authors = Author.objects.prefetch_related('books').all()
for author in authors:
for book in author.books.all():
print(author.name, book.title)
- This performs 2 queries:
- One query to fetch all authors.
- Another query to fetch all books related to the authors. Django will then "join" these results in Python by associating each
Book
object with its respectiveAuthor
.
When to use select_related()
vs. prefetch_related()
?
-
Use
select_related()
when you are dealing with single-valued relationships, typically ForeignKey and OneToOne relationships. It fetches related data in one query, using SQL joins. -
Use
prefetch_related()
when you are dealing with multi-valued relationships, such as ManyToMany or reverse ForeignKey (e.g.,related_name
or accessing a related set of objects). This will perform multiple queries but reduce redundant database hits by fetching the related data in bulk and combining them in Python.
Summary:
select_related()
performs SQL joins in the database, fetching related data in a single query.prefetch_related()
fetches data in multiple queries and "joins" the data in Python code.