1. Relying on implicit SQL queries
Oh really, you’ve been normalizing your database tables all the way to 3rd normal form? You’re a true perfectionist.
Normalization is not only good for data integrity but also for crashing your database server.
A sample project for the code below can be found on github
.
Let’s look at some example models:
from django.db import models
from django.conf import settings
class CustomerAddress(models.Model):
line_1 = models.TextField()
line_2 = models.TextField()
line_3 = models.TextField()
class Customer(models.Model):
auth_user = models.OneToOneField(
settings.AUTH_USER_MODEL, on_delete=models.PROTECT, related_name="as_customer"
)
address = models.ForeignKey(
CustomerAddress, on_delete=models.PROTECT, related_name="customers"
)
class Topping(models.Model):
name = models.CharField(max_length=100)
def __str__(self):
return self.name
class Pizza(models.Model):
SMALL = 0
MEDIUM = 1
LARGE = 2
SIZE_CHOICES = (
(SMALL, "Small"),
(MEDIUM, "Medium"),
(LARGE, "Large"),
)
name = models.CharField(max_length=155)
size = models.PositiveSmallIntegerField(choices=SIZE_CHOICES)
toppings = models.ManyToManyField(Topping, through="PizzaTopping")
def __str__(self):
return f"{self.get_size_display()} {self.name}"
class PizzaTopping(models.Model):
topping = models.ForeignKey(
Topping,
on_delete=models.PROTECT,
)
pizza = models.ForeignKey(
Pizza,
on_delete=models.PROTECT,
)
extra = models.BooleanField(default=False)
class Order(models.Model):
customer = models.ForeignKey(
Customer, on_delete=models.PROTECT, related_name="orders"
)
pizza = models.ForeignKey(Pizza, on_delete=models.PROTECT, related_name="orders")
date = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
def __str__(self):
return f"Order by {self.customer.auth_user.username} on {self.date}"
Let’s build a view that shows all orders:
from django.core.paginator import Paginator
from django.shortcuts import render
from shop.models import Order
def slow_orders(request):
all_orders = Order.objects.all()
paginator = Paginator(all_orders, 10)
page = paginator.get_page(request.GET.get("page"))
return render(
request, "orders.html", {"orders": page.object_list, "page_obj": page}
)
And a template to render tabular data:
{% extends "base.html" %}
{% block content %}
<h1>Orders</h1>
<br>
<br>
<table>
<tr>
<th>Customer email</th>
<th>Address line 1</th>
<th>Address line 2</th>
<th>Address line 3</th>
<th>Pizza name</th>
<th>Toppings</th>
<th>Date</th>
<th>Updated at</th>
</tr>
{% for order in orders %}
<tr>
<td>{{ order.customer.auth_user.email }}</td>
<td>{{ order.customer.address.line_1 }}</td>
<td>{{ order.customer.address.line_2 }}</td>
<td>{{ order.customer.address.line_3 }}</td>
<td>{{ order.pizza.name }}</td>
<td>
{% for pizza_topping in order.pizza.pizzatopping_set.all %}
{{ pizza_topping.topping.name }}, <b>Extra</b>: {{ pizza_topping.extra }}<br>
{% endfor %}
</td>
<td>{{ order.updated_at }}</td>
</tr>
{% endfor %}
</table>
<br>
<br>
{% include "_pagination.html" %}
{% endblock %}
Experienced Django developers would never do this.
For newcomers my question to you is:
How many SQL queries are required to render this template? One? Two?
Remember, we used Order.objects.all()
to fetch the orders. What does this line of code do anyway?
Some naive developers would think that this code:
- Connects to the database.
- Fetches all records from the table that stores data for our Order objects.
But actually, this line of code doesn’t connect to the database at all.
All it does it prepare to execute when we actually need the data. This is called a lazy QuerySet in Djangoese.
The real work is taking place in our template:
- We iterate over the
orders
context object.
- We interpolate the attributes in the HTML.
Again, how many SQL queries? No less than 52 in this case.
Fifty Two queries!?
Whenever you access attributes that contain data from related tables,
a SQL query will be perform to fetch that data.
Obvious but not that much because it’s difficult to infer that SQL queries are
being perform by accessing fields.
And that’s why lazy QuerySets are so dangerous.
It’s a mistake that almost all newcomers make, especially if their SQL knowledge is limited.
What needs to be done
Instead of preparing to fetch only the Orders, we need to carry out multi-table
joins and fetch all our data in as little queries as possible:
def fast_orders(request):
all_orders = Order.objects.select_related(
"customer",
"customer__address",
"customer__auth_user",
"pizza",
).prefetch_related("pizza__pizzatopping_set", "pizza__pizzatopping_set__topping")
paginator = Paginator(all_orders, 10)
page = paginator.get_page(request.GET.get("page"))
return render(
request, "orders.html", {"orders": page.object_list, "page_obj": page}
)
This reduces the number of SQL queries down to 4:
- One for counting objects for use with Paginator.
- Another for joining all fields related via ForeignKeys.
- Yet another for joining all fields related via ManyToManyFields.
- And a last one for joining across ManyToManyFields (Pizza to PizzaTopping to Topping).
In this case, I made sure to fetch all related data at once. But what happens
if you forget to fetch a single related model?
all_orders = Order.objects.select_related(
"customer",
"customer__address",
"pizza"
)
Notice how I forgot to fetch the customer__auth_user
table. In this case,
whenever I access the email
field, a SQL query will be performed.
As you can see, allowing queries to be performed in templates can be
expensive.
Explicit QuerySets are better than implicit ones
Instead of remembering to join tables using select_related
and prefetch_related
, I like to use a package called django-zen-queries
to force querysets to be evaluated as soon as they’re encountered.
The package also allows you to disable evaluation of QuerySets in templates.
By forcing QuerySet evaluation early, you also avoid the COUNT
instruction for Paginator. Now your number of SQL queries gets reduced to only 3!
Without the ability to execute queries in templates, every developer will be forced to sculpt optimized queries before even rendering the template. This is essential for avoiding the n+1 query problem
.
I highly recommend you install this package and use in it your project, especially if your team consist of many junior developers.
2. Using the default User model
Django comes with a default User model that you can use for authentication.
It sounds great in theory because you can quickly build user login and registration functionality.
But guess what.
User data will be stored in a database table that you’ll never be able to modify.
This means that you don’t own your database — the Django foundation does!
The User model is also tightly coupled with the whole framework.
Once your app reaches a certain level of maturity, you’re pretty much done for
if you didn’t use a custom user model early on. Pray that you don’t have to
implement custom authentication that requires you to add fields to the model.
Sometimes, you can solve this problem by using a UserProfile model that’s related to the default User model via a OneToOneField
.
It serves it’s purpose in many trivial web apps where you have a small amount of users. Fetching metadata from the UserProfile model
will require an extra SQL query every time such data is required. Keeping all commonly used metadata in a single model is better for large apps.
However, authentication is something you need to keep flexible for the lifetime of your project. At the very least, extend AbstractUser:
class User(AbstractUser):
pass
Then set AUTH_USER_MODEL
to myapp.User
. It’s easy and will save you a ton of headaches down the road when you need to attach arbritary
metadata or use model methods to fetch information about your users.
3. Using automatic data migration names
When you run makemigrations --empty
, Django will name the migration according to a timestamp. You’ll end up with gibberish like 0004_auto_20211124.py
. By looking at this file name, nobody can know what this migration is doing to which model.
You should always name your data migrations semantically. Team members looking at your directory need to know:
- Which model a particular migration is acting on.
- What it’s doing exactly.
For example, I could have a migration like 0005_post_reduce_title_length
.
By looking at this, you can deduce that I’m reducing the length of the title field on the Post model.
An obvious advantage is that if you decide to move models to different apps down the
road, you’ll know exactly which files belong to which models.
4. Relying on automatic database table names
When you migrate a model for the first time, Django will create a database table like yourapp_modelname
.
For example, if you have an app shop
with a model Order
, you will get a database table named shop_order
.
Now think about this:
What would happen if you decide to move the Order model to a different app or if you want to stop using Django in the future?
Your database and application code is so coupled together that the process will become a nightmare.
Instead of letting Django name your tables, name them yourself:
class Order:
...
class Meta:
db_table = "orders"
When naming your tables, you should use names that would make sense to someone who’s not familiar with Django.
Maybe you’ll hire a data analyst down the road and that person won’t know about Django apps and models.
Here, orders
is better than shop_order
because someone looking at the tables
would immediately know that this is for orders. shop_order
on the other hand,
is in singular form and it’s necessary to know that there’s an app named shop
that contains a model named order
that manages this table.
Another advantage is when you want to move the Order
model to another app. Say, you want to mode it to
an app named ecommerce
. Since you’re database table doesn’t contain a hardcoded reference to the app’s name,
you don’t have to edit your tables in any way when moving models around.
Manually naming your tables is a great way to decouple your database from
application code. Your database is expected to outlive your code and it makes
sense to design it in such as way that allows anyone to make sense of your
data without having to look at your code.
Of course, if you’re writing models as part of a reusable app, then it makes
sense to namespace your table names using the name of your app in order to
avoid collisions. But still, using a pluralized table name (shop_orders
vs shop_order
) is better than the
default that Django sets.
5. Using multiple tiny apps
The official Django documentation encourages developers to break out functionality into apps.
Separation of concerns is primordial and multiple apps are a great way to achieve that right?
Yes and no.
First, Python already has a solid package system.
Many developers will break down into apps simply because they want to reduce
the size of the models.py
or one of the other boilerplate files.
Then you end up with projects that have a ton of tiny apps with tiny files.
Whenever you need to make a change, you need to follow dependencies across a bunch of apps.
It’s also very difficult to jump around with a fuzzy finder when you have 20 different apps with a
filed named views.py
each.
Instead, just break down your modules into packages. Instead of models.py
,
have a models package with model classes imported in __init__.py
.
It’s much easier to handle and you also get your migrations, urls, and views all in one place.
Some people read this and took it to the extreme by never using the apps feature.
It’s only for tiny apps, not all apps. If you have significant and unique
business logic, you need to use an app.
6. Dumping apps in the root directory
Unlike Ruby on Rails
,
Django doesn’t have any opinions about how you should structure your project.
Ok fine, I have my own opinions anyway. But what about people brand new to the framework? Is there a sane default for them? Bah!
By default, the startapp
command dumps apps in the root directory. That’s how many projects are laid out. It’s not really an issue if you followed my advice of using a single app for everything but still, what happens if you name an app robots
and then install a package called django-robots
? Turns out, both of the packages are called robots
. Will you be able to install both of them in your INSTALLED_APPS
? Or will Django choose only one? But, which one?
I’ll let you find out on your own because I never run in this situation since I always place my apps inside a package named after my project. Take a look at this cookiecutter
for an example.
By placing your apps inside a package, you solve the naming conflict problem and can name your apps whatever you want.
Django is unchained
As you can see, Django doesn’t have your best interest at heart. Instead of providing sane defaults, it punishes you for not being a master at the framework.
Now that you know what you look for, you’ll be able to ship higher quality apps using this powerful framework.
The more you work with Django, the better you’ll get and you’ll be able to keep your team happy and productive.
So go out there and start shipping!