6 Harmful Defaults in Django

6 Harmful Defaults in Django

Many developers flock to Django thinking that it’s the tool that will allow them to ship high quality web apps quickly. After all, it’s for perfectionists with deadlines.

Unfortunately, that’s only true for developers who’ve mastered the framework.

If you’re new to Django, you’ll only be shipping huge piles of unmaintainable mess.

Why is that?

Because instead of coming with sane defaults, Django comes with harmful ones.

In this post, I will explore defaults that you need to avoid if you want to write high quality applications for the web.

1. Relying on implicit SQL queries

Oh really, you’ve been normalizing your database tables all the way to 3rd normal form? You’re a true perfectionist.

Normalization is not only good for data integrity but also for crashing your database server.

A sample project for the code below can be found on github.

Let’s look at some example models:

from django.db import models
from django.conf import settings

class CustomerAddress(models.Model):
    line_1 = models.TextField()
    line_2 = models.TextField()
    line_3 = models.TextField()

class Customer(models.Model):
    auth_user = models.OneToOneField(
        settings.AUTH_USER_MODEL, on_delete=models.PROTECT, related_name="as_customer"
    )
    address = models.ForeignKey(
        CustomerAddress, on_delete=models.PROTECT, related_name="customers"
    )

class Topping(models.Model):
    name = models.CharField(max_length=100)

    def __str__(self):
        return self.name

class Pizza(models.Model):
    SMALL = 0
    MEDIUM = 1
    LARGE = 2
    SIZE_CHOICES = (
        (SMALL, "Small"),
        (MEDIUM, "Medium"),
        (LARGE, "Large"),
    )
    name = models.CharField(max_length=155)
    size = models.PositiveSmallIntegerField(choices=SIZE_CHOICES)
    toppings = models.ManyToManyField(Topping, through="PizzaTopping")

    def __str__(self):
        return f"{self.get_size_display()} {self.name}"

class PizzaTopping(models.Model):
    topping = models.ForeignKey(
        Topping,
        on_delete=models.PROTECT,
    )
    pizza = models.ForeignKey(
        Pizza,
        on_delete=models.PROTECT,
    )
    extra = models.BooleanField(default=False)

class Order(models.Model):
    customer = models.ForeignKey(
        Customer, on_delete=models.PROTECT, related_name="orders"
    )
    pizza = models.ForeignKey(Pizza, on_delete=models.PROTECT, related_name="orders")
    date = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self):
        return f"Order by {self.customer.auth_user.username} on {self.date}"

Let’s build a view that shows all orders:

from django.core.paginator import Paginator
from django.shortcuts import render

from shop.models import Order

def slow_orders(request):

    all_orders = Order.objects.all()
    paginator = Paginator(all_orders, 10)
    page = paginator.get_page(request.GET.get("page"))

    return render(
        request, "orders.html", {"orders": page.object_list, "page_obj": page}
    )

And a template to render tabular data:

{% extends "base.html" %}

{% block content %}
    <h1>Orders</h1>
    <br>
    <br>
    <table>
        <tr>
            <th>Customer email</th>
            <th>Address line 1</th>
            <th>Address line 2</th>
            <th>Address line 3</th>
            <th>Pizza name</th>
            <th>Toppings</th>
            <th>Date</th>
            <th>Updated at</th>
        </tr>
        {% for order in orders %}
            <tr>
                <td>{{ order.customer.auth_user.email }}</td>
                <td>{{ order.customer.address.line_1 }}</td>
                <td>{{ order.customer.address.line_2 }}</td>
                <td>{{ order.customer.address.line_3 }}</td>
                <td>{{ order.pizza.name }}</td>
                <td>
                    {% for pizza_topping in order.pizza.pizzatopping_set.all %}
                        {{ pizza_topping.topping.name }}, <b>Extra</b>: {{ pizza_topping.extra }}<br>
                    {% endfor %}
                </td>
                <td>{{ order.updated_at }}</td>
            </tr>
        {% endfor %}
    </table>
    <br>
    <br>
    {% include "_pagination.html" %}
{% endblock %}

Experienced Django developers would never do this. For newcomers my question to you is:

How many SQL queries are required to render this template? One? Two?

Remember, we used Order.objects.all() to fetch the orders. What does this line of code do anyway?

Some naive developers would think that this code:

  1. Connects to the database.
  2. Fetches all records from the table that stores data for our Order objects.

But actually, this line of code doesn’t connect to the database at all. All it does it prepare to execute when we actually need the data. This is called a lazy QuerySet in Djangoese.

The real work is taking place in our template:

  1. We iterate over the orders context object.
  2. We interpolate the attributes in the HTML.

Again, how many SQL queries? No less than 52 in this case.

Fifty Two queries!?

Whenever you access attributes that contain data from related tables, a SQL query will be perform to fetch that data. Obvious but not that much because it’s difficult to infer that SQL queries are being perform by accessing fields.

And that’s why lazy QuerySets are so dangerous. It’s a mistake that almost all newcomers make, especially if their SQL knowledge is limited.

What needs to be done

Instead of preparing to fetch only the Orders, we need to carry out multi-table joins and aggregate all our data in as little queries as possible:

def fast_orders(request):
    all_orders = Order.objects.select_related(
        "customer",
        "customer__address",
        "customer__auth_user",
        "pizza",
    ).prefetch_related("pizza__pizzatopping_set", "pizza__pizzatopping_set__topping")
    paginator = Paginator(all_orders, 10)
    page = paginator.get_page(request.GET.get("page"))

    return render(
        request, "orders.html", {"orders": page.object_list, "page_obj": page}
    )

This reduces the number of SQL queries down to 4:

  1. One for counting objects for use with Paginator.
  2. Another for joining all fields related via ForeignKeys.
  3. Yet another for joining all fields related via ManyToManyFields.
  4. And a last one for joining across ManyToManyFields (Pizza to PizzaTopping to Topping).

In this case, I made sure to fetch all related data at once. But what happens if you forget to fetch a single related model?

all_orders = Order.objects.select_related(
   "customer",
   "customer__address",
   "pizza"
)

Notice how I forgot to fetch the customer__auth_user table. In this case, whenever I access the email field, a SQL query will be performed.

As you can see, allowing queries to be performed in templates can be expensive.

Explicit QuerySets are better than implicit ones

Instead of remembering to join tables using select_related and prefetch_related, I like to use a package called django-zen-queries to force querysets to be evaluated as soon as they’re encountered.

The package also allows you to disable evaluation of QuerySets in templates.

By forcing QuerySet evaluation early, you also avoid the COUNT instruction for Paginator. Now your number of SQL queries gets reduced to only 3!

Without the ability to execute queries in templates, every developer will be forced to sculpt optimized queries before even rendering the template. This is essential for avoiding the n+1 query problem.

I highly recommend you install this package and use in it your project, especially if your team consist of many junior developers.

2. Using the default User model

Django comes with a default User model that you can use for authentication. It sounds great in theory because you can quickly build user login and registration functionality.

But guess what. User data will be stored in a database table that you’ll never be able to modify. This means that you don’t own your database — the Django foundation does!

The User model is also tightly coupled with the whole framework. Once your app reaches a certain level of maturity, you’re pretty much done for if you didn’t use a custom user model early on. Pray that you don’t have to implement custom authentication that requires you to add fields to the model.

Sometimes, you can solve this problem by using a UserProfile model that’s related to the default User model via a OneToOneField. It serves it’s purpose in many trivial web apps.

However, authentication is something you need to keep flexible for the lifetime of your project. At the very least, extend AbstractUser:

class User(AbstractUser):
	pass

Then set AUTH_USER_MODEL to myapp.User. It’s easy and will save you a ton of headaches down the road.

3. Using automatic migration names

When you run makemigrations, Django will automatically name the migration file according to the changes to made. Sometimes, it’s intelligent enough to create meaningful names but names often end up being gibberish like 0004_auto_20211124.py. By looking at this file name, nobody can know what this migration is doing to which model.

You should always name your migrations semantically. Team members looking at your directory need to know:

  1. Which model a particular migration is acting on.
  2. What it’s doing exactly.

For example, I could have a migration like 0005_post_reduce_title_length. By looking at this, you can deduce that I’m reducing the length of the title field on the Post model.

This will be very useful if you decide to move models to different apps down the road because you’ll know exactly which files belong to which models.

As an extra tip, I suggest you use 1 migration per model to make your migrations less coupled to the app itself.

4. Relying on automatic database table names

When you migrate a model for the first time, Django will create a database table like yourapp_modelname.

For example, if you have an app shop with a model Order, you will get a database table named shop_order.

Now think about this:

What would happen if you decide to move the Order model to a different app? I’m not even going to answer this question because I don’t want anything to do with this situation should it arise.

Instead of letting Django name your tables, name them yourself:

class Order:
	...
	class Meta:
		db_table = "orders"

When naming your tables, you should use names that would make sense to someone who’s not familiar with Django. Maybe you’ll hire a data analyst down the road and that person won’t know about Django apps and models.

Here, orders is better than shop_order because someone looking at the tables would immediately know that this is for orders. shop_order on the other hand, is in singular form and it’s necessary to know that there’s an app named shop that contains a model named order that manages this table.

Manually naming your tables is a great way to decouple your database from application code. Your database is expected to outlive your code and it makes sense to design it in such as way that allows anyone to make sense of your data without having to look at your code.

Of course, if you’re writing models as part of a reusable app, then it makes sense to namespace your table names using the name of your app in order to avoid collisions. But still, using a pluralized table name is better than the default that Django sets.

5. Using multiple tiny apps

The official Django documentation encourages developers to break out functionality into apps. Separation of concerns is primordial and multiple apps are a great way to achieve that right?

Yes and no.

First, Python already has a solid package system. Many developers will break down into apps simply because they want to reduce the size of the models.py or one of the other boilerplate files. Then you end up with projects that have a ton of tiny apps with tiny files. Whenever you need to make a change, you need to follow dependencies across a bunch of apps.

It’s also very difficult to jump around with a fuzzy finer when you have 20 different apps with a filed named views.py each.

Instead, just break down your modules into packages. Instead of models.py, have a models package with model classes imported in __init__.py. It’s much easier to handle and you also get your migrations, urls, and views all in one place.

6. Dumping apps in the root directory

Unlike Ruby on Rails, Django doesn’t have any opinions about how you should structure your project. Ok fine, I have my own opinions anyway. But what about people brand new to the framework? Is there a sane default for them? Bah!

By default, the startapp command dumps apps in the root directory. That’s how many projects are laid out. It’s not really an issue if you followed my advice of using a single app for everything but still, what happens if you name an app robots and then install a package called django-robots? Turns out, both of the packages are called robots. Will you be able to install both of them in your INSTALLED_APPS? Or will Django choose only one? But, which one?

I’ll let you find out on your own because I never run in this situation since I always place my apps inside a package named after my project. Take a look at this cookiecutter for an example.

By placing your apps inside a package, you solve the naming conflict problem and can name your apps whatever you want.

Django is unchained

As you can see, Django doesn’t have your best interest at heart. Instead of providing sane defaults, it punishes you for not being a master at the framework.

Now that you know what you look for, you’ll be able to ship higher quality apps using this powerful framework.

The more you work with Django, the better you’ll get and you’ll be able to keep your team happy and productive.

So go out there and start shipping!