Django ORM by Example

Django ORM by Example

Django’s ORM allows you to easily access and model your data. Since it has a ton of features, you probably haven’t tapped into it’s full power yet.

In this post, you’ll learn about the more advanced ORM features by solving real world problems.

Modeling simple data

You’re an officer at the Gestapo and you need a database to keep tabs on people you’re observing.

Here’s how you represent this in Django:

from django.db import models

class Person(models.Model):
    name = models.CharField(max_length=255)
    age = models.PositiveSmallIntegerField()
    first_seen = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return self.name

See? Django provides you with a bunch of ways to automatically convert and validate data from your database into Python objects. I encourage you to check out all types of fields it provides .

Here, CharField will turn into a str, PositiveIntegerField will turn into an int, and finally, DateTimeField will turn into a Python datetime object.

Models on their own are pretty useless because it’s in the database that your data will lie. Models simply allow you to describe what your data looks like in the database without having to look at it.

Change management with migrations

Instead of managing your database manually, Django provides a concept called a migration system where you use code to make sure that your models and database are exact copies.

To understand the migration system, skip migrations for now and create the database table manually. Connect to your database using sqlite db.sqlite3 and enter the following:

CREATE TABLE "persons"
(
    "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
    "name" varchar(255) NOT NULL,
    "age" smallint unsigned NOT NULL CHECK ("age" >= 0),
    "first_seen" datetime NOT NULL
);

Now you can play with real data using the Django shell:

p = Person.objects.create(name="Henric Heinz", age=22, first_seen=datetime.fromisoformat("1938, 12, 01"))

If you look inside your database table, you’ll see a new row with this data:

SELECT * FROM "persons";

Django is using your model definition to create database tables corresponding to your input data without you having to write any SQL.

Update your model to add an important metric, favorite_color, that any Gestapo officer needs to track:

from django.db import models

class Person(models.Model):
    name = models.CharField(max_length=255)
    age = models.PositiveSmallIntegerField()
    first_seen = models.DateTimeField(auto_now_add=True)
    favorite_color = models.CharField(max_length=50)

    def __str__(self):
        return self.name

Since you added a new field, you need this change to be reflected in your database as well, otherwise, Django will raise OperationalError when you perform CRUD against it because it’s expecting the database table to contain a favorite_color column:

p = Person.objects.create(name="Nadine Braun", age=22)
>>> django.db.utils.OperationalError: table persons has no column named favorite_color

Connect to your database with sqlite db.sqlite3 and enter this:

ALTER TABLE "persons"
ADD COLUMN "favorite_color" varchar(50);

You can now update the previous record to add the user’s favorite color:

p = Person.objects.first()
p.favorite_color = "red"
p.save()

Notice how in our model definition, the favorite_color field is mandatory. By default, CharFields are non nullable and non blankable — a value must be supplied when creating objects or validation will fail.

However, in our SQL statement, we didn’t add the NOT NULL constraint. There is a mismatch between our model and table definition. A developer looking at the model will assume that the database column contains a NOT NULL constraint because that what he’ll see in the model code. In this case, it’s not a issue because you’re only accessing data from Django. However, if your data is accessed by external tools, these tools might write null values to the favorite_color field. When you fetch records, you’ll send up with None in Django, which might cause hard to find bugs in your logic.

Not to mention that it’s easy to make a mistake when writing SQL statements. Consider the following:

ALTER TABLE "persons"
ADD COLUMN "favorite_color" varchar(30);

Notice how the max length of the favorite_color column is 30 but in Django, it’s 50. Everything will work fine when adding text less than or equal to 30 characters of length, but once you start adding 31+ characters, Django will raise an OperationalError again which will break your app.

Let’s say that you’re already working on a production database will millions of records. If you want to add a non-nullable column, you need to make sure to set a sensible default value to existing records before adding the NOT NULL constraint. Doing this in SQL is tedious because:

  1. We’re Python developers and we don’t want to write SQL.
  2. It’s easier to generate new data in Python than SQL.

The solution? Use Django migrations because they allow you to use code to manage your database schema. Migrations solve 2 problems:

  1. Instead of making assumptions about the database schema, new developers will be able to trust that there’s no mismatch between the model code and schema.
  2. Instead of managing the database schema separately, you do it in Python code itself and commit it to Git like for every other code. This allows you to grow your app and data model together.

Ok, delete your database and create a new migration:

rm db.sqlite3
python manage.py makemigrations myapp --empty

You just instructed Django to create a new empty migrations file for your myapp app. You’ll see the file called 0001_initial.py in the migrations directory inside the myapp directory.

Inside the migration file, you’ll see:

from django.db import migrations


class Migration(migrations.Migration):

    dependencies = [
    ]

    operations = [
    ]

The operations list is what you’re interested it. It allows you to perform operations against your database.

Add this to the operations list:

migrations.RunSQL(
    """
    CREATE TABLE "persons"
    (
        "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
        "name" varchar(255) NOT NULL,
        "age" smallint unsigned NOT NULL CHECK ("age" >= 0),
        "first_seen" datetime NOT NULL,
    );
    """,
    """
    DROP TABLE "persons";
    """
)

Here, we specify one operation with a forward and reverse method. If you run the migrate command now, the table will be created. If you undo the migration with python manage.py migrate myapp zero, then the second DROP TABLE statement will run and your table will be deleted.

Run the migration:

python manage.py migrate myapp

Since you forgot to add the favorite_color column, you get OperationalError when creating objects:

p = Person.objects.create(name="Amon Goeth", age=27)
>>> django.db.utils.OperationalError: table persons has no column named favorite_color

Let’s fix this now. Create a new empty migration:

python manage.py makemigrations myapp --empty --name person_add_favorite_color

It’s important to name your empty migrations because by default, Django will generate a bogus name that doesn’t make any sense.

Add the following to the operations list:

migrations.RunSQL(
    """
    ALTER TABLE "persons"
    ADD COLUMN "favorite_color" varchar(30);
    """,
    migrations.RunSQL.noop
)

Now you can create new objects:

p = Person.objects.create(name="Amon Goeth", age=27, favorite_color="Red")

Notice how yoou made a mistake in your SQL statement by adding varchar(30) instead of 50. Unfortunately, SQLite doesn’t support modifying columns. So, we’re forced to delete our database and start from scratch.

Delete the database:

rm db.sqlite3

Then edit the previous migrations file:

migrations.RunSQL(
    """
    ALTER TABLE "persons"
    ADD COLUMN "favorite_color" varchar(50);
    """,
    migrations.RunSQL.noop
)

Now migrate:

python manage.py migrate

Notice the migrations.RunSQL.noop line. If you want to rollback this migration, you’ll have to issue a kind of DROP COLUMN statement but SQLite doesn’t support this kind of operation. That’s why you instruct Django to do nothing when rolling back. If you don’t specify a reverse method, then Django will prevent you from rolling back the migration.

As you can see, dealing with the database manually is error prone. Thankfully, your app isn’t running in production yet and you can afford to delete the database and start from scratch. What if you’re running in production though? Then, you’ll have to:

  1. Create a new dummy table.
  2. Copy all data to this new table.
  3. Delete the old table
  4. Create a table with the old table’s name and new desired schema.
  5. Copy all data to this table.
  6. Delete dummy table.

Too tedious for me to cover, but doable by writing migration operations.

Since you’re unable to modify your table to add the NOT NULL constraint because SQLite doesn’t support this operation, your’re forced to start from scratch once again.

However, I’m getting tired of this and I’m sure you are too. Let’s put all this tedious hard work behind us and tap into the real power of Django migrations.

Delete all migration files as well as your database:

rm myapp/migrations/000*.py
rm db.sqlite3

Let Django automatically create the migration and migrate the database:

python manage.py makemigrations myapp
python manage.py migrate

Now you can create objects:

p = Person.objects.create(name="Amon Goeth", age=27, favorite_color="Red")

Magic?

The Gestapo chief sends a telegram saying that every Gestapo officer needs to start recording the occupation of all suspects. Let’s modify our model to do that.

from django.db import models

class Person(models.Model):
    name = models.CharField(max_length=255)
    age = models.PositiveSmallIntegerField()
    first_seen = models.DateTimeField(auto_now_add=True)
    favorite_color = models.CharField(max_length=50)
    occupation = models.CharField(max_length=100, null=True, blank=True)

    def __str__(self):
        return self.name

Generate migrations and migrate:

python manage.py makemigrations
python manage.py migrate

You can now add Amon’s occupation:

p = Person.objects.create(name="Amon Goeth", age=27, favorite_color="Red", occupation="Farmer")

However, it’s possible to create new objects without specifying an occupation:

p = Person.objects.create(name="Tanja Decker", age=21, favorite_color="Purple")

If you keep adding dirty data to your database, you’ll get fired and die a lonely death.

You can solve this by setting blank to False:

occupation = models.CharField(max_length=100, null=True, blank=False)

If you try to create:

p = Person.objects.create(name="Uta Foerster", age=23, favorite_color="Blue")

It will still create! Why? Because using the objects.create() method doesn’t perform model validation. It saves the data as is. The blank flag is used by model validation only and not used by the database in any way.

If you want to validate before saving:

p = Person(name="Uta Foerster", age=23, favorite_color="Blue")
p.full_clean()
>>> django.core.exceptions.ValidationError: {'occupation': ['This field cannot be blank.']}

Now you get an error. However, you need to remember to call full_clean() before saving any object.

It’s not possible to set the occupation field to non null because the database has no way of knowing how to populate existing rows with data. Let’s use a migration to fix that:

Create an empty migration:

python manage.py makemigrations myapp --empty --name person_occupation_not_null

Change the contents to look like this (replacing myapp with your own):

from django.db import migrations


def forwards(apps, schema_editor):
    person_model = apps.get_model("myapp", "Person")

    for person in person_model.objects.all():
        if person.occupation is None:
            person.occupation = "-"
            person.full_clean()
            person.save()


class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_alter_person_occupation'),
    ]

    operations = [
        migrations.RunPython(forwards, migrations.RunPython.noop)
    ]

Migrate the database:

python manage.py migrate

Now that all columns have a non-null value, you can remove the null flag from the field:

occupation = models.CharField(max_length=50)

Now, generate migrations and you’ll be greeted with some options:

  1. Provide a one-off default now (will be set on all existing rows with a null value for this column)
  2. Ignore for now. Existing rows that contain NULL values will have to be handled manually, for example with a RunPython or RunSQL operation.
  3. Quit and manually define a default value in models.py.

As you can see, you could have let Django set a default value by choosing option 1 instead of writing your own migration operation using RunPython. Since you already did that, pick option 2.

Now that the field is non-nullable, you won’t be able to create dirty objects:

Person.objects.create(name="Monika Duerr", age=22, favorite_color="Green")

Wait a minute! It still created without an error. That’s because by default, when you don’t provide a value for text based values, Django will save the empty string "" instead of null.

What can you do about it? Stop using objects.create() to create objects. Instead, use full_clean() to validate your data before saving. Only use objects.create() when you’re testing stuff.

By using the RunPython operation, you can write Python code to work directly with your database. For example, you can load data from CSV files or external APIs.

While models allow you to model your data in Python, migrations allow you to prepare your database to be access by your models.

Next, you’ll learn how to enforce data consistency so that you can build high quality data sets.

Field validation

Take a look at this:

person = Person.objects.create(name="Nadine Braun", age=0)

You now have a person with age 0 — clearly a mistake.

Let’s say that Gestapo is only allowed to spy on people aged 18 and beyond. You need to update your model to reflect this requirement:

class Person(models.Model):
    name = models.CharField(max_length=255)
    age = models.PositiveSmallIntegerField(min_value=18)
    first_seen = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return self.name

Update and apply your migrations:

python manage.py makemigrations
python manage.py migrate

Check this out now:

person = Person.objects.create(name="Nadine Braun", age=0)

Wait a minute, a person 0 years old is still created even though we added this min_value thing to our field. Why is that?

Remember I told you to stop using objects.create because it bypasses validation? The proper way to create objects is by calling full_clean and save:

person = Person(name="Annet Shultz", age=0)
person.full_clean()
>>> ValidationError...

Now you’ll avoid polluting your database with invalid data.

The min_value field makes use of validation built into Django but you can easily create your own validators.

Custom field validation

Let’s say that you want to enforce that all favorite_color values start with a capital letter. Here’s how that would work:

import string
from django.utils.translation import gettext_lazy as _

def validate_capfirst(value):
    if value[0] not in string.ascii_uppercase:
        raise ValidationError(
            _('%(value)s must be capitalized.'),
            params={'value': value},
        )

class Person(models.Model):
    name = models.CharField(max_length=255)
    age = models.PositiveSmallIntegerField(min_value=18)
    first_seen = models.DateTimeField(auto_now_add=True)
    favorite_color = models.CharField(max_length=20, validators=[validate_capfirst])

    def __str__(self):
        return self.name

If you try to create with non-capitalized string:

p = Person(name="Sophia Herzog", age=24, favorite_color="red")
p.full_clean()
>>> django.core.exceptions.ValidationError: {'favorite_color': ['red must be capitalized.']}

By attaching as many valiators as you want in the validators list, you have an easy way to enforce data consistency in your database.

Model level validation

You’ve seen how individual fields can be validated. But what if you want to validate multiple fields at once.

Let’s say that the Gestapo chief introduces a new rule stating that every officer needs to start monitoring males aged as low as 16 years. Other genders can only be monitored if they’re at least 18 years old.

You need to update your model to reflect this change:

class Person(models.Model):
    MALE = 0
    FEMALE = 1

    GENDER_CHOICES = (
        (MALE, "Male"),
        (FEMALE, "Female")
    )
    name = models.CharField(max_length=255)
    age = models.PositiveSmallIntegerField(min_value=18)
    gender = models.PositiveSmallIntegerField(choices=GENDER_CHOICES)
    first_seen = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return self.name

If we create this object:

person = Person(name="Henrich Heinz", age=16, gender=Person.MALE)
person.full_clean()

A ValidationError will be raised because Mr Henrich is only 16 but our model only accepts persons aged atleast 18.

Let’s fix this:

class Person(models.Model):
    MALE = 0
    FEMALE = 1

    GENDER_CHOICES = (
        (MALE, "Male"),
        (FEMALE, "Female)
    )
    name = models.CharField(max_length=255)
    age = models.PositiveSmallIntegerField(min_value=16)
    gender = models.PositiveSmallIntegerField(choices=GENDER_CHOICES)
    first_seen = models.DateTimeField(auto_now_add=True)

    def clean(self):
        super().clean()

        if self.gender != Person.MALE and self.age < 18:
            raise ValidationError(f"{self.get_gender_display()}s must be at least 18 years of age.")

    def __str__(self):
        return self.name

Now we can create 16 year old males:

person = Person(name="Henrich Heinz", age=16, gender=Person.MALE)
person.full_clean()

But not 16 year old females:

person = Person(name="Anna Weiss", age=16, gender=Person.FEMALE)
person.full_clean()
>>> django.core.exceptions.ValidationError: {'__all__': ['Females must be at least 18 years of age.']}

By using model level validation, you can easily enforce complicated business rules so that your data remains clean over your application’s lifetime.

Next, let’s explore how to use the ORM to model more complicated real world data.

Modeling relationships

Fast forward to 2022 and Gestapo is no more. You retired to Latin America where no one knows you. The Internet is the new real life and so you decide to make some money by selling the bananas growing in your backyard online. Using Django of course.

Start with a model to represent Banana items.

class Banana(models.Model):
    CAVENDISH_VARIANT = 0
    RED_VARIANT = 1
    PLANTAIN_VARIANT = 2

    VARIANT_CHOICES = (
        (CAVENDISH_VARIANT, "Cavendish"),
        (RED_VARIANT, "Red"),
        (PLANTAIN_VARIANT, "Plantain"),
    )
    variant = models.PositiveSmallIntegerField(choices=VARIANT_CHOICES)
    qty = models.PositiveIntegerField("Quantity")

    def __str__(self):
        return f"{self.get_variant_display()} banana."

When a user is shopping for bananas on your website, he will add his desired banana in his shopping cart. We’ll use a model to represent this shopping cart.

The shopping cart will need to store a reference to items that the user added to his cart. You could store references to Banana objects directly like so:

from django.conf import settings

class ShoppingCart(models.Model):
    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    banana = models.ForeignKey(Banana, on_delete=models.SET_NULL, null=True, blank=True)
    qty = models.PositiveIntegerField("Quantity")
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self):
        return f"{self.user}'s cart as at {self.updated_at}"

ForeignKey is used to relate a model to another one. In this case, every ShoppingCart instance will have a reference to a user, and a Banana instance.

This is known as a many to one relationship because many shopping carts can reference one user. Let’s take an example:

banana = Banana.objects.first()
user = user_model.objects.first()

cart = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
cart.save()

cart2 = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
cart.save()

Simple enough? However, this model is wrong. Why should a user be able to create multiple shopping carts? And why can a user add only a single type of banana in his cart at a time?

That’s where OneToOneField comes in. Instead of using a many to one relationship between shopping carts and users, you want a one to one relationship. A user will be able to create only one shopping cart.

class ShoppingCart(models.Model):
    user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    banana = models.ForeignKey(Banana, on_delete=models.SET_NULL, null=True, blank=True)
    qty = models.PositiveIntegerField("Quantity")
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self):
        return f"{self.user}'s cart as at {self.updated_at}"

banana = Banana.objects.first()
user = user_model.objects.first()

cart = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
cart.save()

cart2 = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
>> django.core.exceptions.ValidationError: {'user': ['Shopping cart with this User already exists.']}

Perfect, but what about adding multiple bananas to the cart? Clearly our data model is wrong. Instead of referencing bananas directly, use another model to represent this relationship. Many e-commerce systems use this kind of model and they usually call it a LineItem.

class LineItem(models.Model):
    cart = models.ForeignKey(ShoppingCart, on_delete=models.CASCADE)
    banana = models.ForeignKey(Banana, on_delete=models.CASCADE)
    qty = models.PositiveIntegerField("Quantity")

Now you need a way to connect this LineItem to our ShoppingCart. Remember, a shopping cart needs to be able to contain many line items. In other words, many shopping carts must reference many line items. And so we’ll use a third kind of relationship called a many to many relationship.

class ShoppingCart(models.Model):
    user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    line_items = models.ManyToManyField(LineItem)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self):
        return f"{self.user}'s cart as at {self.updated_at}"

Here’s how you’ll create carts now:

user = user_model.objects.first()

cart = ShoppingCart(user=user)
cart.full_clean()
cart.save()

banana1 = Banana.objects.get(variant=0)
banana2 = Banana.objects.get(variant=1)

line_item = LineItem(cart=cart, banana=banana1, qty=1)
line_item.full_clean()
line_item.save()

cart.line_items.add(line_item)

line_item = LineItem(cart=cart, banana=banana2, qty=1)
line_item.full_clean()
line_item.save()
cart.line_items.add(line_item)

You can query your shopping cart to see the line items for yourself:

cart.line_items.all()

You should see a QuerySet containing 2 items — the line items you added earlier.

And that’s how easy it is to relate models to one another. Now let’s see how you can actually work with these relationships.

Querying relationships

When adding a relationship, you can specify an additional attribute called related_name. This is used for reverse relationships. What are reverse relationships?

When querying a user’s shopping cart, you can do this:

u = user_model.objects.first()

u.shoppingcart

u.shoppingcart contains an instance of that user’s shopping cart. For OneToOneField by default, Django will convert the model namme to lowercase and use it as the reverse relationship name. However, you can specify your own:

user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name="cart")

u = user_model.objects.first()

u.cart

Now u.cart will contain the instance because you explicitly told Django to use that name.

What about reverse ForeignKeys though?

banana = Banana.objects.first()

banana.lineitem_set.all()

This will return a QuerySet of all LineItem instances where this Banana instance was used. Django turns the model name to lowercase and then appends _set to it so that you know that it’s a QuerySet. You can use a more readable name:

class LineItem(models.Model):
    banana = models.ForeignKey(Banana, on_delete=models.CASCADE, related_name="line_items")
    qty = models.PositiveIntegerField("Quantity")

Now banana.line_items.all() will contain the QuerySet because you named it explicitly. I prefer line_items to lineitem_set because it’s more readable. That’s why I suggest that you always name your relationships.

The same thing is done for ManyToManyField:

l = LineItem.objects.first()

l.shoppingcart_set.all()

And you can name it manually:

class ShoppingCart(models.Model):
    user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    line_items = models.ManyToManyField(LineItem, related_name="carts")
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    def __str__(self):
        return f"{self.user}'s cart as at {self.updated_at}"

l = LineItem.objects.first()

l.carts.all()

Now you’ll have your reverse relationship in the carts field. Using carts instead of cart seems a little weird because a LineItem can be in a single ShoppingCart at a time. Reverse relationships doesn’t actually exist in SQL, it’s a quality of life feature in Django’s ORM. That’s why we can’t ask a LineItem what ShoppingCart it’s part of because then the ShoppingCart will have to assume that it can only reference a single LineItem. If this doesn’t make sense then don’t worry about it. SQL is weird but as long as your data is consistent and easily accessible, you can relax.

More about ManyToManyField

You were able to reference multiple LineItems from a ShoppingCart instance but SQL is rows and columns, so how does this actually work?

If you take a look at your SQLite schema, you’ll see 3 columns:

  • shop_shoppingcart
  • shop_lineitem
  • shop_shoppingcart_line_items

shop_shoppingcart was created by the ShoppingCart model and shop_lineitem by the LineItem model. What about shop_shoppingcart_line_items? There’s no model named ShoppingCartLineItems because we never created one, so where does this table come from?

Since SQL doesn’t support referencing multiple rows from a single row, we need some sort of hack to make this work — use an intermediary table to hold relationships between rows and columns.

Inspect the shop_shoppingcart_line_items table and you’ll see 3 columns:

  • id
  • shoppingcart_id
  • lineitem_id

As you can see, this table will hold ids for ShoppingCart and LineItem objects. With the following:

cart.line_items.add(line_item)

Django will create a new row in the shop_shoppingcart_line_items table containing both the id of the cart object and the line_item. This trick allows you to hold references to multiple objects in your model.

Django automatically creates this intermediary table for you can also create it yourself. Why and how would you do that? Let’s find out next.

Relating a model to itself

You want users on your shop be friends with each other so that they come to your site more often to share stuff.

You’ll use your own custom user model to hold this relationship.

from django.db import models
from django.contrib.auth.models import AbstractUser

class User(AbstractUser):
    friends = models.ManyToManyField('self')

Using 'self' allows you to reference a model to itself. Now users can add friends:

u1 = user_model.objects.create(username='abcd', email="[email protected]")
u2 = user_model.objects.create(username='abcdef', email="[email protected]")

u1.friends.add(u2)

The great thing about 'self' is that uses a symmetrical relationship where if you add u2 as u1’s friend, then u1 is automatically added to u2’s friends as well. See for youself:

u2.refesh_from_db()
print(u1 in u2.friends.all())
>>> True

Likewise, if you remove u1 from u2 friends, then u2 is also removed from u1’s friends. Check it out:

u2.friends.remove(u1)
u1.refresh_from_db()
print(u2 in u1.friends.all())

What if you want to attach a timestamp to this relationship to allows users to find out when they became friends? Create your own intermediary model. Find out how next.

Adding metadata to ManyToManyField

class User(AbstractUser):
    friends = models.ManyToManyField('self', through="UserFriendship",
                                     through_fields=("to_user", "from_user"))

class UserFriend(models.Model):
    to_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="+")
    from_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="+")
    created_at = models.DateTimeField(auto_now_add=True)

Notice 3 things:

  1. Using the through attribute allows you to use your own intermediary model instead of allowing Django to create one automatically.
  2. The through_fields attribute allows you to specify while fields on the intermediary model are used for the relationship.
  3. Using + as related_name disables reverse relationships. In this case, you don’t need the reverse relationship because you won’t be fetching the UserFriend instance from User objects directly in most cases.

If you don’t want to disable related_name, then you can do:

to_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="friendships_from")
from_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="friendships_to")

This is confusing and I would avoid it.

Now you can provide users with functionality like finding out who added them as friend during the past 7 days. Or, who have been their friends for the longest. Let’s play with these next.

Fat models

You want to provide users with the ability to filter users they befriended between a particular time period. Where would you add this logic? That’s where the concept of the fat model comes in.

Instead of having business logic all over the place, add it to models themselves as instance methods:

class User(AbstractUser):
    friends = models.ManyToManyField('self', through="UserFriendship",
                                     through_fields=("to_user", "from_user"))

    def friendships_between(self, friend: User, from_time: datetime, to_time: datetime) -> QuerySet:
        return UserFriend.objects.filter(to_user=self, from_user=friend, created_at__gte=datetime, created_at__lte=to_time)

You now have access to this method wherever you have a user instance. In templates, in view code, context processors, anywhere.

Some people don’t like fat models because they feel it’s dirty and encourages tight coupling. I don’t see any dirty about this nor do I see any undesired coupling. A lot of frameworks make use of fat models. For example, Wagtail uses models to represent pages in a CMS. When using Wagtail, even the code to process requests and render templates is done using model instance methods, which provides developers an easy way to manage business logic related to CMS pages.

Whenever you need to perform actions on a single model or models related to that model, use the fat model pattern.

However, when performing actions on entire QuerySets, using model methods might not be the best solution. Let’s explore possible methods next.

Table level operations using Managers

You get hired by a company called WebFlix — a competitor to the popular online streaming company Netflix.

To make more money, the company wants a way to figure out what kind of movies people like to watch. To do that, they need data.

That’s where you come in. Your job is to store data about various movies that will help data analysts determine popularity of movies, predict trends, and provide users with movies that they would more likely enjoy watching.

To start off, you’ll store the following about a movie:

  • The average percentage of the movie watched by users. If a movie is 2 hours long and most users watch 1.2 hours, then you’ll store that percentage.
  • The number of users who rewatch the movie.

Here’s your model:

class MoviePopularity(models.Model):
    movie = models.OneToOneField(Movie,
                                on_delete=models.CASCADE,
                                related_name="popularity")
    watch_percent = models.FloatField()
    rewatched = models.PositiveIntegerField()

    class Meta:
        db_table = "movie_popularity"
        verbose_name_pural = "Movie popularity stats"

Using this model is simple:

popularity = MoviePopularity(movie=movie, watch_percent=40.0, rewatched=0)
popularity.full_clean()
popularity.save()

According to the statistics department, a movie that has been watched for:

  • More than 80% and rewatched by at least 1000 users is considered very popular.
  • More than 80% but has never been rewatched is considered popular.
  • Less than 60% is considered a flop.

Querying the model for this information is quite simple:

very_popular = MoviePopularity.objects.filter(watch_percent__gt=80.0,
                                            rewatched__gte=1000)
popular = MoviePopularity.objects.filter(watch_percent__gt=80.0
                                            rewatched=0)
flopped = MoviePopularity.objects.filter(watch_percent__lt=60.0)

However, these queries feel wrong. Whenever you need some information, you have to remember the business rules set by the statistician. Things like the percentage and watch count.

Not to mention that these things can and will change. Down the road, the statistician might provide an additional requirement like the number of people who share the movie after watching it. If you use these kind of queries in many places, you’ll have a hard time refactoring once the business rules change.

That’s where custom model managers come in. Remember how we used fat models to work on individual instances? Well, managers operate on whole tables instead.

Take a look at this manager:

class MoviePopularityManager(models.Manager):
    def very_popular(self):
        return self.filter(
            watch_percent__gt=80.0,
            rewatched__gte=1000)

    def popular(self):
        return self.filter(
            watch_percent__gt=80.0,
            rewatched=0)

    def flopped(self):
        return self.filter(watch_percent__lt=60.0)

To use this manager, simple add it to your model:

class MoviePopularity(models.Model):
    movie = models.OneToOneField(Movie,
                                on_delete=models.CASCADE,
                                related_name="popularity")
    watch_percent = models.FloatField()
    rewatched = models.PositiveIntegerField()

    objects = MoviePopularityManager()

    class Meta:
        db_table = "movie_popularity"
        verbose_name_pural = "Movie popularity stats"

As you can see, we replaced the default manager with our custom one. It functions the same as the default one but with our custom methods added.

Check this out:

very_popular = MoviePopularity.objects.very_popular()

popular = MoviePopularity.objects.popular()

flopped = MoviePopularity.objects.flopped()

How do you like that? Personally, I love this solution because all the logic is encapsulated in a single place and you can easily extract the data in a less verbose manner.

These methods are nothing special either. You can do your regular querying with them:

very_popular_a = MoviePopularity.objects.very_popular().filter(
                    movie__title__startswith="a")

Whenever you need to reduce duplication when querying for objects, try to use Managers to get the job done.

Optimizing database queries for performance

You’re approached by the SJW Union, an organization in charge of recording various infractions on the Internet and subsequently, cancel the individuals who caused them.

They want you to build an app to help them with such recordings.

You’ll record people who cause infractions with this model:

class Person(models.Model):
    STATUS_GREEN = 0
    STATUS_YELLOW = 1
    STATUS_RED = 2
    STATUS_CHOICES = (
        ("Green", STATUS_GREEN),
        ("Yellow", STATUS_YELLOW),
        ("Red", STATUS_RED)
    )

    name = models.CharField(max_length=150)
    email = models.EmailField()
    phone_number = models.CharField(max_length=100)
    twitter_username = models.CharField(max_length=40)

    status = models.PositiveSmallIntegerField(choices=STATUS_CHOICES)

    def __str__(self):
        return f"{self.name}: {self.get_status_display()}"

Later, when a Person’s status is Red, that person will be cancelled. Yellows will be mildly harrassed.

You’ll record infractions with this model:

class Infraction(models.Model):
    SEVERITY_LOW = 0
    SEVERITY_MEDIUM = 1
    SEVERITY_HIGH = 2
    SEVERITY_CHOICES = (
        ("Low", SEVERITY_LOW),
        ("Medium", SEVERITY_MEDIUM),
        ("High", SEVERITY_HIGH)
    )

    person = models.ForeignKey(Person,
                                on_delete=models.CASCADE,
                                related_name="infractions")
    severity = models.PositiveIntegerField(choices=SEVERITY_CHOICES)
    content = models.TextField(help_text=("Describe the infraction"))

    def __str__(self):
        return (
            f"{self.get_severity_display()} severity infraction"
            f"by {self.person.name}")

Persons will also be categorized. For example, a Person who causes an infraction against the transsexual community will be categorized as Transphobic.

This is your model:

class Category(models.Model):
    name = models.CharField(max_length=70)

    def __str__(self):
        return self.name

Update your Person model to use this model:

class Person(models.Model):
    ...
    categories = models.ManyToManyField(Category)
    ...

You’re asked to build a view that will list all Person objects, the list of categories they’re in, as well as the highest level of severity across all of their infractions.

As a naive developer, you do the following:

def persons(request):
    ctx = {
        "persons": Person.objects.all()
    }
    return TemplateResponse(request, "persons.html", ctx)

Of course, you need a way to retrieve each person’s worst infraction. What better way than using a model method?

class Person(models.Model):
    ...
    @cached_property
    def worst_infraction(self):
        return max(self.infractions.all(), key=lambda i: i.severity)

Then in your template, you present the information as follows:

<h1>Persons</h1>

<table>
    <thead>
        <tr>
            <th scope="col">Name</th>
            <th scope="col">Categories</th>
            <th scope="col">Worst infraction level</th>
        </tr>
    </thead>
    <tbody>
        {% for person in persons %}
            <tr>
                <td>{{ person.name }}</td>
                <td>
                <ul>
                    {% for category in person.categories.all %}
                        <li>{{ category.name }}</li>
                    {% endfor %}
                </ul>
                </td>
                <td>{{ person.worst_infraction }}</td>
            </tr>
        {% endfor %}
    </tbody>
</table>

Simple enough eh?

Here’s what happens when the view code runs:

  1. Django prepares a QuerySet to be executed against your database.
  2. That queryset is sent to the template as persons.

Once you start looping through the QuerySet, Django performs a query against your database to fetch a record of all the Person objects in it.

Then for each Person:

  1. Django connects to the database to fetch all that Person’s categories.
  2. It loops through these categories.
  3. Then it called the worst_severity method for each Person.
  4. This method loads all the Person’s infractions in memory, and uses the Python max() function to get the maxium value.

These operations look fair and nothing bad seems to be happening, especially if you don’t have many users yet.

However, once you start getting traffic, you’ll find that you need more and more resources to perform these basic tasks. Eventually, it will start to cost a ton of money just to prevent the app from crashing.

Why is that?

Because not only are we performing Python operations, but the database is alo being hit a number of times in parallel.

Wouldn’t it be better if we could reduce the number of operations in Python or against the database? What if we could reduce both?

Querying across relationships efficiently

For each Person, we’re connection to the database to fetch the categories. If we have 1,000 Person objects, 1000 database connections would be required. This isn’t right.

Django allows you to fetch all objects related to a model in one fell swoop using a method called prefetch_related

Check this out:

persons = Person.objects.all().prefetch_related('categories')

for person in persons:
    for category in person.categories.all():
        print(category.name)

Here’s what’s happening:

  1. Django prepares a QuerySet to be executed.
  2. Once you start looping through persons, Django performs 2 SQL queries:
    1. One to fetch all Person objects.
    2. A second one to fetch all categories for all Person objects.
    3. Django bundles everything up in a single QuerySet.

If you have 1,000 Persons, instead of connecting to the database 1,000 times like before, you will connect only 2 times.

Pretty good huh?

Using aggregation

In our worst_severity method, we’re connecting to the database to bring all of a Person’s infractions into Python memory so that we can determine the maximum value for that Person.

If your app is serving a large number of users, both your memory and CPU usage will be high. If you don’t scale your server up, your app will crash.

Wouldn’t it be great if we could perform this operation in the database itself? As a matter of fact, SQL systems contain a lot of built-in ways to work with data. After all, they were designed for that.

Let’s use Django’s aggregation feature:

from django.db.models import Max

...
def worst_infraction(self):
    return self.infractions.aggregate(max_severity=Max('severity'))['max_severity']
...

This aggregation will instruct the database to find the maximum value in the severity column and return it. Django will store this value in a dict as the key max_severity like we specified. We return the value of this key.

Instead of loading entire infractions in memory and then finding the maximum value, we use the database to find that value and return just that — greatly reducing our memory and CPU usage.

Using annotations

The SJW Union needs to prepare a hit list for their next mass cancellation campaign. They need the names of persons who committed more than 3 infractions ASAP.

You can use this query:

hitlist = []

for person in Person.objects.all().prefetch_related('infractions'):
    if person.infractions.count() > 3:
        hitlist.append(person)

Django will perform a query to fetch all Person records, then a second one to fetch all infractions for every Person object and bundling them up in a single QuerySet to avoid future database connections.

Since we already prefetched the infractions, Django will use len to count the number of infractions instead of using the database to count records. This means that we’re using memory to hold infraction records and then also using CPU to count objects.

We don’t actually care about entire infraction records in this case. We only care about the number of objects. So, what’s the point in loading them into memory?

Instead of loading them up in memory, just count the objects in the database and tell Django to add this information to our QuerySet:

from django.db.models import Count


hitlist = []

for person in Person.objects.all().annotate(infraction_count=Count('infractions'):
    if person.infraction_count > 3:
        hitlist.append(person)

This will use a single database connection to fetch all Person records while also returning a count of all infraction records for each Person object.

Django will add the count in the infraction_count attribute like we instructed.

We’re now once more delegating the dirty work to the database, which allows us to use less memory and CPU.

However, we’re using a list to hold out hitlist. What if we want to filter that list later on? Say, we want to find everyone who’s name starts with A:

new_list = [person for person in hitlist if person.name.startswith('A')]

As you probably guessed, this is quite inefficient. We should use the database to do this kind of dirty work.

Instead of storing our histlist in a list, why not just use a QuerySet?

hitlist = Person.objects.all()
        .annotate(infraction_count=Count('infractions')
        .filter(infraction_count__gt=3)

Ah, much better! Later on, we can simply filter on this QuerySet as needed:

hitlist.filter(name__startswith='A')

This is the power of annotations — they act just like a regular model field.

When to optimize database access

Using the above methods, you’ll be able to optimize database access in 99% of use cases.

You should avoid wasting time with premature optimization because it will prevent you from shipping features.

But how do you know when to optimize and when to move on?

You’ll need 2 tools for that:

  1. django-debug-toolbar
  2. django-extensions

The first one, django-debug-toolbar, allows to see CPU usage when rendering templates as well as database queries that get performed when you load a view.

The second one, django-extensions, has a command called shell_plus. If you run this command with the –print-sql flag, it will print SQL statements that get run when you’re building QuerySets.

By using the information provided by these tools, you’ll be able to determine when it’s time to optimize.

Next steps

As you saw, Django provides you with an ORM that allows you to take full advantage of SQL without having to write any SQL yourself.

In this post, we merely scratched the surface but you should now be fully armed with the knowledge to model any kind of business object thrown at you as well as optimize database access to keep your apps running smoothly even under load.

If you look at the documentation for models , you’ll quickly get overwhelmed by all the ways that Django provides for you to work with data stored in the database.

A good strategy is to avoid reading the documentation until you come across a problem that requires you to use some of these obscure but amazing features.

For example, Django provides an assortment of database functions . Trying to learn about them without a proper use case would be a waste of time. Instead of trying to figure out what Coalesce does, learn about it once you get a task that requires to to build a view that contains no null values.

That’s all. Now it’s your turn. Go and build some great apps.