Django’s ORM allows you to easily access and model your data. Since it has a ton of features, you probably haven’t tapped into it’s full power yet.
In this post, you’ll learn about the more advanced ORM features by solving real world problems.
Django’s ORM allows you to easily access and model your data. Since it has a ton of features, you probably haven’t tapped into it’s full power yet.
In this post, you’ll learn about the more advanced ORM features by solving real world problems.
There’s no point in learning how to use any ORM without first learning at least the basics of SQL because ORMs are all about SQL.
Use these resources to quickly learn SQL:
You’re an officer at the Gestapo and you need a database to keep tabs on people you’re observing.
Here’s how you represent this in Django:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.PositiveSmallIntegerField()
first_seen = models.DateTimeField(auto_now_add=True)
def __str__(self):
return self.name
See? Django provides you with a bunch of ways to automatically convert and validate data from your database into Python objects. I encourage you to check out all types of fields it provides .
Here, CharField will turn into a str
, PositiveIntegerField will turn into an int
, and finally,
DateTimeField will turn into a Python datetime
object.
Models on their own are pretty useless because it’s in the database that your data will lie. Models simply allow you to describe what your data looks like in the database without having to look at it.
Instead of managing your database manually, Django provides a concept called a migration system where you use code to make sure that your models and database are exact copies.
To understand the migration system, skip migrations for now and create the database table manually.
Connect to your database using sqlite db.sqlite3
and enter the following:
CREATE TABLE "persons"
(
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"name" varchar(255) NOT NULL,
"age" smallint unsigned NOT NULL CHECK ("age" >= 0),
"first_seen" datetime NOT NULL
);
Now you can play with real data using the Django shell:
p = Person.objects.create(name="Henric Heinz", age=22, first_seen=datetime.fromisoformat("1938, 12, 01"))
If you look inside your database table, you’ll see a new row with this data:
SELECT * FROM "persons";
Django is using your model definition to create database tables corresponding to your input data without you having to write any SQL.
Update your model to add an important metric, favorite_color, that any Gestapo officer needs to track:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.PositiveSmallIntegerField()
first_seen = models.DateTimeField(auto_now_add=True)
favorite_color = models.CharField(max_length=50)
def __str__(self):
return self.name
Since you added a new field, you need this change to be reflected in your database as well,
otherwise, Django will raise OperationalError
when you perform CRUD against it because it’s expecting
the database table to contain a favorite_color
column:
p = Person.objects.create(name="Nadine Braun", age=22)
>>> django.db.utils.OperationalError: table persons has no column named favorite_color
Connect to your database with sqlite db.sqlite3
and enter this:
ALTER TABLE "persons"
ADD COLUMN "favorite_color" varchar(50);
You can now update the previous record to add the user’s favorite color:
p = Person.objects.first()
p.favorite_color = "red"
p.save()
Notice how in our model definition, the favorite_color
field is mandatory. By
default, CharField
s are non nullable and non blankable — a value must be supplied
when creating objects or validation will fail.
However, in our SQL statement, we didn’t add the NOT NULL
constraint. There is a
mismatch between our model and table definition. A developer looking at the model will assume that
the database column contains a NOT NULL
constraint because that what he’ll see in the model code.
In this case, it’s not a issue because you’re only accessing data from Django. However, if your data
is accessed by external tools, these tools might write null
values to the favorite_color
field. When you
fetch records, you’ll send up with None
in Django, which might cause hard to find bugs in your logic.
Not to mention that it’s easy to make a mistake when writing SQL statements. Consider the following:
ALTER TABLE "persons"
ADD COLUMN "favorite_color" varchar(30);
Notice how the max length of the favorite_color
column is 30 but in Django, it’s 50.
Everything will work fine when adding text less than or equal to 30 characters of length, but once
you start adding 31+ characters, Django will raise an OperationalError
again which will
break your app.
Let’s say that you’re already working on a production database will millions of records.
If you want to add a non-nullable column, you need to make sure to set a sensible default
value to existing records before adding the NOT NULL
constraint. Doing this in SQL
is tedious because:
The solution? Use Django migrations because they allow you to use code to manage your database schema. Migrations solve 2 problems:
Ok, delete your database and create a new migration:
rm db.sqlite3
python manage.py makemigrations myapp --empty
You just instructed Django to create a new empty migrations file for your
myapp
app. You’ll see the file called 0001_initial.py
in the migrations directory inside the
myapp
directory.
Inside the migration file, you’ll see:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
]
operations = [
]
The operations list is what you’re interested it. It allows you to perform operations against your database.
Add this to the operations list:
migrations.RunSQL(
"""
CREATE TABLE "persons"
(
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"name" varchar(255) NOT NULL,
"age" smallint unsigned NOT NULL CHECK ("age" >= 0),
"first_seen" datetime NOT NULL,
);
""",
"""
DROP TABLE "persons";
"""
)
Here, we specify one operation with a forward and reverse method. If you run the migrate
command
now, the table will be created. If you undo the migration with python manage.py migrate myapp zero
, then
the second DROP TABLE
statement will run and your table will be deleted.
Run the migration:
python manage.py migrate myapp
Since you forgot to add the favorite_color
column, you get OperationalError
when creating objects:
p = Person.objects.create(name="Amon Goeth", age=27)
>>> django.db.utils.OperationalError: table persons has no column named favorite_color
Let’s fix this now. Create a new empty migration:
python manage.py makemigrations myapp --empty --name person_add_favorite_color
It’s important to name your empty migrations because by default, Django will generate a bogus name that doesn’t make any sense.
Add the following to the operations list:
migrations.RunSQL(
"""
ALTER TABLE "persons"
ADD COLUMN "favorite_color" varchar(30);
""",
migrations.RunSQL.noop
)
Now you can create new objects:
p = Person.objects.create(name="Amon Goeth", age=27, favorite_color="Red")
Notice how yoou made a mistake in your SQL statement by adding varchar(30)
instead of 50.
Unfortunately, SQLite doesn’t support modifying columns. So, we’re forced to delete our database
and start from scratch.
Delete the database:
rm db.sqlite3
Then edit the previous migrations file:
migrations.RunSQL(
"""
ALTER TABLE "persons"
ADD COLUMN "favorite_color" varchar(50);
""",
migrations.RunSQL.noop
)
Now migrate:
python manage.py migrate
Notice the migrations.RunSQL.noop
line. If you want to rollback this migration, you’ll have to issue a kind of DROP COLUMN
statement but SQLite
doesn’t support this kind of operation. That’s why you instruct Django to do nothing when rolling back. If you don’t specify a reverse method, then Django
will prevent you from rolling back the migration.
As you can see, dealing with the database manually is error prone. Thankfully, your app isn’t running in production yet and you can afford to delete the database and start from scratch. What if you’re running in production though? Then, you’ll have to:
Too tedious for me to cover, but doable by writing migration operations.
Since you’re unable to modify your table to add the NOT NULL
constraint because SQLite doesn’t support
this operation, your’re forced to start from scratch once again.
However, I’m getting tired of this and I’m sure you are too. Let’s put all this tedious hard work behind us and tap into the real power of Django migrations.
Delete all migration files as well as your database:
rm myapp/migrations/000*.py
rm db.sqlite3
Let Django automatically create the migration and migrate the database:
python manage.py makemigrations myapp
python manage.py migrate
Now you can create objects:
p = Person.objects.create(name="Amon Goeth", age=27, favorite_color="Red")
Magic?
The Gestapo chief sends a telegram saying that every Gestapo officer needs to start recording the occupation of all suspects. Let’s modify our model to do that.
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.PositiveSmallIntegerField()
first_seen = models.DateTimeField(auto_now_add=True)
favorite_color = models.CharField(max_length=50)
occupation = models.CharField(max_length=100, null=True, blank=True)
def __str__(self):
return self.name
Generate migrations and migrate:
python manage.py makemigrations
python manage.py migrate
You can now add Amon’s occupation:
p = Person.objects.create(name="Amon Goeth", age=27, favorite_color="Red", occupation="Farmer")
However, it’s possible to create new objects without specifying an occupation:
p = Person.objects.create(name="Tanja Decker", age=21, favorite_color="Purple")
If you keep adding dirty data to your database, you’ll get fired and die a lonely death.
You can solve this by setting blank
to False
:
occupation = models.CharField(max_length=100, null=True, blank=False)
If you try to create:
p = Person.objects.create(name="Uta Foerster", age=23, favorite_color="Blue")
It will still create! Why? Because using the objects.create()
method doesn’t perform model
validation. It saves the data as is. The blank flag is used by model validation only and not used
by the database in any way.
If you want to validate before saving:
p = Person(name="Uta Foerster", age=23, favorite_color="Blue")
p.full_clean()
>>> django.core.exceptions.ValidationError: {'occupation': ['This field cannot be blank.']}
Now you get an error. However, you need to remember to call full_clean()
before saving any object.
It’s not possible to set the occupation field to non null because the database has no way of knowing how to populate existing rows with data. Let’s use a migration to fix that:
Create an empty migration:
python manage.py makemigrations myapp --empty --name person_occupation_not_null
Change the contents to look like this (replacing myapp with your own):
from django.db import migrations
def forwards(apps, schema_editor):
person_model = apps.get_model("myapp", "Person")
for person in person_model.objects.all():
if person.occupation is None:
person.occupation = "-"
person.full_clean()
person.save()
class Migration(migrations.Migration):
dependencies = [
('myapp', '0003_alter_person_occupation'),
]
operations = [
migrations.RunPython(forwards, migrations.RunPython.noop)
]
Migrate the database:
python manage.py migrate
Now that all columns have a non-null value, you can remove the null flag from the field:
occupation = models.CharField(max_length=50)
Now, generate migrations and you’ll be greeted with some options:
As you can see, you could have let Django set a default value by choosing option 1 instead of writing your own migration operation using RunPython. Since you already did that, pick option 2.
Now that the field is non-nullable, you won’t be able to create dirty objects:
Person.objects.create(name="Monika Duerr", age=22, favorite_color="Green")
Wait a minute! It still created without an error. That’s because by default, when you don’t provide a value for
text based values, Django will save the empty string ""
instead of null
.
What can you do about it? Stop using objects.create()
to create objects. Instead, use full_clean()
to validate your
data before saving. Only use objects.create()
when you’re testing stuff.
Avoid using null=True
in CharField
s because then you will have 2 different possible values
for blank fields. If you want to search your database for fields that are empty, then you’ll have
to search for rows whose field contains either ""
or null
.
Instead, set only blank=False
and call it a day. Avoid setting null=True
on any text based field.
By using the RunPython operation, you can write Python code to work directly with your database. For example, you can load data from CSV files or external APIs.
While models allow you to model your data in Python, migrations allow you to prepare your database to be access by your models.
Next, you’ll learn how to enforce data consistency so that you can build high quality data sets.
Take a look at this:
person = Person.objects.create(name="Nadine Braun", age=0)
You now have a person with age 0 — clearly a mistake.
Let’s say that Gestapo is only allowed to spy on people aged 18 and beyond. You need to update your model to reflect this requirement:
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.PositiveSmallIntegerField(min_value=18)
first_seen = models.DateTimeField(auto_now_add=True)
def __str__(self):
return self.name
Update and apply your migrations:
python manage.py makemigrations
python manage.py migrate
Check this out now:
person = Person.objects.create(name="Nadine Braun", age=0)
Wait a minute, a person 0 years old is still created even though we
added this min_value
thing to our field. Why is that?
Remember I told you to stop using objects.create
because it bypasses validation? The proper way to create objects is by calling full_clean
and save
:
person = Person(name="Annet Shultz", age=0)
person.full_clean()
>>> ValidationError...
Now you’ll avoid polluting your database with invalid data.
The min_value
field makes use of validation built into Django but you can easily create your own validators.
Let’s say that you want to enforce that all favorite_color
values start with a capital letter.
Here’s how that would work:
import string
from django.utils.translation import gettext_lazy as _
def validate_capfirst(value):
if value[0] not in string.ascii_uppercase:
raise ValidationError(
_('%(value)s must be capitalized.'),
params={'value': value},
)
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.PositiveSmallIntegerField(min_value=18)
first_seen = models.DateTimeField(auto_now_add=True)
favorite_color = models.CharField(max_length=20, validators=[validate_capfirst])
def __str__(self):
return self.name
If you try to create with non-capitalized string:
p = Person(name="Sophia Herzog", age=24, favorite_color="red")
p.full_clean()
>>> django.core.exceptions.ValidationError: {'favorite_color': ['red must be capitalized.']}
By attaching as many valiators as you want in the validators
list, you have an easy
way to enforce data consistency in your database.
You’ve seen how individual fields can be validated. But what if you want to validate multiple fields at once.
Let’s say that the Gestapo chief introduces a new rule stating that every officer needs to start monitoring males aged as low as 16 years. Other genders can only be monitored if they’re at least 18 years old.
You need to update your model to reflect this change:
class Person(models.Model):
MALE = 0
FEMALE = 1
GENDER_CHOICES = (
(MALE, "Male"),
(FEMALE, "Female")
)
name = models.CharField(max_length=255)
age = models.PositiveSmallIntegerField(min_value=18)
gender = models.PositiveSmallIntegerField(choices=GENDER_CHOICES)
first_seen = models.DateTimeField(auto_now_add=True)
def __str__(self):
return self.name
If we create this object:
person = Person(name="Henrich Heinz", age=16, gender=Person.MALE)
person.full_clean()
A ValidationError
will be raised because Mr Henrich is only 16 but our model
only accepts persons aged atleast 18.
Let’s fix this:
class Person(models.Model):
MALE = 0
FEMALE = 1
GENDER_CHOICES = (
(MALE, "Male"),
(FEMALE, "Female)
)
name = models.CharField(max_length=255)
age = models.PositiveSmallIntegerField(min_value=16)
gender = models.PositiveSmallIntegerField(choices=GENDER_CHOICES)
first_seen = models.DateTimeField(auto_now_add=True)
def clean(self):
super().clean()
if self.gender != Person.MALE and self.age < 18:
raise ValidationError(f"{self.get_gender_display()}s must be at least 18 years of age.")
def __str__(self):
return self.name
Notice how your model is validating gender
against a set of known values.
The value stored in the database will be 0th index of the tuple, which in this case is a small integer.
At index 1 of the tuple, you have the human readable name that you can fetch using
self.get_fieldname_display()
where fieldname
is the name of your field.
Instead of storing Male and Female in the database, it’s more space efficient to store small integers like 0 and 1. Django allows you to easily get this done using this simple technique.
Now we can create 16 year old males:
person = Person(name="Henrich Heinz", age=16, gender=Person.MALE)
person.full_clean()
But not 16 year old females:
person = Person(name="Anna Weiss", age=16, gender=Person.FEMALE)
person.full_clean()
>>> django.core.exceptions.ValidationError: {'__all__': ['Females must be at least 18 years of age.']}
By using model level validation, you can easily enforce complicated business rules so that your data remains clean over your application’s lifetime.
Next, let’s explore how to use the ORM to model more complicated real world data.
Fast forward to 2022 and Gestapo is no more. You retired to Latin America where no one knows you. The Internet is the new real life and so you decide to make some money by selling the bananas growing in your backyard online. Using Django of course.
Start with a model to represent Banana items.
class Banana(models.Model):
CAVENDISH_VARIANT = 0
RED_VARIANT = 1
PLANTAIN_VARIANT = 2
VARIANT_CHOICES = (
(CAVENDISH_VARIANT, "Cavendish"),
(RED_VARIANT, "Red"),
(PLANTAIN_VARIANT, "Plantain"),
)
variant = models.PositiveSmallIntegerField(choices=VARIANT_CHOICES)
qty = models.PositiveIntegerField("Quantity")
def __str__(self):
return f"{self.get_variant_display()} banana."
When a user is shopping for bananas on your website, he will add his desired banana in his shopping cart. We’ll use a model to represent this shopping cart.
The shopping cart will need to store a reference to items that the user added to his cart. You could store references to Banana objects directly like so:
from django.conf import settings
class ShoppingCart(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
banana = models.ForeignKey(Banana, on_delete=models.SET_NULL, null=True, blank=True)
qty = models.PositiveIntegerField("Quantity")
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
def __str__(self):
return f"{self.user}'s cart as at {self.updated_at}"
ForeignKey
is used to relate a model to another one. In this case, every ShoppingCart
instance will
have a reference to a user, and a Banana instance.
This is known as a many to one relationship because many shopping carts can reference one user. Let’s take an example:
banana = Banana.objects.first()
user = user_model.objects.first()
cart = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
cart.save()
cart2 = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
cart.save()
Simple enough? However, this model is wrong. Why should a user be able to create multiple shopping carts? And why can a user add only a single type of banana in his cart at a time?
That’s where OneToOneField
comes in. Instead of using a many to one relationship between shopping carts and users,
you want a one to one relationship. A user will be able to create only one shopping cart.
class ShoppingCart(models.Model):
user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
banana = models.ForeignKey(Banana, on_delete=models.SET_NULL, null=True, blank=True)
qty = models.PositiveIntegerField("Quantity")
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
def __str__(self):
return f"{self.user}'s cart as at {self.updated_at}"
banana = Banana.objects.first()
user = user_model.objects.first()
cart = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
cart.save()
cart2 = ShoppingCart(user=user, banana=banana, qty=1)
cart.full_clean()
>> django.core.exceptions.ValidationError: {'user': ['Shopping cart with this User already exists.']}
Perfect, but what about adding multiple bananas to the cart? Clearly our data model is wrong. Instead of referencing bananas directly, use another model to represent this relationship. Many e-commerce systems use this kind of model and they usually call it a LineItem.
class LineItem(models.Model):
cart = models.ForeignKey(ShoppingCart, on_delete=models.CASCADE)
banana = models.ForeignKey(Banana, on_delete=models.CASCADE)
qty = models.PositiveIntegerField("Quantity")
Now you need a way to connect this LineItem to our ShoppingCart. Remember, a shopping cart needs to be able to contain many line items. In other words, many shopping carts must reference many line items. And so we’ll use a third kind of relationship called a many to many relationship.
class ShoppingCart(models.Model):
user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
line_items = models.ManyToManyField(LineItem)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
def __str__(self):
return f"{self.user}'s cart as at {self.updated_at}"
Here’s how you’ll create carts now:
user = user_model.objects.first()
cart = ShoppingCart(user=user)
cart.full_clean()
cart.save()
banana1 = Banana.objects.get(variant=0)
banana2 = Banana.objects.get(variant=1)
line_item = LineItem(cart=cart, banana=banana1, qty=1)
line_item.full_clean()
line_item.save()
cart.line_items.add(line_item)
line_item = LineItem(cart=cart, banana=banana2, qty=1)
line_item.full_clean()
line_item.save()
cart.line_items.add(line_item)
You can query your shopping cart to see the line items for yourself:
cart.line_items.all()
You should see a QuerySet
containing 2 items — the line items you added earlier.
And that’s how easy it is to relate models to one another. Now let’s see how you can actually work with these relationships.
When adding a relationship, you can specify an additional attribute called
related_name
. This is used for reverse relationships. What are reverse relationships?
When querying a user’s shopping cart, you can do this:
u = user_model.objects.first()
u.shoppingcart
u.shoppingcart
contains an instance of that user’s shopping cart. For OneToOneField
by default,
Django will convert the model namme to lowercase and use it as the reverse relationship name.
However, you can specify your own:
user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name="cart")
u = user_model.objects.first()
u.cart
Now u.cart
will contain the instance because you explicitly told Django to use that name.
What about reverse ForeignKeys though?
banana = Banana.objects.first()
banana.lineitem_set.all()
This will return a QuerySet
of all LineItem instances where this Banana instance was used. Django turns the model name to lowercase and then
appends _set
to it so that you know that it’s a QuerySet. You can use a more readable name:
class LineItem(models.Model):
banana = models.ForeignKey(Banana, on_delete=models.CASCADE, related_name="line_items")
qty = models.PositiveIntegerField("Quantity")
Now banana.line_items.all()
will contain the QuerySet
because you named it explicitly.
I prefer line_items
to lineitem_set
because it’s more readable. That’s why I suggest that
you always name your relationships.
The same thing is done for ManyToManyField
:
l = LineItem.objects.first()
l.shoppingcart_set.all()
And you can name it manually:
class ShoppingCart(models.Model):
user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
line_items = models.ManyToManyField(LineItem, related_name="carts")
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
def __str__(self):
return f"{self.user}'s cart as at {self.updated_at}"
l = LineItem.objects.first()
l.carts.all()
Now you’ll have your reverse relationship in the carts
field. Using carts
instead of cart
seems a little weird because
a LineItem can be in a single ShoppingCart at a time. Reverse relationships doesn’t actually exist in SQL, it’s a quality of
life feature in Django’s ORM. That’s why we can’t ask a LineItem what ShoppingCart it’s part of because then the
ShoppingCart will have to assume that it can only reference a single LineItem. If this doesn’t make sense then don’t worry
about it. SQL is weird but as long as your data is consistent and easily accessible, you can relax.
You were able to reference multiple LineItems from a ShoppingCart instance but SQL is rows and columns, so how does this actually work?
If you take a look at your SQLite schema, you’ll see 3 columns:
shop_shoppingcart was created by the ShoppingCart model and shop_lineitem by the LineItem model. What about shop_shoppingcart_line_items? There’s no model named ShoppingCartLineItems because we never created one, so where does this table come from?
Since SQL doesn’t support referencing multiple rows from a single row, we need some sort of hack to make this work — use an intermediary table to hold relationships between rows and columns.
Inspect the shop_shoppingcart_line_items table and you’ll see 3 columns:
As you can see, this table will hold ids for ShoppingCart and LineItem objects. With the following:
cart.line_items.add(line_item)
Django will create a new row in the shop_shoppingcart_line_items table containing both the id of the cart object and the line_item. This trick allows you to hold references to multiple objects in your model.
Django automatically creates this intermediary table for you can also create it yourself. Why and how would you do that? Let’s find out next.
You want users on your shop be friends with each other so that they come to your site more often to share stuff.
You’ll use your own custom user model to hold this relationship.
from django.db import models
from django.contrib.auth.models import AbstractUser
class User(AbstractUser):
friends = models.ManyToManyField('self')
Using 'self'
allows you to reference a model to itself. Now users can add friends:
u1 = user_model.objects.create(username='abcd', email="[email protected]")
u2 = user_model.objects.create(username='abcdef', email="[email protected]")
u1.friends.add(u2)
The great thing about 'self'
is that uses a symmetrical relationship where if you add u2 as u1’s friend, then
u1 is automatically added to u2’s friends as well. See for youself:
u2.refesh_from_db()
print(u1 in u2.friends.all())
>>> True
Likewise, if you remove u1 from u2 friends, then u2 is also removed from u1’s friends. Check it out:
u2.friends.remove(u1)
u1.refresh_from_db()
print(u2 in u1.friends.all())
What if you want to attach a timestamp to this relationship to allows users to find out when they became friends? Create your own intermediary model. Find out how next.
class User(AbstractUser):
friends = models.ManyToManyField('self', through="UserFriendship",
through_fields=("to_user", "from_user"))
class UserFriend(models.Model):
to_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="+")
from_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="+")
created_at = models.DateTimeField(auto_now_add=True)
Notice 3 things:
through_fields
attribute allows you to specify while fields on the intermediary model are used for the relationship.+
as related_name
disables reverse relationships. In this case, you don’t need the reverse relationship because you
won’t be fetching the UserFriend instance from User objects directly in most cases.If you don’t want to disable related_name, then you can do:
to_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="friendships_from")
from_user = models.ForeignKey(User, on_delete=models.CASCADE, related_name="friendships_to")
This is confusing and I would avoid it.
Now you can provide users with functionality like finding out who added them as friend during the past 7 days. Or, who have been their friends for the longest. Let’s play with these next.
You want to provide users with the ability to filter users they befriended between a particular time period. Where would you add this logic? That’s where the concept of the fat model comes in.
Instead of having business logic all over the place, add it to models themselves as instance methods:
class User(AbstractUser):
friends = models.ManyToManyField('self', through="UserFriendship",
through_fields=("to_user", "from_user"))
def friendships_between(self, friend: User, from_time: datetime, to_time: datetime) -> QuerySet:
return UserFriend.objects.filter(to_user=self, from_user=friend, created_at__gte=datetime, created_at__lte=to_time)
You now have access to this method wherever you have a user instance. In templates, in view code, context processors, anywhere.
Some people don’t like fat models because they feel it’s dirty and encourages tight coupling. I don’t see any dirty about this nor do I see any undesired coupling. A lot of frameworks make use of fat models. For example, Wagtail uses models to represent pages in a CMS. When using Wagtail, even the code to process requests and render templates is done using model instance methods, which provides developers an easy way to manage business logic related to CMS pages.
Whenever you need to perform actions on a single model or models related to that model, use the fat model pattern.
However, when performing actions on entire QuerySets, using model methods might not be the best solution. Let’s explore possible methods next.
You get hired by a company called WebFlix — a competitor to the popular online streaming company Netflix.
To make more money, the company wants a way to figure out what kind of movies people like to watch. To do that, they need data.
That’s where you come in. Your job is to store data about various movies that will help data analysts determine popularity of movies, predict trends, and provide users with movies that they would more likely enjoy watching.
To start off, you’ll store the following about a movie:
Here’s your model:
class MoviePopularity(models.Model):
movie = models.OneToOneField(Movie,
on_delete=models.CASCADE,
related_name="popularity")
watch_percent = models.FloatField()
rewatched = models.PositiveIntegerField()
class Meta:
db_table = "movie_popularity"
verbose_name_pural = "Movie popularity stats"
Using this model is simple:
popularity = MoviePopularity(movie=movie, watch_percent=40.0, rewatched=0)
popularity.full_clean()
popularity.save()
According to the statistics department, a movie that has been watched for:
Querying the model for this information is quite simple:
very_popular = MoviePopularity.objects.filter(watch_percent__gt=80.0,
rewatched__gte=1000)
popular = MoviePopularity.objects.filter(watch_percent__gt=80.0
rewatched=0)
flopped = MoviePopularity.objects.filter(watch_percent__lt=60.0)
However, these queries feel wrong. Whenever you need some information, you have to remember the business rules set by the statistician. Things like the percentage and watch count.
Not to mention that these things can and will change. Down the road, the statistician might provide an additional requirement like the number of people who share the movie after watching it. If you use these kind of queries in many places, you’ll have a hard time refactoring once the business rules change.
That’s where custom model managers come in. Remember how we used fat models to work on individual instances? Well, managers operate on whole tables instead.
Take a look at this manager:
class MoviePopularityManager(models.Manager):
def very_popular(self):
return self.filter(
watch_percent__gt=80.0,
rewatched__gte=1000)
def popular(self):
return self.filter(
watch_percent__gt=80.0,
rewatched=0)
def flopped(self):
return self.filter(watch_percent__lt=60.0)
To use this manager, simple add it to your model:
class MoviePopularity(models.Model):
movie = models.OneToOneField(Movie,
on_delete=models.CASCADE,
related_name="popularity")
watch_percent = models.FloatField()
rewatched = models.PositiveIntegerField()
objects = MoviePopularityManager()
class Meta:
db_table = "movie_popularity"
verbose_name_pural = "Movie popularity stats"
As you can see, we replaced the default manager with our custom one. It functions the same as the default one but with our custom methods added.
Check this out:
very_popular = MoviePopularity.objects.very_popular()
popular = MoviePopularity.objects.popular()
flopped = MoviePopularity.objects.flopped()
How do you like that? Personally, I love this solution because all the logic is encapsulated in a single place and you can easily extract the data in a less verbose manner.
These methods are nothing special either. You can do your regular querying with them:
very_popular_a = MoviePopularity.objects.very_popular().filter(
movie__title__startswith="a")
Whenever you need to reduce duplication when querying for objects, try to use Managers to get the job done.
You’re approached by the SJW Union, an organization in charge of recording various infractions on the Internet and subsequently, cancel the individuals who caused them.
They want you to build an app to help them with such recordings.
You’ll record people who cause infractions with this model:
class Person(models.Model):
STATUS_GREEN = 0
STATUS_YELLOW = 1
STATUS_RED = 2
STATUS_CHOICES = (
("Green", STATUS_GREEN),
("Yellow", STATUS_YELLOW),
("Red", STATUS_RED)
)
name = models.CharField(max_length=150)
email = models.EmailField()
phone_number = models.CharField(max_length=100)
twitter_username = models.CharField(max_length=40)
status = models.PositiveSmallIntegerField(choices=STATUS_CHOICES)
def __str__(self):
return f"{self.name}: {self.get_status_display()}"
Later, when a Person’s status is Red, that person will be cancelled. Yellows will be mildly harrassed.
You’ll record infractions with this model:
class Infraction(models.Model):
SEVERITY_LOW = 0
SEVERITY_MEDIUM = 1
SEVERITY_HIGH = 2
SEVERITY_CHOICES = (
("Low", SEVERITY_LOW),
("Medium", SEVERITY_MEDIUM),
("High", SEVERITY_HIGH)
)
person = models.ForeignKey(Person,
on_delete=models.CASCADE,
related_name="infractions")
severity = models.PositiveIntegerField(choices=SEVERITY_CHOICES)
content = models.TextField(help_text=("Describe the infraction"))
def __str__(self):
return (
f"{self.get_severity_display()} severity infraction"
f"by {self.person.name}")
Persons will also be categorized. For example, a Person who causes an infraction against the transsexual community will be categorized as Transphobic.
This is your model:
class Category(models.Model):
name = models.CharField(max_length=70)
def __str__(self):
return self.name
Update your Person model to use this model:
class Person(models.Model):
...
categories = models.ManyToManyField(Category)
...
You’re asked to build a view that will list all Person objects, the list of categories they’re in, as well as the highest level of severity across all of their infractions.
As a naive developer, you do the following:
def persons(request):
ctx = {
"persons": Person.objects.all()
}
return TemplateResponse(request, "persons.html", ctx)
Of course, you need a way to retrieve each person’s worst infraction. What better way than using a model method?
class Person(models.Model):
...
@cached_property
def worst_infraction(self):
return max(self.infractions.all(), key=lambda i: i.severity)
Then in your template, you present the information as follows:
<h1>Persons</h1>
<table>
<thead>
<tr>
<th scope="col">Name</th>
<th scope="col">Categories</th>
<th scope="col">Worst infraction level</th>
</tr>
</thead>
<tbody>
{% for person in persons %}
<tr>
<td>{{ person.name }}</td>
<td>
<ul>
{% for category in person.categories.all %}
<li>{{ category.name }}</li>
{% endfor %}
</ul>
</td>
<td>{{ person.worst_infraction }}</td>
</tr>
{% endfor %}
</tbody>
</table>
Simple enough eh?
Here’s what happens when the view code runs:
Once you start looping through the QuerySet, Django performs a query against your database to fetch a record of all the Person objects in it.
Then for each Person:
These operations look fair and nothing bad seems to be happening, especially if you don’t have many users yet.
However, once you start getting traffic, you’ll find that you need more and more resources to perform these basic tasks. Eventually, it will start to cost a ton of money just to prevent the app from crashing.
Why is that?
Because not only are we performing Python operations, but the database is alo being hit a number of times in parallel.
Wouldn’t it be better if we could reduce the number of operations in Python or against the database? What if we could reduce both?
For each Person, we’re connection to the database to fetch the categories. If we have 1,000 Person objects, 1000 database connections would be required. This isn’t right.
Django allows you to fetch all objects related to a model in one fell swoop using a method called prefetch_related
Check this out:
persons = Person.objects.all().prefetch_related('categories')
for person in persons:
for category in person.categories.all():
print(category.name)
Here’s what’s happening:
If you have 1,000 Persons, instead of connecting to the database 1,000 times like before, you will connect only 2 times.
Pretty good huh?
In our worst_severity method, we’re connecting to the database to bring all of a Person’s infractions into Python memory so that we can determine the maximum value for that Person.
If your app is serving a large number of users, both your memory and CPU usage will be high. If you don’t scale your server up, your app will crash.
Wouldn’t it be great if we could perform this operation in the database itself? As a matter of fact, SQL systems contain a lot of built-in ways to work with data. After all, they were designed for that.
Let’s use Django’s aggregation feature:
from django.db.models import Max
...
def worst_infraction(self):
return self.infractions.aggregate(max_severity=Max('severity'))['max_severity']
...
This aggregation will instruct the database to find the maximum value in the severity column and return it. Django will store this value in a dict as the key max_severity like we specified. We return the value of this key.
Instead of loading entire infractions in memory and then finding the maximum value, we use the database to find that value and return just that — greatly reducing our memory and CPU usage.
The SJW Union needs to prepare a hit list for their next mass cancellation campaign. They need the names of persons who committed more than 3 infractions ASAP.
You can use this query:
hitlist = []
for person in Person.objects.all().prefetch_related('infractions'):
if person.infractions.count() > 3:
hitlist.append(person)
Django will perform a query to fetch all Person records, then a second one to fetch all infractions for every Person object and bundling them up in a single QuerySet to avoid future database connections.
Since we already prefetched the infractions, Django will use len to count the number of infractions instead of using the database to count records. This means that we’re using memory to hold infraction records and then also using CPU to count objects.
We don’t actually care about entire infraction records in this case. We only care about the number of objects. So, what’s the point in loading them into memory?
Instead of loading them up in memory, just count the objects in the database and tell Django to add this information to our QuerySet:
from django.db.models import Count
hitlist = []
for person in Person.objects.all().annotate(infraction_count=Count('infractions'):
if person.infraction_count > 3:
hitlist.append(person)
This will use a single database connection to fetch all Person records while also returning a count of all infraction records for each Person object.
Django will add the count in the infraction_count attribute like we instructed.
We’re now once more delegating the dirty work to the database, which allows us to use less memory and CPU.
However, we’re using a list to hold out hitlist. What if we want to filter that list later on? Say, we want to find everyone who’s name starts with A:
new_list = [person for person in hitlist if person.name.startswith('A')]
As you probably guessed, this is quite inefficient. We should use the database to do this kind of dirty work.
Instead of storing our histlist in a list, why not just use a QuerySet?
hitlist = Person.objects.all()
.annotate(infraction_count=Count('infractions')
.filter(infraction_count__gt=3)
Ah, much better! Later on, we can simply filter on this QuerySet as needed:
hitlist.filter(name__startswith='A')
This is the power of annotations — they act just like a regular model field.
Using the above methods, you’ll be able to optimize database access in 99% of use cases.
You should avoid wasting time with premature optimization because it will prevent you from shipping features.
But how do you know when to optimize and when to move on?
You’ll need 2 tools for that:
The first one, django-debug-toolbar, allows to see CPU usage when rendering templates as well as database queries that get performed when you load a view.
The second one, django-extensions, has a command called shell_plus. If you run this command with the –print-sql flag, it will print SQL statements that get run when you’re building QuerySets.
By using the information provided by these tools, you’ll be able to determine when it’s time to optimize.
As you saw, Django provides you with an ORM that allows you to take full advantage of SQL without having to write any SQL yourself.
In this post, we merely scratched the surface but you should now be fully armed with the knowledge to model any kind of business object thrown at you as well as optimize database access to keep your apps running smoothly even under load.
If you look at the documentation for models , you’ll quickly get overwhelmed by all the ways that Django provides for you to work with data stored in the database.
A good strategy is to avoid reading the documentation until you come across a problem that requires you to use some of these obscure but amazing features.
For example, Django provides an assortment of database functions . Trying to learn about them without a proper use case would be a waste of time. Instead of trying to figure out what Coalesce does, learn about it once you get a task that requires to to build a view that contains no null values.
That’s all. Now it’s your turn. Go and build some great apps.
© Michael Karamuth 2022