Search in Django

We go through the various ways through which you can add search functionality to your django project.

Posted by Matthew on Nov 03, 2019 in Django

django python

5 min read

Today we are going to be talking about Full Text Search in Django. Now even though Django is a batteries included web framework, it does not come with an inbuilt search functionality, there isn't one perfect way to implement search in your web application provided by it, there are many different ways in which it can be done, today we are gonna be talking about a few of those methods, let's start by first listing out the methods which we can utilize, I will be giving you a brief overview of each method as well:

  • Basic : When you have a small scale app and you just wanna implement a search view that can be used to filter out the items based on user's query without any added intelligence, you can utilize the filter function provided within the Django ORM itself, or there's another thing called Q object present in it, which you can use for that purpose as well. We'll discuss the two in the subsequent sections.

  • Full Text Search: Django provides Full Text Search ability included in the django.contrib.postgres module, now going by its name you must have guessed it already that it only works when you use PostgreSQL as your database backend. Don't be disappointed yet, there are other solutions available as well for other database providers, but they won't be built into the Django itself, there are various well maintained third party packages available for that.

    If you're a beginner to the field of search, you might be wondering what this Full Text Search means, don't worry yet, we'll get to that point soon enough.

  • Hosted Solutions : Many external services such as Algolia, Swifttype, and many others provide the search as a service, which provides additional benefits such as search analytics, allows you to search in different ways for example to find titles that contain multiple words but not in the same order as the query, words only a certain distance apart, words slightly misspelled, phonetic searches etc, apart from the obvious one which is speed.

  • External Tools: These are the full-blown services providing search ability that you would have to configure and run on your own. Some examples of it are ElasticSearch and Solr both of them are Lucene based.

Today, we will only be talking about the first two methods, since I want to keep this beginner-friendly. So let's jump right into it.

Full Text Search using Postgres

Let us start by defining Full Text Search itself, 

Textual search operators have existed in databases for years, PostgreSQL has ~, ~*, LIKE, ILIKE operators for textual data types, but they lack many essential properties that you would need to include in your search for it to be as magical as the ones you've seen before, which are:

  • There is no linguistic support, even for English. Regular expressions are not sufficient because they cannot easily handle derived words, e.g., categories and category. You might miss documents that contain categories, although you probably would like to find them when searching for category. One thing you might say is, that you can just use OR to search for multiple derived forms, but it is tedious and error-prone (some words have several thousand derivatives).
  • They provide no ordering (ranking) of search results, which makes them ineffective when thousands of matching documents are found.
  • They tend to be slow because there is no index support, so they must process all documents for every search.

Full text searching allows documents to be preprocessed and be saved as an index for later rapid searching. Preprocessing includes:

  1. Parsing documents into tokens. It is useful to identify various classes of tokens, e.g., numbers, words, complex words, email addresses, so that they can be processed differently.

    PostgreSQL uses a parser to perform this step. A standard parser is provided, and custom parsers can be created for specific needs.

  2. Converting tokens into lexemes. A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case and often involves the removal of suffixes (such as 's' or 'es' in English). This allows searches to find variant forms of the same word, without tediously entering all possible variants. Also, this step typically eliminates stop words, which are words that are so common that they are useless for searching (such as the word 'the' in English).

    PostgreSQL uses dictionaries to perform this step. Various standard dictionaries are provided, and custom ones can be created for specific needs.

  3. Storing preprocessed documents optimized for searching. For example, each document can be represented as a sorted array of normalized lexemes.

    A data type tsvector is provided for storing preprocessed documents, along with a type tsquery for representing processed queries.

I know you all have been eagerly waiting for this moment by now, let's jump right into the code now.

Let's start by defining our models on which we will be working from here on, it's a Post model as in Blog Post which is defined as:

from django.db import models

class Post(models.Model):
	title = models.CharField(max_length=100)
	overview = models.CharField(max_length=200)
	content = models.TextField()

To be able to utilize full text search feature of postgres, we will need to add the following to INSTALLED_APPS within our settings.py

# blog/settings.py
INSTALLED_APPS = [
    ...
    'django.contrib.postgres', # new
]

 

Querying Single Field

Now the simplest way to do search is to search a single term against a single column. For e.g. :

Post.objects.filter(title__search='coding')

This will perform full text search behind the scenes and return the list of results that have the matching title. But you might have noticed something, it only searches against a single field which seems to be rather limiting. 

 

Querying Multiple Fields

The Post object we have been querying against also contains the field named overview . To query against both the fields, we will need to use SearchVector.

from django.contrib.postgres.search import SearchVector
>>> Post.objects.annotate(search=SearchVector('title', 'overview')).filter(search='vortex')

 

Preprocessing the user query

By default the order of the words present in the query is not relevant, i.e. it performs a keyword-based search, but if you want to find the items containing the text in the exact order as present in the query, then you will need to perform a phrase search, which can be done as follows :

from django.contrib.postgress.search import SearchQuery
>>> SearchQuery('red tomato', search_type='phrase')

If you do not pass the search_type argument, it defaults to 'plain', which results in keyword-based search.

>>> SearchQuery('red tomato') # two keywords
>>> SearchQuery('tomato red') # same results as above

It can be used to perform even more advanced operations such as:

>>> SearchQuery('foo') | SearchQuery('bar') # will search for either foo or bar
>>> SearchQuery('foo') & SearchQuery('bar') # will search for both


Share on: