Lightning-Fast Search with MeiliSearch and Python

Lightning-Fast Search with MeiliSearch and Python

·

8 min read

What is MeiliSearch?

Meilisearch is a RESTful search API. It aims to be a ready-to-go solution for everyone who wants a fast and relevant search experience for their end-users ⚡️🔎

Step 1: Install the dependencies

Let’s install the necessary dependencies. Run the below command and install MeiliSearch.

pip3 install meilisearch

Step 2: Run Meilisearch

Download and run Meilisearch Instance.

# Install Meilisearch
curl -L https://install.meilisearch.com | sh
# Launch Meilisearch
./meilisearch --master-key=masterKey

Step 3: Code with Python and MeiliSearch

Let’s create a Basic Python script that demonstrates how to use MeiliSearch for searching.

import meilisearch

client = meilisearch.Client('http://127.0.0.1:7700', 'masterKey')

# An index is where the documents are stored.
index = client.index('movies')

documents = [
      { 'id': 1, 'title': 'Carol', 'genres': ['Romance', 'Drama'] },
      { 'id': 2, 'title': 'Wonder Woman', 'genres': ['Action', 'Adventure'] },
      { 'id': 3, 'title': 'Life of Pi', 'genres': ['Adventure', 'Drama'] },
      { 'id': 4, 'title': 'Mad Max: Fury Road', 'genres': ['Adventure', 'Science Fiction'] },
      { 'id': 5, 'title': 'Moana', 'genres': ['Fantasy', 'Action']},
      { 'id': 6, 'title': 'Philadelphia', 'genres': ['Drama'] },
]

# If the index 'movies' does not exist, Meilisearch creates it when you first add the documents.
index.add_documents(documents) # => { "uid": 0 }

Documents: A document is like a container for data, made up of different parts called fields. Each field has a name (attribute) and something it holds (value). Think of documents as the building blocks of a Meilisearch database. To find a document, you have to put it into a special storage area called an index.

Indexes: An index is like a folder for documents with some rules. You can think of it as a table in SQL or a collection in MongoDB.

An index has a name (uid) and has three important things inside it:

  1. A special key that helps identify documents (like a unique ID).

  2. Customizable settings, which are like instructions for how the index works.

  3. It can hold as many documents as you want.

Adding Search Capabilities to Your Data Using MeiliSearch

Now that we have a better understanding of how MeiliSearch operates, let’s tackle a straightforward problem: working with a CSV file containing information about various places.

This CSV file includes several fields, such as name, is_premium, hashtags, and languages.

Step 1: Index creation

First we need to create a index which will store our places data.

import meilisearch

# Define a function to get or create a Meilisearch index
def get_or_create_index(client: meilisearch.Client, index_name: str) -> Index:
    try:
        return client.get_index(index_name)
    except MeilisearchApiError:
        client.create_index(index_name)
    return client.index(index_name)

if __name__ == "__main__":
    index_name = "places"
    client = meilisearch.Client('http://127.0.0.1:7700', 'masterKey')
    index = get_or_create_index(client, index_name)

Step 2: Create Documents

Now, let’s create documents for your placesindex.

To effectively manage our places data, we need to define a schema. This step will ensure that our data is structured and organized for efficient indexing and searching.

{
    name: str
    is_premium: bool
    prebooking_required: bool
    hashtags: List[str]
    languages: List[str]
}

Now with the given schema let’s create documents.

import ast
import csv

import meilisearch
import time

# Define the CSV file and database connection
csv_file = "data.csv"


def place_schema(
    name: str,
    is_premium: bool,
    prebooking_required: bool,
    hashtags: list,
    languages: list,
):
    place_document = {
        "name": name,
        "is_premium": is_premium,
        "prebooking_required": prebooking_required,
        "hashtags": [hashtag for hashtag in hashtags],
        "languages": [language for language in languages],
    }
    return place_document


def index_places(index):
    with open(csv_file, "r") as csvfile:
        csvreader = csv.DictReader(csvfile)

        documents = []

        for row in csvreader:
            name = row["name"]
            is_premium = bool(row["is_premium"])
            prebooking_required = bool(row["prebooking_required"])
            hashtags = ast.literal_eval(row["hashtags"])
            languages = ast.literal_eval(row["languages"])

            documents.append(
                place_schema(
                    name=name,
                    is_premium=is_premium,
                    prebooking_required=prebooking_required,
                    hashtags=hashtags,
                    languages=languages,
                )
            )

        task = index.add_documents(documents)
        task_id = task.task_uid

        task_status = index.get_task(task_id)
        while task_status.status not in ["succeeded", "failed"]:
            time.sleep(1)
            task_status = index.get_task(task_id)

        print(f"Task Status: {task_status.status}")
        if task_status.status == "failed":
            print(f"Error message: {task_status.error['message']}")

def get_or_create_index(client: meilisearch.Client, index_name: str):
        client.create_index(index_name)
        return client.index(index_name)


if __name__ == "__main__":
    index_name = "places"
    client = meilisearch.Client("http://127.0.0.1:7700", "masterKey")
    indexes = client.get_indexes()
    index = get_or_create_index(client, index_name)
    index_places(index=index)

In the index_places function, we are responsible for two crucial tasks:

  1. Creating a List of Documents: First, you collect all the documents that need to be indexed into a list called documents. These documents are structured according to the schema we have defined earlier. Each document is a Python dictionary with fields like name, is_premium, prebooking_required, hashtags, and languages.

  2. Adding Documents to the ‘places’ Index: Once you have your list of documents, you use MeiliSearch’s add_documents method to add these documents to your 'places' index. This method returns a task object, and you extract the task_uid from it.

  3. Monitoring Task Status: The unique identifier (task_uid) is essential for tracking the progress of the task. In this case, the task is to add documents to the index. We use a while loop to repeatedly check the status of the task until it either succeeds or fails. The status can be one of "succeeded" or "failed."

  4. Error Handling: If the task status is “failed,” you retrieve the error message to understand why the operation didn’t succeed. This is a critical step in debugging and ensuring that your indexing process works smoothly.

Let’s proceed by executing the script to generate an index and populate it with documents.

Oops! It seems we’ve encountered an issue.

Task Status: failed
Error message: The primary key inference failed as the engine did not 
find any field ending with `id` in its name. Please specify the primary key 
manually using the `primaryKey` query parameter.

That’s right, we haven’t defined any primary key, and there is no ‘id’ field. Let’s fix this by adding a primary field to our script.

Here’s the updated code:

def get_or_create_index(client: meilisearch.Client, index_name: str):
        client.create_index(index_name, {"primaryKey": "name"})
        return client.index(index_name)

Now! Lets run the script again.

Task Status: failed
Error message: Document identifier `"Climb Central Delhi"` is invalid.
A document identifier can be of type integer or string,
only composed of alphanumeric characters (a-z A-Z 0-9), hyphens (-) and
underscores (_).

Okay, it seems our document identifier format is not suitable. Let’s create a function to convert the name into the required format.

def make_primary_document_identifier(string):
    string = re.sub(r"[^a-zA-Z0-9]+", "_", string)
    string = string.lower()
    return string


def place_schema(
    name: str,
    is_premium: bool,
    prebooking_required: bool,
    hashtags: list,
    languages: list,
):
    place_document = {
        "name": make_primary_document_identifier(string=name),
        "is_premium": is_premium,
        "prebooking_required": prebooking_required,
        "hashtags": [hashtag for hashtag in hashtags],
        "languages": [language for language in languages],
    }
    return place_document

Now! Lets run the script again.

Task Status: succeeded

Nice!

Now, you can visit http://localhost:7700/ to see your index. You've successfully created an index and added documents to it.

Let’s start by creating a Python script named search.py. This script will include a search function that accepts a search query and returns the search results. Below is the code for search.py:

#search.py
import argparse

import meilisearch


def search(search_string):
    return index.search(search_string)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--query", help="Search query", default=None)
    args = parser.parse_args()

    index_name = "places"
    client = meilisearch.Client("http://127.0.0.1:7700", "masterKey")
    index = client.get_index(index_name)
    if args.query is not None:
        print(search(args.query))
    else:
        print(search(""))

Let’s run the search script:

python search.py climb

Result:

{
  "hits": [
    {
      "name": "climb_central_delhi",
      "is_premium": true,
      "prebooking_required": true,
      "hashtags": [
        "chilled",
        "ecofriendly",
        "wellness",
        "yoga",
        "familyfriendly",
        "familyrun",
        "lively",
        "cycling",
        "organic",
        "party"
      ],
      "languages": [
        "English"
      ]
    }
  ],
  "query": "climb",
  "processingTimeMs": 18,
  "limit": 20,
  "offset": 0,
  "estimatedTotalHits": 1
}

Step 4: Filtering

The first step is to enable Meilisearch to filter places by hashtags .To do this, we need to add hashtagsto the list of filterable attributes. This allows us to perform precise searches based on hashtags

Here’s the code snippet for adding or updating filterable attributesin your Python script:

def add_or_update_filterable_attributes(index):
    index.update_filterable_attributes(["hashtags"])


if __name__ == "__main__":
    index_name = "places"
    client = meilisearch.Client("http://127.0.0.1:7700", "masterKey")
    indexes = client.get_indexes()
    index = get_or_create_index(client, index_name)
    add_or_update_filterable_attributes(index)

Make sure to run the script to update the filter attributes.

In this example, we’ll search for places with the party hashtag.

import argparse

import meilisearch


def search(search_string):
    return index.search(
        search_string,
        {
           "filter": ["hashtags = party"]
        },
    )


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--query", help="Search query", default=None)
    args = parser.parse_args()

    index_name = "places"
    client = meilisearch.Client("http://127.0.0.1:7700", "masterKey")
    index = client.get_index(index_name)
    if args.query is not None:
        print(search(args.query))
    else:
        print(search(""))

Let’s run the search script:

python search.py

Result:

Upon running the search script with the partyquery, you’ll receive a response with places that have partyas one of their hashtags. Here’s an example of the response:

{
   "hits":[
      {
         "name":"climb_central_delhi",
         "is_premium":true,
         "prebooking_required":true,
         "hashtags":[
            "chilled",
            "ecofriendly",
            "wellness",
            "party"
         ],
         "languages":[
            "English"
         ]
      },
      {
         "name":"akbran_tour",
         "is_premium":true,
         "prebooking_required":true,
         "hashtags":[
            "daytours",
            "walking",
            "party"
         ],
         "languages":[
            "English",
            "Spanish",
            "Russian",
            "French",
            "German"
         ]
      },
      {
         "name":"1_karbala_rd",
         "is_premium":true,
         "prebooking_required":true,
         "hashtags":[
            "crafts",
            "boats",
            "party"
         ],
         "languages":[
            "English"
         ]
      }
   ],
   "query":"",
   "processingTimeMs":5,
   "limit":3,
   "offset":0,
   "estimatedTotalHits":3
}

Conclusion

In this article, we saw how to make your search functions lightning-fast using MeiliSearch and Python. This powerful tool enhances search capabilities in your projects, making it a valuable addition for developers looking to provide efficient search functionality for their users.

But here’s the exciting part: MeiliSearch has more advanced features we haven’t discussed yet. It can do even more amazing things to enhance your search capabilities. In the next article, we’ll explore these advanced features and show you how to use them to improve your search experience.

If you found this article helpful, do share it with your peers. Feedback and Suggestions to the article are most welcome! ❤️