How AI and Machine Learning are Reshaping CRM Data Handling

Customer data plays an important role lately in business success and maintaining accurate as well as clean data in CRM systems has always been a challenge. Machine learning is revolutionizing managing customer data for businesses as it is bringing automation and intelligence to data quality management. Traditional CRM systems often struggle with issues like outdated […]

AI Careers in 2025: Insights on Salaries, Popular Roles, Trends

Artificial intelligence (AI) is shaping the future of industries and simultaneously also creating some of the most exciting job opportunities. The latest report from TRG Datacentres makes one thing clear and it is that the AI-related jobs are where the action is. Machine learning (ML) engineering tops the list among all the roles examined as […]

How AI, Machine Learning Are Driving Big Data Security Software Growth

The Big Data Network Security Software market is rising and the trend is towards a more secure digital future. The market is expected to grow at a compound annual growth rate (CAGR) of 13.77% from 2024 to 2031. This means it will move from $8.84 billion in 2024 to nearly $19.17 billion by 2031. Why […]

Top 5 Trending ML Courses on Courser

Machine learning has emerged as an important skill for a plethora of professionals and enthusiasts in recent years. Understanding the requirement, Coursera has emerged as a popular online learning platform and is offering a range of courses to help individuals deepen their understanding of machine learning concepts and its applications as well. Let us explore […]

9 Real-World Problems that can be Solved by Machine Learning

Data is considered the new oil today and machine learning has emerged as a powerful tool that is capable of extracting valuable insights as well as driving impactful solutions across various sectors. Understanding its applications and implications becomes important as organizations increasingly use the capabilities of machine learning. Machine learning enables systems to learn from […]

How to Use Machine Learning for Weather Predictions

Machine learning (ML) is the latest buzz. In fact, it has been a buzz for a year or two from now. Everyone and everywhere are talking about its capabilities. Now, it is at the forefront when ChatGPT and other generative AI tools are helping the mass in drafting emails and solving homework. It is also […]

Machine Learning Model Created to Predict Partner Violence Using Big Data

Researchers at the Centre for Social and Behaviour Change at Ashoka University have developed a unique machine learning (ML) model to can predict intimate partner violence (IPV). It is based on utilizing large datasets and algorithms like random forest. It aims to identify individuals who are at higher risk of domestic violence. It is capable […]

Advanced ML Model Identifies Overheating in Lithium-Ion Batteries, Revolutionizes EV Safety

A team of researchers at the University of Arizona has come up with something promising and claims to revolutionize the electric vehicle (EV) safety aspect by predicting as well as preventing dangerous temperature spikes in lithium-ion batteries. They have developed an innovative machine learning (ML) model under the guidance of doctoral student Basab Goswami. The […]

Building a Local Face Search Engine — A Step by Step Guide

Building a Local Face Search Engine — A Step by Step Guide

Part 1: on face embeddings and how to run face search on the fly

Sample demonstration of face recognition and search for the cast of “The Office”

In this entry (Part 1) we’ll introduce the basic concepts for face recognition and search, and implement a basic working solution purely in Python. At the end of the article you will be able to run arbitrary face search on the fly, locally on your own images.

In Part 2 we’ll scale the learning of Part 1, by using a vector database to optimize interfacing and querying.

Face matching, embeddings and similarity metrics.

The goal: find all instances of a given query face within a pool of images.
Instead of limiting the search to exact matches only, we can relax the criteria by sorting results based on similarity. The higher the similarity score, the more likely the result to be a match. We can then pick only the top N results or filter by those with a similarity score above a certain threshold.

Example of matches sorted by similarity (descending). First entry is the query face.

To sort results, we need a similarity score for each pair of faces <Q, T> (where Q is the query face and T is the target face). While a basic approach might involve a pixel-by-pixel comparison of cropped face images, a more powerful and effective method uses embeddings.

An embedding is a learned representation of some input in the form of a list of real-value numbers (a N-dimensional vector). This vector should capture the most essential features of the input, while ignoring superfluous aspect; an embedding is a distilled and compacted representation.
Machine-learning models are trained to learn such representations and can then generate embeddings for newly seen inputs. Quality and usefulness of embeddings for a use-case hinge on the quality of the embedding model, and the criteria used to train it.

In our case, we want a model that has been trained to maximize face identity matching: photos of the same person should match and have very close representations, while the more faces identities differ, the more different (or distant) the related embeddings should be. We want irrelevant details such as lighting, face orientation, face expression to be ignored.

Once we have embeddings, we can compare them using well-known distance metrics like cosine similarity or Euclidean distance. These metrics measure how “close” two vectors are in the vector space. If the vector space is well structured (i.e., the embedding model is effective), this will be equivalent to know how similar two faces are. With this we can then sort all results and select the most likely matches.

https://medium.com/media/8929d6d8077c7300dfa5acc29dba739b/href

Implement and Run Face Search

Let’s jump on the implementation of our local face search. As a requirement you will need a Python environment (version ≥3.10) and a basic understanding on the Python language.

For our use-case we will also rely on the popular Insightface library, which on top of many face-related utilities, also offers face embeddings (aka recognition) models. This library choice is just to simplify the process, as it takes care of downloading, initializing and running the necessary models. You can also go directly for the provided ONNX models, for which you’ll have to write some boilerplate/wrapper code.

First step is to install the required libraries (we advise to use a virtual environment).

pip install numpy==1.26.4 pillow==10.4.0 insightface==0.7.3

The following is the script you can use to run a face search. We commented all relevant bits. It can be run in the command-line by passing the required arguments. For example

 python run_face_search.py -q "./query.png" -t "./face_search"

https://medium.com/media/bcb2ba4a20ed239be8ffbfa61be89259/href

The query arg should point to the image containing the query face, while the target arg should point to the directory containing the images to search from. Additionally, you can control the similarity-threshold to account for a match, and the minimum resolution required for a face to be considered.

The script loads the query face, computes its embedding and then proceeds to load all images in the target directory and compute embeddings for all found faces. Cosine similarity is then used to compare each found face with the query face. A match is recorded if the similarity score is greater than the provided threshold. At the end the list of matches is printed, each with the original image path, the similarity score and the location of the face in the image (that is, the face bounding box coordinates). You can edit this script to process such output as needed.

Similarity values (and so the threshold) will be very dependent on the embeddings used and nature of the data. In our case, for example, many correct matches can be found around the 0.5 similarity value. One will always need to compromise between precision (match returned are correct; increases with higher threshold) and recall (all expected matches are returned; increases with lower threshold).

What’s Next?

And that’s it! That’s all you need to run a basic face search locally. It is quite accurate, and can be run on the fly, but it doesn’t provide optimal performances. Searching from a large set of images will be slow and, more important, all embeddings will be recomputed for every query. In the next post we will improve on this setup and scale the approach by using a vector database.

Want to Connect?
You can catch a glimpse of my latest experiments and explanations on Twitter or Threads and see my graphics results on Instagram.


Building a Local Face Search Engine — A Step by Step Guide was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

LLNL Scientists Use Machine Learning to Probe Carbon Capture at Atomic Level

Climate change is posing a critical threat to the planet. Innovative approaches are required to reduce the emissions of carbon dioxide (CO2) that leads to climate change. The Lawrence Livermore National Laboratory (LLNL) has lately come up with a significant innovation in the field. It has developed a machine learning model to unravel the complexities […]