awesome-metric-learning 😎 Awesome list about practical Metric Learning and its applications Motivation 🤓 At Qdrant, we have one goal: make metric learning more practical. This listing is in line with this purpose, and we aim at providing a concise yet useful list of awesomeness around metric  learning. It is intended to be inspirational for productivity rather than serve as a full bibliography. If you find it useful or like it in some other way, you may want to join our Discord server, where we are running a paper reading club on metric learning.  Contributing 🤩 If you want to contribute to this project, but don't know how, you may want to check out the contributing guide (/CONTRIBUTING.md). It's easy! 😌 Surveys 📖 ▐ It has proceeding guides for supervised (http://contrib.scikit-learn.org/metric-learn/supervised.html), weakly supervised  ▐ (http://contrib.scikit-learn.org/metric-learn/weakly_supervised.html) and unsupervised (http://contrib.scikit-learn.org/metric-learn/unsupervised.html) metric learning algorithms in  ▐ metric_learn (http://contrib.scikit-learn.org/metric-learn/metric_learn.html) package.  - A comprehensive  study for newcomers. ▐ Factors such as sampling strategies, distance metrics, and network structures are systematically analyzed by comparing the quantitative results of the methods. ▐ It discusses the need for metric learning, old and state-of-the-art approaches, and some real-world use cases. Applications 🎮 ▐ CLIP offers state-of-the-art zero-shot image classification and image retrieval with a natural language query. See demo  ▐ (https://colab.research.google.com/github/openai/clip/blob/master/notebooks/Interacting_with_CLIP.ipynb). ▐ This work achieves zero-shot classification and cross-modal audio retrieval from natural language queries. ▐ It is an open-class object detector to detect any label encoded by CLIP without finetuning. See demo (https://huggingface.co/spaces/akhaliq/Detic). ▐ TensorFlow Hub offers a collection of pretrained models from the paper Large Dual Encoders Are Generalizable Retrievers (https://arxiv.org/abs/2112.07899). ▐ GTR models are first initialized from a pre-trained T5 checkpoint. They are then further pre-trained with a set of community question-answer pairs. Finally, they are fine-tuned on the MS  ▐ Marco dataset. ▐ The two encoders are shared so the GTR model functions as a single text encoder. The input is variable-length English text and the output is a 768-dimensional vector. ▐ The method and pretrained models found in Flair go beyond zero-shot sequence classification and offers zero-shot span tagging abilities for tasks such as named entity recognition and part  ▐ of speech tagging. ▐ It leverages HuggingFace Transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics while keeping important words in the topic descriptions. It supports  ▐ guided, (semi-) supervised, and dynamic topic modeling beautiful visualizations. ▐ Identification of substances based on spectral analysis plays a vital role in forensic science. Similarly, the material identification process is of paramount importance for malfunction  ▐ reasoning in manufacturing sectors and materials research. ▐ This models enables to identify materials with deep metric learning applied to X-Ray Diffraction (XRD) spectrum. Read this post  ▐ (https://towardsdatascience.com/automatic-spectral-identification-using-deep-metric-learning-with-1d-regnet-and-adacos-8b7fb36f2d5f) for more background. ▐ Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic  ▐ concepts and semantics. The repository provides the pretrained models and source code for Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus  ▐ (https://arxiv.org/abs/2201.11313), where they apply several tricks to achieve this. ▐ State-of-the-art methods are incapable of leveraging attributes from different types of items and thus suffer from data sparsity problems because it is quite challenging to represent items  ▐ with different feature spaces jointly. To tackle this problem, they propose a kernel-based neural network, namely deep unified representation (DURation) for heterogeneous recommendation, to ▐ jointly model unified representations of heterogeneous items while preserving their original feature space topology structures. See paper (https://arxiv.org/abs/2201.05861). ▐ It provides the implementation of Item2Vec: Neural Item Embedding for Collaborative Filtering (https://arxiv.org/abs/1603.04259), wrapped as a sklearn estimator compatible with GridSearchCV ▐ and BayesSearchCV for hyperparameter tuning. ▐ You can search for the overall closest fit, or choose to focus matching genre, mood, or instrumentation. ▐ It searches phrase-level answers to your questions in real-time or retrieves passages for downstream tasks. Check out demo (http://densephrases.korea.ac.kr/), or see paper  ▐ (https://arxiv.org/abs/2109.08133). ▐ Instead of leveraging NLI/XNLI, they make use of the text encoder of the CLIP model, concluding from casual experiments that this sometimes gives better accuracy than NLI-based models. ▐ Application of the SimCLR method to musical data with out-of-domain generalization in million-scale music classification. See demo  ▐ (https://spijkervet.github.io/CLMR/examples/clmr-onnxruntime-web/) or paper (https://arxiv.org/abs/2103.09410). Case Studies ✍️ Libraries 🧰 ▐ Quaterion is a framework for fine-tuning similarity learning models. The framework closes the "last mile" problem in training models for semantic search, recommendations, anomaly detection, ▐ extreme classification, matching engines, e.t.c. It is designed to combine the performance of pre-trained models with specialization for the custom task while avoiding slow and costly  ▐ training.  - A library for  sentence-level embeddings.  ▐ Developed on top of the well-known Transformers (https://github.com/huggingface/transformers) library, it provides an easy way to finetune Transformer-based models to obtain sequence-level  ▐ embeddings.  ▐ The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase  ▐ identification.  - A metric learning library in  TensorFlow with a Keras-like API. ▐ It provides support for self-supervised contrastive learning and state-of-the-art methods such as SimCLR, SimSian, and Barlow Twins. ▐ A PyTorch library to train and inference with contextually-keyed word vectors augmented with part-of-speech tags to achieve multi-word queries. ▐ A PyTorch library to efficiently train self-supervised computer vision models with state-of-the-art techniques such as SimCLR, SimSian, Barlow Twins, BYOL, among others. ▐ A library that helps you benchmark pretrained and custom embedding models on tens of datasets and tasks with ease.  - A Python implementation of a number of popular  recommender algorithms.  ▐ It supports incorporating user and item features to the traditional matrix factorization. It represents users and items as a sum of the latent representations of their features, thus  ▐ achieving a better generalization. ▐ It provides efficient multicore and memory-independent implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA),  ▐ Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec. ▐ It provides implementations of algorithms such as KNN, LFM, SLIM, NeuMF, FM, DeepFM, VAE and so on, in order to ensure fair comparison of recommender system benchmarks. Tools ⚒️ ▐ It supports UMAP, T-SNE, PCA, or custom techniques to analyze embeddings of encoders. ▐ It allows you to visualize the embedding space selecting explicitly the axis through algebraic formulas on the embeddings (like king-man+woman) and highlight specific items in the embedding ▐ space. It also supports implicit axes via PCA and t-SNE. See paper (https://arxiv.org/abs/1905.12099). Approximate Nearest Neighbors ⚡ ▐ It provides benchmarking of 20+ ANN algorithms on nine standard datasets with support to bring your dataset. (Medium Post  ▐ (https://medium.com/towards-artificial-intelligence/how-to-choose-the-best-nearest-neighbors-algorithm-8d75d42b16ab?sk=889bc0006f5ff773e3a30fa283d91ee7)) ▐ It is not the fastest ANN algorithm but achieves memory efficiency thanks to various quantization and indexing methods such as IVF, PQ, and IVF-PQ. (Tutorial  ▐ (https://www.pinecone.io/learn/faiss-tutorial/)) ▐ It is still one of the fastest ANN algorithms out there, requiring relatively a higher memory usage. (Paper: Efficient and robust approximate nearest neighbor search using Hierarchical  ▐ Navigable Small World graphs (https://arxiv.org/abs/1603.09320)) ▐ Paper: Accelerating Large-Scale Inference with Anisotropic Vector Quantization (https://arxiv.org/abs/1908.10396) Papers 🔬 Dimensionality Reduction by  Learning an Invariant Mapping ▐ Published by Yann Le Cun et al. (2005), its main focus was on dimensionality reduction. However, the method proposed has excellent properties for metric learning such as preserving  ▐ neighbourhood relationships and generalization to unseen data, and it has extensive applications with a great number of variations ever since. It is advised that you read this great post  ▐ (https://medium.com/@maksym.bekuzarov/losses-explained-contrastive-loss-f8f57fe32246) to better understand its importance for metric learning. ▐ The paper introduces Triplet Loss, which can be seen as the "ImageNet moment" for deep metric learning. It is still one of the state-of-the-art methods and has a great number of  ▐ applications in almost any data modality.  - A novel loss function  with better properties. ▐ It provides scale invariance, robustness against feature variance, and better convergence than Contrastive and Triplet Loss.   ▐ Supervised metric learning without pairs or triplets. ▐ Although it is originally designed for the face recognition task, this loss function achieves state-of-the-art results in many other metric learning problems with a simpler and faster data  ▐ feeding. It is also robust against unclean and unbalanced data when modified with sub-centers and a dynamic margin. VICReg: Variance-Invariance-Covariance Regularization for  Self-Supervised Learning ▐ The paper introduces a method that explicitly avoids the collapse problem in high dimensions with a simple regularization term on the variance of the embeddings along each dimension  ▐ individually. This new term can be incorporated into other methods to stabilize the training and performance improvements. ▐ The paper proposes using the mean centroid representation during training and retrieval for robustness against outliers and more stable features. It further reduces retrieval time and  ▐ storage requirements, making it suitable for production deployments. ▐ It demonstrates among other things that ▐ - composition of data augmentations plays a critical role - Random Crop + Random Color distortion provides the best downstream classifier accuracy, ▐ - introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, ▐ - and Contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. ▐ They also incorporates annotated pairs from natural language inference datasets into their contrastive learning framework in a supervised setting, showing that contrastive learning  ▐ objective regularizes pre-trained embeddings’ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.  ▐ Mining informative negative instances are of central importance to deep metric learning (DML), however this task is intrinsically limited by mini-batch training, where only a mini-batch of  ▐ instances is accessible at each iteration. In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters ▐ are updating throughout the training process. This suggests that the features of instances computed at preceding iterations can be used to considerably approximate their features extracted  ▐ by the current model.  Datasets ℹ️ ▐ Practitioners can use any labeled or unlabelled data for metric learning with an appropriate method chosen. However, some datasets are particularly important in the literature for  ▐ benchmarking or other ways, and we list them in this section.  - The Stanford Natural Language Inference Corpus,  serving as a useful benchmark.  ▐ The dataset contains pairs of sentences labeled as contradiction, entailment, and neutral regarding semantic relationships. Useful to train semantic search models in metric learning. ▐ Modeled on the SNLI corpus, the dataset contains sentence pairs from various genres of spoken and written text, and it also offers a distinctive cross-genre generalization evaluation. ▐ Shared as a part of a Kaggle competition by Google, this dataset is more diverse and thus more interesting than the first version. ▐ The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. ▐ The dataset is published along with "Deep Metric Learning via Lifted Structured Feature Embedding" (https://github.com/rksltnl/Deep-Metric-Learning-CVPR16) paper. ▐ The dataset is published along with "The 2021 Image Similarity Dataset and Challenge" (http://arxiv.org/abs/2106.09672) paper.