Files
awesome-awesomeness/terminal/metriclearning6
2024-04-20 19:22:54 +02:00

51 KiB

awesome-metric-learning
😎 Awesome list about practical Metric Learning and its applications
 
Motivation 🤓
At Qdrant, we have one goal: make metric learning more practical. This listing is in line with this purpose, and we aim at providing a concise yet useful list of awesomeness around metric
learning. It is intended to be inspirational for productivity rather than serve as a full bibliography.
 
If you find it useful or like it in some other way, you may want to join our Discord server, where we are running a paper reading club on metric learning.
 
 
 
 
 
 
Contributing 🤩
If you want to contribute to this project, but don't know how, you may want to check out the contributing guide (/CONTRIBUTING.md). It's easy! 😌
 
 
Surveys 📖
 
 
 
 
It has proceeding guides for supervised (http://contrib.scikit-learn.org/metric-learn/supervised.html), weakly supervised
(http://contrib.scikit-learn.org/metric-learn/weakly_supervised.html) and unsupervised (http://contrib.scikit-learn.org/metric-learn/unsupervised.html) metric learning algorithms in
metric_learn (http://contrib.scikit-learn.org/metric-learn/metric_learn.html) package.
 
 
 
- A comprehensive
study for newcomers.
 
Factors such as sampling strategies, distance metrics, and network structures are systematically analyzed by comparing the quantitative results of the methods.
 
 
 
 
 
It discusses the need for metric learning, old and state-of-the-art approaches, and some real-world use cases.
 
 
 
 
 
 
 
 
 
 
 
 
 
Applications 🎮
 
 
 
 
CLIP offers state-of-the-art zero-shot image classification and image retrieval with a natural language query. See demo
(https://colab.research.google.com/github/openai/clip/blob/master/notebooks/Interacting_with_CLIP.ipynb).
 
 
 
 
 
This work achieves zero-shot classification and cross-modal audio retrieval from natural language queries.
 
 
 
 
 
It is an open-class object detector to detect any label encoded by CLIP without finetuning. See demo (https://huggingface.co/spaces/akhaliq/Detic).
 
 
 
 
 
TensorFlow Hub offers a collection of pretrained models from the paper Large Dual Encoders Are Generalizable Retrievers (https://arxiv.org/abs/2112.07899).
GTR models are first initialized from a pre-trained T5 checkpoint. They are then further pre-trained with a set of community question-answer pairs. Finally, they are fine-tuned on the MS
Marco dataset.
The two encoders are shared so the GTR model functions as a single text encoder. The input is variable-length English text and the output is a 768-dimensional vector.
 
 
 
 
 
The method and pretrained models found in Flair go beyond zero-shot sequence classification and offers zero-shot span tagging abilities for tasks such as named entity recognition and part
of speech tagging.
 
 
 
 
 
It leverages HuggingFace Transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics while keeping important words in the topic descriptions. It supports
guided, (semi-) supervised, and dynamic topic modeling beautiful visualizations.
 
 
 
 
 
Identification of substances based on spectral analysis plays a vital role in forensic science. Similarly, the material identification process is of paramount importance for malfunction
reasoning in manufacturing sectors and materials research.
This models enables to identify materials with deep metric learning applied to X-Ray Diffraction (XRD) spectrum. Read this post
(https://towardsdatascience.com/automatic-spectral-identification-using-deep-metric-learning-with-1d-regnet-and-adacos-8b7fb36f2d5f) for more background.
 
 
 
 
 
Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic
concepts and semantics. The repository provides the pretrained models and source code for Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus
(https://arxiv.org/abs/2201.11313), where they apply several tricks to achieve this.
 
 
 
 
 
 
 
 
 
 
State-of-the-art methods are incapable of leveraging attributes from different types of items and thus suffer from data sparsity problems because it is quite challenging to represent items
with different feature spaces jointly. To tackle this problem, they propose a kernel-based neural network, namely deep unified representation (DURation) for heterogeneous recommendation, to
jointly model unified representations of heterogeneous items while preserving their original feature space topology structures. See paper (https://arxiv.org/abs/2201.05861).
 
 
 
 
 
It provides the implementation of Item2Vec: Neural Item Embedding for Collaborative Filtering (https://arxiv.org/abs/1603.04259), wrapped as a sklearn estimator compatible with GridSearchCV
and BayesSearchCV for hyperparameter tuning.
 
 
 
 
 
You can search for the overall closest fit, or choose to focus matching genre, mood, or instrumentation.
 
 
 
 
 
It searches phrase-level answers to your questions in real-time or retrieves passages for downstream tasks. Check out demo (http://densephrases.korea.ac.kr/), or see paper
(https://arxiv.org/abs/2109.08133).
 
 
 
 
 
Instead of leveraging NLI/XNLI, they make use of the text encoder of the CLIP model, concluding from casual experiments that this sometimes gives better accuracy than NLI-based models.
 
 
 
 
 
Application of the SimCLR method to musical data with out-of-domain generalization in million-scale music classification. See demo
(https://spijkervet.github.io/CLMR/examples/clmr-onnxruntime-web/) or paper (https://arxiv.org/abs/2103.09410).
 
 
Case Studies ✍️
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Libraries 🧰
 
 
 
 
Quaterion is a framework for fine-tuning similarity learning models. The framework closes the "last mile" problem in training models for semantic search, recommendations, anomaly detection,
extreme classification, matching engines, e.t.c. It is designed to combine the performance of pre-trained models with specialization for the custom task while avoiding slow and costly
training.
 
 
 
- A library for
sentence-level embeddings.
 
Developed on top of the well-known Transformers (https://github.com/huggingface/transformers) library, it provides an easy way to finetune Transformer-based models to obtain sequence-level
embeddings.
 
 
 
 
 
 
 
 
 
 
The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase
identification.
 
 
 
 
 
 
 
 
- A metric learning library in
TensorFlow with a Keras-like API.
 
It provides support for self-supervised contrastive learning and state-of-the-art methods such as SimCLR, SimSian, and Barlow Twins.
 
 
 
 
 
A PyTorch library to train and inference with contextually-keyed word vectors augmented with part-of-speech tags to achieve multi-word queries.
 
 
 
 
 
A PyTorch library to efficiently train self-supervised computer vision models with state-of-the-art techniques such as SimCLR, SimSian, Barlow Twins, BYOL, among others.
 
 
 
 
 
A library that helps you benchmark pretrained and custom embedding models on tens of datasets and tasks with ease.
 
 
 
- A Python implementation of a number of popular
recommender algorithms.
 
It supports incorporating user and item features to the traditional matrix factorization. It represents users and items as a sum of the latent representations of their features, thus
achieving a better generalization.
 
 
 
 
 
It provides efficient multicore and memory-independent implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA),
Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec.
 
 
 
 
 
It provides implementations of algorithms such as KNN, LFM, SLIM, NeuMF, FM, DeepFM, VAE and so on, in order to ensure fair comparison of recommender system benchmarks.
 
 
 
Tools ⚒️
 
 
 
 
It supports UMAP, T-SNE, PCA, or custom techniques to analyze embeddings of encoders.
 
 
 
 
 
It allows you to visualize the embedding space selecting explicitly the axis through algebraic formulas on the embeddings (like king-man+woman) and highlight specific items in the embedding
space. It also supports implicit axes via PCA and t-SNE. See paper (https://arxiv.org/abs/1905.12099).
 
 
 
 
 
 
 
 
 
Approximate Nearest Neighbors ⚡
 
 
 
It provides benchmarking of 20+ ANN algorithms on nine standard datasets with support to bring your dataset. (Medium Post
(https://medium.com/towards-artificial-intelligence/how-to-choose-the-best-nearest-neighbors-algorithm-8d75d42b16ab?sk=889bc0006f5ff773e3a30fa283d91ee7))
 
 
 
 
 
It is not the fastest ANN algorithm but achieves memory efficiency thanks to various quantization and indexing methods such as IVF, PQ, and IVF-PQ. (Tutorial
(https://www.pinecone.io/learn/faiss-tutorial/))
 
 
 
 
 
It is still one of the fastest ANN algorithms out there, requiring relatively a higher memory usage. (Paper: Efficient and robust approximate nearest neighbor search using Hierarchical
Navigable Small World graphs (https://arxiv.org/abs/1603.09320))
 
 
 
 
 
Paper: Accelerating Large-Scale Inference with Anisotropic Vector Quantization (https://arxiv.org/abs/1908.10396)
 
 
 
Papers 🔬
 
Dimensionality Reduction by
Learning an Invariant Mapping
 
Published by Yann Le Cun et al. (2005), its main focus was on dimensionality reduction. However, the method proposed has excellent properties for metric learning such as preserving
neighbourhood relationships and generalization to unseen data, and it has extensive applications with a great number of variations ever since. It is advised that you read this great post
(https://medium.com/@maksym.bekuzarov/losses-explained-contrastive-loss-f8f57fe32246) to better understand its importance for metric learning.
 
 
 
 
 
The paper introduces Triplet Loss, which can be seen as the "ImageNet moment" for deep metric learning. It is still one of the state-of-the-art methods and has a great number of
applications in almost any data modality.
 
 
 
 
 
 
 
 
- A novel loss function
with better properties.
 
It provides scale invariance, robustness against feature variance, and better convergence than Contrastive and Triplet Loss.
 
 
 

Supervised metric learning without pairs or triplets.
 
Although it is originally designed for the face recognition task, this loss function achieves state-of-the-art results in many other metric learning problems with a simpler and faster data
feeding. It is also robust against unclean and unbalanced data when modified with sub-centers and a dynamic margin.
 
 
 
 
 
 
 
 
VICReg: Variance-Invariance-Covariance Regularization for
Self-Supervised Learning
 
The paper introduces a method that explicitly avoids the collapse problem in high dimensions with a simple regularization term on the variance of the embeddings along each dimension
individually. This new term can be incorporated into other methods to stabilize the training and performance improvements.
 
 
 
 
 
The paper proposes using the mean centroid representation during training and retrieval for robustness against outliers and more stable features. It further reduces retrieval time and
storage requirements, making it suitable for production deployments.
 
 
 
 
 
 
 
 
 
 
It demonstrates among other things that
- composition of data augmentations plays a critical role - Random Crop + Random Color distortion provides the best downstream classifier accuracy,
- introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations,
- and Contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
 
 
 
 
 
They also incorporates annotated pairs from natural language inference datasets into their contrastive learning framework in a supervised setting, showing that contrastive learning
objective regularizes pre-trained embeddings’ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.
 
 
 
 
 
 
 
 
 
 
 
 
 
Mining informative negative instances are of central importance to deep metric learning (DML), however this task is intrinsically limited by mini-batch training, where only a mini-batch of
instances is accessible at each iteration. In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters
are updating throughout the training process. This suggests that the features of instances computed at preceding iterations can be used to considerably approximate their features extracted
by the current model.
 
 
 
Datasets ℹ️
Practitioners can use any labeled or unlabelled data for metric learning with an appropriate method chosen. However, some datasets are particularly important in the literature for
benchmarking or other ways, and we list them in this section.
 
 
- The Stanford Natural Language Inference Corpus,
serving as a useful benchmark.
 
The dataset contains pairs of sentences labeled as contradiction, entailment, and neutral regarding semantic relationships. Useful to train semantic search models in metric learning.
 
 
 
 
 
Modeled on the SNLI corpus, the dataset contains sentence pairs from various genres of spoken and written text, and it also offers a distinctive cross-genre generalization evaluation.
 
 
 
 
 
Shared as a part of a Kaggle competition by Google, this dataset is more diverse and thus more interesting than the first version.
 
 
 
 
 
The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
 
 
 
 
 
The dataset is published along with "Deep Metric Learning via Lifted Structured Feature Embedding" (https://github.com/rksltnl/Deep-Metric-Learning-CVPR16) paper.
 
 
 
 
 
The dataset is published along with "The 2021 Image Similarity Dataset and Challenge" (http://arxiv.org/abs/2106.09672) paper.