Files
awesome-awesomeness/html/metriclearning.html
2025-07-18 23:13:11 +02:00

766 lines
28 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h1 id="awesome-metric-learning">awesome-metric-learning</h1>
<p>😎 Awesome list about practical Metric Learning and its
applications</p>
<h2 id="motivation">Motivation 🤓</h2>
<p>At Qdrant, we have one goal: make metric learning more practical.
This listing is in line with this purpose, and we aim at providing a
concise yet useful list of awesomeness around metric learning. It is
intended to be inspirational for productivity rather than serve as a
full bibliography.</p>
<p>If you find it useful or like it in some other way, you may want to
join our Discord server, where we are running a paper reading club on
metric learning.</p>
<p align="center">
<a href="https://discord.gg/tdtYvXjC4h"><img src="https://img.shields.io/badge/Discord-Qdrant-5865F2.svg?logo=discord" alt="Discord"></a>
</p>
<h2 id="contributing">Contributing 🤩</h2>
<p>If you want to contribute to this project, but dont know how, you
may want to check out the <a href="/CONTRIBUTING.md">contributing
guide</a>. Its easy! 😌</p>
<h2 id="surveys">Surveys 📖</h2>
<details>
<summary>
<a href='http://contrib.scikit-learn.org/metric-learn/introduction.html'>What
is Metric Learning? </a> - A beginner-friendly starting point for
traditional metric learning methods from scikit-learn website.
</summary>
<blockquote>
<p>It has proceeding guides for <a
href="http://contrib.scikit-learn.org/metric-learn/supervised.html">supervised</a>,
<a
href="http://contrib.scikit-learn.org/metric-learn/weakly_supervised.html">weakly
supervised</a> and <a
href="http://contrib.scikit-learn.org/metric-learn/unsupervised.html">unsupervised</a>
metric learning algorithms in <a
href="http://contrib.scikit-learn.org/metric-learn/metric_learn.html"><code>metric_learn</code></a>
package.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://www.mdpi.com/2073-8994/11/9/1066/htm">Deep Metric
Learning: A Survey</a> - A comprehensive study for newcomers.
</summary>
<blockquote>
<p>Factors such as sampling strategies, distance metrics, and network
structures are systematically analyzed by comparing the quantitative
results of the methods.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://hav4ik.github.io/articles/deep-metric-learning-survey">Deep
Metric Learning: A (Long) Survey</a> - An intuitive survey of the
state-of-the-art.
</summary>
<blockquote>
<p>It discusses the need for metric learning, old and state-of-the-art
approaches, and some real-world use cases.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/1812.05944">A Tutorial on Distance Metric
Learning: Mathematical Foundations, Algorithms, Experimental Analysis,
Prospects and Challenges (with Appendices on Mathematical Background and
Detailed Algorithms Explanation)</a> - Intended for those interested in
mathematical foundations of metric learning.
</summary>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/2201.05176">Neural Approaches to
Conversational Information Retrieval</a> - A working draft of a 150-page
survey book by Microsoft researchers
</summary>
</details>
<h2 id="applications">Applications 🎮</h2>
<details>
<summary>
<a href="https://github.com/openai/CLIP">CLIP</a> - Training a unified
vector embedding for image and text. <code>NLP</code> <code>CV</code>
</summary>
<blockquote>
<p>CLIP offers state-of-the-art zero-shot image classification and image
retrieval with a natural language query. See <a
href="https://colab.research.google.com/github/openai/clip/blob/master/notebooks/Interacting_with_CLIP.ipynb">demo</a>.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/descriptinc/lyrebird-wav2clip">Wav2CLIP</a>
- Encoding audio into the same vector space as CLIP. <code>Audio</code>
</summary>
<blockquote>
<p>This work achieves zero-shot classification and cross-modal audio
retrieval from natural language queries.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/facebookresearch/Detic">Detic</a> - Code
released for <a href="https://arxiv.org/abs/2201.02605">“Detecting
Twenty-thousand Classes using Image-level Supervision”</a>.
<code>CV</code>
</summary>
<blockquote>
<p>It is an open-class object detector to detect any label encoded by
CLIP without finetuning. See <a
href="https://huggingface.co/spaces/akhaliq/Detic">demo</a>.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://tfhub.dev/google/collections/gtr/1">GTR</a> -
Collection of Generalizable T5-based dense Retrievers (GTR) models.
<code>NLP</code>
</summary>
<blockquote>
<p>TensorFlow Hub offers a collection of pretrained models from the
paper <a href="https://arxiv.org/abs/2112.07899">Large Dual Encoders Are
Generalizable Retrievers</a>. GTR models are first initialized from a
pre-trained T5 checkpoint. They are then further pre-trained with a set
of community question-answer pairs. Finally, they are fine-tuned on the
MS Marco dataset. The two encoders are shared so the GTR model functions
as a single text encoder. The input is variable-length English text and
the output is a 768-dimensional vector.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL.md">TARS</a>
- Task-aware representation of sentences, a novel method for several
zero-shot tasks including NER. <code>NLP</code>
</summary>
<blockquote>
<p>The method and pretrained models found in Flair go beyond zero-shot
sequence classification and offers zero-shot span tagging abilities for
tasks such as named entity recognition and part of speech tagging.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/MaartenGr/BERTopic">BERTopic</a> - A novel
topic modeling toolkit with BERT embeddings. <code>NLP</code>
</summary>
<blockquote>
<p>It leverages HuggingFace Transformers and c-TF-IDF to create dense
clusters allowing for easily interpretable topics while keeping
important words in the topic descriptions. It supports guided, (semi-)
supervised, and dynamic topic modeling beautiful visualizations.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/ma921/XRDidentifier">XRD Identifier</a> -
Fingerprinting substances with metric learning
</summary>
<blockquote>
<p>Identification of substances based on spectral analysis plays a vital
role in forensic science. Similarly, the material identification process
is of paramount importance for malfunction reasoning in manufacturing
sectors and materials research. This models enables to identify
materials with deep metric learning applied to X-Ray Diffraction (XRD)
spectrum. Read <a
href="https://towardsdatascience.com/automatic-spectral-identification-using-deep-metric-learning-with-1d-regnet-and-adacos-8b7fb36f2d5f">this
post</a> for more background.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/overwindows/SemanticCodeSearch">Semantic
Code Search</a> - Retrieving relevant code snippets given a natural
language query. <code>NLP</code>
</summary>
<blockquote>
<p>Different from typical information retrieval tasks, code search
requires to bridge the semantic gap between the programming language and
natural language, for better describing intrinsic concepts and
semantics. The repository provides the pretrained models and source code
for <a href="https://arxiv.org/abs/2201.11313">Learning Deep Semantic
Model for Code Search using CodeSearchNet Corpus</a>, where they apply
several tricks to achieve this.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://git.tu-berlin.de/rsim/duch">DUCH: Deep Unsupervised
Contrastive Hashing</a> - Large-scale cross-modal text-image retrieval
in remote sensing with computer vision. <code>CV</code> <code>NLP</code>
</summary>
</details>
<details>
<summary>
<a href="https://github.com/geekinglcq/HRec">DUration: Deep Unsupervised
Representation for Heterogeneous Recommendation</a> - Recommending
different types of items efficiently. <code>RecSys</code>
</summary>
<blockquote>
<p>State-of-the-art methods are incapable of leveraging attributes from
different types of items and thus suffer from data sparsity problems
because it is quite challenging to represent items with different
feature spaces jointly. To tackle this problem, they propose a
kernel-based neural network, namely deep unified representation
(DURation) for heterogeneous recommendation, to jointly model unified
representations of heterogeneous items while preserving their original
feature space topology structures. See <a
href="https://arxiv.org/abs/2201.05861">paper</a>.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/MathieuCayssol/Item2Vec">Item2Vec</a> -
Word2Vec-inspired model for item recommendation. <code>RecSys</code>
</summary>
<blockquote>
<p>It provides the implementation of <a
href="https://arxiv.org/abs/1603.04259">Item2Vec: Neural Item Embedding
for Collaborative Filtering</a>, wrapped as a <code>sklearn</code>
estimator compatible with <code>GridSearchCV</code> and
<code>BayesSearchCV</code> for hyperparameter tuning.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/reppertj/earworm">Earworm</a> - Search for
royalty-free commercial-use music by sonic similarity
</summary>
<blockquote>
<p>You can search for the overall closest fit, or choose to focus
matching genre, mood, or instrumentation.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/princeton-nlp/DensePhrases">DensePhrases</a>
- a text retrieval model that can return phrases, sentences, passages,
or documents for your natural language queries. <code>NLP</code>
</summary>
<blockquote>
<p>It searches phrase-level answers to your questions in real-time or
retrieves passages for downstream tasks. Check out <a
href="http://densephrases.korea.ac.kr/">demo</a>, or see <a
href="https://arxiv.org/abs/2109.08133">paper</a>.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/PrithivirajDamodaran/Alt-ZSC">Alt-ZSC</a> -
An alternate implementation for zero-shot text classification.
<code>NLP</code>
</summary>
<blockquote>
<p>Instead of leveraging NLI/XNLI, they make use of the text encoder of
the CLIP model, concluding from casual experiments that this sometimes
gives better accuracy than NLI-based models.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/Spijkervet/CLMR">CLMR</a> - Contrastive
learning of musical representations
</summary>
<blockquote>
<p>Application of the SimCLR method to musical data with out-of-domain
generalization in million-scale music classification. See <a
href="https://spijkervet.github.io/CLMR/examples/clmr-onnxruntime-web/">demo</a>
or <a href="https://arxiv.org/abs/2103.09410">paper</a>.</p>
</blockquote>
</details>
<h2 id="case-studies">Case Studies ✍️</h2>
<details>
<summary>
<a href="https://arxiv.org/pdf/1810.09591.pdf">Applying Deep Learning to
Airbnb Search</a>
</summary>
</details>
<details>
<summary>
<a href="https://arxiv.org/pdf/2106.09297.pdf">Embedding-based Product
Retrieval in Taobao Search</a>
</details>
<details>
<summary>
<a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf">Deep
Neural Networks for Youtube Recommendations</a>
</summary>
</details>
<details>
<summary>
<a href="https://isir-ecom2022.github.io/papers/isir-ecom-2022_paper_3.pdf">Embracing
Structure in Data for Billion-scale Semantic Product Search</a> by
Amazon
</summary>
</details>
<h2 id="libraries">Libraries 🧰</h2>
<details>
<summary>
<a href="https://github.com/qdrant/quaterion">Quaterion</a> - Blazing
fast framework for fine-tuning similarity learning models
</summary>
<blockquote>
<p>Quaterion is a framework for fine-tuning similarity learning models.
The framework closes the “last mile” problem in training models for
semantic search, recommendations, anomaly detection, extreme
classification, matching engines, e.t.c. It is designed to combine the
performance of pre-trained models with specialization for the custom
task while avoiding slow and costly training.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/UKPLab/sentence-transformers">sentence-transformers</a>
- A library for sentence-level embeddings. <code>NLP</code>
</summary>
<blockquote>
<p>Developed on top of the well-known <a
href="https://github.com/huggingface/transformers">Transformers</a>
library, it provides an easy way to finetune Transformer-based models to
obtain sequence-level embeddings.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/OML-Team/open-metric-learning">OpenMetricLearning</a>
- PyTorch-based framework to train and validate the models producing
high-quality embeddings. <code>CV</code>
</summary>
</details>
<details>
<summary>
<a href="https://github.com/NTMC-Community/MatchZoo">MatchZoo</a> - a
collection of deep learning models for matching documents.
<code>NLP</code>
</summary>
<blockquote>
<p>The goal of MatchZoo is to provide a high-quality codebase for deep
text matching research, such as document retrieval, question answering,
conversational response ranking, and paraphrase identification.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/KevinMusgrave/pytorch-metric-learning">pytorch-metric-learning</a>
- A modular library implementing losses, miners, samplers and trainers
in PyTorch.
</summary>
</details>
<details>
<summary>
<a href="https://github.com/tensorflow/similarity">tensorflow-similarity</a>
- A metric learning library in TensorFlow with a Keras-like API.
</summary>
<blockquote>
<p>It provides support for self-supervised contrastive learning and
state-of-the-art methods such as SimCLR, SimSian, and Barlow Twins.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/explosion/sense2vec">sense2vec</a> -
Contextually keyed word vectors. <code>NLP</code>
</summary>
<blockquote>
<p>A PyTorch library to train and inference with contextually-keyed word
vectors augmented with part-of-speech tags to achieve multi-word
queries.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/lightly-ai/lightly">lightly</a> - A Python
library for self-supervised learning on images. <code>CV</code>
</summary>
<blockquote>
<p>A PyTorch library to efficiently train self-supervised computer
vision models with state-of-the-art techniques such as SimCLR, SimSian,
Barlow Twins, BYOL, among others.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/embeddings-benchmark/mteb">MTEB</a> -
Massive Text Embedding Benchmark. <code>NLP</code>
</summary>
<blockquote>
<p>A library that helps you benchmark pretrained and custom embedding
models on tens of datasets and tasks with ease.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/lyst/lightfm">LightFM</a> - A Python
implementation of a number of popular recommender algorithms.
<code>RecSys</code>
</summary>
<blockquote>
<p>It supports incorporating user and item features to the traditional
matrix factorization. It represents users and items as a sum of the
latent representations of their features, thus achieving a better
generalization.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/RaRe-Technologies/gensim">gensim</a> -
Library for topic modelling, document indexing and similarity retrieval
with large corpora
</summary>
<blockquote>
<p>It provides efficient multicore and memory-independent
implementations of popular algorithms, such as online Latent Semantic
Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random
Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/AmazingDD/daisyRec">DasyRec</a> - A library
for recommender system development in pytorch. <code>RecSys</code>
</summary>
<blockquote>
<p>It provides implementations of algorithms such as KNN, LFM, SLIM,
NeuMF, FM, DeepFM, VAE and so on, in order to ensure fair comparison of
recommender system benchmarks.</p>
</blockquote>
</details>
<h2 id="tools">Tools ⚒️</h2>
<details>
<summary>
<a href="https://projector.tensorflow.org/">Embedding Projector</a> - A
web-based tool to visualize high-dimensional data.
</summary>
<blockquote>
<p>It supports UMAP, T-SNE, PCA, or custom techniques to analyze
embeddings of encoders.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/uber-research/parallax">Parallax</a> - a
tool for visualizing embeddings
</summary>
<blockquote>
<p>It allows you to visualize the embedding space selecting explicitly
the axis through algebraic formulas on the embeddings (like
king-man+woman) and highlight specific items in the embedding space. It
also supports implicit axes via PCA and t-SNE. See <a
href="https://arxiv.org/abs/1905.12099">paper</a>.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/carted/processing-text-data">Processing Text
Data</a> - An optimized Apache Beam pipeline for generating sentence
embeddings (runnable on Cloud Dataflow). <code>NLP</code>
</summary>
</details>
<h3 id="approximate-nearest-neighbors">Approximate Nearest Neighbors
</h3>
<details>
<summary>
<a href="https://github.com/erikbern/ann-benchmarks">ANN Benchmarks</a>
- Benchmarking various ANN implementations for different metrics.
</summary>
<blockquote>
<p>It provides benchmarking of 20+ ANN algorithms on nine standard
datasets with support to bring your dataset. (<a
href="https://medium.com/towards-artificial-intelligence/how-to-choose-the-best-nearest-neighbors-algorithm-8d75d42b16ab?sk=889bc0006f5ff773e3a30fa283d91ee7">Medium
Post</a>)</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/facebookresearch/faiss">FAISS</a> -
Efficient similarity search and clustering of dense vectors that
possibly do not fit in RAM
</summary>
<blockquote>
<p>It is not the fastest ANN algorithm but achieves memory efficiency
thanks to various quantization and indexing methods such as IVF, PQ, and
IVF-PQ. (<a
href="https://www.pinecone.io/learn/faiss-tutorial/">Tutorial</a>)</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/nmslib/hnswlib">HNSW</a> - Hierarchical
Navigable Small World graphs
</summary>
<blockquote>
<p>It is still one of the fastest ANN algorithms out there, requiring
relatively a higher memory usage. (Paper: <a
href="https://arxiv.org/abs/1603.09320">Efficient and robust approximate
nearest neighbor search using Hierarchical Navigable Small World
graphs</a>)</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/google-research/google-research/tree/master/scann">Googles
SCANN</a> - The technology behind vector search at Google
</summary>
<blockquote>
<p>Paper: <a href="https://arxiv.org/abs/1908.10396">Accelerating
Large-Scale Inference with Anisotropic Vector Quantization</a></p>
</blockquote>
</details>
<h2 id="papers">Papers 🔬</h2>
<details>
<summary>
<a href="http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf">Dimensionality
Reduction by Learning an Invariant Mapping</a> - First appearance of
Contrastive Loss.
</summary>
<blockquote>
<p>Published by Yann Le Cun et al. (2005), its main focus was on
dimensionality reduction. However, the method proposed has excellent
properties for metric learning such as preserving neighbourhood
relationships and generalization to unseen data, and it has extensive
applications with a great number of variations ever since. It is advised
that you read <a
href="https://medium.com/@maksym.bekuzarov/losses-explained-contrastive-loss-f8f57fe32246">this
great post</a> to better understand its importance for metric
learning.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/1503.03832">FaceNet: A Unified Embedding
for Face Recognition and Clustering</a> - First appearance of Triplet
Loss.
</summary>
<blockquote>
<p>The paper introduces Triplet Loss, which can be seen as the “ImageNet
moment” for deep metric learning. It is still one of the
state-of-the-art methods and has a great number of applications in
almost any data modality.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/1703.07737">In Defense of the Triplet
Loss for Person Re-Identification</a> - It shows that triplet sampling
matters and proposes to use batch-hard samples.
</summary>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/1708.01682">Deep Metric Learning with
Angular Loss</a> - A novel loss function with better properties.
</summary>
<blockquote>
<p>It provides scale invariance, robustness against feature variance,
and better convergence than Contrastive and Triplet Loss.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/1801.07698">ArcFace: Additive Angular
Margin Loss for Deep Face Recognition</a> &gt; Supervised metric
learning without pairs or triplets.
</summary>
<blockquote>
<p>Although it is originally designed for the face recognition task,
this loss function achieves state-of-the-art results in many other
metric learning problems with a simpler and faster data feeding. It is
also robust against unclean and unbalanced data when modified with
sub-centers and a dynamic margin.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://cse.buffalo.edu/~lusu/papers/TKDD2020.pdf">Learning
Distance Metrics from Probabilistic Information</a> - Working with
datasets that contain probabilistic labels instead of deterministic
values.
</summary>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/2105.04906">VICReg:
Variance-Invariance-Covariance Regularization for Self-Supervised
Learning</a> - Better regularization for high-dimensional embeddings.
</summary>
<blockquote>
<p>The paper introduces a method that explicitly avoids the collapse
problem in high dimensions with a simple regularization term on the
variance of the embeddings along each dimension individually. This new
term can be incorporated into other methods to stabilize the training
and performance improvements.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/2104.13643">On the Unreasonable
Effectiveness of Centroids in Image Retrieval</a> - Higher robustness
against outliers with better efficiency.
</summary>
<blockquote>
<p>The paper proposes using the mean centroid representation during
training and retrieval for robustness against outliers and more stable
features. It further reduces retrieval time and storage requirements,
making it suitable for production deployments.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://arxiv.org/abs/2104.06979">TSDAE: Using
Transformer-based Sequential Denoising Auto-Encoder for Unsupervised
Sentence Embedding Learning</a> - A SOTA method to learn domain-specific
sentence-level embeddings from unlabelled data.
</summary>
</details>
<details>
<summary>
<a href="http://arxiv.org/abs/2002.05709">SimCLR: A Simple Framework for
Contrastive Learning of Visual Representations</a> - Self-Supervised
method comparing two differently augmented versions of the same image
with Contrastive Loss. <code>CV</code>
</summary>
<blockquote>
<p>It demonstrates among other things that - composition of data
augmentations plays a critical role - Random Crop + Random Color
distortion provides the best downstream classifier accuracy, -
introducing a learnable nonlinear transformation between the
representation and the contrastive loss substantially improves the
quality of the learned representations, - and Contrastive learning
benefits from larger batch sizes and more training steps compared to
supervised learning.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://aclanthology.org/2021.emnlp-main.552">SimCSE: Simple
Contrastive Learning of Sentence Embeddings</a> - An unsupervised
approach, which takes an input sentence and predicts itself in a
contrastive objective, with only standard dropout used as noise.
<code>NLP</code>
</summary>
<blockquote>
<p>They also incorporates annotated pairs from natural language
inference datasets into their contrastive learning framework in a
supervised setting, showing that contrastive learning objective
regularizes pre-trained embeddings anisotropic space to be more
uniform, and it better aligns positive pairs when supervised signals are
available.</p>
</blockquote>
</details>
<details>
<summary>
<a href="http://arxiv.org/abs/2103.00020">Learning Transferable Visual
Models From Natural Language Supervision</a> - The paper that introduced
CLIP: Training a unified vector embedding for image and text.
<code>NLP</code> <code>CV</code>
</summary>
</details>
<details>
<summary>
<a href="http://arxiv.org/abs/2102.05918">Scaling Up Visual and
Vision-Language Representation Learning With Noisy Text Supervision</a>
- Googles answer to CLIP: Training a unified vector embedding for image
and text but using noisy text instead of a carefully curated dataset.
<code>NLP</code> <code>CV</code>
</summary>
</details>
<details>
<summary>
<a href="https://github.com/msight-tech/research-xbm">Cross-Batch Memory
for Embedding Learning (XBM)</a> - A technique aimed to extend batch
sizes for similarity losses, without actually evaluating all embeddings
in a single batch.
</summary>
<blockquote>
<p>Mining informative negative instances are of central importance to
deep metric learning (DML), however this task is intrinsically limited
by mini-batch training, where only a mini-batch of instances is
accessible at each iteration. In this paper, we identify a “slow drift”
phenomena by observing that the embedding features drift exceptionally
slow even as the model parameters are updating throughout the training
process. This suggests that the features of instances computed at
preceding iterations can be used to considerably approximate their
features extracted by the current model.</p>
</blockquote>
</details>
<h2 id="datasets-">Datasets </h2>
<blockquote>
<p>Practitioners can use any labeled or unlabelled data for metric
learning with an appropriate method chosen. However, some datasets are
particularly important in the literature for benchmarking or other ways,
and we list them in this section.</p>
</blockquote>
<details>
<summary>
<a href="https://nlp.stanford.edu/projects/snli/">SNLI</a> - The
Stanford Natural Language Inference Corpus, serving as a useful
benchmark. <code>NLP</code>
</summary>
<blockquote>
<p>The dataset contains pairs of sentences labeled as
<code>contradiction</code>, <code>entailment</code>, and
<code>neutral</code> regarding semantic relationships. Useful to train
semantic search models in metric learning.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://cims.nyu.edu/~sbowman/multinli/">MultiNLI</a> - NLI
corpus with samples from multiple genres. <code>NLP</code>
</summary>
<blockquote>
<p>Modeled on the SNLI corpus, the dataset contains sentence pairs from
various genres of spoken and written text, and it also offers a
distinctive cross-genre generalization evaluation.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://www.kaggle.com/c/landmark-recognition-2019">Google
Landmark Recognition 2019</a> - Label famous (and no so famous)
landmarks from images. <code>CV</code>
</summary>
<blockquote>
<p>Shared as a part of a Kaggle competition by Google, this dataset is
more diverse and thus more interesting than the first version.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST</a>
- a dataset of Zalandos article images. <code>CV</code>
</summary>
<blockquote>
<p>The dataset consists of a training set of 60,000 examples and a test
set of 10,000 examples. Each example is a 28x28 grayscale image,
associated with a label from 10 classes.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://cvgl.stanford.edu/projects/lifted_struct/">The Stanford
Online Products dataset</a> - dataset has 22,634 classes with 120,053
product images. <code>CV</code>
</summary>
<blockquote>
<p>The dataset is published along with <a
href="https://github.com/rksltnl/Deep-Metric-Learning-CVPR16">“Deep
Metric Learning via Lifted Structured Feature Embedding”</a> paper.</p>
</blockquote>
</details>
<details>
<summary>
<a href="https://www.drivendata.org/competitions/79/">MetaAIs 2021
Image Similarity Dataset and Challenge</a> - dataset has 1M Reference
image set, 1M Training image set, 50K Dev query image set and 50K Test
query image set. <code>CV</code>
</summary>
<blockquote>
<p>The dataset is published along with <a
href="http://arxiv.org/abs/2106.09672">“The 2021 Image Similarity
Dataset and Challenge”</a> paper.</p>
</blockquote>
</details>
<p><a
href="https://github.com/qdrant/awesome-metric-learning">metriclearning.md
Github</a></p>