爱可可AI论文推介(10月15日)( 三 ) LG-机器学习CL-计算与语言AS-音频与语

文章插图
文章插图
5、[IR]Pretrained Transformers for Text Ranking: BERT and Beyond
J Lin, R Nogueira, A Yates
[University of Waterloo & Max Planck Institute for Informatics]
面向文本排序的预训练Transformer模型综述，提供了关于文本排序与Transformer神经网络结构的综述，其中BERT是最著名的例子。综述涵盖了广泛的现代技术，包括两大类：在多阶段排序架构中执行重排序的Transformer模型，以及试图直接执行排序的密集表示学习。重点考察长文档处理技术(超出NLP中使用的典型逐句处理方法) ，以及在处理效果(结果质量)和效率(查询延迟)之间权衡的技术。
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading.
文章插图
文章插图
文章插图