Labs


CLEF promotes the systematic evaluation of information access systems, primarily through experimentation on shared tasks.

CLEF 2022 consists of a set of 14 Labs designed to test different aspects of multilingual and multimedia IR systems:

This page is under construction. Lab descriptions will eventually and gracefully appear


  1. ARQMath
  2. BioASQ: Large-scale biomedical semantic indexing and question answering
  3. CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection
  4. ChEMU
  5. eRisk: Early risk prediction on the Internet
  6. HIPE: Named Entity Recognition and Linking in Historical Documents
  7. iDPP
  8. ImageCLEF
  9. JokeR: Automatic Pun and Humour Translation Workshop
  10. LeQua: Learning to Quantify
  11. LifeCLEF: Biodiversity Identification and Prediction Challenges
  12. PAN Lab on Digital Text Forensics and Stylometry
  13. SimpleText: Automatic Simplification of Scientific Texts
  14. Touché: Argument Retrieval

Labs Publications:

  • Lab Overviews will be published in LNCS Proceedings
  • Labs Working Notes will published in CEUR-WS Proceedings
  • Best of 2021 Lab Papers will be nominated for CLEF 2022 submission to LNCS proceedings

Labs Participation:

  • Lab participants register via the CLEF website (open November 2021)

BioASQ: Large-scale biomedical semantic indexing and question answering

The aim of the BioASQ Lab is to push the research frontier towards systems that use the diverse and voluminous information available online to respond directly to the information needs of biomedical scientists.
  • Task A: Large-Scale Online Biomedical Semantic Indexing.
    The participants are asked to classify new PubMed documents, before PubMed curators annotate (in effect, classify) them manually into classes from the MeSH hierarchy.
  • Task B: Biomedical Semantic Question Answering.
    This task uses benchmark datasets of biomedical questions, in English, along with gold standard (reference) answers constructed by a team of biomedical experts. The participants have to respond with relevant concepts, articles, snippets and RDF triples, from designated resources, as well as exact and 'ideal' answers.
  • Task MESINESP: Medical Semantic Indexing In Spanish.
    The participants are asked to classify new medical documents written in Spanish, before curators annotate them manually. The classes come from the MeSH hierarchy through the DeCS vocabulary.
  • Task Synergy: Question Answering for developing problems.
    Biomedical experts pose unanswered questions for the developing problems, such as COVID-19. Participating systems are required to provide answers, which will in turn be assessed by the experts and fed back to the systems, together with updated questions and new knowledge resources. Through this process, this task aims to facilitate the incremental understanding of developing problems, such as COVID-19, and contribute to the discovery of new solutions.
  • http://www.bioasq.org/workshop2022

CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection

The mission of the CheckThat! lab is to foster the development of technology that would enable the automatic verification of claims. This is the 5th edition of the lab, this year we add two new languages: Dutch and German (in addition to Arabic, Bulgarian, English, Spanish, and Turkish). We focus on Twitter for tasks 1 and 2, and on news articles for task 3.
  • Task 1: Fighting the COVID-19 Infodemic.
    It covers check-worthiness, verifiable factual claims detection, harmful tweet detection, and attention-worthy tweets detection.
  • Task 2: Previously fact-checked claims detection.
    Given a check-worthy claim, and a set of previously-checked claims, determine whether the claim has been previously fact-checked with respect to a collection of fact-checked claims.
  • Task 3: Fake news detection.
    It targets news articles. Given the text and the title of a news article, determine whether the main claim made in the article is true, partially true, false, or other.
  • https://sites.google.com/view/clef2022-checkthat

eRisk: Early risk prediction on the Internet

eRisk explores the evaluation methodology, effectiveness metrics and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles.
  • Task 1: Early detection of pathological gambling.
    The challenge will consist of sequentially processing pieces of evidence and detect early traces of pathological gambling as soon as possible. The task is mainly concerned with evaluating Text Mining solutions and, thus, concentrates on texts written in Social Media. Texts should be processed in the order they were created.
  • Task 2: Early detection of depression.
    The challenge will consist of sequentially processing pieces of evidence and detect early traces of depression as soon as possible. The task will follow the same scheme used in Task 1.
  • Task 3: Measuring the severity of the signs of eating disorders
    The task consists of estimating the level of eating disorder for an individual given their history or written submissions. For each user, the participants will be given a history of postings and the participants will have to fill a standard eating disorder questionnaire (based on the evidence found in the history of postings).
  • https://erisk.irlab.org

HIPE - Named Entity Recognition and Linking in Historical Documents

HIPE ('Identifying Historical People, Places and other Entities’) focuses on named entity recognition and linking in historical documents, with the objective to assess and advance the development of robust, adaptable and transferable named entity processing systems. Compared to the first HIPE edition in 2020, HIPE-2022 will confront systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation schemas. HIPE-2022 aims at gaining new insights on how best to ensure the transferability of NE processing approaches across languages, time periods, and document types in a cultural heritage context.
  • Task 1: Named Entity Recognition and Classification (NERC).
    With two subtasks: NERC-coarse (on high-level entity types, for all languages) and NERC-fine (on finer-grained entity types, for English, French and German only).
  • Task 2: Named Entity Linking (EL).
    Or the linking of named entity mentions to a unique referent in a knowledge base (Wikidata) or to a NIL node if the mention does not have a referent in the KB.
  • https://hipe-eval.github.io/HIPE-2022/

JokeR: Automatic Pun and Humour Translation Workshop

The main objective of the JokeR project is to study strategies for the localization of humour and puns and to create a multilingual parallel corpus, annotated according to these strategies, open and freely available, as well as evaluation metrics. The multilingual data and metrics resulting from the JokeR workshop will be a step forward in the automation of humour localization in order to train and evaluate machine translation models.
  • Pilot task 1:
    Classify and explain a given punning construction in a proper noun or a neologism. The classification will be evaluated in terms of accuracy, while the explanation will be compared against the gold standard (exact match).
  • Pilot task 2:
    Translate a given pun from a proper noun or a neologism from English into French.
  • Pilot task 3:
    Translate a given punning phrase from English into French.
  • https://motsmachines.github.io/ joker/EN/

LeQua: Learning to Quantify

The aim of LeQua 2022 is to allow the comparative evaluation of methods for ”learning to quantify” in textual datasets, i.e., methods for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. These predictors (called “quantifiers”) will be required to issue predictions for several such sets, some of them characterized by class frequencies radically different from the ones of the training set.
  • Task 1:
    This task is concerned with evaluating binary quantifiers, i.e., quantifiers that must only predict the relative frequencies of a class and its complement. Participants in this task will be provided with training and test documents already converted in vector form; the task is thus suitable for participants who do not wish to engage in generating suitable representations for the textual documents, but want instead to concentrate on optimizing the methods for learning to quantify.
  • Task 2:
    This task is concerned with evaluating single-label multi-class quantifiers, i.e., quantifiers that operate on documents that each belong to exactly one among a set of n>2 classes. Like in Task 1, participants will be provided with training and test documents already converted in vector form.
  • Task 3:
    Like Task 1, this task is concerned with evaluating binary quantifiers. Unlike in Task 1, participants will be provided with the raw text of both training and test documents; the task is thus suitable for participants who also wish to engage in generating suitable representations for the textual documents, or to train end-to-end systems.
  • Task 4:
    Like Task 2, this task is concerned with evaluating single-label multi-class quantifiers; like in Task 3, participants will be provided with the raw text of both training and test documents.
  • https://lequa2022.github.io

LifeCLEF: Biodiversity Identification and Prediction Challenges

LifeCLEF lab aims at boosting research on the identification and prediction of living organisms in order to solve the taxonomic impediment and improve our knowledge of biodiversity.
  • Task 1: PlantCLEF
    Image-based plant identification at global scale (300K classes).
  • Task 2: BirdCLEF
    Bird species identification from bird calls and songs in audio soundscapes.
  • Task 3: GeoLifeCLEF
    Species presence prediction from remote sensing data.
  • Task 4: SnakeCLEF
    Snake Species Identification in Medically Important scenarios.
  • Task 5: FungiCLEF
    Fungi Recognition from Image & Metadata
  • https://www.imageclef.org/LifeCLEF2022

SimpleText: Automatic Simplification of Scientific Texts

Information retrieval has moved from traditional document retrieval in which search is an isolated activity, to modern information access where search and the use of the information are fully integrated. But non-experts tend to avoid authoritative primary sources such as scientific literature due to their complex language, internal vernacular, or lacking prior background knowledge. Text simplification approaches can remove some of these barriers, thereby avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. SimpleText tackles technical challenges and evaluation challenges by providing appropriate data and benchmarks for text simplification.
  • Task 1: Selecting passages to include in a simplified summary
    - Content Simplification. Given an article from a major international newspaper general audience, this task aims at retrieving from a large scientific bibliographic database with abstracts, all passages that would be relevant to illustrate this article. Extracted passages should be adequate to be inserted as plain citations in the original paper. Sentence pooling and automatic metrics will be used to evaluate these results. The relevance of the source document will be evaluated as well as potential unresolved anaphora issues.
  • Task 2: Identifying difficult-to-understand concepts for non experts - Content Simplification.
    The goal of this task is to decide which terms (up to 10) require explanation and contextualization to help a reader understand a complex scientific text - for example, with regard to a query, terms that need to be contextualized (with a definition, example and/or use-case). Term pooling and automatic metrics (NDCG,...) will be used to evaluate these results.
  • Task 3: Scientific text simplification-Language implification.
    The goal of this task is to provide a simplified version of text passages. Participants will be provided with queries and abstracts of scientific papers. The abstracts can be split into sentences as in the example. The simplified passages will be evaluated manually with eventual use of aggregating metrics.
  • https://simpletext-madics.github.io

Touché: Argument Retrieval

The main goal of Touché is to establish a collaborative platform for researchers in the area of argument retrieval and to provide tools for developing and evaluating argument retrieval approaches.
  • Task 1: Argument Retrieval for Controversial Questions.
    Given a controversial topic and a collection of argumentative documents, the task is to retrieve and rank sentences (the main claim and its most important premise in the document) that convey key points pertinent to the controversial topic.
  • Task 2: Argument Retrieval for Comparative Questions.
    Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.
  • Task 3: Image Retrieval for Arguments.
    Given a stance on some controversial topic and a collection of argumentative documents with images, the task is to retrieve and rank images that can be used to support or attack that stance.
  • https://touche.webis.de