Labs


CLEF promotes the systematic evaluation of information access systems, primarily through experimentation on shared tasks.

CLEF 2022 consists of a set of 14 Labs designed to test different aspects of multilingual and multimedia IR systems:


  1. ARQMath: Answer Retrieval for Questions on Math
  2. BioASQ: Large-scale biomedical semantic indexing and question answering
  3. CheckThat! Lab on Fighting the COVID-19 Infodemic and Fake News Detection
  4. ChEMU: Cheminformatics Elsevier Melbourne University lab
  5. eRisk: Early risk prediction on the Internet
  6. HIPE: Named Entity Recognition and Linking in Multilingual Historical Documents
  7. iDPP: Intelligent Disease Progression Prediction
  8. ImageCLEF: Multimedia Retrieval Challenge
  9. JokeR: Automatic Wordplay and Humour Translation
  10. LeQua: Learning to Quantify
  11. LifeCLEF: Biodiversity identification and prediction Challenges
  12. PAN Lab on Digital Text Forensics and Stylometry
  13. SimpleText: Automatic Simplification of Scientific Texts
  14. Touché: Argument Retrieval

Labs Publications:

  • Lab overviews will be published in LNCS Proceedings
  • Labs working notes and extended overviews will published in CEUR-WS Proceedings
  • Best of 2021 Lab Papers will be nominated for CLEF 2022 submission to LNCS proceedings

Labs Participation:


ARQMath: Answer Retrieval for Questions on Math

ARQMath aims to advance math-aware search and the semantic analysis of mathematical notation and texts.
  • Task 1: Answer Retrieval.
    Given a math question post, return relevant answer posts.
  • Task 2: Formula Retrieval.
    Given a formula in a math question post, return relevant formulas from both question and answer posts.
  • Pilot Task 1: Open Domain Question Answering.
    Given a math question post, return an automatically generated answer that is comprised of excerpts from arbitrary sources, and/or machine generated. (approval pending)
  • https://www.cs.rit.edu/~dprl/ARQMath
  • @ARQMath1

BioASQ: Large-scale biomedical semantic indexing and question answering

The aim of the BioASQ Lab is to push the research frontier towards systems that use the diverse and voluminous information available online to respond directly to the information needs of biomedical scientists.
  • Task 1: Large-Scale Online Biomedical Semantic Indexing.
    Classify new PubMed documents, before PubMed curators annotate (in effect, classify) them manually into classes from the MeSH hierarchy.
  • Task 2: Biomedical Semantic Question Answering.
    It uses benchmark datasets of biomedical questions, in English, along with gold standard (reference) answers constructed by a team of biomedical experts. The participants have to respond with relevant articles, and snippets from designated resources, as well as exact and "ideal" answers.
  • Task 3 - DisTEMIST: Disease Text Mining and Indexing Shared Task.
    It focuses on the recognition and indexing of diseases in medical documents in Spanish, by posing subtasks on (1) indexing medical documents with controlled terminologies; (2) automatic detection indexing textual evidence (i.e. disease entity mentions in text); and (3) normalization of these disease mentions to terminologies.
  • Task 4 - Task Synergy: Question Answering for developing problems.
    Biomedical experts pose unanswered questions for the developing problem of COVID-19, receive the responses provided by the participating systems, and provide feedback, together with updated questions in an iterative procedure that aims to facilitate the incremental understanding of COVID-19.
  • http://www.bioasq.org/workshop2022
  • @BioASQ

CheckThat! lab on Fighting the COVID-19 Infodemic and Fake News Detection

The CheckThat! lab aims at fighting misinformation and disinformation in social media, in political debates and in the news, with focus on three tasks (in seven languages: Arabic, Bulgarian, Dutch, English, German, Spanish, and Turkish).
  • Task 1: Fighting the COVID-19 Infodemic.
    It focuses on disinformation related to the ongoing COVID-19 infodemic and asks to identify which posts in a Twitter stream are worth fact-checking, contain a verifiable factual claim, are harmful to the society, and why. This task is offered in Arabic, Bulgarian, Dutch, English, Spanish, and Turkish.
  • Task 2: Detecting Previously Fact-Checked Claims.
    Given a check-worthy claim, and a set of previously-checked claims, determine whether the claim has been previously fact-checked with respect to a collection of fact-checked claims. The text can be a tweet or a sentence from a political debate. The task is offered in Arabic and English.
  • Task 3: Fake news detection.
    Given the text and the title of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., articles in dispute and unproven articles). This task is offered in English and German.
  • https://sites.google.com/view/clef2022-checkthat

ChEMU: Cheminformatics Elsevier Melbourne University lab

The ChEMU lab series provides a unique opportunity for the development of information extraction tools over chemical patents. ChEMU 2022 focuses on information extraction in chemical patents, including five tasks ranging from document- to expression-level.
  • Task 1a: Named entity recognition.
    This task aims to identify chemical compounds, their specific types, temperatures, reaction times, yields, and the label of the reaction.
  • Task 1b: Event extraction.
    A chemical reaction leading to an end product often consists of a sequence of individual event steps. The task is to identify those steps which involve chemical entities recognized from Task 1a.
  • Task 1c: Anaphora resolution.
    It requires the resolution of anaphoric dependencies between expressions in chemical patents. The participants are required to find five types of anaphoric relationships in chemical patents: coreference, reaction-associated, work-up, contained, and transform.
  • Task 2a: Chemical reaction reference resolution.
    Given a reaction description, this task requires identifying references to other reactions that the reaction relates to, and to the general conditions that it depends on.
  • Task 2b: Table semantic classification.
    This task is about classifying tables in chemical patents into 8 categories based on their contents .
  • http://chemu2022.eng.unimelb.edu.au/
  • @karinv

eRisk: Early risk prediction on the Internet

eRisk explores the evaluation methodology, effectiveness metrics, and practical applications (particularly those related to health and safety) of early risk detection on the Internet. Early detection technologies can be employed in different areas, particularly those related to health and safety. For instance, early alerts could be sent when a predator starts interacting with a child for sexual purposes, or when a potential offender starts publishing antisocial threats on a blog, forum or social network. Our main goal is to pioneer a new interdisciplinary research area that would be potentially applicable to a wide variety of situations and to many different personal profiles. Examples include potential paedophiles, stalkers, individuals that could fall into the hands of criminal organisations, people with suicidal inclinations, or people susceptible to depression.
  • Task 1: Early Detection of Signs of Pathological Gambling.
    The challenge consists of sequentially processing pieces of evidence and detect early traces of pathological gambling (also known as compulsive gambling or disordered gambling), as soon as possible. The task is mainly concerned about evaluating Text Mining solutions and, thus, it concentrates on texts written in Social Media.
  • Task 2: Early Detection of Depression.
    The challenge consists of sequentially processing pieces of evidence and detect early traces of depression as soon as possible. The task is mainly concerned about evaluating Text Mining solutions and, thus, it concentrates on texts written in Social Media.
  • Task 3: Measuring the severity of the signs of Eating Disorders
    The task consists of estimating the level of features associated with a diagnosis of eating disorders from a thread of user submissions. For each user, the participants will be given a history of postings and the participants will have to fill a standard eating disorder questionnaire (based on the evidence found in the history of postings).
  • https://erisk.irlab.org
  • @earlyrisk

HIPE - Named Entity Recognition and Linking in Multilingual Historical Documents

HIPE ('Identifying Historical People, Places and other Entities') focuses on named entity recognition and linking in historical documents, with the objective of assessing and advancing the development of robust, adaptable, and transferable named entity processing systems. Compared to the first HIPE edition in 2020, HIPE 2022 will confront systems with the challenges of dealing with more languages, learning domain-specific entities, and adapting to diverse annotation schemas.
  • Task 1: Named Entity Recognition and Classification (NERC).
    With two subtasks: NERC-coarse on high-level entity types, for all languages and NERC-fine on finer-grained entity types, for English, French, and German only.
  • Task 2: Named Entity Linking (EL).
    Or the linking of named entity mentions to a unique referent in a knowledge base (Wikidata) or to a NIL node if the mention does not have a referent in the KB."
  • https://hipe-eval.github.io/HIPE-2022/
  • @impressoproject

iDPP: Intelligent Disease Progression Prediction

Amyotrophic Lateral Sclerosis (ALS) is a severe chronic disease characterized by progressive or alternate impairment of neurological functions, characterized by high heterogeneity both in symptoms and disease progression. The goal of iDPP is to design and develop an evaluation infrastructure for AI algorithms able to: (1) better describe disease mechanisms; (2) stratify patients according to their phenotype assessed all over the disease evolution; and (3) predict disease progression in a probabilistic, time dependent fashion.
  • Task 1: Ranking Risk of Impairment.
    This task will focus on ranking of patients based on the risk of impairment in specific domains. We will use the ALSFRS-R scale to monitor speech, swallowing, handwriting, dressing/hygiene, walking and respiratory ability in time and will ask participants to rank patients based on time to event risk of experiencing impairment in each specific domain.
  • Task 2: Predicting Time of Impairment.
    This task will refine Task 1 asking participants to predict when specific impairments will occur (i.e. in the correct time-window). We will assess model calibration in terms of the ability of the proposed algorithms to estimate a probability of an event close to the true probability within a specified time-window.
  • Task 3: Explainability of AI algorithms [Position Papers].
    This task will call for position papers to start a discussion on AI explainability including proposals on how the single patient data can be visualized in a multivariate fashion contextualizing its dynamic nature and the model predictions together with information on the predictive variables that most influence the prediction. We will evaluate proposals of different visualization frameworks able to show the multivariate nature of the data and the model predictions in an explainable, possibly interactive, way.
  • https://brainteaser.health/open-evaluation-challenges/idpp-2022/
  • @brainteaser2020

ImageCLEF: Multimedia Retrieval Challenge

ImageCLEF is set to promote the evaluation of technologies for annotation, indexing, classification and retrieval of multi-modal data, with the objective of providing information access to large collections of images in various usage scenarios and domains. ImageCLEF 2022 focuses on medical, nature, Internet, and system fusion applications.
  • Task 1: ImageCLEFmedical
    The caption task focuses on interpreting and summarizing the insights gained from radiology images, i.e. develop systems that are able to predict the UMLS concepts from visual image content, and implementing models to predict captions for given radiology images. The tuberculosis task fosters systems that are expected to detect cavern regions localization rather than simply provide a label for the CT images.
  • Task 2: ImageCLEFcoral
    It fosters tools for creating 3-dimensional models of underwater coral environments. It requires participants to label coral underwater images with types of benthic substrate together with their bounding box, and to segment and parse each coral image into different image regions associated with benthic substrate types.
  • Task 3: ImageCLEFaware
    The online disclosure of personal data often has effects which go beyond the initial context in which data were shared. Participants are required to provide automatic rankings of photographic user profiles in a series of real-life situations such as searching for a bank loan, an accommodation, a waiter job or a job in IT. The ranking will be based on an automatic analysis of profile images and the aggregation of individual results.
  • Task 4: ImageCLEFfusion
    System fusion allows to exploit the complementary nature of individual systems to boost performance. Participants will be tasked with creating novel ensembling methods that are able to significantly increase the performance of precomputed inducers in various use-case scenarios, such as visual interestingness and video memorability prediction.
  • https://www.imageclef.org/2022
  • @imageclef

JokeR: Automatic Wordplay and Humour Translation Workshop

The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for creative language, including data and metric development, and to foster work on automatic methods for wordplay translation.
  • Pilot task 1: Classify and interpret wordplay.
    Classify single words containing wordplay according to a given typology, and provide lexical-semantic interpretations.
  • Pilot task 2: Translate single term wordplay.
    Translate single words containing wordplay.
  • Pilot task 3: Translate phrase wordplay.
    Translate entire phrases that subsume or contain wordplay.
  • Task 4: Unshared Task.
    We welcome submissions that use our data in other ways!
  • http://joker-project.com/
  • @joker_research

LeQua: Learning to Quantify

The aim of LeQua 2022 (the 1st edition of the lab) is to allow the comparative evaluation of methods for “learning to quantify” in textual datasets; i.e. methods for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. These predictors (called “quantifiers”) will be required to issue predictions for several such sets, some of them characterized by class frequencies radically different from the ones of the training set.
  • Task 1:
    Participants will be provided with documents already converted into vector form; the task is thus suitable for participants who do not wish to engage in generating representations for the textual documents, but want instead to concentrate on optimizing the methods for learning to quantify.
  • Task 2:
    Participants will be provided with the raw text of the documents; the task is thus suitable for participants who also wish to engage in generating suitable representations for the textual documents, or to train end-to-end systems.
  • https://lequa2022.github.io
  • @LeQua2022

LifeCLEF: Biodiversity identification and prediction

The LifeCLEF lab aims to stimulate research in data science and machine learning for biodiversity monitoring.
  • Task 1: BirdCLEF
    Bird species recognition in audio soundscapes.
  • Task 2: PlantCLEF
    Image-based plant identification at global scale (300K classes).
  • Task 3: GeoLifeCLEF
    Location-based prediction of species based on environmental and occurrence data.
  • Task 4: SnakeCLEF
    Snake species identification in medically important scenarios.
  • Task 5: FungiCLEF
    Fungi Recognition from image & metadata.
  • https://www.imageclef.org/LifeCLEF2022
  • @LifeCLEF

PAN Lab on Digital Text Forensics and Stylometry

PAN is a series of scientific events and shared tasks on digital text forensics and stylometry, studying how to quantify writing style and improve authorship technology.
  • Task 1: Authorship Verification.
    Given two texts, determine if they are written by the same author.
  • Task 2 - IROSTEREO: Profiling Irony and Stereotype Spreaders on Twitter.
    Given a Twitter feed, determine whether its author spreads Irony and Stereotypes.
  • Task 3: Style Change Detection.
    Given a document, determine the number of authors and at which positions the author changes.
  • Task 4: Trigger Warning Prediction.
    Given a document, determine whether its content warrants a warning of potential negative emotional responses in readers.
  • https://pan.webis.de
  • @webis_de

SimpleText: Automatic Simplification of Scientific Texts

The 2022 SimpleText track addresses the challenges of text simplification approaches in the context of promoting scientific information access, by providing appropriate data and benchmarks, and creating a community of NLP and IR researchers working together to resolve one of the greatest challenges of today.
  • Task 1: What is in (or out)?
    Select passages to include in a simplified summary, given a query.
  • Task 2: What is unclear?
    Given a passage and a query, rank terms/concepts that are required to be explained for understanding this passage (definitions, context, applications,..).
  • Task 3: Rewrite this!
    Given a query, simplify passages from scientific abstracts.
  • Task 4: Unshared task.
    We welcome any submission that uses our data!
  • http://simpletext-project.com

Touché: Argument Retrieval

Decision making processes, be it at the societal or at the personal level, often come to a point where one side challenges the other with a why-question, which is a prompt to justify some stance based on arguments. Since technologies for argument mining are maturing at a rapid pace, also ad-hoc argument retrieval becomes a feasible task in reach.
  • Task 1: Argument Retrieval for Controversial Questions.
    Given a controversial topic and a collection of argumentative documents, the task is to retrieve and rank sentences (the main claim and its most important premise in the document) that convey key points pertinent to the controversial topic.
  • Task 2: Argument Retrieval for Comparative Questions.
    Given a comparative topic and a collection of documents, the task is to retrieve relevant argumentative passages for either compared object or for both and to detect their respective stances with respect to the object they talk about.
  • Task 3: Image Retrieval for Arguments.
    Given a controversial topic, the task is to retrieve images (from web pages) for each stance (pro/con) that show support for that stance.
  • https://touche.webis.de
  • @webis_de