AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web
Michael Schlichtkrull, Zhijiang Guo, Andreas Vlachos
Contains guidelines for the AVERITEC Shared Task
@inproceedings{ schlichtkrull2023averitec, title={AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web}, author={Michael Sejr Schlichtkrull and Zhijiang Guo and Andreas Vlachos}, booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, year={2023}, url={https://openreview.net/forum?id=fKzSz0oyaI} }
The AVeriTeC challenge aims to evaluate the ability of systems to verify real-world claims with evidence from the Web.
To learn more about the task and our baseline implementation, read our paper AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web.
The participants are given access to the training and development datasets of the AVeriTeC paper available here. In addition, we provide the document collection for each claim as compiled by searching the Web using the Google API here. It is guaranteed to contain the right evidence, so that participants do not need to use a search engine to develop their approaches (but doing so is allowed). The test dataset will be released in the end of June, including both claims and document collections (but not the correct responses). The datasets are distributed in the JSONL format with one example per line (see http://jsonlines.org for more details). The data can be downloaded on the AVeriTeC Dataset page.
System predictions should be submitted to our EvalAI challenge page. Before the release of the testing data you can submit your predictions on the development split to become familiar with the submission system. When submitting system predictions, you need to specify the system name, and, if available, a link to the code. We will use the team name you specified on EvalAI when we compile the final results of the challenge. You can find more details on the submission page itself.
NB: Participants will be allowed a limited number of submissions per system (once a day) – multiple submissions are allowed, but only the final one will be scored/counted.
You may submit a system description paper, describing the system's method, how it has been trained, the evaluation, and possibly an error analysis to understand strengths and weaknesses of the proposed system. The system description paper must be submitted as a PDF, consisting of a maximum of eight pages (for most description papers four to six pages will be sufficient) of content plus unlimited pages for bibliography. Submissions must follow the EMNLP 2024 two-column format, using the LaTeX style files or Word templates or the Overleaf template from the official EMNLP website. Please submit your system description papers here.
NB: System Description papers are reviewed in a single-blind review process. Thus, your manuscript may contain authors' names and information that would reveal your identity (e.g. team name, score, and rank at the shared task). Also note that at least one author of the system description paper will have to register as a reviewer for the FEVER Workshop.
The implementation of the Baseline system can be found on our Huggingface repository.
For the technical details of the implementation as well as the Baseline performance, please refer to the AVeriTeC paper.
The AVeriTec scoring is built on the FEVER scorer. The scoring script can be found on the AVeriTeC Dataset page.
For the AVeriTeC score following changes are made to the FEVER scorer:
The data are distributed in the .JSON format with one example per line (see http://jsonlines.org for more details).
Each example is an object of the following form:
id
: The ID of the sample. claim
: The claim text itself. label
: The annotated verdict for the claim. evidence
: A list of QA pairs. Each set consists of dictionaries with two fields (question, answer)
.
question
: The text of the generated question.answer
: The text of the answer of the generated question.