Adversarial attacks against Fact Extraction and VERification
James Thorne, Andreas Vlachos
Contains guidelines for the FEVER 2.0 Shared Task
@misc{Thorne2019adversarial, title={Adversarial attacks against {Fact Extraction and VERification}}, author={James Thorne and Andreas Vlachos}, year={2019}, eprint={1903.05543}, archivePrefix={arXiv}, primaryClass={cs.CL} }
The FEVER2.0 Shared Task
James Thorne, Andreas Vlachos, Oana Cocarascu, Christos Christodoulopoulos and Arpit Mittal
@inproceedings{Thorne19FEVER2, author = {Thorne, James and Vlachos, Andreas and Cocarascu, Oana and Christodoulopoulos, Christos and Mittal, Arpit}, title = {The {FEVER2.0} Shared Task}, booktitle = {Proceedings of the Second Workshop on {Fact Extraction and VERification (FEVER)}}, year = {2018} }
Rank | System | Resilience (%) | FEVER Score (%) |
---|---|---|---|
1 | Papelo* | 37.31 | 57.36 |
2 | UCL MR* | 35.83 | 62.52 |
3 | Dominiks | 35.82 | 68.46 |
4 | CUNLP | 32.92 | 67.08 |
5 | UNC* | 30.47 | 64.21 |
6 | Athene* | 25.35 | 61.58 |
7 | GPLSI | 19.63 | 58.07 |
8 | Baseline | 11.06 | 27.45 |
9 | CalcWorks | DNQ | 33.56 |
Rank | Team | # Test Instances | # Breaks from Valid Instances |
Raw Potency (%) |
Correct Rate (%) |
Potency (%) |
---|---|---|---|---|---|---|
1 | TMLab | 79 | 402 | 78.80 | 84.81 | 66.83 |
2 | CUNLP | 501 | 2219 | 68.51 | 81.44 | 55.79 |
3 | NbAuzDrLqg | 102 | 401 | 79.66 | 64.71 | 51.54 |
4 | Baseline | 498 | 1976 | 60.34 | 82.33 | 49.68 |
DNQ | Papelo | - | - | 71.20 | 91.00 | 64.79 |
Rank | Team | (FEVER Score Before (%)) | (Resilience Before (%)) | FEVER Score (%) | Resilience (%) |
---|---|---|---|---|---|
1 | CUNLP | 67.08 | 32.92 | 68.80 | 36.61 |
The FEVER 2.0 Shared Task will build upon work from the first shared task in a Build it Break it Fix it setting. The shared will comprise three phases. In the first phase of the shared task, Builders build systems for solving the first FEVER shared task dataset. The highest scoring systems from the first shared task will be used as baselines and we will also invite new participants to develop new systems.
In the second phase, Breakers are tasked with generating adversarial examples to fool the existing systems. We consider only novel claims (i.e. not contained in the original FEVER dataset) with either Supports, Refutes or NotEnoughInfo labels. Supported or refuted claims should be accompanied with evidence from the Wikipedia dump used in the original task (claims with NotEnoughInfo as labels do not require evidence). The Breakers will have access to the systems to allow themselves to generate claims which are challenging for the builders. Alongside the labels and evidence for each claim, breakers will be asked to provide meta-information regarding the type of attack they are introducing. The breakers will be invited to submit up to a fixed number of claims as their entry to the shared task. We welcome both manual (through the use of our annotation interface) and automated methods for this phase. Half of the claims generated by the Breakers will be retained as a hold-out blind test set and the remaining half will be released to the participants to fix their systems. The blind set will be manually evaluated by the organisers for quality assurance.
In the final phase of the shared task, the original Builders or teams of dedicated Fixers must incorporate the new data generated by the Breakers to improve the systems' classification performance.
Builders will be creating system that can solve the original FEVER task. Participants in this category are also encouraged to participate as Fixers for their own systems.
The FEVER dataset can be found on our Dataset page. The page contains examples of the data structures for each of the three labels. Existing implementations for the FEVER 1.0 task can be found on the FEVER 1.0 Task page
Participants must submit their predictions to the new FEVERlab leaderboard for scoring. We also invite participants to make their systems available to the Breakers by creating a docker image (sample docker image) and submitting it to the FEVERlab page. The Shared Task organisers will host the docker images and keep them private by mediating access through the Shared Task server. Throughout the shared task, Builders should be able to provide support to Breakers or Fixers that use their system through the FEVER slack channel.
A baseline performance of builder systems will be measured using predictions against the FEVER test set -- these results will be displayed on the new FEVERlab leaderboard alongside the Codalab entries for the original FEVER 1.0 task. After Builders submit docker images and the Breakers have submitted adversarial instances, we will measure the builders' resilience to adversarial examples. The results will be presented in a new leaderboard.
The data format submitted by the Breakers (see below) will be the same as the FEVER 1.0 task.
Breakers will be generating adversarial claims in an attempt to break as many Builders' systems as possible. The adversarial claims can be generated manually or automatically, and participants are free to choose specific systems to target. All three types of claims are allowed (Supported, Refuted or NotEnoughInfo), but Supported and Refuted claims have to be accompanied by at least one evidence sentence (from the FEVER 1.0 pre-processed Wikipedia dump).
At the launch of the challenge, we will release additional annotation artefacts to support adversarial attacks. This will incorporate the mutations that were used to generate the FEVER claims.
Each adversarial claim submitted has to match the format of the FEVER 1.0 claims with the addition of the attack type field containing meta-information as to how the attack was generated. We will provide a list of expected values with the challenge launch e.g.:
{
"id": 78526,
"label": "REFUTES",
"claim": "Lorelai Gilmore's father is named Robert.",
"attack": "Entity replacement",
"evidence": [
[
[<annotation_id>, <evidence_id>, "Lorelai_Gilmore", 3]
]
]
}
See the definition of the FEVER 1.0 task for more details.
In order to register as a Breaker for the FEVER 2.0 task, each participant will have to submit a sample of 50 examples that will be manually evaluated by the organisers of the task by 30th April 2019. For the final submission, participants will have to submit a balanced dataset of up to 1000 examples, 50% of which will be given to Builders as development data, and the other 50% will manually evaluated for accuracy of claim labels and evidence as used as the final test set.
Breakers will be scored on the potency of the adversarial instances that they submit. This is an inverted FEVER Score based on the number of systems that incorrectly classify claims that meet the data guidelines. For a formal definition, read Section 3 of this paper: https://arxiv.org/abs/1903.05543.
Fixers will be working on correcting errors specific to types of (or individual) adversarial attacks. This round is open to everyone, regardless of participation in previous rounds. Builders are invited to submit improved systems based on breaker data alternatively, Fixers can collaborate with one or more Builders, using one of the published systems, and submit improved solutions as a new team.
The following FEVER2.0 systems are open to fixers: each builder has agreed to collaborate or release code.
The development dataset based on Breaker' submissions is now available from the FEVER 2.0 Dataset page. All submissions (except the rule-based baseline) have been manually annotated for correctness.
Systems will be provided a set of unlabeled claims and will be scored on their ability to correctly identify evidence and label the claim. The data format provided to the fixers will be the same as the FEVER 1.0 task: i.e. the attack metadata will not be provided to the systems at test time.
Fixers will be invited to submit a docker image following the same guidelines as the Builders. Both the predict.sh
batch mode and and web API for single instance prediction must be implemented. For more information about the web api, see existing systems, a sample submission, or the GitHub page.
The submission section in the FEVERlab page will be opened on the 20th of June.
Participants will be scored based on the improvement on the final test set of Breakers' adversarial examples as well as the score on the FEVER1.0 test set. The leaderboard will display all scores.