View the dataset here

Download Links

To replicate the results in the NAACL paper use the following files. When concatenated, they form the Shared Task Development Dataset above (19,998 claims).

Citation

FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

@inproceedings{Thorne18Fever,
    author = {Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit},
    title = {{FEVER}: a Large-scale Dataset for Fact Extraction and {VERification}},
    booktitle = {NAACL-HLT},
    year = {2018}
}

Data Format

The data is distributed in JSONL format with one example per line (see http://jsonlines.org for more details).

Training/Development Data format

The training and development data will contain 4 fields:

Below are examples of the data structures for each of the three labels.

Supports Example
{
    "id": 62037,
    "label": "SUPPORTS",
    "claim": "Oliver Reed was a film actor.",
    "evidence": [
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 0]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 3],
            [<annotation_id>, <evidence_id>, "Gladiator_-LRB-2000_film-RRB-", 0]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 2],
            [<annotation_id>, <evidence_id>, "Castaway_-LRB-film-RRB-", 0]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 1]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 6]
        ]
    ]
}
              
Refutes Example
{
    "id": 78526,
    "label": "REFUTES",
    "claim": "Lorelai Gilmore's father is named Robert.",
    "evidence": [
        [
            [<annotation_id>, <evidence_id>, "Lorelai_Gilmore", 3]
        ]
    ]
}
              
NotEnoughInfo Example
{
    "id": 137637,
    "label": "NOT ENOUGH INFO",
    "claim": "Henri Christophe is recognized for building a palace in Milot.",
    "evidence": [
        [
            [<annotation_id>, <evidence_id>, null, null]
        ]
    ]
}
              

Test Data format

The test data will follow the same format as the training/development examples, with the label and evidence fields removed.

{
    "id": 78526,
    "claim": "Lorelai Gilmore's father is named Robert."
}