Download Links

To replicate the results in the NAACL paper use the following files. When concatenated, they form the Shared Task Development Dataset above (19,998 claims).

Citation

FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

@inproceedings{Thorne18Fever,
    author = {Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Mittal, Arpit},
    title = {{FEVER}: a Large-scale Dataset for Fact Extraction and {VERification}},
    booktitle = {NAACL-HLT},
    year = {2018}
}

Data Format

The data is distributed in JSONL format with one example per line (see http://jsonlines.org for more details).

Training/Development Data format

The training and development data will contain 4 fields:

id: The ID of the claim
label: The annotated label for the claim. Can be one of SUPPORTS|REFUTES|NOT ENOUGH INFO.
claim: The text of the claim.
evidence: A list of evidence sets (lists of [Annotation ID, Evidence ID, Wikipedia URL, sentence ID] tuples) or a [Annotation ID, Evidence ID, null, null] tuple if the label is NOT ENOUGH INFO.
(the Annotation ID and Evidence ID fields are for internal use only and are not used for scoring. They may help debug or correct annotation issues at a later point in time.)

Below are examples of the data structures for each of the three labels.

Supports Example

{
    "id": 62037,
    "label": "SUPPORTS",
    "claim": "Oliver Reed was a film actor.",
    "evidence": [
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 0]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 3],
            [<annotation_id>, <evidence_id>, "Gladiator_-LRB-2000_film-RRB-", 0]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 2],
            [<annotation_id>, <evidence_id>, "Castaway_-LRB-film-RRB-", 0]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 1]
        ],
        [
            [<annotation_id>, <evidence_id>, "Oliver_Reed", 6]
        ]
    ]
}

Refutes Example

{
    "id": 78526,
    "label": "REFUTES",
    "claim": "Lorelai Gilmore's father is named Robert.",
    "evidence": [
        [
            [<annotation_id>, <evidence_id>, "Lorelai_Gilmore", 3]
        ]
    ]
}

NotEnoughInfo Example

{
    "id": 137637,
    "label": "NOT ENOUGH INFO",
    "claim": "Henri Christophe is recognized for building a palace in Milot.",
    "evidence": [
        [
            [<annotation_id>, <evidence_id>, null, null]
        ]
    ]
}

Test Data format

The test data will follow the same format as the training/development examples, with the label and evidence fields removed.

{
    "id": 78526,
    "claim": "Lorelai Gilmore's father is named Robert."
}