Download Links

Citations

FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information
Rami Aly, Zhijiang Guo, Michael Sejr Schlichtkrull, James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Oana Cocarascu, Arpit Mittal

@inproceedings{Aly21Feverous,
    author = {Aly, Rami and Guo, Zhijiang and Schlichtkrull, Michael Sejr and Thorne, James and Vlachos, Andreas and Christodoulopoulos, Christos and Cocarascu, Oana and Mittal, Arpit},
    title = {{FEVEROUS}: Fact Extraction and {VERification} Over Unstructured and Structured information},
    eprint={2106.05707},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    year = {2021}
}

You can also cite the dataset directly using its DOI: https://doi.org/10.5281/zenodo.4911508

Data Format

The data (Annotations and Wikipedia pages) are distributed in the JSONL format with one example per line (see https://jsonlines.org for more details).

Training/Development Data format

The training and development data contains 5 fields:

Below are two examples of the data structure.

SUPPORTS Example

        {
          "id": 33670,
          "label": "SUPPORTS",
          "claim": "Wolfgang Niedecken is a german rock musician who founded the Kölsch speaking rock group BAP at the end of the 1970s",
          "evidence":
            {
              "content": [""Wolfgang Niedecken_sentence_0", "Wolfgang Niedecken_cell_0_4_1", "Wolfgang Niedecken_sentence_1"]
              "context":
                {
                  "Wolfgang Niedecken_sentence_0": ["Wolfgang Niedecken_title"],
                  "Wolfgang Niedecken_cell_0_4_1":
                    [
                      "Wolfgang Niedecken_title", "Wolfgang Niedecken_header_cell_0_4_0", "Wolfgang Niedecken_header_cell_0_1_0", "Wolfgang Niedecken_header_cell_0_0_0"
                    ],
                  "Wolfgang Niedecken_sentence_1": ["Wolfgang Niedecken_title"]
                }
            }
          "annotator_operations":
            [
              {
                "operation": "start",
                "value": "start",
                "time": 0
              },
              {
                "operation": "search",
                "value": "Wolfgang Niedecken",
                "time": 12.654
              },
              {
                "operation": "Now on",
                "value": "Wolfgang Niedecken",
                "time": 13.547
              },
              {
                "operation": "Highlighting",
                "value": "Wolfgang Niedecken_sentence_0",
                "time": 20.926
              }
              ...
            ],
          "expected_challenge": "Combining Tables and Text"
          "challenge": "Combining Tables and Text"
        }
            
NOT ENOUGH INFO Example

        {
          "id": 35206,
          "label": "NOT ENOUGH INFO",
          "claim": "As of December 2020, the most expensive aircraft of the Korean Air fleet is the Boeing 777-300ER.",
          "evidence":
            {
              "content": ["Korean Air_cell_1_19_0", "Boeing 777_cell_0_11_1"]
              "context":
                {
                  "Korean Air_cell_1_19_0":
                      [
                      "Korean Air_title", "Korean Air_section_10", "Korean Air_section_11", "Korean Air_header_cell_1_0_0"
                      ],
                  "Boeing 777_cell_0_11_1":
                      [
                      "Boeing 777_title", "Boeing 777_header_cell_0_11_0", "Boeing 777_header_cell_0_0_0"
                      ]
                }
            }
          "annotator_operations":
            [
              {
                "operation": "start",
                "value": "start",
                "time": 0
              },
              {
                "operation": "search",
                "value": "Boeing 777-300ER",
                "time": 19.391
              },
              {
                "operation": "Now on",
                "value": "Boeing777",
                "time": 21.531
              },
              {
                "operation": "search",
                "value": "Korean Air fleet",
                "time": 62.33
              }
              ...
            ],
        "expected_challenge": "Numerical Reasoning"
        "challenge": "Multi-hop Reasoning"
        }
            

Wikipedia Data format

Each Wikipedia article contains 2 base fields:

Each element specified in order is a field. A sentence field contains the text of the sentence.

A section element is a dictionary with following fields:

A table element is a dictionary with following fields:

A list element consists of following fields:

Hyperlinks in text are indicated with double square brackets. If an anchor text is provided, it is the text on the right hand side of a vertical bar in the square backets

Wikipedia Article Example

        {
          "title": "Aare", # Article title
          "order": [ "sentence_0", "table_0", "section_0", "sentence_1", "list_0" ]
          "sentence_0": "This article is about a river in \[\[Switzerland\]\]
          "table_0":
            {
            	"type": "infobox"
            	"table":
            	    [ # Contents of the table
            	    	[ # Each row is encoded in seperate list
            		    {
              		    "id": "header_cell_0_0_0",
              		    "value": "Location",
              		    "is_header": true,
              		    "row_span": 1,
              		    "column_span: 1
            		    }
            		    {
            		    ...
            		    }
            		]
            		[
            		    {
            		    "id": "cell_0_1_0",
            		    "value": "Koblenz",
            		    "is_header": false,
            		    "row_span": 1,
            		    "column_span: 1
            		    }
            		    ...
            		]
            	    ]
            }
          "list_0"
          {
          	"type": "unordered_list" #either unordered_list or ordered_list
          	"list":
          	    [ # Contents of the list
          	    	{
          	    	"id": "item_0_0", # numbers indicate list, and item, respectively
          	    	"value": ...,
          	    	"level": 0,
          	    	"type": ordered_list
          	    	}
          	    	{
          	    	...
          	    	}
          	    ]
         "section_0":
          {
          	"value": "Course"
          	"level": 1 # Level of section
          }
        }