Polish dataset
To recreate the German dataset, follow these steps:
- Download TIGER treebank files (requires accepting a licence):
- treebank XML (tigercorpus-2.2.xml.tar.gz),
- metadata (TIGER2.2.doc.zip),
- dependency version (tigercorpus-2.2.conll09.tar.gz).
- Convert XML to JSON with this script.
- Produce headed constituencies using this notebook.
- Generate a dataset using these scripts.