OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized
eukaryote31 and Joshua Peterson and Aaron Gokaslan and Vanya Cohen

folder data-owt (395 files)
fileowt262.npz 40.76MB
fileowt1.npz 40.44MB
fileowt2.npz 40.58MB
fileowt3.npz 40.61MB
fileowt4.npz 40.61MB
fileowt5.npz 40.61MB
fileowt6.npz 40.62MB
fileowt7.npz 40.40MB
fileowt8.npz 40.56MB
fileowt9.npz 40.58MB
fileowt10.npz 40.57MB
fileowt11.npz 40.62MB
fileowt12.npz 40.58MB
fileowt13.npz 40.49MB
fileowt14.npz 40.56MB
fileowt15.npz 40.56MB
fileowt16.npz 40.53MB
fileowt17.npz 40.58MB
fileowt18.npz 40.57MB
fileowt19.npz 40.56MB
fileowt20.npz 40.55MB
fileowt21.npz 40.50MB
fileowt22.npz 40.64MB
fileowt23.npz 40.53MB
fileowt24.npz 40.59MB
fileowt25.npz 40.55MB
fileowt26.npz 40.66MB
fileowt27.npz 40.54MB
fileowt28.npz 40.54MB
fileowt29.npz 40.51MB
fileowt30.npz 40.57MB
fileowt31.npz 40.60MB
fileowt32.npz 40.54MB
fileowt33.npz 40.42MB
fileowt34.npz 40.70MB
fileowt35.npz 40.65MB
fileowt36.npz 40.67MB
fileowt37.npz 40.41MB
fileowt38.npz 40.55MB
fileowt39.npz 40.56MB
fileowt40.npz 40.56MB
fileowt41.npz 40.58MB
fileowt42.npz 40.60MB
fileowt43.npz 40.51MB
fileowt44.npz 40.51MB
fileowt45.npz 40.28MB
fileowt46.npz 40.60MB
fileowt47.npz 40.52MB
fileowt48.npz 40.50MB
Too many files! Click here to view them all.
Type: Dataset
Tags:

Bibtex:
@article{,
title= {OpenWebText (Gokaslan's distribution, 2019), GPT-2 Tokenized},
journal= {},
author= {eukaryote31 and Joshua Peterson and Aaron Gokaslan and Vanya Cohen},
year= {},
url= {},
abstract= {Code by eukaryote31 and Joshua Peterson: https://github.com/jcpeterson/openwebtext and https://github.com/eukaryote31/openwebtext

Scraped by Aaron Gokaslan and Vanya Cohen: https://skylion007.github.io/OpenWebTextCorpus/

Tokenized by eukaryote31},
keywords= {},
terms= {},
license= {},
superseded= {}
}

Hosted by users:

Send Feedback