Type: Dataset
Tags: United States, gov, loc, banneker
Bibtex:
Tags: United States, gov, loc, banneker
Bibtex:
@article{, title= {loc.gov-benjamin-banneker}, journal= {}, author= {Library of Congress}, year= {}, url= {}, abstract= {A mirror of https://guides.loc.gov/benjamin-banneker/digital-resources and all collections linked from that page: * African American Perspectives: Materials Selected from the Rare Book Collection https://www.loc.gov/collections/african-american-perspectives-rare-books/ * George Washington Papers https://www.loc.gov/collections/george-washington-papers/ * Printed Ephemera: Three Centuries of Broadsides and Other Printed Ephemera https://www.loc.gov/collections/broadsides-and-other-printed-ephemera/ * Thomas Jefferson Papers, 1606 to 1827 https://www.loc.gov/collections/thomas-jefferson-papers/ Data captured between 2025-03-23 and 2025-03-25. All data uncompressed is about 170 GiB. Table of contents: * banneker.html: Barebones mirror of https://guides.loc.gov/benjamin-banneker/digital-resources * index/: search results pages for each collection. each page links to many pages in items/. * items/: detail pages like www.loc.gov/item/123abc. each page links to many downloaded files. * pdfs/, jpgs/, ...: all downloaded files segmented by filetype. others/ contains all remaining filetypes of which there are few, such as XML, JP2, HTML, ... Note that many of the files are duplicates in different formats. It was determined that it's best to download all versions than to figure out which version is highest-res and covers all pages of e.g. a scanned book, and that sorting out duplicates would provide negligible storage cost savings. }, keywords= {gov, united states, loc, banneker}, terms= {}, license= {}, superseded= {} }