Type: Dataset
Tags: PMC, PubMed, ratarmount
Bibtex:
Tags: PMC, PubMed, ratarmount
Bibtex:
@article{, title= {ratarmount indexes for PMC OpenAccess subset}, journal= {}, author= {rngadam@coderbunker.com}, year= {}, url= {}, abstract= {## the problem PMC Open Access bulk article (commercial and non-commercial) is a hefty set of files that weight in compressed at 79G and uncompressed at 388G. Archive decompression time in itself can take hours. A bittorrent mirror exists on: https://academictorrents.com/details/06d6badd7d1b0cfee00081c28fddd5e15e106165 ## the solution ratarmount (https://github.com/mxmlnkn/ratarmount), a python application, allows us to use FUSE (through fusepy) to mount a compressed archive as a disk, allowing us randomly access files in the archive as a disk without first decompression. To achieve good performance, it creates an index (an sqlite database per archive). This set of indexes still weight in at 1.4G uncompressed (345M compressed). ## usage * decompress all indexes in the same directory you've downloaded oa_bulk * install ratarmount * use ratarmount to mount the oa_bulk archives on the disk a sample script ```mount.sh``` is provided as an example ## distribution we also use bittorrent to distribute the set of indexes. }, keywords= {PMC, PubMed, ratarmount}, terms= {}, license= {CC BY 4.0}, superseded= {} }