This session explores some of the technical features and challenges facing the OSP during the next year of development (and in the Hackathon). Topics range from Internet crawling and scraping, to choices regarding database architecture and the API, to the opportunities for (and limits of) machine-learning-based extraction of structured data from the documents.
● Dennis Tenen, Columbia University
● Miao Chen, HathiTrust Research Center
● Apoorv Agarwal, Columbia University
● Jian Wu, Citeseer, Penn State