Discussion Forum (Archived)
Guest
Moderately technical article about books and Big Data.
https://cloudplatform.googleblog.com/2016/02/what-it-looks-like-to-process-3.5-million-books-in-Googles-cloud.html
This past September I published into Google BigQuery a massive new public dataset of metadata from 3.5 million digitized English-language books dating back more than two centuries (1800-2015), along with the full text of 1 million of these books. The archive, which draws from the English-language public domain book collections of the Internet Archive and HathiTrust, includes full publication details for every book, along with a wide array of computed content-based data. The entire archive is available as two public BigQuery datasets, and there’s a growing collection of sample queries to help users get started with the collection. You can even map two centuries of books with a single line of SQL.
Martha Barnette
Grant Barrett
Grant Barrett
1 Guest(s)