Fame, fortune and words
Digitising books is a problematic endeavour, not because of the technology, but due to the ethical issues that arise. Once a book becomes digitised problems with copyright and sales enter the frame. Nevertheless, Google Books has digitised more than fifteen million books which is about twelve per cent of all the books ever published. The digitising process has enabled a detailed analysis of the books’ content that would not be possible if done by individuals and some remarkable findings have resulted.
The analysis that has taken place was done on five million of the fifteen million digitised books. That meant that approximately 500 billion words were in the final data set. Digging into the words used in those books across time has revealed some fascinating facts about our evolving culture.