Literature is not data.1
Author Stephen Marche defends this position in a 2012 essay, Literature Is not Data: Against Digital Humanities. He disparages the approach of the digital humanities:
But there is a deeper problem with the digital humanities in general, a fundamental assumption that runs through all aspects of the methodology and which has not been adequately assessed in its nascent theory. Literature cannot meaningfully be treated as data. The problem is essential rather than superficial: literature is not data. Literature is the opposite of data.
Therefore, Literature must—the author is being imperative—remain foreign to the dominant paradigms, such as software performance and economic productivity. These aim universal quantification (numbers are handled much more fluently by machines), disambiguation (thanks to strict syntax—programs fail on the first ill-placed comma), mode (specific but reductive schemas), indexation (what can be searched and what can not).
The opacity of language, with all due respect to corporate managers who claim “to order the world’s information,” is one of the irreducible aspects of Literature:
The experience of the mystery of language is the original literary sensation.
We will surely never exhaust the meanings of literary texts. To formalize some parts, perhaps—it is the never ending quest of the humanities. But to data-ify Literature as a whole, this is neither conceivable (nor appropriate) with the current technical means.2