Simulated Humanist Mind


LDA DOA? LDA Topic Modelling Versus Topic Mapping

I apologize for the sensationalist (and somewhat senselessly constructed) title, but what is a blog for if not to craft such things?   One of the pillars of the emerging DH scene has been the use of Latent Dirichlet Allocation based topic modelling. For those not in the know, this is a machine learning technique that uses properties of words and documents to discover “topics” that occur in the texts.  For example, if your corpus contains a significant amount of documents about pet care, you might expect the algorithm to return a topic full of cat-related words. This is the barest of explanations — if you’d like to know more please see either the Blei paper (here: for a more technical introduction.  If I remember correctly Matt Jockers also has an introduction to the concept aimed at humanists in Macroanalysis.   No matter your level of familiarity the important facet is that LDA topic modelling has found use in numerous projects within (and without) the humanities.  And now at least one generally accepted part of it is under a sort of attack.   The salvo was launched in this paper: A high-reproducibility and high-accuracy method for automated topic classification by a group of scientists working out of Northwestern University 1   So what’s the issue? According to Lancichinetti et. al. the tests they ran on standard LDA-based topic modelling algorithms indicate that:   “PLSA and the standard optimization algorithm implemented with LDA (variational inference) are systematically unable to find the global maximum of the likelihood landscape…these algorithms have surprisingly low accuracy and reproducibility, especially when topic sizes are unequal. Taken together, the results in this section clearly demonstrate that the approach taken by standard topic-model algorithms for exploring the likelihood landscape is extremely inefficient, whether one starts from random […]

Posted by Craig on February 24th, 2015


Zipf’s Law, Style and the Literary: Stein and Experimentalism

(This is part 5 of my series on Zipf’s Law. For part 4 see here) Stein and Experimentalism In my last post, I promised to offer evidence that my technique employed thus far is non-vacuous, that it can account for variety across different texts. To test this theory, one could do much worse than leap 200 years hence from Puritan shores to the cosmopolitan work of Gertrude Stein. The combination of Stein’s American origins and European residence produced one of modernism’s more distinct aesthetics, especially in the case of Stein’s “middle” experimental works like The Making of Americans, A Novel of Thank You, and Lucy Church Amiably. Characterized by iteration, permutation, and unresolved referentiality, these works form a sort of allusive stew that challenges reading practices that ascribe overly deep meaning to the texts while still offering some sort of meaning-carved handhold onto which the critical reader can cling. Stein’s usage of this style has been (unsurprisingly) contextualized a myriad of ways. Two particular efforts stick out as useful to our agenda. The first comes from critic Carolynn Van Dyke who in her article “Bits of Information and Tender Feeling” argues that Stein’s work presages the techniques computer generated literature. Comparing Lucy Church to the work man-machine collaborative text generator Racter, Van Dyke pronounces both samples examples of “schizophrenic” texts that consist of “fragmentary and deformed syntax,” “erratic allusions” and the “violation of discursive norms of relevance and informativeness.” 1 A striking statement, given that Cancho’s framework provides expected values for the written work of schizophrenics. Processing a corpus primarily composed of Stein’s middle period experimental works reveals strikingly low values: 2 Almost all of the works dip well below the common value of ~2. As noted above this same phenomenon does not occur in the writings of similarly experimental […]

Posted by Craig on February 17th, 2015