Research

My research combines machine learning with traditional literary scholarship, and has focused on metric learning, the adaptation of large language models to historical domains, and analyzing orthographic variation in literary works. I am broadly interested in the set of topics that emerge at the nexus of literary scholarship, linguistics, and computational language modeling, particularly literary theory, semantics, pragmatics, and artificial neural networks.

Some recently published papers are listed below:

Pretraining Language Models for Diachronic Linguistic Change Discovery

Arxiv Currently under review

Transferring Extreme Subword Style Using Ngram Model-Based Logit Scaling

Arxiv ACL Anthology Presented at NLP4DH 2025 @ NAACL

Previously popular language modeling techniques (Ngram, Bengio-style neural modeling) have once again entered the spotlight of current research. Recent projects have both updated them using now-ubiqitous modeling strategies and used them to supplement their now-ubiquitous cousin, the pretrained LLM. I employ the latter approach to transfer extreme versions of linguistic style at generation time. This approach results in an efficient and accurate means of transfer that does not rely on (potentially meager) in-weight knowledge or (potentially brittle) prompting.

More ...

Examining Language Modeling Assumptions Using an Annotated Literary Dialect Corpus

Arxiv ACL Anthology Presented at NLP4DH2024 @ EMNLP

Using the BERT and CANINE series of embedding models for literary investigation led us to a series of experiments that tested their ability to embed orthographic information important to the study of works that employ “non-standard” versions of written English.

More ...

Pairing Orthographically Variant Literary Words to Standard Equivalents Using Neural Edit Distance Models

Arxiv ACL Anthology Presented at LaTeCH-CLFL 2024 @ EACL

Textual style is composed of a number of features. While modifications of these features are common in many discourses, literary style employs a number of variations uncommon in other registers. This includes modifying the orthographic elements of “standard” words in order to indicate some form of difference on the character or authorial level.

More ...