Arxiv Accepted to Findings of EACL.
We use compute and data efficient methods to pretrain a battery of historically-specific models over a relatively limited budget of tokens. We show that this approach leaks far less temporally-inapporpriate information than finetuning an existing LLM while retaining adequate performance to do lexical change detection.
Our approach can be used for any corpus with delineations between collections of works. Code is available at GitHub and models and data on HuggingFace