How can I use an LLM in my research?

Craig Messner, JHU (various)

Penn Libraries, April 9th 2026

The Real Power of LLMs

LLMs are succesful due to their multitask capability.

"Language Models are Unsupervised Multitask Learners"

Radford et al., 2019

LLMs outperform traditional NLP pipelines, while being natural-language tunable
via in-context learning.

The Classic NLP Pipeline

What if I want to extract Italian city names from text?

Input:

"Maria traveled from Rome to Florence last summer."
1. Tokenization:

["Maria", "traveled", "from", "Rome", "to", "Florence", "last", "summer"]
2. Part-of-Speech Tagging:

[NNP, VBD, IN, NNP, TO, NNP, JJ, NN]
3. Named Entity Recognition:

Maria: PERSON, Rome: GPE, Florence: GPE
4. Filter against Italian cities list:

Result: ["Rome", "Florence"]

This requires expertise, time, and sometimes custom models.

The LLM Paradigm

Prompt:

Extract all Italian city names from the following text. Text: "Maria traveled from Rome to Florence last summer." Italian cities:
Response:

["Rome", "Florence"]

Multitasking

Previously, NLP tasks required supervised training with sets of distinctly labeled data.

Summarization

Sentiment Analysis

Machine Translation

In-Context Learning

LLMs can even adapt to new tasks not seen during training.

By prompting in a zero-shot or many-shot fashion, we recruit the model's in-context learning ability.

No expensive gradient training required.

Interacting with LLMs as Products

The most common way users interact with LLMs is through a web-based chat interface.

What is the capital of France?
The capital of France is Paris.
What is its population?
Paris has a population of about 2.1 million.

The dialogic models offers few options for automatable, testable results.

Zero-Shot vs Many-Shot

Task: Extract US cities founded before 1830 and analyze sentiment.

Zero-Shot:
Extract US cities founded before 1830 and the sentiment of their context.

Text: "Boston was magnificent, but Denver felt dreary."

Output:
Many-Shot:
Extract US cities founded before 1830 and the sentiment of their context.

Text: "New York was thrilling but Seattle seemed dull."
Output: [{"city": "New York", "sentiment": "positive"}]

Text: "Baltimore seemed grim."
Output: [{"city": "Baltimore", "sentiment": "negative"}]

Text: "Boston was magnificent, but Denver felt dreary."
Output:
Expected Output:
[{"city": "Boston", "sentiment": "positive"}]

Denver (founded 1858) is correctly excluded.

Caveats and Roadmap

This is powerful, but also poses a unique challenge.

Output can be convincing but misleading.

Let's discuss:

  1. Determining if your project employ an LLM
  2. Preparing your data into a structured form
  3. Designing a prompt
  4. Extracting and evaluating your results
  5. Important caveats

Preparing Your Experiment and Data

A major advantage of LLMs is performing operations at scale.

Ask yourself:

  • Do I have a question decomposable into a repeatable, evaluatable action?
  • Do I have a large dataset that would support this question, or can I create one?
  • How should I structure and segment this data for LLM inference?

An Example

Use LLM inference to zero-shot identify the first publication date of historical printed works.

Dataset: Work-author combinations from WikiData.

Author Title Pub
Amelia Opie Adeline Mowbray; or, The Mother and Daughter
Barbara Hofland The Barbadoes Girl: A Tale for Young People

Each datapoint: a work-title pair with a blank for the publication date.

Exercise

Identify a dataset tied to your field.

  • What questions could this dataset support?
  • Does a structured version exist?
  • How would you decompose it into individual datapoints?
Note: This might involve splitting longer texts into smaller units (paragraph, sentence) or PDFs of images into single images.

Dataset Final Form

Data often represented in a semi-structured form.

CSV (Comma Separated Values):
Author,Title,Pub
Amelia Opie,"Adeline Mowbray; or, The Mother and Daughter",
Barbara Hofland,"The Barbadoes Girl: A Tale for Young People",
JSON (JavaScript Object Notation):
[
  {
    "author": "Amelia Opie",
    "title": "Adeline Mowbray; or, The Mother and Daughter",
    "pub": ""
  },
  {
    "author": "Barbara Hofland",
    "title": "The Barbadoes Girl: A Tale for Young People",
    "pub": ""
  }
]

These formats allow us to programmatically feed data alongside prompts into the LLM.

Designing a Prompt

Design a prompt with:

  • A consistent set of instructions
  • Slots for inserting information from a datapoint
  • A request for structured output
Determine the first date of 
publication of the following work by the given author. 
    
Return the date as a valid JSON object with the single field "date".

Author: {author}
Work: {title}

The placeholders {author} and {title} are filled programmatically for each datapoint.

Exercise

Design a prompt over the datapoints from your dataset.

Tips:

  • There are numerous prompt engineering techniques; prompting can be brittle
  • Plan how to associate your output with an input datapoint
  • How should validate that your output is in the requested form?

Passing Prompts to a Model

Three options for getting inference from an LLM:

Datapoint
API-based remote LLMs
Campus-hosted LLMs
Local inference

Consider generation hyperparameters for replicability.

Evaluating Your Results

How do we make sure the model has correctly performed our task?

We need to quantify the error rate.

Author Title Ground Truth Model Output
Amelia Opie Adeline Mowbray 1804 1804
Barbara Hofland The Barbadoes Girl 1816 1832
Consider: How would you quantify error for your experiment? What about complex tasks like sentiment analysis?

Ethics and Responsiblilty

Issues for the researcher to keep in mind:

  1. The potential for hallucinations
  2. The existence of bias
  3. Who sees our datapoints? (university policy may apply)
  4. Disclosure

Thank you, and questions