Craig Messner
JHU CDH, CS, DSAI
April 2026
LLMs have been successful as chat assistants and agentic backbones, but the original excitement was their:
One approach: use LLMs to replace complicated NLP pipelines.
Example: Extract cities that existed before 1500 from text
If we are interested in LLMs from a philosophical or media perspective, we want to understand their functioning.
These questions belong to the ML field of interpretability
Focuses on discovering structure in the variance/invariance of output behavior under certain conditions.
In effect: probing using prompting
Behavioral methods offer a limited but accessible window into this internal space.
Seeks to "crack open" the black box of model representations.
You will need a way to perform inference with your chosen model.
Today: We will use LM Studio to explore key considerations, then move to llama.cpp for a fuller behavioral interpretability demo.
Outside of directly performing inference in Python, the dominant mode of access is as a web-served API, often structured in OpenAI format, even when running locally.
You will have a chat interface, but digging in will require making requests to these locally-running API endpoints.
Assuming a straightforward laptop situation, you need a model that fits in RAM, with some potential offloading to GPU.
Quantization is common to reduce memory requirements.
You can grab models from LM Studio, but it is good to know:
Our focus: The recently-released Gemma 4
Remember: the output distribution is a low-dimensional look into the model's feature space. But typical interactions reveal even less.
Effect of temperature on output distribution:
Output shaping and prompt-rerunning could also be occurring, making the relationship between backend changes and generation behavior proprietary.
What if we want a deeper view?
To start, we might want the log probabilities of the potentially generated tokens.
This would be a prerequisite for methods like fightin' words across contrasting corpora.
When we request a completion from our llama.cpp server, we can also request this information.
Start the llama.cpp server:
./llama-server -hf unsloth/gemma-4-E4B-it-GGUF --port 1234
Request a completion with log probabilities:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Replace the [MASK] in the following sentence with
the name of a real-life city. I am a science-fiction
writer from [MASK]"
}
],
"logprobs": true,
"top_logprobs": 5
}'
"logprobs": true
"top_logprobs": 5
Behavioral interpretability requires a systematic approach and multiple points of comparison.
Python is most useful for this, but perhaps tools like the one I will show you can or could exist.