Every conversation is Groundhog Day for ChatGPT
Phil Connors -- Bill Murray's character -- was the original prompt engineer. Be like Phil (but less creepy).
Last week, a friend of mine was trying to figure out how to use ChatGPT-4, which she had heard from me was better than ChatGPT-3.5. Naturally, she thought ChatGPT could help her out:
Meanwhile, if you google her question, here’s what you get:
How confusing! Here’s the issue, which a lot of people do not understand:
ChatGPT knows basically nothing that has happened since September 2021. It does not have access to the internet. It does not remember its previous interactions with you or with anyone else.
To explain to my friend what was going on, I used ChatGPT to help me come up with a metaphor:
I’ve seen Groundhog Day and I haven’t seen 50 First Dates, so I decided to run with Groundhog Day for my analogy.
ChatGPT is like Rita in Groundhog Day
In the movie Groundhog Day, Bill Murray’s character Phil Connors discovers that he’s stuck in a time loop where he repeats Feb 2. He quickly figures out that he’s the only one he’s interacting with who can remember anything that happened on the previous iterations of Feb 2.
Creepily (an adverb that applies to the leading male character in basically every movie from the 90s), he decides that in this situation, his main goal in life is to get into the pants of his colleague Rita.
How does he go about achieving his goal?
Phil Connors’ prompt engineering strategy
By trial and error over the course of weeks, Phil learns how to be an effective Rita prompt engineer.
One day, he asks to buy Rita a drink, and she orders a vermouth with a twist. Now he knows what she likes. The next day, he orders a vermouth with a twist for himself, and Rita is like, what a coincidence!
Rita then asks him what they should drink to, and he says “the groundhog.” She says she usually drinks to world peace — and the conversation fizzles.
The next day, of course, Phil replays everything up to that question, but then proposes drinking to world peace. And now, she begins to act more like what he’s looking for!
RitaLLM and ChatGPT both have knowledge cutoff dates
Now, let’s conceive of Rita as a Large Language Model (LLM) trained on a large dataset: Her life experience up through February 1, in the unknown year that the movie takes place. Feb 1 is RitaLLM’s “knowledge cutoff date.”
ChatGPT’s knowledge cutoff date is September 2021. That is when the vast dataset of text that GPT-3.5 and GPT-4 were both trained was compiled. This “knowledge cutoff date” is a big difference between LLMs (large language models) like ChatGPT vs. traditional search engines, and can be a source of confusion.
When you are prompting ChatGPT, you are like Phil prompting RitaLLM. In Groundhog Day, Phil can get the same behavior out of RitaLLM day after day, as long as he prompts her identically.
Similarly, unlike a Google search, the results of a prompt to a particular ChatGPT model will be the same today as they were three months ago or three months from now (minus a small component of randomness).
One small note: OpenAI did teach each model some facts about itself on top of its main training dataset… but not very thoroughly! Which leads to some potentially confusing contradictions if you don’t understand what’s going on:
These questions about release dates of GPT-3.5 and how to use GPT-4 are questions that you should ask a search engine (like Google), not an LLM (like ChatGPT), because they pertain to events after Sept 2021.
(Now, I know some of you are thinking, “But what about Bing?” That is exactly the right question to ask, and you’re just going to have to wait til I get to that in another post :-) )
If you ask ChatGPT anything else about current events, you’re gonna be similarly out of luck if you ask ChatGPT. That having been said, usually it’s pretty good about adding a disclaimer about the knowledge cutoff. E.g.:
RitaLLM and ChatGPT’s “context windows”
Along with a knowledge cutoff date, RitaLLM and ChatGPT have another similarity: For each, there is a significant constraint on how much prompting you can do to get your desired outcome. In Groundhog Day, that constraint is a single calendar day. RitaLLM remembers everything that happened during the current iteration of Feb 2, but nothing that happened on previous iterations of Feb 2.
In other words, Phil has only one day to fit in all of his prompts to get the desired output (sex, ugh) out of RitaLLM.
In technical parlance, we would say that one day’s worth of interactions and conversation is RitaLLM’s “context window.” That’s all she can remember on top of the base RitaLLM model that was trained on Rita’s life experience up through February 1.
ChatGPT’s context window is measured in tokens, which for the time being you can think of roughly as words (though some words consist of more than one token). On top of what it learned during its training, GPT-4 can only remember the most recent 8,000 “tokens”1 (more or less 6000 words) of conversation with you, and only from within your current chat tab. (GPT-3 and 3.5 can only remember the most recent 4000 tokens.)
Now, 8000 tokens is actually a lot of text. I have yet to try to use ChatGPT to do something that requires a context window that large. But if you want to use ChatGPT to read and summarize something book-length, or for that matter help you write something book-length, you’ll have to use workarounds.
Be like Phil Connors (but less creepy)
Our Groundhog Day analogy doesn’t hold up in every regard, of course. Unlike Phil with RitaLLM, your goals are different each time you interact with ChatGPT; you’re not spending every single new chat as a do-over trying to get to a single perfect output. So it’s much less important for you to remember the details of any particular successful prompt, and more important to see patterns in what kinds of prompting work better than others to get your desired outcomes.
Also, ChatGPT is vastly more flexible and credulous than Rita: Phil has to work with a specific, relatively unmalleable human personality — who doesn’t believe him when he, for instance, tries to tell her how he’s repeating the same day over and over again. ChatGPT, on the other hand, will believe most things you tell it, and it will adopt a very wide range of personalities upon demand.
But I do think we can all learn from Phil’s enthusiasm and creativity as a prompt engineer. Figuring out how you work best with ChatGPT takes some trial and error — and it’s worth it!
GPT-4 actually comes in two different token size limits: 8K and 32K. Since OpenAI calls 8K the “standard”, I assume that ChatGPT’s interface with GPT4 is limited to 8K tokens, though I haven’t found a specific reference on that. (Note that GPT-3 comes with a 4K token limit.)