Exploring the GPT-2 Language Model

Mar 1, 2019 00:00 · 658 words · 4 minutes read

OpenAI recently released the GPT-2 language model with incredible performance. It can generate very good quality sentences by predicting the most likely word given all previous words.

GPT what?

GPT-2 (Generative Pre-Training) is a language model based on the transformer architecture which connects the encoder and decoder through an attention mechanism. OpenAI choose not to release their large model with over a billion parameters, but they did release a smaller version on GitHub. You can download the code and the pretrained weight to explore the magic of GPT-2.

Unconditioned Samples

By running GPT-2 model without any input, it will simply spur out random sentences, here is one example.

RBP and D SheepER is the world’s most popular bedding and shower curtains and cooler kit. For days! ABOUT RBP - Even some of the most sought-after enthusiast components of RAPIDS features the real world RCP coming soon! VR Machines will always overlook the classic days of all time, as RCP and D sheepers enjoy bliss and have a refreshed baby-taking consciousness!

The model does capture a somewhat correct grammatical structure and punctuation, but the content makes no sense at all.

Interactive Prompt

Merely generating the next word is kinda boring, by given a pre-written prompt, GPT-2 can generate more meaningful content and even answer questions.

I gave GPT-2 the prompt Zihao Zhang is the most to see how it would describe me and got surprisingly good results.

Zihao Zhang is the most interesting person on YouTube and is in many ways the most interesting character on the online streaming service. If you are curious to know more about his character, check out his official YouTube channel: http://www.youtube.com/user/XuL_Zihao

Zihao Zhang is the most respected person in the province. He’s not a person who’s been to Beijing before, but still has an impressive resume.

Zihao Zhang is the most successful Chinese restaurant owner in the city of Beijing, but he also recently won the prestigious New York “Gina” award over his culinary abilities.

It almost sounds like GPT-2 knows who Zihao is and is giving some legitimate description of Zihao. Interestingly, the youtube channel given by GPT-2 does not exist. That means GPT-2 is not just copy pasting content but correctly identified the connection between a person’s name and the structure of common youtube channel URLs. And in the next two examples, GPT-2 made the connection that a person named Zihao Zhang is probably related to Beijing in some way (Spooky! I was born in Beijing). Sometimes the model started using Zhao, Zhu as my name in the sentences, so I guess it is just saying correlated things but failed to understand that a subject’s name shouldn’t change.

I also tried to prompt a more mathematical question like one plus one equals

one plus one equals one plus one (one minus one)

same question in alternative representation 1 + 1 =

1 + 1 = [0.134814]

I guess GPT-2, at least the small version, can’t really add numbers.

Most of the time, the first few sentences following the prompt kind of make sense. However, the longer the paragraph gets, the more unrelated it becomes. So it seems like GPT-2 still can’t capture long term dependency really well.

Bonus Prompt: Donald Trump is

Donald Trump is to be found among the alt-right, and I am confident that his influence will not be diminished by his use of the term.

GPT-2 made the connection between Donald Trump and the alt-right, Lulz.


GPT-2 is definitely a big step forward in language modeling and machine learning. The text generated by the unreleased large model seems to be a lot better than the small model I tested with. We still have a long way to go in order to achieve some form of AGI, but it feels like we are getting closer than ever. I whole heartly strongly recommends Scott Alexander’s article, if you want a more in-depth review on the philosophical implication of GPT-2.