Natural Language Generation - Generating Puns

Natural Language Generation (NLG) is characterised as “the sub-field of artificial intelligence and computational linguistics that is concerned with the construction of computer systems that can produce understandable tests in English or other human languages from some underlying non-linguistic representation of information” (Reiter & Dale, 1997). NLG is one of the earliest topics that researchers started taking interest in.

Following are the most frequent applications of NLG systems:

  1. Content determination: Determining what kind of information to include while preparing a document
  2. Text structuring: Determining the sequence in which information is supposed to appear
  3. Sentence aggregation: Deciding information which should be displayed together
  4. Lexicalisation: Finding the right words and phrases to express information
  5. Referring expression generation: Selecting the words and phrases to identify domain objects
  6. Linguistic realisation: Combining all words and phrases into well-formed sentences

All of these topics are fairly complex, and most applications usually involve a combination or all of the above mentioned areas.

So, in this blogpost I’ll be talking about generating puns using Natural Language Generation which involves some parts of all the above stated applications.

Generating puns falls under the field known as computational humor, which is a branch of computational linguistics focused on humor research (there have also been dedicated conferences for this field!).

For those of you who don’t know what puns are, here are a few punny ones-

I was going to make myself a belt made out of watches, but then I realized it would be a waist of time.

What do you get when you cross a murderer with a breakfast food? A cereal killer.

As illustrated in the above examples, making puns involves using the fact that there are different meanings for a given word or that there are words that sound similar but hold different meanings.

While currently there are a couple of innovative and more truly “natural” methods for generating puns, the most basic and popular method for generating sentences (and puns) is using a template approach, in which the main context of the sentence is set, while the fillers are derived by the generation model being used. For example, a template might be of the following form: “The population of the [country] is [number]“, where [country] and [number] are filled in by the model at the run-time. It seems tedious and a little less galmorous, but a large part of the amazing results we see or hear are based on such rule-based methods.

One of the most popular models developed in this field is JAPE (Joke Analysis and Production Engine). JAPE is designed to output question-and-answer type puns from a general lexicon. While seemingly easy at first, the model gets fairly complex, with a number of rules being used for coming up with a valid question and sound answer. An example output of this program is:

Q. What do you call a cry that has pixels? A. A computer scream

The original paper on this model by Kim Binsted and Graeme Ritchie can be found here.

For the sake of this post, I will be only considering the case of being able to generate words that sound similar to the an input sentence, while still making sure that the sentence makes sense. This is the most basic method, and can be built upon to make more complex mechanisms.

The actual idea is pretty simple and the code is similar to the one used by Max Schwartz for his talk at PyGotham 2017, which I’ve linked below. I take in a sentence as an input, and barring the stop words, for every other word, I look through a list for words with similar pronounciation and the same POS tag, and then replace it with the best fit. Only those words are considered whose edit distance from the word that is being replaced is below a threshold. Varying this threshold changes how similar the new sentence sounds to the old one.

The next step was to generate a list containing the pronounciation and the most likely POS tag for each word in our dictionary. This was done using ARPABET, which is a pronouncing dictionary managed by CMU. Storing the POS tag helps in picking up words that similar sounding to the word we try to replace.

And that is pretty much it. With a few lines of code, I was happy with the results I was getting. An example:

Input: I am a serial killer.

Output: I am a cereal tiller.

The implementation was pretty simple, and hence the output is, for most of the time, not that great. Of course, increasing the complexity can increase the kind of outputs we get.

The iPython notebook accompanying this post can be found here.

References:

  1. Building a “Pun Generator” in Python - https://www.youtube.com/watch?v=6gJKxe5zPXM
  2. PyGotham Talks - https://github.com/M0nica/PyGotham-2017-Talks
  3. An implementation - https://github.com/maxwell-schwartz/PUNchlineGenerator
  4. Computational Humor Seminar - https://www.cse.iitb.ac.in/~vipulsingh10/me/ComputationalHumourSeminar.pdf
Avatar
Ritik Dutta
Computer Science & Engineering Undergraduate