Generation of Humorous Caption for Cartoon Images Using Deep Learning

Document Type


Degree Name

Master of Science (MS)


Computer Science

Date of Award

Spring 2018


Humor is complex and equipping artificial intelligence programs with the ability to recognize humor or generate it is still an ongoing area of research. Researchers had identified the linguistic features of humor. In this project, we used those identified features into construction and generation of humorous sentences through deep neural networks. Considering the recent success in deep learning methods in computer vision, we decided to use the growing technique of deep learning and linguistics in the generation of humorous natural language. To achieve the goal above, we trained a neural network model using a dataset of humorous captions submitted to the New Yorker caption contest, along with each relevant cartoon image as input. The output is a new variant of the humorous caption created by the program. These captions were predicted through a long short term memory (LSTM) recurrent neural network model. The cartoon images were fed into a parallel pre-trained convolutional network in order to extract image features. We used semi-supervised learning to help the program learn features from the example funny captions. Then we used those learned features from the trained model to later generate a humorous variant for a given cartoon image. The technique used here was sequence-to-sequence learning, where the inputs were a previous sequence of words that can be used to predict the next likeliest word in the caption generation task. In addition, we created a separate classifier that learned to classify captions as being funny or unfunny. We used a convolutional neural network (CNN) for this classification task. CNN architectures are predominantly used in computer vision tasks, but recent results have shown they can also be applied in a natural language domain for accurate classification performance. The evaluation model is a binary classification task, where the humorous captions submitted to the New Yorker caption contest and general descriptions of English sentences were used as training inputs for the classifier. We showed that our classifier achieves acceptable accuracy and that the captions generated by our generator network were classified as humorous ( M = 0.847).


Derek Harter

Subject Categories

Computer Sciences | Physical Sciences and Mathematics