Using Python and the IMDb API to find the best Star Trek episode

The new Star Trek series, Discovery, is just around the corner. It’s time to look back at the best episodes of the franchise so far. This tutorial will teach you how to use the IMDb API to get responses yourself.

What’s the best Star Trek episode?

TL; DR: Star Trek The Next Generation: The Best of Both Worlds: Part 1. The full top 10 IMDb ratings are presented in the table below.

Why did I use IMDb ratings? Well, this is the only database with ratings for every single episode. And there are quite a few appraisers. Each episode in the table below was rated by over 1000 people.

Episode	Series	average IMDb rating
The Best of Both Worlds: Part 1	TNG	9.4
The Inner Light	TNG	9.4
The Best of Both Worlds: Part 2	TNG	9.3
Trials and Tribble-ations	DS9	9.3
In the Pale Moonlight	DS9	9.3
The City on the Edge of Forever	TOS	9.3
Yesterday’s Enterprise	TNG	9.2
Mirror, Mirror	TOS	9.2
The Measure of a Man	TNG	9.1
The Visitor	DS9	9.1

Many trekkies won’t be surprised to hear that Star Trek: The Next Generation dominates the top 10. This series is generally regarded as the high point of Star Trek television. However, I was shocked to find that the worst Star Trek episode is from The Next Generation, too: Shades of Gray.

This made me wonder what the best Star Trek series is. From the table below, you can see that the five Star Trek series so far are extremely similar in terms of the average episode rating, whether measured in terms of the mean or the median. However, this does not correspond to viewers rating each series as a whole in which case The Next Generation is king.

Series	mean IMDb episode rating	median IMDb episode rating	IMDb series rating
The Original Series	7.5	7.5	8.4
The Next Generation	7.4	7.4	8.6
Deep Space Nine	7.5	7.5	7.9
Voyager	7.4	7.3	7.7
Enterprise	7.7	7.6	7.5

I suspect that a series, in order to be remembered as ‘generally good’, has to include a few stellar episodes. Fandom will forgive a few atrocious episodes but it won’t forgive unending mediocrity. The figure below shows the distribution of episode ratings. In such a density plot a high vertical value corresponds to many ratings aroung this point, similar to a histogram. It turns out that all the worst episodes are from The Next Generation. Fans forgave these mishaps, apparently.

The mediocrity of Voyager is well illustrated by the high peak just above 7 and no very bad or indeed very good episodes. This certainly mirrors my own experience: Voyager was not bad enough to tune out but gave you very little reason to really get engaged.

Data acquisition

The data acquisition for this post was very very similar to my previous Star Trek post. So, I won’t go into details here and simply link to the data acquisition script and the resulting data set on github. The script is well commented. Should you nonetheless have a question, just leave a comment.

Data visualisation: using ggplot and matplotlib

Somewhat similarly to my previous Star Trek post, I use python 2.7 and start by loading modules and data:

 import matplotlib.pyplot as plt  # for plotting  from matplotlib.cbook import get_sample_data  # for adding image to plot  from matplotlib.offsetbox import (OffsetImage, AnnotationBbox)  from ggplot import *  # for plotting  import pandas as pd  # for plotting with ggplot  import datetime as dt  # for date handling    # load data  df = pd.read_csv('C:UsersDesktoppythonIMDb_analysesStar TrekStar_Trek_data.csv')  df['date'] = pd.to_datetime(df['date'])   

For the scatter plot (first figure above), I start with ggplot for the basic figure without images.

 p = ggplot(aes(x='date', y='rating', colour='title'), data=df) + geom_point() + theme_bw()  # basic plot  p = p + ylim(1, 10) + scale_x_date(labels='%Y') + xlab('Date') + ylab('Mean IMDb rating')  # make axes pretty  p = p + ggtitle('Star Trek episode ratings')  # add title   

Then, I export the figure to use it with matplotlib which allows me to add graphics. First, I add my author tag.

 p.make()  # exporting the figure to use it in matplotlib  plt.text(dt.datetime(2003, 1, 1), 1.2, '@ri')  # keep figure open for this to work   

Then, I add star ships to show which colour corresponds to which series. I found the star ship images online and put them on my github for your convenience. Just download them, put them in a folder and add the path in line 2 of the function.

 def add_starship(ax_, ship, xy, imzoom):      fn = get_sample_data("PATH TO IMAGE FOLDER\" + ship + ".jpg", asfileobj=False)      arr_img = plt.imread(fn, format='jpg')      imagebox = OffsetImage(arr_img, zoom=imzoom)      imagebox.image.axes = ax_      ab = AnnotationBbox(imagebox, xy,                          xybox=(0., 0.),                          boxcoords="offset points",                          pad=-0.5)  # hide enclosing box behind image      ax_.add_artist(ab)      return ax_    ax = plt.gca()  # get current axes (axes is like the drawing area apparently)  ax.legend_.remove()  # remove legend    # add images of star ships  add_starship(ax, 'TOS', [dt.datetime(1972, 1, 1), 2], 0.1)  add_starship(ax, 'TNG', [dt.datetime(1989, 1, 1), 2.5], 0.1)  add_starship(ax, 'DS9', [dt.datetime(1994, 1, 1), 2], 0.15)  add_starship(ax, 'VOY', [dt.datetime(1999, 6, 1), 2.6], 0.1)  add_starship(ax, 'ENT', [dt.datetime(2002, 1, 1), 2], 0.1)   

Now, we can just annotate the three episodes which stand out and we are nearly done.

 #most delayed episode  max_TOS_date = max(df[df['title'] == 'Star Trek']['date'])  ax.annotate('That TOS episode nwhich was not aired', xy=(max_TOS_date, df[df['date'] == max_TOS_date]['rating']),              xytext=(dt.datetime(1975, 1, 1), 8),              arrowprops=dict(facecolor='black', shrink=0.05))    #worst episode  min_rating = min(df['rating'])  episode_date = min(df[df['rating'] == min_rating]['date'])  # extract time stamp with min function  episode_name = df[df['rating'] == min_rating]['episode']  ax.annotate('The worst episode:n' + episode_name.iloc[0], xy=(episode_date, min_rating),              xytext=(dt.datetime(1980, 1, 1), 4),              arrowprops=dict(facecolor='black', shrink=0.05))    #best episode  max_rating = max(df['rating'])  episode_date = min(df[df['rating'] == max_rating]['date'])  # extract time stamp with min function  episode_name = df[df['rating'] == max_rating]['episode']  ax.annotate('The best episode:n' + episode_name.iloc[0], xy=(episode_date, max_rating),              xytext=(dt.datetime(1975, 1, 1), 9.7),              arrowprops=dict(facecolor='black', shrink=0.05))   

Finally, we optimize the figure dimensions to social media standards and save it all.

 fig = plt.gcf()  # get current figure  fig.set_size_inches(1024 / 70, 512 / 70)  # reset the figure size to twitter standard  fig.savefig('PATH TO FOLDER/IMAGE NAME.png', dpi=96,              bbox_inches='tight')   

Regarding the first table, the pandas module makes it very easy to get the best ten episodes. Just call:

 df.sort_index(by = ['rating'], ascending = False)[:10]   

We follow the exact same strategy for the density plot. We first make a density plot with ggplot, then turn to matplotlib to add images of star ships. This should now be relatively straight forward.

 #  start plotting using ggplot  p = ggplot(aes(x='rating', colour='title'), data=df) + geom_density() + theme_bw()  # basic plot  p = p + xlim(1, 10) + xlab('IMDb rating') + ylab('Density')  # make axes pretty  p = p + ggtitle('Star Trek episode ratings')  # add title  print p  # in case you want to see what has been made in ggplot    # continue with matplotlib for annotations and adding images  p.make()  # exporting the figure to use it in matplotlib  plt.text(9.5, 0, '@ri')  # keep figure open for this to work    ax = plt.gca()  # get current axes (axes is like the drawing area apparently)  ax.legend_.remove()  # remove legend    # add images of star ships  add_starship(ax, 'TOS', [6.85, 0.52], 0.05)  add_starship(ax, 'TNG', [5.38, 0.2], 0.08)  add_starship(ax, 'DS9', [8.305, 0.52], 0.15)  add_starship(ax, 'VOY', [7, 0.58], 0.1)  add_starship(ax, 'ENT', [9.05, 0.4], 0.05)    fig = plt.gcf()  # get current figure  fig.set_size_inches(1024 / 70, 512 / 70)  # reset the figure size to twitter standard  fig.savefig('PATH TO FOLDER/IMAGE NAME.png', dpi=96,              bbox_inches='tight')   

Like this post? Share it with your followers or follow me on Twitter!

What’s the best Star Trek episode?

Data acquisition

Data visualisation: using ggplot and matplotlib

Leave a Comment Cancel Reply