Game of Thrones

The prominence (and end) of the top 15 female and male characters in Game of Thrones over 8 seasons

Game of Thrones is an extremely succesfull series that has been the subject of several detailed studies of its content and portrayals. I am usually interested in the depiction of autistic and disabled characters, but Game of Thrones offers a great opportunity to compare my own techniques and visualizations with those created by others.

The speech rate in words per minute, as determined from the number of words and episode duration in fan-made subtitle (.SRT) files

There are a lot of words in Game of Thrones – almost half a million in total (including speaker identification, cuts and scene descriptions). One interesting analysis by mrquart identified a significant fall in the word rate, even though the episode lengths rose over the 8 seasons. I downloaded subtitles (from www.opensubtitles.com) as a verification for the word structures in fan-made scripts for the later analyses here, and replicated mrquart’s findings. The following code is an approximation of the words and duration of each episode (assuming that the last subtitle occurs near the end of the episode, and that every sequence of characters surrounded by whitespace is a “word”):

# Return the time-code of the last subtitle in each .SRT file:
for file in .srt ; do echo "$file: " ; grep "^[0-9]:" "$file" | sort | tail -1 ; done

# Return the line, word and character count in each .SRT file:
for file in .srt ; do wc=$(sed -e 's/\r//' -e 's/<[^>]>//g' "$file" | grep -v -e '^[0-9]+$' -e '[0-9]+:' -e '^$' | wc) ; echo "$wc $file"; done

Looker produced a detailed statistical analysis of the (lack of) prominence of women on screen throughout Game of Thrones, including a juicy study of Death, Sex and Dialogue. As with most television, this show is produced by (older, white) men, for consumption by (older, white) men because those are the dominant characteristics of the industry and presumed characteristics of viewers with spending power. Other ethnic groups, disabled people and many others are, as with women, unequal voices in writing, creating, delivering and consuming media.

A further fascinating study by Ceretai used automatic recognition of speech to classify female and male screen time, with great promise for monitoring (im)balance in broadcasting. Their results on how much women speak in the show were discussed along with Looker’s by the BBC and NotifyNaija.

The proportion of speech uttered by female and male speakers throughout Game of Thrones

Using fan-made scripts adds a whole new dimension to the text in subtitles, because every utterance is preceded by the character’s name. The name is often an abbreviation, nickname or variant, so some work is necessary to combine aliases into single characters. There is also some variation in style because fan-made scripts vary somewhat between authors and are not quite complete to the end of Season 8. A short Perl script is reproduced below to generate character word counts by episode from a script file.

I identified the gender of the top 300 or so speakers from their names, actor names and fan wiki pronouns. These speakers accounted for more than 95% of all the spoken words in all 8 seasons, allowing the word-count and vocabulary to be categorised fairly accurately by gender. Speech by women varied beween 22.3% of the words in Season 1 and 34.8% of Season 7, and as low as 13.7% of Episode 9 in Season 4. The largest single-episode female share of speaking time was 47.9% in Episode 5 of Season 4 – which means that women never had an equal share, and never exceeded men, in share of speaking time throughout a run of 69 episodes.

(You might compare the low share of on-screen time to the excessive share of discussion about female characters, and the excessive share of vitriol and oversight that female characters were subjected to online – this was especially apparent following the end of the final season).

The wordcount (across all 8 seasons) of the most vocal characters

Despite the low (29.5%) share of screen dialogue, some of the most prominent characters are female. Cersei Lannister is the second-most prominent after the leading character, Tyrion. Daenerys, Sansa and Arya are in the top 7 speakers, followed by a run of 12 men – women make up just 4 of the top 19 speakers.

The prominence of each of the top 15 women and men throughout Game of Thrones

The alluvial plot (the same type of plot I used in my analysis of autism in the DSM) is a beautiful way to illustrate the flow of characters throughout the entire run of episodes. We see individuals rise to the top as most prolific speaker for a few episodes as the story concentrates on them, then dropping down as a new story-line takes centre stage. Their closest allies and enemies rise with them, so they move up and down in groups. We also see – at the moment of their last line in the script – their probable death.

Colour-coding gender (for the top 300 characters) produces a fascinating tapestry that reveals at the grossest scale the dominance of men as the most prolific speaker, of men as the the prominent speakers at all levels, and that men are more likely identified by an occupation or status (merchant, soldier, knight, watchman, Ser, prisoner, King and many others). Women have only three occupations: Lady, septa or prostitute / whore (#2, #3, etc in some episodes), and are more often merely unidentified woman (#2 to #9 in one episode) or women.

Coincident with Looker’s analysis of deaths, we can also see gendered clusters of death – men in battles and internecine feuding, women in mass burning and other disasters.

A whore, a war

The most common words used by male speakers

Men’s words, as portrayed in Game of Thrones, are dominated by action, place and personality. The top 50 male words, by Z-score, are: the, I, you, is, a, to, of, not, will, are, have, it, we, do, in, am, be, for, your, my, on, and, if, all, was, that, night’s, as, grace, they, can, did, no, have, with, had, he, man, this, Stark, but, what, Watch, would, his, she, about, men, could, Wall. Men are depicted using an abundance of pronouns (orange in the image above) combined with active verbs (pink), with a broad spread of nouns (blue) that are individually far less prominent than the pronouns.

The top 50 word pairs used by male characters enhance their action and centrality: of the, I am, it is, in the, you are, do not, I have, your grace, is a, we are, night’s Watch, the Wall, the night’s, I will, on the, will be, to the, to be, I do, and the, for the, you have, Lord Commander, that is, there is, is the, if you, is not, one of, with a, I was, we will, will not, did not, come on, you will, all the, a man, this is, of course, the realm, I would, with the, a good, Castle Black, the North, we have, if I, you do, for a. Men also express a proprietorial interest in place and its defence – the North, Castle Black, the Wall, the Watch.

Clams and oysters

The most common words used by female speakers

The top words used by women more than men depict women as preoccupied with family, petty jealousy (“shut up“) and their Lord’s business. The words identified by Z-score as more female in use are: very, love, Baelish, gift, Will, knew, home, up, oh, narrow, daughter, Master, mean, dragons, clams, husband, Great, Sparrow, Promise, murdered, may, across, liar, glad, Barristan, perhaps, ground, both, Arya, word, traitor, Targaryen, slave, says, really, made, imagine, Hound, Eyrie, dragon, Bank, wake, understand, thing, talk, swore, Stormborn, shut, shall, Sea. There is an excess of nouns (blue in the image above) and an absence of active verbs (pink). There is an unusual stilted quality in the prominence of so many nouns and so few pronouns, as if the women are external observers of the action.

The top 50 word-pairs reinforce this portrayal of passivity, with a focus on observed events, higher authorities and the actions of others: Lord Baelish, Will you, you love, would never, My mother, you mean, me in, you told, time you, they did, our father, my husband, I love, afraid of, you never, the Great, said you, My brother, is he, a gift, what will, what if, we must, think he, his life, High Sparrow, good to, get out, come with, because of, you may, you came, who was, who murdered, swear it, speak to, she does, Promise me, let me, know the, is very, he wanted, happened to, do you, do that, did they, brother and, across the, you what, take him.

A word network brings out many of the features of the perspectives portrayed by female and male characters. Contrast this with the use of “(with) autism” and “autistic (person)” in my analysis of Irish newspaper depictions of autistic people. Female characters are more likely to say “I should / would / could” while male characters are more likely to say “I am / do / have“, or “to take / be / make / fight / die“. Women are portrayed as talking about “my Lord / son / brother” and men as talking about “my father / Grace“. Women invoke “the gods / Lord / Queen / world” while men discuss “the North / Seven Kingdoms / Night’s Watch” or “Casterley Rock / Castle Black“.

Who’s together?

Two interesting measures of distance / similarity can be used to see if a pair of names or a pair of words are somehow alike. If two characters use the same vocabulary then they are probably discussing the same topics, possibly together. By measuring the similarity of the most frequent words used by every pair of characters (using Kendall’s tau, a measure of the similarity of the words and word order in two lists), we can place all the characters in a force diagram, placing related characters close to each other, prominent characters to the centre, unalike characters apart and lesser characters on the periphery. The image above demonstrates the relatedness and centrality of the top 15 male and female characters.

(You might like to compare the relatedness in this diagram with the common flow of character prominence in the alluvial diagram above).

Perl

Purely for completeness, the following script will take input from a script file and output a sorted list of ascending wordcount by speaker name for each season and episode. It assumes a) a speaker is identified by a colon, e.g. Baelish:, stripping out any commentary in round (…) or square […] braces and ignoring lines that are not atrributed to a speaker. The command cat "Game of Thrones script.txt" | wordcount.pl > wordcount.csv will write a standard CSV file, although I add an alias combining script to my pipeline to replace variant speaker names with a unified full name.

use strict;
use warnings;

my @wordlist;
my %count;
my $word;
my $speaker; my $text; my %words; my %lines;
my $season = ""; my $episode = ""; my $newseason; my $newepisode = "";

# CSV column headings
print "Season,Episode,Speaker,Lines,Words\n";
while(<>) {
  if (/^SEASON ([0-9]+)/) { $newseason = $1; }
  if (/^EPISODE ([0-9]+)/) { $newepisode = $1; if (length($newepisode) == 1) { $newepisode = "0".$newepisode; }}
  if (/([^:]): (.)/) {
    $speaker = $1; $text = $2;
    $speaker =~ s/([\w']+)/\u\L$1/g; $speaker =~ s/ ([^)])//;     $text =~ s/([^)])//g; $text =~ s/[[^]]*]//g;
  } else {
    $speaker = ""; $text = $_;
  }
  $lines{$speaker}++;
  @wordlist = split /\s+/;
  foreach $word (@wordlist)  
  { 
    $count{$word}++;
    $words{$speaker}++;
  }
  # Stats for this episode, and clear counts
  if ($episode ne $newepisode) {
    if ($episode ne "") {
      foreach $speaker (sort {$words{$a} <=> $words{$b}} keys %words)
      { 
        print "\"s", $season, "\",\"e", $episode, "\",\"", $speaker, "\",", $lines{$speaker}, ",", $words{$speaker}, "\n";
      }
    }
    %lines = (); %words = (); $season = $newseason; $episode=$newepisode;
  }
}
# Stats for final episode
foreach $speaker (sort {$words{$a} <=> $words{$b}} keys %words)
{ 
  print "\"s", $season, "\",\"e", $episode, "\",\"", $speaker, "\",", $lines{$speaker}, ",", $words{$speaker}, "\n";
}