Blog - BS Data, hackery, stories

A super quick weighted scatterplot about self-employment in R and ggplot2

The two self-employed worlds

The 'privileged' and the 'precarious'

Construction and building Education Health sector Hairdressers Sports and recreation Arts Real estate Taxis IT and programming Consultancies Retail Legal and accounting 20% 40% 60% 80% 10000 20000 30000 40000 50000 60000 Mean income (£) Percentage of degrees Category Privileged Number 1e+05 2e+05 3e+05 4e+05 5e+05


Import your dataset

This example uses data from a Resolution Foundation report about self-employment in the UK. It surfaces so-called privileged and precarious professions (based on unclear factors, judging from the report). We'll plot the mean income against the percentage of degrees per SIC (Standard Industrial Classification).

df <- read_csv("https://raw.githubusercontent.com/basilesimon/interactive-journalism-module/master/week5/exercise/data_annotated.csv")

Basic plot

We'll use ggplot2's geom_point to create our scatterplot. Some precisions:

  • To fill the circles with a colour, you've got to change their shape to an empty circle with a stroke. This is a handy guide to shapes in ggplot2.
  • seq(0, 70000, 10000) is essentially a shorthand to output an array like so: [0, 10000, 20000, ..., 70000]
  • If you're surprised by the magic scales::percent, read the docs about continuous scales
colors <- ggplot(df, aes(x = Mean, y = Degrees, size = Number, fill = Category)) +
  geom_point(shape = 21) +
  scale_size_area(max_size = 15) +
  scale_x_continuous(breaks = seq(0, 70000, 10000), name = "Mean income (£)") +
  scale_y_continuous(labels = scales::percent, name = "Percentage of degrees")

Annotations are the core of the job

Annotations in R are super simple, if only not so fancy.

In this instance, we'll annotate a few of the industry codes to give a bit more life to the plot.

Note that unless we add an interaction layer with tooltips and mouseovers with D3 later, this is everything the reader will see and read. It is good practice to assume the reader won't click or move their mouse about upon implementing tooltips and interactive features anyway, so we'll go down the simple and most effective route: by pointing out what's interesting straight away.

annotate("text", x = 29000, y = .05, label = "Construction and building") +

Export your plot as SVG...

... and savour the pleasures of CSS and Illustrator — or even of d3 manipulations.

ggsave(file="file.svg", plot=plot, width=10, height=10)

Full code

library(ggplot2)
library(readr)
library(ggthemes)

# Load new batch of annotated data
df <- read_csv("https://raw.githubusercontent.com/basilesimon/interactive-journalism-module/master/week5/exercise/data_annotated.csv")

# Color now depends on assigned category
colors <- ggplot(df, aes(x = Mean, y = Degrees, size = Number, fill = Category)) +
  geom_point(shape = 21) +
  scale_size_area(max_size = 15) +
  scale_x_continuous(breaks = seq(0, 70000, 10000), name = "Mean income (£)") +
  scale_y_continuous(labels = scales::percent, name = "Percentage of degrees")

# More annotations
labels <- colors +
  annotate("text", x = 29000, y = .05, label = "Construction and building") +
  annotate("text", x = 12000, y = .56, label = "Education") +
  annotate("text", x = 46000, y = .75, label = "Health sector") +
  annotate("text", x = 10000, y = .16, label = "Hairdressers") +
  annotate("text", x = 12000, y = .39, label = "Sports and recreation") +
  annotate("text", x = 20000, y = 0.7, label = "Arts") +
  annotate("text", x = 40000, y = .35, label = "Real estate") +
  annotate("text", x = 14000, y = .07, label = "Taxis") +
  annotate("text", x = 35000, y = .66, label = "IT and programming") +
  annotate("text", x = 50000, y = .62, label = "Consultancies") +
  annotate("text", x = 18000, y = .24, label = "Retail") +
  annotate("text", x = 58000, y = .77, label = "Legal and accounting")

# Playing with some themes
# And display plot
plot <- labels + theme_minimal()
plot

Data courtesy of Resolution Foundation, Feb. 2017 - A tough gig? The nature of self-employment in 21st Century Britain and policy implications, by Dan Tomlinson and Adam Corlett