The two self-employed worlds

The 'privileged' and the 'precarious'

This example uses data from a Resolution Foundation report about self-employment in the UK. It surfaces so-called privileged and precarious professions (based on unclear factors, judging from the report). We'll plot the mean income against the percentage of degrees per SIC (Standard Industrial Classification).

``````df <- read_csv("https://raw.githubusercontent.com/basilesimon/interactive-journalism-module/master/week5/exercise/data_annotated.csv")
``````

Basic plot

We'll use ggplot2's `geom_point` to create our scatterplot. Some precisions:

• To fill the circles with a colour, you've got to change their shape to an empty circle with a stroke. This is a handy guide to shapes in ggplot2.
• `seq(0, 70000, 10000)` is essentially a shorthand to output an array like so: `[0, 10000, 20000, ..., 70000]`
• If you're surprised by the magic `scales::percent`, read the docs about continuous scales
``````colors <- ggplot(df, aes(x = Mean, y = Degrees, size = Number, fill = Category)) +
geom_point(shape = 21) +
scale_size_area(max_size = 15) +
scale_x_continuous(breaks = seq(0, 70000, 10000), name = "Mean income (£)") +
scale_y_continuous(labels = scales::percent, name = "Percentage of degrees")
``````

Annotations are the core of the job

Annotations in R are super simple, if only not so fancy.

In this instance, we'll annotate a few of the industry codes to give a bit more life to the plot.

Note that unless we add an interaction layer with tooltips and mouseovers with D3 later, this is everything the reader will see and read. It is good practice to assume the reader won't click or move their mouse about upon implementing tooltips and interactive features anyway, so we'll go down the simple and most effective route: by pointing out what's interesting straight away.

``````annotate("text", x = 29000, y = .05, label = "Construction and building") +
``````

... and savour the pleasures of CSS and Illustrator — or even of d3 manipulations.

``````ggsave(file="file.svg", plot=plot, width=10, height=10)
``````

Full code

``````library(ggplot2)
library(ggthemes)

# Load new batch of annotated data

# Color now depends on assigned category
colors <- ggplot(df, aes(x = Mean, y = Degrees, size = Number, fill = Category)) +
geom_point(shape = 21) +
scale_size_area(max_size = 15) +
scale_x_continuous(breaks = seq(0, 70000, 10000), name = "Mean income (£)") +
scale_y_continuous(labels = scales::percent, name = "Percentage of degrees")

# More annotations
labels <- colors +
annotate("text", x = 29000, y = .05, label = "Construction and building") +
annotate("text", x = 12000, y = .56, label = "Education") +
annotate("text", x = 46000, y = .75, label = "Health sector") +
annotate("text", x = 10000, y = .16, label = "Hairdressers") +
annotate("text", x = 12000, y = .39, label = "Sports and recreation") +
annotate("text", x = 20000, y = 0.7, label = "Arts") +
annotate("text", x = 40000, y = .35, label = "Real estate") +
annotate("text", x = 14000, y = .07, label = "Taxis") +
annotate("text", x = 35000, y = .66, label = "IT and programming") +
annotate("text", x = 50000, y = .62, label = "Consultancies") +
annotate("text", x = 18000, y = .24, label = "Retail") +
annotate("text", x = 58000, y = .77, label = "Legal and accounting")

# Playing with some themes
# And display plot
plot <- labels + theme_minimal()
plot
``````

Data courtesy of Resolution Foundation, Feb. 2017 - A tough gig? The nature of self-employment in 21st Century Britain and policy implications, by Dan Tomlinson and Adam Corlett