You can download the raw source code for these lecture notes here.

Course Meeting Plan

Wednesday · 22 February · Lecture

  • Video: Antescofo (10 min)
  • Lecture: What are chroma features? (20 min)
  • Demo: Sonic Visualiser (10 min)
  • Lecture: Dynamic time warping (20 min)
  • Breakout 1: Chroma and tuning (10 min)
  • Plenary Discussion: Breakout findings (5 min)
  • Lecture: MIR features and music cognition (15 min)

Wednesday · 22 February · Lab

  • Tutorial: Github, RStudio, and flexdashboard (20 min)
  • Demo: Chromagrams (10 min)
  • Breakout: Chromagrams (15 min)
  • Plenary Discussion: Breakout findings (10 min)
  • Demo: compmus_long_distance (10 mins)
  • Breakout: Dynamic time warping (15 mins)
  • Plenary Discussion: Breakout findings (10 mins)

The Zoom breakout groups will be assigned at random. If you need help during any of the breakout sessions, please click the Help button, and the instructor or a TA will come to you.

Breakout 1: Chroma and Tuning

As a group, discuss Exercise 3.6 from the textbook:

Assume that an orchestra is tuned 20 cents upwards compared with the standard tuning. What is the center frequency of A4 in this tuning? How can a chroma representation be adjusted to compensate for this tuning difference?

Be prepared to return to the full group with answers to these questions.

Breakout 2: Chromagrams in R

In this breakout group, you will start working with chromagrams in R, and how to interpret them.

Installing compmus

Just once, you need to install the compmus package, a special package I wrote for this course to sand off some rough edges in spotifyr. You can do so with the following line of code:

remotes::install_github('jaburgoyne/compmus')

Setup

Once you have compmus installed, we will start this analysis like every other analysis, by loading tidyverse and spotifyr. From now on, we also add compmus to the list of libraries to load every time.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(spotifyr)
library(compmus)

One of the compmus helper functions, get_tidy_audio_analysis(), pulls in Spotify’s detailed audio analysis for one track at a time. These analyses are very detailed, and if you want to know more, basic documentation is available for everything in it.

Let’s pull down the analysis of a recording of Steve Reich’s ‘Music for Pieces of Wood’ (well worth a listen if you don’t know the piece). For reasons that will become clearer next week, one needs to select the desired field out of the analysis – in this case, segments – and then unnest it.

Spotify segments have a lot of information inside them, but this week, we’ll focus just on start, duration, and pitches: the three tools we need to make a chromagram.

wood <-
  get_tidy_audio_analysis("6IQILcYkN2S2eSu5IHoPEH") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)

The key to making a chromagram is geom_tile(). It is powerful but has it’s wrinkles. Much of the code in the next block is code that you will simply need to copy every time you use it; next week we’ll develop a better understanding of what is going on.

First, we want to choose a norm for the chroma vectors – manhattan, euclidean, or chebyshev – using the new helper function compmus_normalise() (and a little help from the map() function, which we’ll see more of next week). The name of the normalisation is the only thing in that line of code that you need to change in your breakout group work.

Next, we need to convert the data to so-called long format: a new row for each pitch class. There is a helper function compmus_gather_chroma() to do that.

Finally we can plot, but beware that ggplot centres each tile on the x or y coordinate instead of using the left corner. Use the duration to make a correction for it.

I’ve added labels, a simpler theme, and a scientific colour scheme to make the plot look a little nicer.

wood |>
  mutate(pitches = map(pitches, compmus_normalise, "euclidean")) |>
  compmus_gather_chroma() |> 
  ggplot(
    aes(
      x = start + duration / 2,
      width = duration,
      y = pitch_class,
      fill = value
    )
  ) +
  geom_tile() +
  labs(x = "Time (s)", y = NULL, fill = "Magnitude") +
  theme_minimal() +
  scale_fill_viridis_c()

Instructions

As a group, work through the following steps.

  • Listen to a few seconds of the piece. After you have listened, try to explain why this chromagram looks the way it does.
  • The sample chromagram uses a Euclidean norm for the chroma vectors. Try regenerating the chromagram with Manhattan and Chebyshev norms. What changes?
  • Choose a different piece from Spotify and make a chromagram. Try the three different norms. Can you explain the patterns you see?

Choose one member of your group who will be prepared to share their screen when we discuss the results at the end of the breakout sessions.

Breakout 3: Dynamic Time Warping

In order to take the step from chromagrams to dynamic time warping, we need to choose an appropriate distance metric to measure how far apart we think two different chroma vectors are. Distance metrics usually form conceptual pairs with norms (see the table below), although there are no standard distance metrics to use after Chebyshev normalisation.

Theoretically, the Manhattan norm–Aitchison distance pair and the Euclidean norm–angular distance pair are most appropriate for chroma vectors. But the Manhattan norm–Manhattan distance pair and the Euclidean norm–cosine distance pair are faster to compute and often good enough. The cosine distance, in particular, is extremely popular in practice.

Domain Normalisation Distance
Non-negative (e.g., chroma) Manhattan Manhattan
Aitchison
Euclidean cosine
angular
Chebyshev [none]

Let’s look at seven recordings of Josquin des Prez’s ‘Ave Maria’. The blank lines separate performances at three different pitch levels (transpositions).

## The Tallis Scholars
tallis <-
  get_tidy_audio_analysis("2J3Mmybwue0jyQ0UVMYurH") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)
## La Chapelle Royale
chapelle <-
  get_tidy_audio_analysis("4ccw2IcnFt1Jv9LqQCOYDi") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)
## The Cambridge Singers
cambridge <-
  get_tidy_audio_analysis("54cAT1TCFaZbLOB2i1y61h") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)


## Oxford Camerata
oxford <-
  get_tidy_audio_analysis("5QyUsMY40MQ1VebZXSaonU") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)
## Chanticleer
chanticleer <-
  get_tidy_audio_analysis("1bocG1N8LM7MSgj9T1n3XH") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)


## The Hilliard Ensemble
hilliard <-
  get_tidy_audio_analysis("2rXEyq50luqaFNC9DkcU6k") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)
## The Gabrieli Consort
gabrieli <-
  get_tidy_audio_analysis("4NnJ4Jes8a8mQUfXhwuITx") |>
  select(segments) |>
  unnest(segments) |>
  select(start, duration, pitches)

The compmus_long_distance() helper function gets everything ready to plot the distances between the chroma vectors in two different pieces. It takes two data frames (don’t forget to normalise the chroma vectors), the feature we want to compute distance over (pitches in our case), and any of the distance measures in the table above. It returns a long table ready for plotting, with xstart, xduration, ystart, and yduration.

compmus_long_distance(
  tallis |> mutate(pitches = map(pitches, compmus_normalise, "chebyshev")),
  chapelle |> mutate(pitches = map(pitches, compmus_normalise, "chebyshev")),
  feature = pitches,
  method = "euclidean"
) |>
  ggplot(
    aes(
      x = xstart + xduration / 2,
      width = xduration,
      y = ystart + yduration / 2,
      height = yduration,
      fill = d
    )
  ) +
  geom_tile() +
  coord_equal() +
  labs(x = "The Tallis Scholars", y = "La Chapelle Royale") +
  theme_minimal() +
  scale_fill_viridis_c(guide = NULL)

Instructions

As a group, work through the following steps.

  • Try different combinations of norms and distance metrics until the pattern seems visually clearest to you.
  • There seems to be a dark, mostly straight, diagonal link from the lower left to the upper right corner. What does this pattern mean? Why isn’t the line perfectly straight?
  • Try plotting different pairs of performances. Do you always see a diagonal line? Is it always straight? How can you explain what you are seeing?

Be prepared to discuss your finding with the larger group at the end of the breakout session.

Advanced work

If you are comfortable with R and want to compute actual DTW alignments instead of just making a visual analysis, look into the R dtw package.

Warning

The output of geom_tile() can be very large if sent through ggplotly() and will probably not work for your portfolios. It will either be very slow to load or simply leave a blank space on the page.