Mapping the Rockefeller Center tree with ggplot2 in R

Matt Russell
Towards Data Science
7 min readNov 24, 2020

--

The Rockefeller Center Christmas tree in New York City.
Photo by Wesley Tingey on Unsplash

It’s known as “the tree”. Every year the Christmas tree at Rockefeller Center in New York City brings a wonderful burst of green amid the dull winter colors of a big city.

The week after Thanksgiving, NBC broadcasts its annual Christmas tree lighting. Growing up in my family, we would tune in and mark this as the unofficial start to the holiday season.

The broadcast would have popular singers and musical groups perform. The night would conclude with the lighting of the tree, an awesome spectacle with a countdown to its bright bulbs set off to people cheering in the chilly New York City.

“Ya seen the tree yet?” A few times my mom, sister and I would take the train to New York City for the day to see the tree. This was usually around my sister’s birthday after Christmas and before New Year’s Eve. It was beautiful to see — usually an 70-foot tall Norway spruce with crowds of people surrounding it.

The first Christmas tree at Rockefeller Center was in 1931 and was a meager 20-foot high balsam fir. The tree was erected by construction workers who would meet at the tree to receive their paychecks. This was during the Great Depression and the tree had a special meaning at the time.

The Rockefeller Center Christmas tree data

A few weeks before it arrives, in the news are stories about where the tree originated from, the family whose property it came from, and how far it traveled to get to Rockefeller Center.

With a lack of tall Norway spruces within the New York City limits, the tree is always chosen from one of the surrounding states. The 2020 Rockefeller Center Christmas tree is from Oneonta, New York, and traveled 170 miles to get to Manhattan.

Wikipedia maintains a data set of past Rockefeller Center Christmas trees, including its species, height, and the town where it originated. We will use these data and four R packages (tidyverse, maps, geosphere, and ggrepel) to map where the Rockefeller trees come from:

library(tidyverse)
library(maps)
library(geosphere)
library(ggrepel)
tree <- read_csv('rocktree.csv')

Since at least 1984, the species chosen has always been a Norway spruce, a large and fast-growing conifer known for its drooping branches. Prior to that, white spruce and balsam fir were chosen more than once for the tree.

tree %>% 
group_by(Species) %>%
summarize(num = n()) %>%
arrange(desc(num))
A table showing the n number of trees for each species.
Number of times different species were used for the Rockefeller Center Christmas tree.

For the 71 trees which there are data, the average height of all of the Rockefeller Christmas trees was 75.2 feet. The Rockefeller tree has generally gotten taller through the years. Prior to 2000 there were several trees less than 70 feet tall. Since 2000 the average height of the tree is 78.1 feet.

ggplot(tree, aes(Year, Height_ft)) +
geom_point() +
stat_smooth(method = "lm") +
labs(y = "Height of Rockefeller tree (feet)") +
theme_bw()
A  scatter plot showing year and the height of the Rockefeller Christmas tree.
Trends in the height of the for the Rockefeller Center Christmas tree.

The tallest trees have come from Connecticut, with a mean of 84 feet. Although the tree has only come from the state of Vermont twice in its history, this state produces the smallest trees at 66 feet:

tree %>% 
group_by(OriginState) %>%
summarize(num = n(),
mean_ht = mean(Height_ft, na.rm = T)) %>%
arrange(desc(mean_ht)) %>%
filter(num >2 & !is.na(OriginState)) %>%
ggplot(aes(x = reorder(OriginState, mean_ht), y = mean_ht)) +
geom_bar(stat = "identity", fill = "lightgreen", col = "black") +
coord_flip()+
labs(y = "Mean height of Rockefeller tree (feet)", x = "State") +
theme_bw()
A graph showing average height of trees.
Mean heights for the Rockefeller Center Christmas trees originating from each state.

Mapping where the Rockefeller tree comes from

If you’re familiar with using ggplot to make visualizations in R, the maps package is a great addition to your data visualization toolkit. Functions from the maps package can be used directly inside a block of ggplot() code.

The first step to map where the Rockefeller tree originates from is to use the map_data() function to pull out the US states we’re interested in. This function uses the data from maps and wrangles it into a format that is recognized in ggplot():

states <- map_data("state")tree_states <- subset(states, 
region %in% c("connecticut", "pennsylvania",
"new york","ohio", "new jersey",
"vermont", "massachusetts",
"new hampshire"))

Next, we can plot the states using the ggplot() function and the mapping statement. State borders and colors are specified using geom_polygon() The coord_map() statement uses a central projection on cone tangent at 40 degrees latitude. Adding theme_bw() provides a consistent theme that will be used in all of the maps:

ggplot(data = tree_states, 
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
coord_map("conic", lat0 = 40) +
theme_bw()
A map of states in the northeastern United States.
A map of states produced using ggplot and maps in R.

A map of counties can also be produced using similar code:

counties <- map_data("county")tree_counties <- subset(counties, 
region %in% c("connecticut", "pennsylvania",
"new york","ohio", "new jersey",
"vermont", "massachusetts",
"new hampshire"))
ggplot(data = tree_states,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(data = tree_counties, fill = NA, color = "gray") +
geom_polygon(color = "black", fill = NA) +
coord_map("conic", lat0 = 40) +
theme_bw()
A map of counties in the northeastern United States.
A map of counties produced using ggplot and maps in R.

We can add the location of Rockefeller Center by specifying its latitude and longitude coordinates (40.759358, -73.978502) using geom_point(). We’ll make it dark green in color with a large size:

ggplot(data = tree_states, 
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
geom_point(aes(y=40.759358, x=-73.978502),
color = "darkgreen", size = 5) +
coord_map("conic", lat0 = 40) +
theme_bw()
Map showing location of Rockefeller Center.
The location of Rockefeller Center in New York City.

A data set can be called within a geom_point() statement to plot multiple points on a map. In the data, OriginLong and OriginLat represent the coordinates for the location of where the Rockefeller trees originated. We’ll make them red in color with a slightly smaller size:

ggplot(data = tree_states, 
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
geom_point(data = tree, aes(x = OriginLong, y = OriginLat),
color = "red", size = 2, inherit.aes = F) +
geom_point(aes(y=40.759358, x=-73.978502),
color = "darkgreen", size = 5) +
coord_map("conic", lat0 = 40) +
theme_bw()
A map with points showing where Rockefeller Christmas trees come from.
The locations where Rockefeller Christmas trees come from.

The geosphere package in R contains several functions that compute geographic distances between points. The distHaversine() function computes the shortest distance (in meters) between two points assuming a spherical earth. We can use it to quantify the distance between where the tree originated from and Rockefeller Center.

The dist_to_rock_miles variable converts the meters measurement to the number of miles between its home and Rockefeller Center:

tree <- tree %>% 
mutate(LatRock = 40.759358,
LongRock = -73.978502,
dist_to_rock_m = distHaversine(
cbind(OriginLong, OriginLat),
cbind(LongRock, LatRock)),
dist_to_rock_miles = dist_to_rock_m * 0.000621371)

The mean distance between a tree’s home and Rockefeller Center is 75 miles. The closest trees originated from Wayne, New Jersey, just 19 miles away from Rockefeller Center. There were two trees from Wayne: one used in 2001 and another from 2005.

The furthest tree originated from Richfield, Ohio, a Norway spruce 401 miles away from Rockefeller Center. Most trees are less than 150 miles away from their “new home” in Rock Center:

A graph showing the distance between a tree and Rockefeller Center.
A violin plot showing the distance between a tree and Rockefeller Center.

We might be interested in mapping the closest and furthest trees from Rockefeller Center. The data set tree_loc contains the coordinates from three locations: Wayne, NJ, Richfield, OH, and Rockefeller Center.

The geom_text_repel() function in the ggrepel package text directly to the map. This is useful for plotting interesting data points or place names on a map. In our map, we label the closest and furthest trees based on the OriginCity variable and then “nudge” the labels to appear offset to their locations:

tree_loc <- tree %>% 
filter(OriginCity %in% c("Wayne, NJ", "Richfield, OH",
"Rockefeller Center") & Year %in% c(1998, 2005, NA))
ggplot(data = tree_states,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
geom_point(data = tree_loc, aes(x = OriginLong,
y = OriginLat), color = "red", size = 2,
inherit.aes = F) +
geom_point(aes(y=40.759358, x=-73.978502),
color = "darkgreen", size = 5) +
geom_text_repel(data = tree_loc, aes(x = OriginLong,
y = OriginLat, label = OriginCity), fontface = "bold",
nudge_y = -1, nudge_x = -2, inherit.aes = F) +
coord_map("conic", lat0 = 40) +
theme_bw()
Map of Richfield, OH and Wayne, NJ.
Map showing the towns with the closest and furthest Rockefeller Christmas trees.

Conclusion

With 71 observations of the Rockefeller Center Christmas tree, we found that the tallest trees come from Connecticut, one tree traveled at least 401 miles to get to New York City from Ohio, and the chosen tree has become taller through the years.

The maps package provides a great addition to extending your ggplot visualizations. Functions from additional packages like geosphere and ggrepel allow you to take your mapping skills further in R.

Data and analysis files available on GitHub.

--

--