Mapping the Rockefeller Center tree with ggplot2 in R
It’s known as “the tree”. Every year the Christmas tree at Rockefeller Center in New York City brings a wonderful burst of green amid the dull winter colors of a big city.
The week after Thanksgiving, NBC broadcasts its annual Christmas tree lighting. Growing up in my family, we would tune in and mark this as the unofficial start to the holiday season.
The broadcast would have popular singers and musical groups perform. The night would conclude with the lighting of the tree, an awesome spectacle with a countdown to its bright bulbs set off to people cheering in the chilly New York City.
“Ya seen the tree yet?” A few times my mom, sister and I would take the train to New York City for the day to see the tree. This was usually around my sister’s birthday after Christmas and before New Year’s Eve. It was beautiful to see — usually an 70-foot tall Norway spruce with crowds of people surrounding it.
The first Christmas tree at Rockefeller Center was in 1931 and was a meager 20-foot high balsam fir. The tree was erected by construction workers who would meet at the tree to receive their paychecks. This was during the Great Depression and the tree had a special meaning at the time.
The Rockefeller Center Christmas tree data
A few weeks before it arrives, in the news are stories about where the tree originated from, the family whose property it came from, and how far it traveled to get to Rockefeller Center.
With a lack of tall Norway spruces within the New York City limits, the tree is always chosen from one of the surrounding states. The 2020 Rockefeller Center Christmas tree is from Oneonta, New York, and traveled 170 miles to get to Manhattan.
Wikipedia maintains a data set of past Rockefeller Center Christmas trees, including its species, height, and the town where it originated. We will use these data and four R packages (tidyverse, maps, geosphere, and ggrepel) to map where the Rockefeller trees come from:
library(tidyverse)
library(maps)
library(geosphere)
library(ggrepel)tree <- read_csv('rocktree.csv')
Since at least 1984, the species chosen has always been a Norway spruce, a large and fast-growing conifer known for its drooping branches. Prior to that, white spruce and balsam fir were chosen more than once for the tree.
tree %>%
group_by(Species) %>%
summarize(num = n()) %>%
arrange(desc(num))
For the 71 trees which there are data, the average height of all of the Rockefeller Christmas trees was 75.2 feet. The Rockefeller tree has generally gotten taller through the years. Prior to 2000 there were several trees less than 70 feet tall. Since 2000 the average height of the tree is 78.1 feet.
ggplot(tree, aes(Year, Height_ft)) +
geom_point() +
stat_smooth(method = "lm") +
labs(y = "Height of Rockefeller tree (feet)") +
theme_bw()
The tallest trees have come from Connecticut, with a mean of 84 feet. Although the tree has only come from the state of Vermont twice in its history, this state produces the smallest trees at 66 feet:
tree %>%
group_by(OriginState) %>%
summarize(num = n(),
mean_ht = mean(Height_ft, na.rm = T)) %>%
arrange(desc(mean_ht)) %>%
filter(num >2 & !is.na(OriginState)) %>%
ggplot(aes(x = reorder(OriginState, mean_ht), y = mean_ht)) +
geom_bar(stat = "identity", fill = "lightgreen", col = "black") +
coord_flip()+
labs(y = "Mean height of Rockefeller tree (feet)", x = "State") +
theme_bw()
Mapping where the Rockefeller tree comes from
If you’re familiar with using ggplot to make visualizations in R, the maps package is a great addition to your data visualization toolkit. Functions from the maps package can be used directly inside a block of ggplot()
code.
The first step to map where the Rockefeller tree originates from is to use the map_data()
function to pull out the US states we’re interested in. This function uses the data from maps and wrangles it into a format that is recognized in ggplot()
:
states <- map_data("state")tree_states <- subset(states,
region %in% c("connecticut", "pennsylvania",
"new york","ohio", "new jersey",
"vermont", "massachusetts",
"new hampshire"))
Next, we can plot the states using the ggplot()
function and the mapping
statement. State borders and colors are specified using geom_polygon()
The coord_map()
statement uses a central projection on cone tangent at 40 degrees latitude. Adding theme_bw()
provides a consistent theme that will be used in all of the maps:
ggplot(data = tree_states,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
coord_map("conic", lat0 = 40) +
theme_bw()
A map of counties can also be produced using similar code:
counties <- map_data("county")tree_counties <- subset(counties,
region %in% c("connecticut", "pennsylvania",
"new york","ohio", "new jersey",
"vermont", "massachusetts",
"new hampshire"))ggplot(data = tree_states,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(data = tree_counties, fill = NA, color = "gray") +
geom_polygon(color = "black", fill = NA) +
coord_map("conic", lat0 = 40) +
theme_bw()
We can add the location of Rockefeller Center by specifying its latitude and longitude coordinates (40.759358, -73.978502) using geom_point()
. We’ll make it dark green in color with a large size:
ggplot(data = tree_states,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
geom_point(aes(y=40.759358, x=-73.978502),
color = "darkgreen", size = 5) +
coord_map("conic", lat0 = 40) +
theme_bw()
A data set can be called within a geom_point()
statement to plot multiple points on a map. In the data, OriginLong
and OriginLat
represent the coordinates for the location of where the Rockefeller trees originated. We’ll make them red in color with a slightly smaller size:
ggplot(data = tree_states,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
geom_point(data = tree, aes(x = OriginLong, y = OriginLat),
color = "red", size = 2, inherit.aes = F) +
geom_point(aes(y=40.759358, x=-73.978502),
color = "darkgreen", size = 5) +
coord_map("conic", lat0 = 40) +
theme_bw()
The geosphere package in R contains several functions that compute geographic distances between points. The distHaversine()
function computes the shortest distance (in meters) between two points assuming a spherical earth. We can use it to quantify the distance between where the tree originated from and Rockefeller Center.
The dist_to_rock_miles
variable converts the meters measurement to the number of miles between its home and Rockefeller Center:
tree <- tree %>%
mutate(LatRock = 40.759358,
LongRock = -73.978502,
dist_to_rock_m = distHaversine(
cbind(OriginLong, OriginLat),
cbind(LongRock, LatRock)),
dist_to_rock_miles = dist_to_rock_m * 0.000621371)
The mean distance between a tree’s home and Rockefeller Center is 75 miles. The closest trees originated from Wayne, New Jersey, just 19 miles away from Rockefeller Center. There were two trees from Wayne: one used in 2001 and another from 2005.
The furthest tree originated from Richfield, Ohio, a Norway spruce 401 miles away from Rockefeller Center. Most trees are less than 150 miles away from their “new home” in Rock Center:
We might be interested in mapping the closest and furthest trees from Rockefeller Center. The data set tree_loc
contains the coordinates from three locations: Wayne, NJ, Richfield, OH, and Rockefeller Center.
The geom_text_repel()
function in the ggrepel package text directly to the map. This is useful for plotting interesting data points or place names on a map. In our map, we label the closest and furthest trees based on the OriginCity
variable and then “nudge” the labels to appear offset to their locations:
tree_loc <- tree %>%
filter(OriginCity %in% c("Wayne, NJ", "Richfield, OH",
"Rockefeller Center") & Year %in% c(1998, 2005, NA))ggplot(data = tree_states,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(color = "black", fill = "white") +
geom_point(data = tree_loc, aes(x = OriginLong,
y = OriginLat), color = "red", size = 2,
inherit.aes = F) +
geom_point(aes(y=40.759358, x=-73.978502),
color = "darkgreen", size = 5) +
geom_text_repel(data = tree_loc, aes(x = OriginLong,
y = OriginLat, label = OriginCity), fontface = "bold",
nudge_y = -1, nudge_x = -2, inherit.aes = F) +
coord_map("conic", lat0 = 40) +
theme_bw()
Conclusion
With 71 observations of the Rockefeller Center Christmas tree, we found that the tallest trees come from Connecticut, one tree traveled at least 401 miles to get to New York City from Ohio, and the chosen tree has become taller through the years.
The maps package provides a great addition to extending your ggplot visualizations. Functions from additional packages like geosphere and ggrepel allow you to take your mapping skills further in R.
Data and analysis files available on GitHub.