[This is a guest post by Jill Hubley*, about her data visualization project NYC Street Trees by Species]
Having grown up in the woods of Pennsylvania, I’ve always been interested in knowing the plants around me. Though I’ve now lived in Brooklyn for 14 years, this naturalist tendency has only deepened. Every day, I walk through the 585 acres of Prospect Park with my dog, and do my best to identify the surrounding trees and plants. It was on one of these many walks a number of years ago that I first envisioned a tree species map. How helpful it would be while wandering down a path by the lake, unsure of a tree’s taxon, to be able to look up a map and know with conviction that it was a Common Buckthorn, for instance. When I got home that day I googled ‘New York tree map’ or something to that effect. I came across a brilliant project by Edward Sibley Barnard and Ken Chaya, similar to what I imagined, where they mapped all of the trees of Central Park. However, their map is an illustrated paper map, and my thought was that such a map should exist on the web and have some level of interactivity.
I filed this idea in the back of my mind, but didn’t act on it for a few years. Like any nagging thought that keeps returning, however, it needed to be addressed, and so I set out to find some data last winter. When Barnard and Chaya created their map of Central Park, they mapped each path and tree and physical feature themselves over the course of two years. While it sounded fun, I didn’t feel I had the ability to put in this much legwork. I reached out to the Prospect Park Alliance, but was told they haven’t done a tree survey. I asked the Central Park Conservancy about their data, but was told, “Unfortunately, for operational and legal reasons, we can’t currently share this database.” Looking at New York City’s open data portal, I saw a street tree census had been done in 2005 and the data was available, so thought that would be a viable alternative. The data was split into separate downloads for each borough. I chose to focus on Brooklyn. Hosted on Socrata, the data is available in a few different formats. The CSV exports don’t include the geographic data, so I grabbed the shapefile. I opened it in QGIS, then reprojected the layer to web mercator (it used a custom projection previously), then added geometry columns and grabbed all of the data from the attribute table and pasted everything into a CSV.
Initially, I thought I’d use D3.js to build the map. I hadn’t used it much previously, but knew of its mapping capabilities and the high level of control granted (nothing preconfigured, which I find an asset). I created the first prototype of the map using a CSV with a small selection of trees. I used d3.js’s enter() method to create an svg circle for each tree, then added a class to each circle that called the species code from the data, and paired a color with each species.
Also included in the data is the diameter of each tree’s trunk. I decided to visualize this too, with smaller circle radii for trees with smaller trunk diameters, and larger ones for larger trunks. It’s not a 1:1 scale, rather I used four ranges. At first, I didn’t multiply the radii by the zoom scale, so the circles were very large no matter how you looked at the map, which made it increasingly hard to differentiate trees as you zoomed out.
While still working out the issues with the trunk diameters, I started setting it up so that each color/species could be shown or hidden. I wanted the map to be interactive in a way that made it easy to see where each species was located, and I thought being able to isolate the tree types would further that goal. I also thought it would be fun to work backwards from the species data, and add the tree’s genus and plant family in my CSV, and allow those to be filterable too. I added an overlay on the left with a jQuery accordion for species, genus, and family and added some placeholder options.
I then chose to scale the circle size, and instantly the map looked better.
Once all 150,000+ tree coordinates were loaded in (I was still only working with the Brooklyn data), I realized I had an enormous performance issue. The points all loaded, but it took many seconds. Zooming and panning was nearly impossible. This shouldn’t have been a surprise, but this is the first time I’d worked with such a large dataset. With so many objects in the DOM, the browser was choking. At first, I thought perhaps if I switched from SVG to Canvas for the circles, that would take care of things. It did speed things up, but really, there were simply too many points. I knew I had to change my approach altogether. One option would be to use clustering. However, one of my favorite aspects of the map was that you could get an overall sense of the patterns when zoomed out completely, with one dot per one tree. A different approach would be to create a heatmap, but I felt this too would obscure the delicate patterns I had achieved. Another option was to create vector tiles, and this is the route I took.
There were two providers I considered– Mapbox and CartoDB. Ultimately, CartoDB won out because cartodb.js gave me the option to dynamically filter the data shown pretty easily. I kept pretty much all of the features I had created in d3.js. I did drop the genus and plant family options, and I decided to limit the filters to the common names of each species. I decided trimming this feature would give the project more clarity. Since I was now using CartoDB vector tiles, which use the UTF Grid specification, things were now zipping along. I decided now that I had the performance capability, why not incorporate the data from all of the boroughs. This inclusion would make the map appealing to a wider audience. It also increased the total number of species on the list, since the other boroughs had trees that aren’t found in Brooklyn. The total number of species citywide is 168. I felt that I couldn’t possibly find a distinct enough color for each and every one of those 168 species, so I made the decision to color-code only the top 52 species. The remaining species, many of which only have a handful of representatives planted in the city, are on the map, but are colored a mid-gray, and can’t be filtered. Users can hover over them to see their species. Limiting the number of colors was necessary to create a vivid map that showed patterns yet still remained distinguishable.
While fiddling with the map colors, I often commented out the base tiles from my code in order to more clearly see the points. I found that viewing the data stripped of its context provided a way to see the information in a more aesthetic light. The patterns of trees viewed in this fashion reminded me of constellations of stars. Wanting to encourage a playful exploration of the data, I chose to incorporate this base map toggling in the final map.
Looking at the map as a whole, I was surprised at the trends revealed– each borough has a predominant color/species. I had expected the trees to be fairly evenly mixed throughout the city, but instead saw different types of trees more heavily planted based on location.
With so many points of data, I continue to find new, unexpected pockets of color throughout the city. I also have plans to update the map to show the change in trees over time. I submitted a FOIL (Freedom of Information Law) request in late January to the Department of Parks and Recreation for the data from the first street tree census, conducted in 1995. I received that data recently, in late April. Further, the next NYC tree census is happening this year. I signed up to help count trees, and look forward to working with the data that I’ll help collect.
*Jill Hubley is a Brooklyn based web developer interested in the intersection of technology, art and science. You can find more of her work at jillhubley.com, and connect with her on Twitter (@jill_hubley).