Great Circles with R

Three methods with sp and sf

rstats
rspatial
Creating great circles in R with the sp and sf packages
Published

28 March 2018

In 1569 the Flemish cartographer and mathematician Gerardus Mercator published a new world map under the title “New and more complete representation of the terrestrial globe properly adapted for use in navigation.” The title of the map points to Mercator’s main claim for its usefulness, which he expounded upon in the map’s legends. Mercator presented his map as not only an accurate representation of the known world, but also as a particularly useful map for the purposes of navigation. As described in the third legend, Mercator aimed to maintain conformity to the shape of land masses even towards the poles and to have straight lines on the map accurately represent directionality. To achieve his goals Mercator used a projection in which lines of longitude and latitude were made perpendicular at all values by increasing the distance between degrees of latitude as they reach the pole.1 Mercator’s projection had the benefit that straight lines drawn on the map are rhumb lines, lines of constant bearing that pass every degree of longitude at the same angle. Theoretically this simplified oceanic navigation; a ship captain could draw a straight line from one port to another, calculate the bearing, and maintain that bearing along the voyage. However, 16th-century navigators used magnetic courses and not longitude and latitude values as Mercator’s map assumed.2 An accurate means to measure longitude at sea was only discovered in the second half of the 18th century with the development of the sextant and later the marine chronometer.3

Figure 1: World Map by Gerardus Mercator, 1569

The Mercator projection was designed with certain uses in mind. Mercator’s emphasis on perpendicular lines of longitude and latitude and the equivalence of straight lines and rhumb lines were meant to simplify navigation and have recently proved useful for online mapping services. However, the stretching of latitudes towards the poles distorts the size of land masses, making those closer to the poles appear larger than those near the equator. The stress on rhumb lines in Mercator’s map also highlights the difference between lines of constant bearing (rhumb or loxodrome lines) and the shortest distance between two points (great circles). Due to Earth’s ellipsoidal nature, the shortest distance between two points is not necessarily a straight line. For instance, to fly from Los Angeles to Amsterdam, one would not want to fly in a straight line of constant bearing at 78 degrees. Instead, you would want to make an arc to the north to take advantage of the ellipsoidal shape of the Earth. By flying along the great circle from Los Angeles to Amsterdam one would travel 1120 kilometers less than flying along the rhumb line.

Great circles with R

In Geocoding with R and Introduction to GIS with R I demonstrated different ways to make maps that showed the sources and destinations of letters sent to the sixteenth-century merchant Daniel van der Meulen in 1585. This post will show how to create great circle lines to connect the sources and destinations of the letters. There are a number of resources on making great circles with R, and each resource uses slightly different methods.4 The diversity of methods to get the same or similar result is both one of the more interesting and one of the more frustrating parts of coding. Here, I will show how to create great circles as a SpatialLinesDataFrame from the sp package, as well as two different methods using the sf package. The results of the three methods will be very similar, and the ability to convert between objects used by the sp and sf packages means that they can be used interchangeably. However, showing multiple methods will help to explicate the process of creating lines from longitude and latitude data and turning them into great circles. You can find the data and the R script that goes along with this post on GitHub.

sp and sf both depend upon external packages to calculate the shortest distance between two points of longitude and latitude data. Creating a SpatialLinesDataFrame of great circles is done through the geosphere package, while the calculations for sf objects depend upon the lwgeom package.5 Despite the different sources for the calculations, the results produced by geosphere and lwgeom are all but equivalent. Internally, both the gcIntermediate() function from geosphere and st_segmentize() function from sf use a spherical model of the Earth to calculate great circles. Therefore, the path drawn by the two functions are not as exact as the distance calculations from the two packages, which use more accurate ellipsoidal models of the Earth. The primary difference between gcIntermediate() and st_segmentize() is not the internal calculations, but how they are implemented in R. Both functions work by breaking a straight line with two points into a segmented line, with the number of breaks defined by the user. gcIntermediate() calculates segments based on a specified number of breaks. On the other hand, st_segmentize() uses a maximum distance argument to calculate the number of segments. This means that whereas gcIntermediate() results in the same number of points along the great circle for each line irrespective of distance, st_segmentize() creates segments of relatively equal distance but with the amount of segments in a line varying by distance.

Let’s start by loading the packages and data that we will use to make the great circles. To wrangle the data into the correct formats I will use the tidyverse. We need to load the sp and geosphere packages to make a SpatialLinesDataFrame of great circles and the sf package for the latter two methods. Internally, st_segmentize() uses the lwgeom package, but we will not use the package directly, and so while it is necessary to have lwgeom installed, we do not need to load it. Lastly, I will use the rnaturalearth package to provide some background maps. The data that I will be using is again the letters sent to Daniel van der Meulen in 1585 along with the latitude and longitude values of the cities within his correspondence network at this time. We can also load the countries map from rnaturalearth as both a Spatial and sf object.

# Load packages
library(tidyverse)
library(sp)
library(geosphere)
library(sf)
library(rnaturalearth)

# Load the data
letters <- read_csv("data/correspondence-data-1585.csv")
locations <- read_csv("data/locations.csv")

# Load the background maps
countries_sp <- ne_countries(scale = "medium")
countries_sf <- ne_countries(scale = "medium", returnclass = "sf")

Preparing the data

The basis for all three methods of creating great circles is a data frame of routes — a data frame with a single row for each unique connection between a source and a destination — which we can then join to the longitude and latitude values in the locations data frame. The letters data contains 114 rows with each row containing a letter. We can calculate the number of letters sent along each unique route with a group_by() and count() pipeline in which we group the letters that have the same source and destination. We then want to ungroup() the data frame to prevent the group attribute from interfering with any code further down the line. The final step is to arrange the rows by the count column, which has been named “n” by count(). This ensures that the routes with more letters sent across them will be drawn later in the plot, making them more visible in the maps.

# Create data frame of routes and count for letters per route
routes <- letters %>%
  group_by(source, destination) %>% 
  count() %>% 
  ungroup() %>% 
  arrange(n)

# Print routes
routes
#> # A tibble: 15 × 3
#>    source    destination     n
#>    <chr>     <chr>       <int>
#>  1 Amsterdam Bremen          1
#>  2 Antwerp   Middelburg      1
#>  3 Dordrecht Haarlem         1
#>  4 Emden     Bremen          1
#>  5 Haarlem   Middelburg      1
#>  6 Haarlem   The Hague       1
#>  7 Hamburg   Bremen          1
#>  8 Het Vlie  Bremen          1
#>  9 Lisse     Delft           1
#> 10 Antwerp   The Hague       2
#> 11 Haarlem   Bremen          2
#> 12 Venice    Haarlem         2
#> 13 Antwerp   Haarlem         5
#> 14 Haarlem   Delft          26
#> 15 Antwerp   Delft          68

The routes data shows that there are 15 unique source to destination combinations within the letters data. A glimpse at routes shows that the majority of letters that Daniel received in 1585 — 68 letters — were sent from Antwerp to Delft, while a sizable number of letters were sent from Haarlem to Delft.6 Daniel received letters along other routes much less frequently. With the routes identified, and already possessing the longitude and latitude of the end points for the routes in the locations data frame, we can now move on to creating the great circles between these sets of points.

Great circles with geosphere and sp

Let’s start by creating great circles as a SpatialLinesDataFrame from the sp package. Before the recent growth of the sf package, the gcIntermediate() function from the geosphere package represented the main method for creating great circles. I have tried to simplify the process, making it possible to create a SpatialLinesDataFrame in four steps.

The gcIntermediate() function provides one of the easiest methods for transforming coordinates data into spatial lines. A single step creates a SpatialLines object and calculates the great circle. As we will see below, the sf methods split these into two steps. The gcIntermediate() function takes two sets of longitude and latitude coordinates, one for the beginning point of the line and one for the end point. The coordinates can be supplied in the form of either a data frame or a matrix. Here, I will supply the coordinates in a data frame, since our longitude and latitude values are already in this format. We can identify the longitude and latitude values for the sources of the routes by joining locations to routes by the “source” column. Since we only want the longitude and latitude values for the gcIntermediate() function, we select() the “lon” and “lat” columns — the names of the columns that contain longitude and latitude values — creating a sources_tbl with two columns and the same amount of rows as the routes data. Joining locations to routes by “destination” creates an equivalent destinations_tbl.

# tibble of longitude and latitude values of sources
sources_tbl <- routes %>% 
  left_join(locations, by = c("source" = "place")) %>% 
  select(lon, lat)

# tibble of longitude and latitude values of destinations
destinations_tbl <- routes %>% 
  left_join(locations, by = c("destination" = "place")) %>% 
  select(lon, lat)

The next step is to decide on the number of segments for the line and the type of output we want. In this example, I choose 50 breaks, which is more than enough for the routes that we are drawing. The end result will actually have 52 segments, because I want to include the start and end points within the line by setting the addStartEnd to TRUE.7 Lastly, in order to have the output be a Spatial object, we need to set the sp argument to TRUE.8

# Great circles as a SpatialLines object
routes_sl <- gcIntermediate(sources_tbl, destinations_tbl, 
                            n = 50, addStartEnd = TRUE, 
                            sp = TRUE)

Let’s investigate the object that we have created with the gcIntermediate() function. You will not want to print routes_sl to the console, as this will print out fifteen matrices with 52 rows each, but we can use other methods to look at the object. routes_sl is a Spatial object of class SpatialLines. The different forms of data within Spatial objects are contained in slots, which can be identified with slotNames(). The lines slot contains the coordinates of the routes produced by gcIntermediate(). The gcIntermediate() function also added a coordinate reference system and set it to longitude and latitude coordinates on the WGS84 ellipsoid, equivalent to EPSG 4326.

# Class of routes_sl
class(routes_sl)
#> [1] "SpatialLines"
#> attr(,"package")
#> [1] "sp"

# Slots in routes_sl
slotNames(routes_sl)
#> [1] "lines"       "bbox"        "proj4string"

# CRS of routes_sl
routes_sl@proj4string
#> Coordinate Reference System:
#> Deprecated Proj.4 representation: +proj=longlat +ellps=WGS84 +no_defs 
#> WKT2 2019 representation:
#> GEOGCRS["unknown",
#>     DATUM["Unknown based on WGS84 ellipsoid",
#>         ELLIPSOID["WGS 84",6378137,298.257223563,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",7030]]],
#>     PRIMEM["Greenwich",0,
#>         ANGLEUNIT["degree",0.0174532925199433],
#>         ID["EPSG",8901]],
#>     CS[ellipsoidal,2],
#>         AXIS["longitude",east,
#>             ORDER[1],
#>             ANGLEUNIT["degree",0.0174532925199433,
#>                 ID["EPSG",9122]]],
#>         AXIS["latitude",north,
#>             ORDER[2],
#>             ANGLEUNIT["degree",0.0174532925199433,
#>                 ID["EPSG",9122]]]]

The inclusion of a coordinate reference system makes routes_sl a truly geospatial object that can be plotted on our background map, though it does not yet possess any attribute data about the names of the locations or the number of letters sent along each route.

# Make bbox of countries_sp the same as routes_sl
countries_sp@bbox <- bbox(routes_sl)

# Plot map
par(mar = c(1, 1, 3, 1))
plot(countries_sp, col = gray(0.8), border = gray(0.7),
     main = "SpatialLines great circles")
plot(routes_sl, col = "dodgerblue", add = TRUE)

At this point, the output is not hugely informative. However, we can confirm that the process of making great circles worked by observing the curvatures of the lines. Notice that the amount of curvature differs according to distance and the bearing of the line.

We can add the attribute data for the lines and create a SpatialLinesDataFrame object by combining routes_sl and the routes data frame through the SpatialLinesDataFrame() function. There is no need to worry about linking the lines data to the correct row in routes, because routes_sl contains a lines ID attribute that corresponds to the row numbers of sources_tbl and destinations_tbl, which were derived from routes.9

# Great circles as a SpatialLinesDataFrame object
routes_sldf <- SpatialLinesDataFrame(routes_sl, data = routes)

The result of SpatialLinesDataFrame() is a complete Spatial object that contains a CRS, attribute data in the data slot, and great circle lines in the lines slot. It is now possible to plot routes_sldf and distinguish the lines by the number of letters sent along each route. The easiest way to show this in a base plot is by adjusting the line width of the great circles by a chosen formula. In this case, I take the square route of the number of letters and add 0.25 to get the maximum and minimum line width to a reasonable size.

countries_sp@bbox <- bbox(routes_sldf)
# Map with SpatialLinesDataFrame object
par(mar = c(1, 1, 3, 1))
plot(countries_sp, col = gray(0.8), border = gray(0.7),
     main = "SpatialLinesDataFrame great circles")
plot(routes_sldf,
     lwd = sqrt(routes_sldf$n/3) + 0.25,
     col = "dodgerblue",
     add = TRUE)

Great circles with sf

There is no straightforward way, no single function, to create great circles from longitude and latitude data with the sf package that is reminiscent of gcIntermediate(). With sf the creation of great circles is a two-step process. First, we need to create lines or objects with a geometry of LINESTRING, and then it will be possible to convert the straight line into a great circle with st_segmentize(). The first transformation is the trickier of the two. As I have shown previously, you can create an sf object with POINT geometry through the st_as_sf() function, but there is no equivalent function for creating lines. Here, I will show two different methods for creating lines and then great circles with sf. The first method will convert the routes data to an sf object of POINT geometry and take advantage of the sf package’s ability to use dplyr functions to create lines through a group_by() and summarise() pipeline. The second method vectorizes the creation of a single LINESTRING feature through a for loop. The resulting sf object of great circles will be essentially identical.

Great circles with sf: tidy method

When I first attempted to create great circles with the sf package, I was not sure how to proceed. In fact, for many months I used the above method from geosphere and then converted the SpatialLinesDataFrame to an sf object with st_as_sf(). I discovered this first method for creating great circles with sf when I came to the realization that the form of the data in the routes data frame was problematic. It obviously makes sense from a perspective of data entry to have a row for each letter with a source and destination. However, this results in two columns that represent a similar type of data of place names. When coordinate data is added, you end up with two sets of longitude and latitude variables. In an sf object this translates to two separate geometry columns, which is possible but problematic.

Once I identified the problem, I could find a solution in the form of the functions in the tidyr package and the transformation of routes from a “wide” form with a “source” and “destination” column in the same row to a “long” form in which the source and destination of each letter are in separate rows. The “long” format of the routes data frame with a single “place” column makes it possible to add a single set of longitude and latitude data and then convert to an sf object. From this format we can transform an sf object with POINT geometry to LINESTRING geometry through sf’s integration with dplyr.10

Let’s see how this works in practice. Before we begin though, we need to make a slight change to routes to add an “id” column. This will enable us to keep track of the relationship between the sources and destinations and will be used later to group the routes together. We can do this with the very helpful rowid_to_column() function from the tibble package, which even places the “id” column at the start of the data frame.

# Create id column to routes
routes_id <- rowid_to_column(routes, var = "id")

With this done, the first step is to transform routes_id into a “long” format with the use of gather(). The related gather() and spread() functions are almost magical when you can get them to do what you want, but this also makes them somewhat tricky to use.11 gather() combines variables and transforms them into values. The gather() function has three main arguments: key, value, and ... to choose the columns to keep or modify. The values passed to the key and value arguments provide the names for two new columns to be created by the gather() function. They can be anything you choose, but assigning descriptive names makes it easier to understand what the function is doing. You can think of the key argument as the name for the column whose values are currently column names. The key essentially identifies the “type”. The value argument is the name of the variable whose values are contained within the columns that are spread apart. The ... argument is used to demarcate the columns to either be used by gather() or to specify which columns you want to keep as is and thus make longer. This uses the same grammar as select() with -variable_name for columns that you want to be left alone.

In our example, we want to turn the “source” and “destination” variables or columns into values that identify whether a place is a source or destination. Here, I choose to name the key “type” and the value “place”, which aligns with the nomenclature used in the locations data. To make the function work properly, we need to choose the right columns for gather() to work upon. In this case, there are two possibilities. We can either tell gather() to use the “source” and “destination” columns or to leave “id” and “n” alone with a - before these column names. Here, I do the former.12

# Transform routes to long format
routes_long <- routes_id %>% 
  gather(key = "type", value = "place", source, destination)

# Print routes_long
routes_long
#> # A tibble: 30 × 4
#>       id     n type   place    
#>    <int> <int> <chr>  <chr>    
#>  1     1     1 source Amsterdam
#>  2     2     1 source Antwerp  
#>  3     3     1 source Dordrecht
#>  4     4     1 source Emden    
#>  5     5     1 source Haarlem  
#>  6     6     1 source Haarlem  
#>  7     7     1 source Hamburg  
#>  8     8     1 source Het Vlie 
#>  9     9     1 source Lisse    
#> 10    10     2 source Antwerp  
#> # … with 20 more rows

Because we performed the gather() function on two columns, routes_long possesses twice as many rows as routes. routes_long has two new columns corresponding to the names we gave them in the key and values arguments, while the “id” and “n” values have essentially been doubled such that the same values are associated with both a source and destination “type”. Now that we have a single column with place names, we can add the longitude and latitude values of the places through a left join.

# Add coordinate values
routes_long_geo <- left_join(routes_long, locations, by = "place")

With routes_long_geo we now have the data in the correct format to create an sf object with POINT geometry and a CRS through the st_as_sf() function. Let’s create the object and then see what it looks like.

# Convert coordinate data to sf object
routes_long_sf <- st_as_sf(routes_long_geo,
                           coords = c("lon", "lat"),
                           crs = 4326)

# Print routes_long_sf
routes_long_sf
#> Simple feature collection with 30 features and 4 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 3.610998 ymin: 45.44085 xmax: 12.31552 ymax: 53.55108
#> Geodetic CRS:  WGS 84
#> # A tibble: 30 × 5
#>       id     n type   place                geometry
#>  * <int> <int> <chr>  <chr>             <POINT [°]>
#>  1     1     1 source Amsterdam (4.895168 52.37022)
#>  2     2     1 source Antwerp   (4.402464 51.21945)
#>  3     3     1 source Dordrecht  (4.690093 51.8133)
#>  4     4     1 source Emden       (7.20601 53.3594)
#>  5     5     1 source Haarlem   (4.646219 52.38739)
#>  6     6     1 source Haarlem   (4.646219 52.38739)
#>  7     7     1 source Hamburg   (9.993682 53.55108)
#>  8     8     1 source Het Vlie      (5.183333 53.3)
#>  9     9     1 source Lisse     (4.557483 52.25793)
#> 10    10     2 source Antwerp   (4.402464 51.21945)
#> # … with 20 more rows

From this point on, we are able to use sf’s dplyr integration to convert from a POINT geometry to MULTIPOINT and then LINESTRING. We can perform the former by grouping routes_long_sf by the “id” column and summarizing the points into a MULTIPOINT geometry.13 The sf implementation of summarise() includes a do.union argument. With do.union set to TRUE, the default, summarise() will try to resolve internal boundaries, but it can also change the order of points. Therefore, we will set do.union to FALSE, which uses st_combine() to more simply combine geometries in a manner similar to c (). The latter transformation from MULTIPOINT to LINESTRING involves st_cast() and takes advantage of the fact that internally MULTIPOINT and LINESTRING are derived from the same type of data, making it possible to convert back and forth between the two geometries without any loss of data. Here I will do all of this in a single pipeline, though you could do it step-by-step to see the multiple conversions that are taking place.

# Convert POINT geometry to MULTIPOINT, then LINESTRING
routes_lines <- routes_long_sf %>% 
  group_by(id) %>% 
  summarise(do_union = FALSE) %>% 
  st_cast("LINESTRING")

# Print routes_lines
routes_lines
#> Simple feature collection with 15 features and 1 field
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 3.610998 ymin: 45.44085 xmax: 12.31552 ymax: 53.55108
#> Geodetic CRS:  WGS 84
#> # A tibble: 15 × 2
#>       id                               geometry
#>    <int>                       <LINESTRING [°]>
#>  1     1  (4.895168 52.37022, 8.801694 53.0793)
#>  2     2  (4.402464 51.21945, 3.610998 51.4988)
#>  3     3  (4.690093 51.8133, 4.646219 52.38739)
#>  4     4    (7.20601 53.3594, 8.801694 53.0793)
#>  5     5  (4.646219 52.38739, 3.610998 51.4988)
#>  6     6    (4.646219 52.38739, 4.3007 52.0705)
#>  7     7  (9.993682 53.55108, 8.801694 53.0793)
#>  8     8      (5.183333 53.3, 8.801694 53.0793)
#>  9     9 (4.557483 52.25793, 4.357068 52.01158)
#> 10    10    (4.402464 51.21945, 4.3007 52.0705)
#> 11    11  (4.646219 52.38739, 8.801694 53.0793)
#> 12    12 (12.31552 45.44085, 4.646219 52.38739)
#> 13    13 (4.402464 51.21945, 4.646219 52.38739)
#> 14    14 (4.646219 52.38739, 4.357068 52.01158)
#> 15    15 (4.402464 51.21945, 4.357068 52.01158)

We have successfully created an sf object with lines. In addition to the change in geometry type, we can see that the number of rows has been halved and returned to the original length of routes_id. However, in doing this, we lost the data on the name of the source and destination of the lines, as well as the number of letters sent along the route. This occurred when we grouped routes_long_sf by the “id” column, leaving other non-geometry columns to be silently dropped. We can re-add the attribute data by joining routes_lines to the original routes_id data frame, using the “id” column to perform the join. This work flow of grouping and then joining by “id” is why we needed to create the column in the first place.

# Join sf object with attributes data
routes_lines <- left_join(routes_lines, routes_id, by = "id")

Now that the geometry of the sf object is LINESTRING, creating a great circle with st_segmentize() is straight forward. The only decision you have to make is the maximum distance for the length of a segment. The maximum distance argument can either be the number of meters for the maximum segment or an object from the units package, which makes it possible to use a more sensible unit for this argument such as kilometers.14 In this case, I will use the set_units function from the units package and set the maximum length to 20 kilometers.15

# Convert rhumb lines to great circles
routes_sf_tidy <- routes_lines %>% 
  st_segmentize(units::set_units(20, km))

With routes_sf_tidy we have reached our goal, an sf object with great circle lines. We can confirm the changes made by st_segmentize() both visually and by looking at the number of coordinates or points in routes_sf_tidy compared to routes_lines. Where routes_lines has 30 set of coordinates — two for each line — routes_sf_tidy has nrow(st_coordinates(routes_sf_tidy)) as a result of st_segmentize().

# Compare number of points in routes_lines and routes_sf_tidy
nrow(st_coordinates(routes_lines))
#> [1] 30
nrow(st_coordinates(routes_sf_tidy))
#> [1] 193

We can visualize the difference between the rhumb lines of routes_lines and the great circles of routes_sf_tidy by plotting them on the same map. Because the lines in this example are relatively short, the differences will be fairly minimal and even imperceptible in some cases. Here, I plot the great circles in black on top of the rhumb lines in magenta to highlight where the differences are visible. The most perceptible difference is the route from Venice to Haarlem. The magenta rhumb lines are also partially visible in the routes to Bremen. Routes over shorter distances such as those between Antwerp and Zeeland or Holland are more similar and the appearance of the magenta rhumb lines are less perceptible.

# Rhumb lines vs great circles
ggplot() +
  geom_sf(data = countries_sf, fill = gray(0.8), color = gray(0.7)) +
  geom_sf(data = routes_lines, color = "magenta") + 
  geom_sf(data = routes_sf_tidy) + 
  coord_sf(xlim = c(2, 14), ylim = c(45, 54), datum = NA) + 
  ggtitle("Rhumb lines vs Great circles") + 
  theme_minimal()

We can also compare the output of gcIntermediate and st_segmentize(). The use of a maximum length of a segment by st_segmentize() instead of the number of segments is the main differentiation between st_segmentize() and gcIntermediate(). The exact coordinates along which the great circles are drawn will necessarily differ between the two functions. In practice, the differences are likely to be unimportant, and in a map like the one we are creating here, the differences between the two sets of great circles will be imperceptible. The limited extent of the difference between the two functions can be seen by plotting routes_sldf and routes_sf_tidy on the same interactive map using the mapview package. This makes it possible to zoom in on the lines and see where they do differ, such as the line between Venice and Haarlem. The differences that are visible are on the scale of meters, and this is without optimizing the alignment for the choices of the number of segments and maximum length of a segment.

# Interactive comparison of gcIntermediate and st_segmentize
library(mapview)

mapview(routes_sldf, color = "magenta") + 
  mapview(routes_sf_tidy, color = "black")

Great circles with sf: for loop method

The above method for creating great circles using sf highlights the integration of sf into the tidyverse, but there are always more than one path to answer a question with code. As I learned more about sf objects, I got interested in more directly creating an sf object with a geometry type of LINESTRING from longitude and latitude data. As with the tidy method, the biggest challenge came from creating LINESTRING features and not in turning rhumb lines into great circles. The method I came up with essentially vectorizes the creation of a single LINESTRING feature through the use of a for loop. This uses a similar step-by-step creation of an sf object — from an sfg, to an sfc object, and then to an sf object — that I have shown in Exploration of Simple Features for R.16

Let’s start by building a single feature with a geometry of LINESTRING using coordinates from Venice and Haarlem to outline the process. The st_linestring() function takes a matrix of two or more sets of coordinates to create an sfg object with a geometry of class LINESTRING. We can construct the matrix of coordinates by row binding two vectors — created using c() — of the longitude and latitude values for Venice and Haarlem.

# Create a line between Venice and Haarlem
st_linestring(rbind(c(12.315515, 45.44085), c(4.646219, 52.38739)))
#> LINESTRING (12.31551 45.44085, 4.646219 52.38739)

It is this process that we want to repeat for each of the 15 routes found in routes. A first instinct might be to create a matrix of the coordinates of all of the routes, but this results in a single line connecting the coordinates. Instead, what we need to do is create 15 matrices that each contain the coordinates for the source and destination of a single route. In other words, we want to take the longitude and latitude of both the source and destination from the first row of routes, bind them together into a 2x2 matrix, and then repeat the process for the second row and so forth. This is the realm of the for loop.

Before getting into the mechanics of the for loop, we have to get the data into a format that is similar to the vectors used to create the line between the coordinates of Venice and Haarlem. A similar workflow to that we used above to prepare the data for the gcIntermediate() function gets us what we are looking for. In this case though, we need matrices of source coordinates and destination coordinates. We could either transform sources_tbl and destinations_tbl into matrices with as.matrix() or rerun the code used to create them and use as.matrix() at the end of the pipeline. Here, I do the latter for greater transparency.

# Matrix of longitude and latitude values of sources
sources_m <- routes %>% 
  left_join(locations, by = c("source" = "place")) %>% 
  select(lon:lat) %>% 
  as.matrix()

# Matrix of longitude and latitude values of destinations
destinations_m <- routes %>% 
  left_join(locations, by = c("destination" = "place")) %>% 
  select(lon:lat) %>% 
  as.matrix()

Now we can deal with creating a for loop.17 There are three components of a for loop: the creation of an object that will contain the output of the for loop, the sequence along which to run the for loop, and the actual code that will be looped. Let’s break this down one part at a time. The goal is to build an object that contains a 2x2 matrix for each route. We should start out by creating an empty vector of the appropriate form and length to hold the output of the for loop. We can do this by creating a list that has a length equal to the number of rows in the routes data frame with the vector() function: linestrings_sfg <- vector(mode = "list", length = nrow(routes)).

The second step is to define a sequence along which the function will loop by creating a variable that holds the place in the sequence and defining the sequence itself. The conventional variable used in a for loop is i. The sequence itself will be similar, if not identical, to the formula used to create the output object. In this case, we want a sequence from 1 — the first row — to 15 — the number of unique routes in our data, which we can determine programmatically by 1:nrow(routes). Thus, the sequence of the for loop is for (i in 1:nrow(routes)) with for acting to choose the type of loop and the formula within the parentheses defining the sequence.

Finally, we are able to construct the code to create the desired output. We already know the inputs, output, and functions to use, which we can summarize as follows: linestrings_sfg <- st_linestring(rbind(sources_m, destinations_m)).

However, we do not want to run this code, as it would create a single sfg object connecting all of the points. Instead, we can get our desired output by using the sequence variable to subset both the output list and input matrices in the code. For each iteration of the loop, we want the output to fill one component of the linestrings_sfg list, which we can access with [[i]]. Concerning the inputs, we want to select one row of coordinates from sources_m and one row of coordinates from destinations_m for each iteration. This is possible with the matrix subsetting method of [row, column]. Thus, we can access the sequence of rows with sources_m[i, ] and destinations_m[i, ].

Putting the above together, we get a complete for loop that creates linestrings_sfg, a list of 15 sfg objects.

# Create empty list object of length equal to number of routes
linestrings_sfg <- vector(mode = "list", length = nrow(routes))

# Define sequence and body of loop
for (i in 1:nrow(routes)) {
  linestrings_sfg[[i]] <- st_linestring(rbind(sources_m[i, ],
                                              destinations_m[i, ]))
}

We can check that the for loop worked by looking at the contents of one value from the list with [[]] subsetting.

# geometry of a single route
linestrings_sfg[[2]]
#> LINESTRING (4.402464 51.21945, 3.610998 51.4988)

The next step is to create an sfc object by transforming the list of sfg objects into a geometry column with st_sfc(). An sfc object is referred to as a geometry set and is itself a list. It is at this step that the lines become geospatial through the addition of a CRS. At this point, we can also convert the rhumb lines to great circles by using st_segmentize() in the same manner as above in the pipeline.

# sfc object of great circles
linestrings_sfc <- st_sfc(linestrings_sfg, crs = 4326) %>% 
  st_segmentize(units::set_units(20, km))

# Print linestrings_sfc
linestrings_sfc
#> Geometry set for 15 features 
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 3.610998 ymin: 45.44085 xmax: 12.31552 ymax: 53.55108
#> Geodetic CRS:  WGS 84
#> First 5 geometries:
#> LINESTRING (4.895168 52.37022, 5.135538 52.4182...
#> LINESTRING (4.402464 51.21945, 4.205503 51.2897...
#> LINESTRING (4.690093 51.8133, 4.67923 51.95682,...
#> LINESTRING (7.206009 53.3594, 7.406604 53.32556...
#> LINESTRING (4.646219 52.38739, 4.514556 52.2768...

linestrings_sfc is a fully geospatial object. The truncated print out of the coordinates for the first five features indicates that the lines have been segmentized and turned into great circles. The final step is to add the attribute data and convert the sfc object to a sf object with st_sf() and the routes data. The st_sf() function effectively adds linestrings_sfc to routes as a geometry column and converts the whole object to class sf and data.frame. Because we have not done anything to rearrange the rows of routes the data will line up correctly.

# Create sf object from data frame and sfc geometry set
routes_sf <- st_sf(routes, geometry = linestrings_sfc)

# Print routes_sf
routes_sf
#> Simple feature collection with 15 features and 3 fields
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 3.610998 ymin: 45.44085 xmax: 12.31552 ymax: 53.55108
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>       source destination n                       geometry
#> 1  Amsterdam      Bremen 1 LINESTRING (4.895168 52.370...
#> 2    Antwerp  Middelburg 1 LINESTRING (4.402464 51.219...
#> 3  Dordrecht     Haarlem 1 LINESTRING (4.690093 51.813...
#> 4      Emden      Bremen 1 LINESTRING (7.206009 53.359...
#> 5    Haarlem  Middelburg 1 LINESTRING (4.646219 52.387...
#> 6    Haarlem   The Hague 1 LINESTRING (4.646219 52.387...
#> 7    Hamburg      Bremen 1 LINESTRING (9.993682 53.551...
#> 8   Het Vlie      Bremen 1 LINESTRING (5.183333 53.3, ...
#> 9      Lisse       Delft 1 LINESTRING (4.557483 52.257...
#> 10   Antwerp   The Hague 2 LINESTRING (4.402464 51.219...

Though they were produced using very different methods, routes_sf_tidy and routes_sf are identical objects apart from the presence of an “id” column in routes_sf_tidy.

Now that we have attribute and spatial data for the great circles of the routes of the letters received by Daniel in 1585, we can create a map that uses color to distinguish the amount of letters sent along each route. Because they are essentially identical, you could use either routes_sf_tidy or routes_sf to create this map. The code to create a ggplot2 map using geom_sf() is very similar to that made above to compare routes_lines and routes_sf_tidy. In this case, though, we want the color of the lines to differ according to the “n” variable. In addition, I change the color palette used for the lines to the viridis palette on a continuous scale, hence the _c at the end of the function. I also label the color legend as “Letters”.

# ggplot2 of great circle routes
ggplot() +
  geom_sf(data = countries_sf, fill = gray(0.8), color = gray(0.7)) +
  geom_sf(data = routes_sf, aes( color = n)) + 
  scale_color_viridis_c() +
  labs(color = "Letters", title = "Great circles with sf") + 
  coord_sf(xlim = c(2, 14), ylim = c(45, 54), datum = NA) + 
  theme_minimal()

Resources

Footnotes

  1. As Mercator explained in legend 6 the distortion of distance is infinite at the poles such that they cannot actually be represented on the map.↩︎

  2. Joaquim Alves Gaspar, “Revisiting the Mercator World Map of 1569: an Assessment of Navigational Accuracy,” The Journal of Navigation 69 (2016): 1183–1196.↩︎

  3. Roy Iliffe, “Science and the Voyages of Discovery,” in The Cambridge History of Science: Eighteenth-Century Science, ed. Roy Porter (Cambridge: Cambridge University Press, 2003).↩︎

  4. See the resources section at the end of the post for a list of posts on creating great circles in R.↩︎

  5. Previous to version 0.6, sf also used the geosphere package to calculate distance and create great circles with geographic coordinates. The change to use the lwgeom package, which is an implementation of the liblwgeom library, aligns sf with the methods used by PostGIS.↩︎

  6. This is hardly surprising. Until the fall of 1585, Daniel’s family mainly resided in Antwerp, and he was serving as a representative to the States General, which was meeting at Delft.↩︎

  7. You can also set whether you want the line to be broken at the International Date Line, but the argument is superfluous in this case.↩︎

  8. Setting sp = FALSE, which is the default, returns a list of matrices containing the latitude and longitude values for the path along the great circle of each route.↩︎

  9. Even if you set the Match.ID argument to FALSE, the lines and attribute data would be correctly aligned as long as you do not change the order of the routes data frame after creating sources_tbl and destinations_tbl.↩︎

  10. This method is similar to the original workflow shown by Martin Hadley in his post on Great circles with sf and leaflet.↩︎

  11. For more information on gather() and spread() see Wickham, R for Data Science, Chapter 12 and Joyce Robbins’s excellent tutorial on gather().↩︎

  12. It is not necessary to place “type” and “place” in quotations, but doing so more clearly identifies these as strings and not as names of already extant objects.↩︎

  13. You do not need to include the geometry column in the group_by() function. The geometry column is sticky and will remain attached to the sf object unless explicitly removed.↩︎

  14. The units package is imported along with sf, so if you have sf, you should also have the units package.↩︎

  15. The equivalent code without the units package would be st_segmentize(routes_lines, 20000).↩︎

  16. See the same post for a more in depth treatment of the vocabulary of simple feature objects.↩︎

  17. If you are familiar with for loops in R, you can skip this discussion of the steps to build the for loop.↩︎