Skip to content

Vector data

In R we can work with vector data using many packages, but the most popular and widely used is sf package, and we will use it in following lessons. Other packages used for spatial vector data processing are stars and terra, and discontinued sp and raster packages. Each from this packages has its own data classes, but they can be easily converted between each other, which can be useful, since some other packages may be dependent on data classes from specific package.

Exercise

In this exercise we will work with National Parks and Protected Landscape Areas and aditionall informations (similar to previous lesson). We will read the data, explore it and create some plots.

Data

  • shapefile with National Parks and Protected Landscape Areas - download the data form here and extract it to data/vzchu directory.

  • table with additional informations - similarly to previous lesson, get the csv table from https://drusop.nature.cz/, but this time with Velkoplošná zvláště chráněná území - National Parks and Protected Landscape Areas. Save the table to data/ directory.

Reading data and general exploration

Load sf package (if not installed, install it with install.packages("sf") in console). So this line should be in the beginning of your script.

library(sf)

To read the data we will use st_read() function, in same way as we read the csv table with read.csv().

vzchu <- st_read("data/vzchu/vzchu.shp")

More layers in a file

If you work with files that can store multiple layers (like geopackage), you have to specify which layer to read by adding layer argument. To see available layers in file use st_layers function.

Check the structure of the data

vzchu
str(vzchu)
summary(vzchu)

head(vzchu)
names(vzchu)

You can see that the data is stored as sf object, which is data.frame with additional attributes. The geometry column contains the geometries of the features.

So we can hadnle the data as data.frame and use many functionality we know from working with data.frame.

for xample:

get vector of unique values in column KAT

unique(vzchu$KAT)

subsetting the columns to get only KAT and NAZEV columns

vzchu[,c("KAT", "NAZEV")]

subsetting the rows to get only National Parks

vzchu[vzchu$KAT == "NP",]

Technical note

You can drop the geometries and get plain data.frame without geometries and vice versa.

drop geometries

vzchu_df <- as.data.frame(vzchu)
# or
vzchu_df <- st_drop_geometry(vzchu)

drop data.frame

vzchu_sfc <- vzchu$geometry

Plotting

In this point we learn how to show simple plots of the data (spatial and non-spatial). We will use built-in plot function, which is able to plot sf objects, and try other types of plots for data barplot and hist (histogram).

Basic built-in plot functions are good for quick visualization or for simple plots, but later in the course we dedicate a whole lesson to more advanced plotting with ggplot2 package and other packages for visualizations spatial data.

plot two numeric columns

plot(vzchu$SHAPE_AREA, vzchu$SHAPE_LEN)

Advanced

regression line (linear model fit lm) can be easily added to the plot using abline function

area_len_lm <- lm(vzchu$SHAPE_LEN ~ vzchu$SHAPE_AREA)

plot(vzchu$SHAPE_AREA, vzchu$SHAPE_LEN)
abline(area_len_lm)

histogram of SHAPE_AREA column

hist(vzchu$SHAPE_AREA)

simple barplot of categories in KAT column. There is need to count the values first with table function.

barplot(table(vzchu$KAT))

Plotting sf objects

You can use just plot with sf object, but this will plot all columns in the data (see ?plot.sf).

plot(vzchu)

If you want to plot only specific column, you can use plot with subset using [] and name of single column. If you want to plot only geometries, you can use plot with geometry column.

plot(vzchu["KAT"])
plot(vzchu["geometry"])
#or
plot(vzchu$geometry)
#or
plot(st_geometry(vzchu))

Some examples of use of subsetting for plotting data (same as with data.frame)

vzchu[vzchu$NAZEV == "Šumava","KAT"]
plot(vzchu[vzchu$NAZEV == "Šumava","KAT"])
plot(vzchu[vzchu$NAZEV == "Beskydy","geometry"])

plot(vzchu[vzchu$NAZEV == "Labské pískovce","geometry"])

Do you remeber the difference of [] and $ operators?

[] - keeping classes, with data.frame returns the data.frame, and now with sf object - returns sf object. `$ - subset which always returns the vector of the column.

vzchu["SHAPE_AREA"]
thus plot use sf object

plot(vzchu["SHAPE_AREA"])

$ - subset which always returns the data as numeric vector

vzchu$SHAPE_AREA
plot use numeric vector

plot(vzchu$SHAPE_AREA)

but! you can use $ with sf object to get spatial data (sfc class) with the geometry column, since it is special sfc class column (see str(vzchu) above)

vzchu$geometry
thus plotting:

plot(vzchu["geometry"])
plot(vzchu$geometry) # same as `plot(st_geometry(vzchu))`
looks the same, but the first one is sf object and the second one is sfc object

Spatial proeprties

Now we will explore some spatial properties of the data.

Bounding box

Bounding box is the rectangle that encloses all the geometries in the data.

st_bbox(vzchu)

CRS - coordinate reference system

See the definition of the coordinate reference system (CRS)

st_crs(vzchu)

You can transform sf data to other CRS with st_transform function. For example we can change the CRS to WGS84 (EPSG:4326)

st_transform(vzchu, 4326)
notice the transformed geometry column

Basic spatial calculations

Here we will calculate the area and the perimeter of the features.

st_area - calculate the area

st_area(vzchu)

st_periemeter - calculate the perimeter

st_perimeter(vzchu)

add the area and perimeter to the data

vzchu$area <- st_area(vzchu)
vzchu$perimeter <- st_perimeter(vzchu)

also you can calculate some shape index (perimeter to area ratio) directly, for example

vzchu$shape <- st_perimeter(vzchu) / st_area(vzchu)

Joining attribute data (data.frame)

Now we will join the additional informations from the table to the sf object. We will use merge function, which is used for merging data frames by matching column values. So you need some matching unique identifier in both datasets. In this example we will join the information about the date of the first declaration of the National Park or Protected Landscape Area, and minimum elevation.

df <- read.csv("data/export.csv")

Since this is similar data as last lesson, we can see similar problems. We resolve this using the part of the code from last lesson.

df <- read.csv("data/export.csv", sep = ";", dec = ",")
df <- df[!is.na(df$Rozloha..ha), ]

Before join, we have to check the column names, and also the values in the columns we want to join by, to see if they match.

str(df)
str(vzchu)

Now we can see that the column with unique identifier in df is Kód, and in vzchu is KOD, so we can use these columns for joining.

Also we can filter out desired columns in df data frame, to keep only the columns we want to join with sf object

df <- df[,c("Kód",  "Nadmořská.výška.min.")]

While using the merge() function, we have to specify: - the data frames to merge, - x - the first object (the sf object in this case) - the result will have the same class as this object - y - the second object (the data.frame object in this case) - by.x - the column name in the first object (x) to join by - by.y - the column name in the second object (y) to join by

vzchu_merged <- merge(vzchu, df, by.x = "KOD", by.y = "Kód")

in case the column names are the same in both data frames, you can use by argument

names(df)[1] <- "KOD" # rename the column in df to match the column in vzchu
vzchu_merged <- merge(vzchu, df, by = "KOD")
vzchu_merged

plot the results

plot(vzchu_merged["Nadmořská.výška.min."])

Writing data

Finally we will write the data to the file. We will use st_write function, which is used for writing sf objects to the file.

st_write(vzchu_merged, "output/vzchu_out.shp")

If you want to owerwrite the existing fil you can control the behavior with append or delete_dsn arguments, see ?st_write for more details.

st_write(vzchu_merged, "output/vzchu_out.shp", append = FALSE)
# or 
st_write(vzchu_merged, "output/vzchu_out.shp", delete_dsn = TRUE)

Summary

Main outcomes

  • read and write spatial vector data with st_read and st_write
  • treat sf objects as data.frame and use the same functionality for data manipulation
  • plot spatial and non-spatial data with basic plot functions
  • perform basic exploration of the spatial data - calculate bounding box, check and transform the CRS, calculate area and perimeter
  • joining tables with merge function

Function overview

General

  • names() - return the names of the columns in a data frame
  • plot(), hist(), barplot() - basic plots
  • merge() - merge two data frames by matching column values

Spatial - sf package

  • st_read() - read a spatial file and create an sf object
  • st_bbox() - calculate the bounding box of the geometries
  • st_crs() - return the coordinate reference system of the data
  • st_transform() - transform the data to another coordinate reference system
  • st_area() - calculate the area of the geometries
  • st_perimeter() - calculate the perimeter of the geometries
  • st_write() - write an sf object to a spatial file

Practice exploration

  • Plot only National Parks