Vector data
In R we can work with vector data using many packages, but the most popular and widely used is sf package, and we will use it in following lessons. Other packages used for spatial vector data processing are stars and terra, and discontinued sp and raster packages. Each from this packages has its own data classes, but they can be easily converted between each other, which can be useful, since some other packages may be dependent on data classes from specific package.
Exercise
In this exercise we will work with National Parks and Protected Landscape Areas and aditionall informations (similar to previous lesson). We will read the data, explore it and create some plots.
Data
-
shapefile with National Parks and Protected Landscape Areas - download the data form here and extract it to
data/vzchudirectory. -
table with additional informations - similarly to previous lesson, get the
csvtable from https://drusop.nature.cz/, but this time with Velkoplošná zvláště chráněná území - National Parks and Protected Landscape Areas. Save the table todata/directory.
Reading data and general exploration
Load sf package (if not installed, install it with install.packages("sf") in console). So this line should be in the beginning of your script.
To read the data we will use st_read() function, in same way as we read the csv table with read.csv().
More layers in a file
If you work with files that can store multiple layers (like geopackage), you have to specify which layer to read by adding layer argument. To see available layers in file use st_layers function.
Check the structure of the data
You can see that the data is stored as sf object, which is data.frame with additional attributes. The geometry column contains the geometries of the features.
So we can hadnle the data as data.frame and use many functionality we know from working with data.frame.
for xample:
get vector of unique values in column KAT
subsetting the columns to get only KAT and NAZEV columns
subsetting the rows to get only National Parks
Technical note
You can drop the geometries and get plain data.frame without geometries and vice versa.
drop geometries
drop data.frame
Plotting
In this point we learn how to show simple plots of the data (spatial and non-spatial). We will use built-in plot function, which is able to plot sf objects, and try other types of plots for data barplot and hist (histogram).
Basic built-in plot functions are good for quick visualization or for simple plots, but later in the course we dedicate a whole lesson to more advanced plotting with ggplot2 package and other packages for visualizations spatial data.
plot two numeric columns
Advanced
regression line (linear model fit lm) can be easily added to the plot using abline function
histogram of SHAPE_AREA column
simple barplot of categories in KAT column. There is need to count the values first with table function.
Plotting sf objects
You can use just plot with sf object, but this will plot all columns in the data (see ?plot.sf).
If you want to plot only specific column, you can use plot with subset using [] and name of single column. If you want to plot only geometries, you can use plot with geometry column.
Some examples of use of subsetting for plotting data (same as with data.frame)
Do you remeber the difference of [] and $ operators?
[] - keeping classes, with data.frame returns the data.frame, and now with sf object - returns sf object. `$ - subset which always returns the vector of the column.
sf object
$ - subset which always returns the data as numeric vector
numeric vector
but! you can use $ with sf object to get spatial data (sfc class) with the geometry column, since it is special sfc class column (see str(vzchu) above)
sf object and the second one is sfc object
Spatial proeprties
Now we will explore some spatial properties of the data.
Bounding box
Bounding box is the rectangle that encloses all the geometries in the data.
CRS - coordinate reference system
See the definition of the coordinate reference system (CRS)
You can transform sf data to other CRS with st_transform function. For example we can change the CRS to WGS84 (EPSG:4326)
Basic spatial calculations
Here we will calculate the area and the perimeter of the features.
st_area - calculate the area
st_periemeter - calculate the perimeter
add the area and perimeter to the data
also you can calculate some shape index (perimeter to area ratio) directly, for example
Joining attribute data (data.frame)
Now we will join the additional informations from the table to the sf object. We will use merge function, which is used for merging data frames by matching column values. So you need some matching unique identifier in both datasets. In this example we will join the information about the date of the first declaration of the National Park or Protected Landscape Area, and minimum elevation.
Since this is similar data as last lesson, we can see similar problems. We resolve this using the part of the code from last lesson.
Before join, we have to check the column names, and also the values in the columns we want to join by, to see if they match.
Now we can see that the column with unique identifier in df is Kód, and in vzchu is KOD, so we can use these columns for joining.
Also we can filter out desired columns in df data frame, to keep only the columns we want to join with sf object
While using the merge() function, we have to specify:
- the data frames to merge,
- x - the first object (the sf object in this case) - the result will have the same class as this object
- y - the second object (the data.frame object in this case)
- by.x - the column name in the first object (x) to join by
- by.y - the column name in the second object (y) to join by
in case the column names are the same in both data frames, you can use by argument
names(df)[1] <- "KOD" # rename the column in df to match the column in vzchu
vzchu_merged <- merge(vzchu, df, by = "KOD")
plot the results
Writing data
Finally we will write the data to the file. We will use st_write function, which is used for writing sf objects to the file.
If you want to owerwrite the existing fil you can control the behavior with append or delete_dsn arguments, see ?st_write for more details.
st_write(vzchu_merged, "output/vzchu_out.shp", append = FALSE)
# or
st_write(vzchu_merged, "output/vzchu_out.shp", delete_dsn = TRUE)
Summary
Main outcomes
- read and write spatial vector data with
st_readandst_write - treat
sfobjects asdata.frameand use the same functionality for data manipulation - plot spatial and non-spatial data with basic plot functions
- perform basic exploration of the spatial data - calculate bounding box, check and transform the CRS, calculate area and perimeter
- joining tables with
mergefunction
Function overview
General
names()- return the names of the columns in a data frameplot(),hist(),barplot()- basic plotsmerge()- merge two data frames by matching column values
Spatial - sf package
st_read()- read a spatial file and create ansfobjectst_bbox()- calculate the bounding box of the geometriesst_crs()- return the coordinate reference system of the datast_transform()- transform the data to another coordinate reference systemst_area()- calculate the area of the geometriesst_perimeter()- calculate the perimeter of the geometriesst_write()- write ansfobject to a spatial file
Practice exploration
- Plot only National Parks