Data combination
Warning
This page is in state from the spring semester 25, and will undergo changes for the actual semester.
General R functions
Basic string manipulation
The most common string manipulation functions in R are paste, paste0, gsub, and substr. These are useful for data cleaning and preparation, and working with paths, filenames, URLs etc.
paste and paste0
paste and paste0 are used to concatenate strings. They are similar, paste0 is a shortcut for paste without sep argument, while paste has default sep = " ".
gsub
gsub is used to replace all occurrences of a pattern in a string. It uses regular expressions.
substr
substr is used to extract a substring from a string
my_date <- "20250409"
year <- substr(date, 1, 4)
month <- substr(date, 5, 6)
day <- substr(date, 7, 8)
substr can be also used to replace a sring
Note
Notice that good practices to work with dates is use dedicated functions. In base R you can use as.Date() or as.POSIXct(). We will cover the date manipulation in more details in the future.
Basic automatisation - for loops
Note
If you know number of elements in desired list, than more efficient way to preallocate the vector is to use vector() function. This avoids the resizing the list in each iteration, which can be slower in large loops.
Warning
Try to avoid appending/combining a resulted vector or list in each iteration of the loop. Its works fine for small loops, but can be slow and inefficient for large loops, due to creating a new vector in each iteration in memory.
Tip
Get the good practices, but do not be afraid to use the code that works for you. Be aware of possible performance improvements when your code goes slow.
Downloading data from a list of urls
Example of downloading files from a list of urls. We will download data from LPIS (https://mze.gov.cz/public/app/eagriapp/lpisdata/) and save them to the local data directory.
URL for the zip files are structured as follows: https://mze.gov.cz/public/app/eagriapp/lpisdata/20250405-713830-DPB-SHP.zip
Where 20250405 is the date of the data, 713830 is the code of the municipality, and DPB-XML-A is the type of data.
So we can use paste0 to create the url for each municipality and date.
municip_codes <- c(682675, 683434, 683442, 683451)
root_url <- "https://mze.gov.cz/public/app/eagriapp/lpisdata/20250405-"
for (municip_code in municip_codes) {
url <- paste0(root_url, municip_code, "-DPB-SHP.zip")
destfile <- paste0("data/", municip_code, "-DPB-SHP.zip")
download.file(url, destfile)
}
You can enchance the code by unzipping the files after downloading and deleting the zip files.
for (municip_code in municip_codes) {
url <- paste0(root_url, municip_code, "-DPB-SHP.zip")
destfile <- paste0("data/", municip_code, "-DPB-SHP.zip")
download.file(url, destfile)
# Unzip the file
unzip(destfile, exdir = "data/")
# Delete the zip file
file.remove(destfile)
}
Spatial data combination
Read the data from the local directory. Note that we will use terra package also for reading the shapefiles.
library(terra)
#raster
r <- rast("data/eudem.tif")
names(r) <- "elev"
#vector
shp_paths <- list.files("data", pattern = ".shp$", full.names = TRUE)
lpis <- vect(shp_paths[1])
Tip
conversion between sf and terra vector objects is easy with vect() (terra package) and st_as_sf() (sf package) functions.
crop raster data with vector data
Same as in previous example, we can crop the raster data with the vector data, as the function takes the extent object from the raster or vector data.
Tha data can be also masked with the vector data, which means that the values outside the vector data will be set to NA.
Than you can summarize the raster data only for the area of interest defined by the vector data.Or do other calculations
zonal statistics
For zonal statistics, we can use zonal() function, with fun argument to specify the statistics we want to calculate. Default is mean, see ?zonal for more options.
If you need the original polygon object with appended statistics, you can use as.polygon = TRUE argument.
task: whats the difference between original elevation in VYSKA and recalculated elevation in elev?
raster value extraction
To extract raster values at specific features or raster pixels, we can use extract() function. It can be used with any vector object, not only points. In case of polygons its similar to zonal statistics. See ?extract for more options.
Get raster vales for lpis centroids
lpis_pts <- centroids(lpis)
extract(r, lpis_pts)
#or
elev_pts <- extract(r, lpis_pts, bind = TRUE)
plot(r_cropped)
plot(lpis, add = TRUE)
plot(elev_pts, add = TRUE, col = "red")