Intro
This is a documentation of how I split a string type column by its length, and combine them together in a directory format (which was a necessary step for me to check whether each directory existed in my analysis).
library(tidyverse)
data <- tibble(string = c("123456", "987654"))
print(data)
## # A tibble: 2 x 1
## string
## <chr>
## 1 123456
## 2 987654
Step 1
strsplit
splits the string into a list of strings, and in tibble
it will show up as a column of list type.
split_data <-
data %>%
mutate(split_str = strsplit(string, "(?<=.{2})", perl = TRUE))
print(split_data)
## # A tibble: 2 x 2
## string split_str
## <chr> <list>
## 1 123456 <chr [3]>
## 2 987654 <chr [3]>
Step 2
First method: combine string + unnest
split_data %>%
mutate(split_str_dir = map(split_str, ~ str_c(., collapse = "/"))) %>%
unnest(split_str_dir)
## # A tibble: 2 x 3
## string split_str split_str_dir
## <chr> <list> <chr>
## 1 123456 <chr [3]> 12/34/56
## 2 987654 <chr [3]> 98/76/54
Second method: unnest (wider) + unite
split_data %>%
unnest_wider(split_str, names_sep = "_") %>%
unite(split_str_dir, starts_with("split_str"), sep = "/")
## # A tibble: 2 x 2
## string split_str_dir
## <chr> <chr>
## 1 123456 12/34/56
## 2 987654 98/76/54
Outro
In my opinion the second method is more straightforward in syntax but it requires someone to know the existence of unnest_wider
(how many problems in programming are due to unknown unknowns?).
The first method requires some understanding of functional programming syntax i.e. map
and ~
. It also requires someone to understand the difference between str_c
’s parameters: sep
and collapse
.