Working with NA in R

NA are necessary markers for missing data. However, Working with them can be tricky because of their special properties. Care should also be taken when reading in and presenting the data.

Properties of NA

Types

There are different types of NA that are denoted by the NA_*. This shhould be noted when working with NA data in a data.frame. Operations like case_when require all output data to be of the same type.

Infection

NAs can be infectious in operations i.e. including them will make the result from logical and math operations NA. The result in string processing is more complicated because of base R does not have many functions for string processing, so it depends on the implementation of the libraries that you are using.

paste0 and glue::glue converts NA to strings "NA" whereas stringr::str_c retains the infectious property

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.5
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
paste0(NA, 1)
## [1] "NA1"
glue::glue("{NA}{1}")
## NA1
stringr::str_c(NA, 1)
## [1] NA

Reading in

NA can be represented by many symbols in human readable files e.g. “.”, ” “. To clean up these values and convert them to NA, one can use naniar::replace_with_na(). I do not think this has been integrated well into the mutate across syntax yet, so this is what I use:

tibble(x= ".")
## # A tibble: 1 x 1
##   x    
##   <chr>
## 1 .
tibble(x= ".") %>% 
naniar::replace_with_na_all(~ .x == ".") 
## # A tibble: 1 x 1
##   x    
##   <chr>
## 1 <NA>

Presenting

When presenting the data, the audience may not be R trained and may not understand what does NA mean. Changing it to a text like “missing” may help to bridge the gap.

tidyr::replace_na() together with mutate mutate(across(everything, ~replace_na(., "missing")))

tibble::tibble(a = NA)
## # A tibble: 1 x 1
##   a    
##   <lgl>
## 1 NA
tibble::tibble(a = NA) %>% 
  mutate(across(everything(), ~replace_na(., "missing")))
## # A tibble: 1 x 1
##   a      
##   <chr>  
## 1 missing
Avatar
Tim

Personalizing medicine

Related