Data Science

Setting up ActivityWatch Watcher-ask on Windows

Ask my computer to ask me questions

Conda command to live by

Export environment conda env export > environment.yml Recreate environment conda env create --name/--proxy xxx --file environment.yml

Install R packages on an offline server

Use renv to automatically manage installation and dependencies

SNP analysis

ID naming TOPMED returns chr:pos:ref:alt. rsid REF:ALT to major:minor translation if (AF == MAF) => ALT = minor, REF = major SNP matching switching flipping SNP: chr:pos:alt:ref How do I know if it matches chr:pos:allele_1:allele_2 First: chr:pos are equivalent Second: Matching four scenarios If it matches, use the target variant id (often the variant in the plink files) as the new id.

Big data in R

Reading files vroom Skip columns: col_types = list(hp = col_integer(), cyl = col_skip(), gear = col_factor())

Use reprex without browsers

The following code will generate a randomly named *_reprex.md file that can be copied for others. reprex::reprex( { library(tibble) library(purrr) library(dplyr) mutate(tibble(a=1),b = map(a, ~ tibble(c=1))) } , outfile = "reprex.html", html_preview=FALSE, session_info = TRUE )

Debugging python with vscode

vscode tutorial create config file select python interpreter put stop points (you can add different conditions by right clicking) move to debug console for interacting with variables the variable will not show up unless you step in once how to add arguments

Base R notes

Many functions in base R has faded away from my daily use of R because of tidyverse and the paradigm to do as many operations as possible in a data.frame. Get the variable name deparse(substitute(variable)) Indexing and subsetting which to return a logical vector that can be used in [] for subsetting Tidyverse alterantive (notes for myself) Imagine that I have a list of data.frames (group_split split a dataframe into lists of dataframes by the value of column specified)

Making tables in Rmarkdown: {DT} and {kableExtra}

{DT} is a package to render html tables. It is an interface to the datatables javascript library. It should not be confused with the {data.table} package, which is a package useful for data wrangling. A similar package of the same purpose is {kableExtra}. I found that {kableExtra} is more suitable for making static tables, whereas {DT} is more suitable for making interactive tables. DT Adding captions DT::datatable(iris[1:10,],caption = htmltools::tags$caption( style = 'caption-side: top; text-align: center; color:black; font-size:200% ;','Table1: Iris Dataset Table') )

Rmarkdown and markdown notes

This contains notes for Rmarkdown and markdown. All notes for markdown are generally applicable for Rmarkdown. markdown Footnote Rmarkdown Markdown extras Adding toggle <details><summary>toggle title</summary> toggle content </details> Image quality knitr::opts_chunk$set(dpi=300) Share html report A report should be self-contained. In xaringan, set self_cotained = TRUE in yaml and download html with Chrome, not Firefox. For DT, downloaded html from browser does not work - see here.