About

I am a PhD student at Department of Statistics, University of Wisconsin-Madison. My research focuses on designing statistical methods to detect non-coding risk variants in neurodevelopmental disorder.

Email: khuang82@wisc.edu
CV: CV

Education

  • University of Wisconsin-Madison
    Doctor of philosophy in Statistics
    Advisor: Dr. Qiongshi Lu (Lu lab)
    Dissertation: “Statistical methods for linking noncoding genetic variations to target genes with application in neurodevelopmental disorders”

Skills

  • Operation system: Linux/Unix
  • Programming languages: R, Python, Java
  • Database tools: MySQL

Projects

Transcriptome-wide transmission disequilibrium analysis identifies novel risk genes for autism spectrum disorder

  • Conducted a large-scale transcriptome-wide association study on autism spectrum disorder (ASD) genetics data
  • Developed and optimized pipeline using R libraries tidyverse and data.table (project code)
  • Designed parallel computing strategy on high-performance computing environment HTCondor
  • Implemented pseudo sibling matching and conditional logistic regression models for analyzing ASD trios
  • Specified and confirmed the epigenetic role of transcription factor gene POU3F2 on ASD risk
  • Performed DNase-I network analysis and demonstrated excess damaging de novo variants in known ASD genes regulated by POU3F2
  • Identified heritability enrichments in POU3F2 binding sites using LD score regression
  • Evaluated the differential expression in adult and fetal brain for POU3F2

Integrating enhancer-promoter interaction with ASD GWAS signals

  • Constructed variance tests for Hi-C enhancer-promoter interaction on GWAS scores to pinpoint genes distantly regulating ASD risks
  • Adapted variance quantitative loci (vQTL) methods to denoise the Hi-C interaction data and achieved model robustness
  • Processed the Hi-C raw data to interpretable format using Fit-Hi-C
  • Integrated and curated resources from NCBI and UCSC
  • Provided high throughput computing and human genetics trainings to undergraduate students

Modelling prenatal and postnatal brain eQTLs together via fused LASSO

  • Modelled expression quantitative loci (eQTL) in adult and fetal brain together to boost power in fetal studies using fused LASSO model
  • Designed a fast coordinate descent algorithm taking eQTL summary statistics in each developmental stage as input
  • Implemented the algorithm as a python scikit-learn compatible module

Genome-wide association study for congenital heart disease

  • Curated next generation sequencing (NGS) data using Plink and bcftools
  • Combined the association results between microarray and NGS studies using meta-analysis tool METAL and R

Identifying ovarian cancer biomarkers with different disease progress

  • Collected RNA-seq data and ovarian cancer progress information using Bioconductor on R
  • Performed gene expression cluster analysis using weighted correlation network analysis (WGCNA) and Gene Ontology (GO)
  • Validated the cancer biomarkers using survival models in different patient groups