Day - 4 Functional Analysis, Longitudinal & Public Data

4.1 Objectives

  • Interpret DE results biologically
  • Explore longitudinal trajectories
  • Work with public datasets

4.2 Module 1: Functional Enrichment & GSEA

Introduce enrichment (GO, KEGG, Reactome).
GSEA principles and visualization.

# example with clusterProfiler
library(clusterProfiler)
library(org.Hs.eg.db)
# Suppose `de_genes` is a vector of gene IDs with statistics
ego <- enrichGO(de_genes, OrgDb = org.Hs.eg.db, keyType = "ENSEMBL", ont = "BP")
dotplot(ego)
library(lme4)
#> Loading required package: Matrix
df_long <- data.frame(
  sample = rep(1:10, each = 3),
  time = rep(c(0,1,2), times = 10),
  intensity = rnorm(30)
)
m <- lmer(intensity ~ time + (1 + time | sample), data = df_long)
#> boundary (singular) fit: see help('isSingular')
summary(m)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: intensity ~ time + (1 + time | sample)
#>    Data: df_long
#> 
#> REML criterion at convergence: 89.3
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.6730 -0.5858 -0.1967  0.6749  2.3189 
#> 
#> Random effects:
#>  Groups   Name        Variance  Std.Dev. Corr 
#>  sample   (Intercept) 3.558e-06 0.001886      
#>           time        6.985e-06 0.002643 -1.00
#>  Residual             1.129e+00 1.062731      
#> Number of obs: 30, groups:  sample, 10
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)  0.12502    0.30678   0.408
#> time         0.08323    0.23764   0.350
#> 
#> Correlation of Fixed Effects:
#>      (Intr)
#> time -0.775
#> optimizer (nloptwrap) convergence code: 0 (OK)
#> boundary (singular) fit: see help('isSingular')

4.2.1 Exercise

On a longitudinal proteomics dataset, plot trajectories per protein or per cluster, fit simple linear/mixed model.

4.3 Module 2: Public Data Integration

Demonstrate downloading, cleaning, integrating a public proteomics or expression dataset (e.g. from PRIDE, GEO). Run pipeline: QC → normalization → DE → enrichment.

4.3.1 Exercise

Pick a public dataset, import into R, preprocess, analyze, interpret results.