Analysis of output from MIrROR (see jupyter notebook for bioinformatics steps)
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ───────────────────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.1 ✔ purrr 1.0.1
✔ tibble 3.1.8 ✔ dplyr 1.1.0
✔ tidyr 1.3.0 ✔ stringr 1.5.0
✔ readr 2.1.4 ✔ forcats 1.0.0
── Conflicts ──────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(phyloseq)
library(microbiome)
microbiome R package (microbiome.github.com)
Copyright (C) 2011-2022 Leo Lahti,
Sudarshan Shetty et al. <microbiome.github.io>
Attaching package: ‘microbiome’
The following object is masked from ‘package:ggplot2’:
alpha
The following object is masked from ‘package:base’:
transform
library(vegan)
Loading required package: permute
Loading required package: lattice
This is vegan 2.6-4
Attaching package: ‘vegan’
The following object is masked from ‘package:microbiome’:
diversity
import the OTU table files- these have sample name in column name with abundance
b01 <- read_delim(file = "results/b01/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 161 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b01.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b02 <- read_delim(file = "results/b02/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 1219 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b02.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b03 <- read_delim(file = "results/b03/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 112 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b03.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b04 <- read_delim(file = "results/b04/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 108 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b04.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b05 <- read_delim(file = "results/b05/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 107 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b05.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b06 <- read_delim(file = "results/b06/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 271 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b06.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b07 <- read_delim(file = "results/b07/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 267 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b07.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b08 <- read_delim(file = "results/b08/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 115 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b08.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b09 <- read_delim(file = "results/b09/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 309 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b09.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b10 <- read_delim(file = "results/b10/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 199 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b10.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b11 <- read_delim(file = "results/b11/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 198 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b11.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
b12 <- read_delim(file = "results/b12/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t", col_types = NULL)
Rows: 7 Columns: 2
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): #Name
dbl (1): b12.fa
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# b01 <- read_delim(file = "results/b01/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b02 <- read_delim(file = "results/b02/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b03 <- read_delim(file = "results/b03/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b04 <- read_delim(file = "results/b04/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b05 <- read_delim(file = "results/b05/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b06 <- read_delim(file = "results/b06/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b07 <- read_delim(file = "results/b07/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b08 <- read_delim(file = "results/b08/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b09 <- read_delim(file = "results/b09/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b10 <- read_delim(file = "results/b10/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b11 <- read_delim(file = "results/b11/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
# b12 <- read_delim(file = "results/b12/OTUtable/OUTPUT_std.txt", col_names = TRUE, delim = "\t ")
join all samples into one variable, otutable
otutable <- full_join(x = b01, y = b02, by = "#Name")
otutable <- full_join(x = otutable, y = b03, by = "#Name")
otutable <- full_join(x = otutable, y = b04, by = "#Name")
otutable <- full_join(x = otutable, y = b05, by = "#Name")
otutable <- full_join(x = otutable, y = b06, by = "#Name")
otutable <- full_join(x = otutable, y = b07, by = "#Name")
otutable <- full_join(x = otutable, y = b08, by = "#Name")
otutable <- full_join(x = otutable, y = b09, by = "#Name")
otutable <- full_join(x = otutable, y = b10, by = "#Name")
otutable <- full_join(x = otutable, y = b11, by = "#Name")
otutable <- full_join(x = otutable, y = b12, by = "#Name")
modify column names from “b01.fa” to just “b01” etc
colnames(otutable) <- c("Name","b01","b02","b03","b04","b05","b06","b07","b08","b09","b10","b11","b12")
export to txt file
# dir.create("results2")
write_delim(x = otutable, file = "results2/otutable.txt")
reformat so can be loaded into phyloseq
otutable[is.na(otutable)] <- 0
otumat <- as.matrix(otutable[,2:ncol(otutable)])
rownames(otumat) <- paste0("OTU",1:nrow(otumat))
head(otumat)
b01 b02 b03 b04 b05 b06 b07 b08 b09 b10 b11 b12
OTU1 1 2 0 0 0 1251 0 1 0 0 3 0
OTU2 1 301 0 4 3 6124 0 4 0 5 10 0
OTU3 5 0 1 2 0 2 488 0 0 0 2 0
OTU4 1 0 0 0 0 0 0 0 0 0 0 0
OTU5 45 0 0 0 1 0 0 0 0 0 0 0
OTU6 169 0 0 0 0 0 0 0 0 0 1 0
taxmat <- otutable[,1] %>% mutate(Name = str_remove_all(Name, pattern = ".__"))
taxmat <- taxmat %>%
separate(Name, sep = (";"), c("Phylum", "Class","Order","Family","Genus","Species"))
rownames(taxmat) <- paste0("OTU",1:nrow(otumat))
Warning: Setting row names on a tibble is deprecated.
taxmat <- as.matrix(taxmat)
head(taxmat)
Phylum Class Order Family Genus
OTU1 "Actinobacteriota" "Acidimicrobiia" "Microtrichales" "Ilumatobacteraceae" "Ilumatobacter"
OTU2 "Actinobacteriota" "Acidimicrobiia" "Microtrichales" "Ilumatobacteraceae" "Ilumatobacter"
OTU3 "Actinobacteriota" "Actinobacteria" "Actinomycetales" "Actinomycetaceae" "Actinomyces"
OTU4 "Actinobacteriota" "Actinobacteria" "Actinomycetales" "Actinomycetaceae" "Pauljensenia"
OTU5 "Actinobacteriota" "Actinobacteria" "Actinomycetales" "Actinomycetaceae" "Pauljensenia"
OTU6 "Actinobacteriota" "Actinobacteria" "Actinomycetales" "Actinomycetaceae" "Pauljensenia"
Species
OTU1 "Ilumatobacter_coccineus"
OTU2 "Ilumatobacter_nonamiensis"
OTU3 "Actinomyces_oris"
OTU4 "Pauljensenia_cellulosilytica"
OTU5 "Pauljensenia_odontolyticus"
OTU6 "Pauljensenia_pacaensis"
Combine into phyloseq object
OTU <- otu_table(otumat, taxa_are_rows = TRUE)
TAX <- tax_table(taxmat)
ps <- phyloseq(OTU, TAX)
ps
phyloseq-class experiment-level object
otu_table() OTU Table: [ 1731 taxa and 12 samples ]
tax_table() Taxonomy Table: [ 1731 taxa by 6 taxonomic ranks ]
Test some plots
ps_phyla <- tax_glom(ps, "Phylum")
phylacountsplot <- plot_bar(ps_phyla, fill = "Phylum")
phylacountsplot
ggsave(plot = phylacountsplot, filename = "results2/phylacountsplot.png")
Saving 7.29 x 4.5 in image
These are on the order of magnitude of reads I get with Illumina data so that’s great. Sample B11 is the negative control- low # of hits makes sense. B1 and B10 are same oyster- diversity looks the same. Great.
make rel abun table
# convert to rel abun ("compositional")
ps_compositional <-
microbiome::transform(ps, transform = "compositional")
ps_compositional_phyla <- tax_glom(ps_compositional, "Phylum")
phylumplot <- plot_bar(ps_compositional_phyla, fill = "Phylum")
phylumplot
ggsave(plot = phylumplot, filename = "results2/phylumplot.png")
Saving 7.29 x 4.5 in image
tab <- otu_table(ps)
class(tab) <- "matrix"
Warning in class(tab) <- "matrix" :
Setting class(x) to "matrix" sets attribute to NULL; result will no longer be an S4 object
rarecurve(t(tab), step=50, cex=0.5)
# export
png(file="results2/rarefaction.png")
rarecurve(t(tab), step=50, cex=0.5)
dev.off()
png
2
Most look OK. Negative control is worst one (makes sense). Rest, except maybe b01 and b10, have reached plateau
shannons <- tibble(Div = diversity(x = t(otumat), index = "shannon"), samples = colnames(otumat))
shannons
shannonplot <- ggplot(shannons) +
geom_col(aes(y = as.numeric(Div), x = samples), fill = "aquamarine3") +
ylab("Shannon's Diversity Index") + xlab("")
shannonplot
ggsave(plot = shannonplot, filename = "results2/shannonsplot.png")
Saving 7.29 x 4.5 in image
# total reads annotated
sum(otutable[,2:length(otutable)])
[1] 955323
# No. of unique species in whole dataset
dim(otutable)[1]
[1] 1731
# total reads per species, across all samples
otutable %>%
mutate(sum = rowSums(across(where(is.numeric))))
# total reads per sample
otutable %>%
summarise(across(where(is.numeric),sum))
NA
How many reads were not annotated?
Confidence in alignment and annotation?
- minimapper2 documentation - MIRrROR repo -