Using data from Green et al.: Green TJ, Siboni N, King WL, Labbate M, Seymour JR, Raftos D. 2019. Simulated Marine Heat Wave Alters Abundance and Structure of Vibrio Populations Associated with the Pacific Oyster Resulting in a Mass Mortality Event. Microb Ecol. 77(3):736–747. doi:10.1007/s00248-018-1242-9.
I followed the BVCN lesson 3a and lesson 3b pipelines. Lesson 3a uses QIIME2, calling DADA2 for ASV assignment, and using a Naive Bayes classifier trained on Silva 132 for taxonomy assignment (data are in qiime2_export
) Lesson 3b uses DADA2 directly in R (I used the Cyverse app I created), and IDTAXA (from Decipher) trained in Silva 138 for taxonomy assignment (data are in ‘dada2_export’)
Modifications:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.1 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(phyloseq)
library(dada2)
## Loading required package: Rcpp
library(Biostrings)
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which, which.max, which.min
## Loading required package: S4Vectors
## Warning: package 'S4Vectors' was built under R version 3.6.3
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:tidyr':
##
## expand
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following object is masked from 'package:phyloseq':
##
## distance
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## The following object is masked from 'package:purrr':
##
## reduce
## Loading required package: XVector
##
## Attaching package: 'XVector'
## The following object is masked from 'package:purrr':
##
## compact
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
library(DECIPHER)
## Loading required package: RSQLite
library(phangorn)
## Loading required package: ape
##
## Attaching package: 'ape'
## The following object is masked from 'package:Biostrings':
##
## complement
library(readr)
library(seqinr)
##
## Attaching package: 'seqinr'
## The following objects are masked from 'package:ape':
##
## as.alignment, consensus
## The following object is masked from 'package:Biostrings':
##
## translate
## The following object is masked from 'package:dplyr':
##
## count
library(decontam)
library(ape)
library(vegan)
## Loading required package: permute
##
## Attaching package: 'permute'
## The following object is masked from 'package:seqinr':
##
## getType
## Loading required package: lattice
## This is vegan 2.5-6
##
## Attaching package: 'vegan'
## The following objects are masked from 'package:phangorn':
##
## diversity, treedist
library(RColorBrewer)
library(microbiome)
##
## microbiome R package (microbiome.github.com)
##
##
##
## Copyright (C) 2011-2020 Leo Lahti,
## Sudarshan Shetty et al. <microbiome.github.io>
##
## Attaching package: 'microbiome'
## The following object is masked from 'package:vegan':
##
## diversity
## The following object is masked from 'package:phangorn':
##
## diversity
## The following object is masked from 'package:Biostrings':
##
## coverage
## The following objects are masked from 'package:IRanges':
##
## coverage, transform
## The following object is masked from 'package:S4Vectors':
##
## transform
## The following object is masked from 'package:ggplot2':
##
## alpha
## The following object is masked from 'package:base':
##
## transform
library(DESeq2)
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
## Warning: package 'GenomeInfoDb' was built under R version 3.6.3
## Loading required package: SummarizedExperiment
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:phyloseq':
##
## sampleNames
## Loading required package: DelayedArray
## Warning: package 'DelayedArray' was built under R version 3.6.3
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
##
## anyMissing, rowMedians
## The following object is masked from 'package:seqinr':
##
## count
## The following object is masked from 'package:dplyr':
##
## count
## Loading required package: BiocParallel
##
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
##
## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following object is masked from 'package:purrr':
##
## simplify
## The following objects are masked from 'package:base':
##
## aperm, apply, rowsum
These metadata can be used with both qiime2 and dada2 results.
For this Bioproject, all info from Biosample file is also in SraRunTable, so I will only import SraRunTable (it is also formatted better)
I manually modified the SraRunTable to split the sample names into numbers (since temperature was built into the sample name) and put some info about samples from the biosample_result.txt summary (eg. antibiotic treatment). I also added in a column for “mortality” because the paper indicates that oysters in the control samples at 25C faced higher mortality (~50% on day 4 when samples were collected; Fig. 1) and this was not seen in any other samples.
Therefore, I am importing SraRunTableMod.txt
SraRunTable <- read_delim("SraRunTableMod.csv", delim = ",")
## Parsed with column specification:
## cols(
## .default = col_character(),
## AvgSpotLen = col_double(),
## Bases = col_double(),
## Bytes = col_double(),
## ReleaseDate = col_datetime(format = ""),
## Replicate = col_double()
## )
## See spec(...) for full column specifications.
# Import Count table. Skip first row of tsv file, which is just some text
count_table <- read_tsv(file="dada2_export/ASVs_counts.tsv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_character(),
## SRR6374533 = col_double(),
## SRR6374534 = col_double(),
## SRR6374535 = col_double(),
## SRR6374536 = col_double(),
## SRR6374537 = col_double(),
## SRR6374538 = col_double(),
## SRR6374539 = col_double(),
## SRR6374540 = col_double(),
## SRR6374541 = col_double(),
## SRR6374542 = col_double(),
## SRR6374543 = col_double(),
## SRR6374544 = col_double(),
## SRR6374545 = col_double(),
## SRR6374546 = col_double(),
## SRR6374547 = col_double()
## )
# And specify that the first column of data are rownames
count_table <- column_to_rownames(count_table, var = colnames(count_table)[1])
# Import taxonomy of ASVs
taxonomy <- read_tsv(file="dada2_export/ASVs_taxonomy.tsv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_character(),
## domain = col_character(),
## phylum = col_character(),
## class = col_character(),
## order = col_character(),
## family = col_character(),
## genus = col_character(),
## species = col_logical()
## )
# And specify that the first column of data are rownames
taxonomy <- column_to_rownames(taxonomy, var = colnames(taxonomy)[1])
# Use rarecurve, from the Vegan package. Rarcurve expects the dataset as a dataframe so we need to use as.data.frame again:
count_table_df <- as.data.frame(count_table)
# Plot the rarefaction curves, color-coding by the colors listed in sample_info_tab, which indicate sample type, and transforming using t() again
rarecurve(t(count_table_df), step=100, cex=0.5, ylab="ASVs", label=T)
Generally these seem to be sequenced to completion
count_table_no_singletons <- filter(count_table,rowSums(count_table)>1)
# retains 5759 ASVs (out of 5776)
In the Dada2 tools, there are no options to build a tree (unlike in Qiime2) but we can build it here using DECIPHER and phangorn
(Based on https://f1000research.com/articles/5-1492/v2)
Make an alignment using tools from Decipher (Note- alignment step takes a while. Commented out for now. Only need to run once)
## import fasta
fas <- "dada2_export/ASVs.fasta"
seqs <- readDNAStringSet(fas)
seqs
## A DNAStringSet instance of length 5776
## width seq names
## [1] 404 TGGGGAATATTGGACAATGGGC...AAAGCGTGGGGAGCAAACAGG ASV_1
## [2] 424 TGAGGAATATTGGACAATGGAG...AAAGCGTGGGGAGCGAACAGG ASV_2
## [3] 429 TGGGGAATATTGCACAATGGGC...AAAGCGTGGGGAGCAAACAGG ASV_3
## [4] 406 TGAGGAATTTTCCGCAATGGGC...AAAGCTAGGGTAGCAAAAGGG ASV_4
## [5] 404 TGGGGAATCTTAGACAATGGGC...AAAGTGTGGGGAGCAAACAGG ASV_5
## ... ... ...
## [5772] 424 TGAGGAATATTGGACAATGGGC...AAAGCGTGGGGAGCGAACAGG ASV_5772
## [5773] 424 TAAGGAATATTGGACAATGGGG...AAAGCGTGGGGAGCGAACAGG ASV_5773
## [5774] 405 TCGAGAATCTTCCGCAATGGAC...AAAGCTAGGGGAGCGAACGGG ASV_5774
## [5775] 429 TGGGGAATATTGGACAATGGGC...AAAGCGTGGGGAGCAAACAGG ASV_5775
## [5776] 424 TGAGGAATATTGGACAATGGAG...AAAGCGTGGGGAGCGAACAGG ASV_5776
# perform the alignment
aligned <- AlignSeqs(seqs)
## Determining distance matrix based on shared 8-mers:
## ================================================================================
##
## Time difference of 665.68 secs
##
## Clustering into groups by similarity:
## ================================================================================
##
## Time difference of 11.6 secs
##
## Aligning Sequences:
## ================================================================================
##
## Time difference of 66.87 secs
##
## Iteration 1 of 2:
##
## Determining distance matrix based on alignment:
## ================================================================================
##
## Time difference of 45.21 secs
##
## Reclustering into groups by similarity:
## ================================================================================
##
## Time difference of 5.88 secs
##
## Realigning Sequences:
## ================================================================================
##
## Time difference of 33.37 secs
##
## Iteration 2 of 2:
##
## Determining distance matrix based on alignment:
## ================================================================================
##
## Time difference of 52.4 secs
##
## Reclustering into groups by similarity:
## ================================================================================
##
## Time difference of 8.44 secs
##
## Realigning Sequences:
## ================================================================================
##
## Time difference of 20.93 secs
##
## Refining the alignment:
## ================================================================================
##
## Time difference of 2.11 secs
# view the alignment in a browser (optional)
BrowseSeqs(aligned, highlight=0)
# write out aligned sequence file
writeXStringSet(aligned, file="ASVs.aligned.fasta")
Use phangorn package to build tree. Here we are building a maximum likelihood neighbor-joining tree. (Also takes a while to run. Comment out for now.)
phang.align <- phyDat(as(aligned, "matrix"), type="DNA") # convert to phyDat format
dm <- dist.ml(phang.align) # calculate pairwise distance matrix
treeNJ <- NJ(dm) # perform neighbor-joining tree method
fit = pml(treeNJ, data=phang.align) # compute intermal max likelihood
## negative edges length changed to 0!
Here we will do ordinations using the phyloseq package, which first requires making phyloseq objects out of each of our input data tables (in the last tutorial, I imported the tree using phyloseq so it is already a phyloseq object)
ASV = otu_table(data.frame(count_table_no_singletons), taxa_are_rows = TRUE)
TAX = tax_table(as.matrix(taxonomy))
META = sample_data(data.frame(SraRunTable, row.names = SraRunTable$Run))
TREE = phy_tree(fit$tree)
First check that the inputs are in compatible formats by checking for ASV names with the phyloseq function, taxa_names
head(taxa_names(TAX))
## [1] "ASV_1" "ASV_2" "ASV_3" "ASV_4" "ASV_5" "ASV_6"
head(taxa_names(ASV))
## [1] "ASV_1" "ASV_2" "ASV_3" "ASV_4" "ASV_5" "ASV_6"
head(taxa_names(TREE))
## [1] "ASV_1" "ASV_2" "ASV_3" "ASV_4" "ASV_5" "ASV_6"
And check sample names were also detected
head(sample_names(ASV))
## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"
head(sample_names(META))
## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"
And make the phyloseq object
ps <- phyloseq(ASV, TAX, META, TREE)
rank_names(ps)
## [1] "domain" "phylum" "class" "order" "family" "genus" "species"
unique(tax_table(ps)[, "domain"])
## Taxonomy Table: [4 taxa by 1 taxonomic ranks]:
## domain
## ASV_1 "Bacteria"
## ASV_19 NA
## ASV_648 "Archaea"
## ASV_1168 "Eukaryota"
table(tax_table(ps)[, "domain"], exclude = NULL)
##
## Archaea Bacteria Eukaryota <NA>
## 8 4580 2 1169
Filter out those ambigious Domain annotations
ps <- subset_taxa(ps, !is.na(domain) & !domain %in% c("", "Eukaryota"))
table(tax_table(ps)[, "domain"], exclude = NULL)
##
## Archaea Bacteria
## 8 4580
Check out the phyla names
table(tax_table(ps)[, "phylum"], exclude = NULL)
##
## Acidobacteriota Actinobacteriota
## 148 122
## Bacteroidota Bdellovibrionota
## 885 89
## Calditrichota Campilobacterota
## 2 32
## Chloroflexi Crenarchaeota
## 20 6
## Cyanobacteria Dadabacteria
## 54 3
## Deferrisomatota Deinococcota
## 1 1
## Dependentiae Desulfobacterota
## 2 70
## Elusimicrobiota Entotheonellaeota
## 1 1
## Fibrobacterota Firmicutes
## 1 52
## Fusobacteriota Gemmatimonadota
## 4 54
## LCP-89 Margulisbacteria
## 1 1
## Marinimicrobia (SAR406 clade) Methylomirabilota
## 1 1
## Myxococcota NB1-j
## 59 29
## Nitrospinota Nitrospirota
## 3 2
## Patescibacteria Planctomycetota
## 33 248
## Proteobacteria RCP2-54
## 2254 2
## SAR324 clade(Marine group B) Schekmanbacteria
## 3 1
## Spirochaetota Sva0485
## 1 7
## Synergistota Thermoplasmatota
## 1 2
## Verrucomicrobiota WPS-2
## 174 1
## WS2 Zixibacteria
## 1 4
## <NA>
## 211
Filter out any with “NA” as phylum. For now, I am leaving in phyla with abundance of 1
ps <- subset_taxa(ps, !is.na(phylum) & !phylum %in% c(""))
table(tax_table(ps)[, "phylum"], exclude = NULL)
##
## Acidobacteriota Actinobacteriota
## 148 122
## Bacteroidota Bdellovibrionota
## 885 89
## Calditrichota Campilobacterota
## 2 32
## Chloroflexi Crenarchaeota
## 20 6
## Cyanobacteria Dadabacteria
## 54 3
## Deferrisomatota Deinococcota
## 1 1
## Dependentiae Desulfobacterota
## 2 70
## Elusimicrobiota Entotheonellaeota
## 1 1
## Fibrobacterota Firmicutes
## 1 52
## Fusobacteriota Gemmatimonadota
## 4 54
## LCP-89 Margulisbacteria
## 1 1
## Marinimicrobia (SAR406 clade) Methylomirabilota
## 1 1
## Myxococcota NB1-j
## 59 29
## Nitrospinota Nitrospirota
## 3 2
## Patescibacteria Planctomycetota
## 33 248
## Proteobacteria RCP2-54
## 2254 2
## SAR324 clade(Marine group B) Schekmanbacteria
## 3 1
## Spirochaetota Sva0485
## 1 7
## Synergistota Thermoplasmatota
## 1 2
## Verrucomicrobiota WPS-2
## 174 1
## WS2 Zixibacteria
## 1 4
After the above, 4377 ASVs remain. This is many fewer than the QIIME2 pipeline.
Re-root tree (from experience, I have had to do this because you may have removed the root of your tree when pruning). (I found this handy function from here which picks the longest branch to root from). There is also a compatibilty issue between the type of tree calculated by QIIME and the one expected by phyloseq, so we have to change that format as well (see here for discussion)
# first define function from link above to find furthest outgroup
pick_new_outgroup <- function(tree.unrooted){
require("magrittr")
require("data.table")
require("ape") # ape::Ntip
# tablify parts of tree that we need.
treeDT <-
cbind(
data.table(tree.unrooted$edge),
data.table(length = tree.unrooted$edge.length)
)[1:Ntip(tree.unrooted)] %>%
cbind(data.table(id = tree.unrooted$tip.label))
# Take the longest terminal branch as outgroup
new.outgroup <- treeDT[which.max(length)]$id
return(new.outgroup) }
# then run on my phyloseq tree
my.tree <- phy_tree(ps)
out.group <- pick_new_outgroup(my.tree)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
## Loading required package: data.table
##
## Attaching package: 'data.table'
## The following object is masked from 'package:SummarizedExperiment':
##
## shift
## The following object is masked from 'package:GenomicRanges':
##
## shift
## The following object is masked from 'package:IRanges':
##
## shift
## The following objects are masked from 'package:S4Vectors':
##
## first, second
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
out.group
## [1] "ASV_5120"
# Then use this outgroup to root the tree
new.tree1 <- ape::root(my.tree, outgroup=out.group, resolve.root=TRUE)
# and convert to dichotomy tree
new.tree2 <- ape::multi2di(new.tree1)
phy_tree(ps) <- new.tree2
phy_tree(ps)
##
## Phylogenetic tree with 4377 tips and 4376 internal nodes.
##
## Tip labels:
## ASV_1, ASV_2, ASV_3, ASV_4, ASV_5, ASV_6, ...
##
## Rooted; includes branch lengths.
Check overall how the phyla are distributed among samples. Phyloseq makes this easy
# First aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed = tax_glom(ps, "phylum")
# There are many phyla here, so have to make a custom color palette by interpolating from an existing one in RColorBrewer
colourCount = length(table(tax_table(ps)[, "phylum"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
PhylaPalette = getPalette(colourCount)
# and plot
plot_bar(phylumGlommed, x = "Sample", fill = "phylum") +
scale_fill_manual(values = PhylaPalette)
Plot compositional (relative abundances) instead of absolute abundance using microbiome::transform
ps_ra <- microbiome::transform(ps, transform = "compositional")
head(otu_table(ps_ra))
## OTU Table: [6 taxa and 15 samples]
## taxa are rows
## SRR6374533 SRR6374534 SRR6374535 SRR6374536 SRR6374537
## ASV_1 0.2112630964 0.059121216 0.1731940442 0.291087469 0.006738057
## ASV_2 0.0678971338 0.187964695 0.0397196118 0.033380171 0.041749035
## ASV_3 0.0008724257 0.002470379 0.0004195951 0.002007327 0.362803570
## ASV_4 0.2124796901 0.006427783 0.2058237330 0.155618006 0.006309042
## ASV_5 0.2061005771 0.014030796 0.1037757388 0.084935013 0.002834863
## ASV_6 0.0800230512 0.008178635 0.0758603242 0.068951674 0.001976833
## SRR6374538 SRR6374539 SRR6374540 SRR6374541 SRR6374542 SRR6374543
## ASV_1 0.0013069892 0.036978071 0.029327345 0.008743909 0.515185753 0.5276016018
## ASV_2 0.0048245649 0.059195170 0.212227122 0.037735710 0.283370995 0.1250808337
## ASV_3 0.4556793520 0.025466098 0.002643898 0.171014087 0.000000000 0.0187098940
## ASV_4 0.0008309869 0.002334092 0.004880346 0.001626603 0.063594544 0.0021203303
## ASV_5 0.0013715318 0.005921678 0.015220521 0.006344486 0.001667877 0.0009264786
## ASV_6 0.0009923436 0.004754632 0.008185218 0.003098641 0.009294380 0.0053536786
## SRR6374544 SRR6374545 SRR6374546 SRR6374547
## ASV_1 0.606026859 0.635074778 6.000352e-01 0.6939757600
## ASV_2 0.097723551 0.214438447 1.764835e-01 0.1059373643
## ASV_3 0.000614150 0.002438471 3.297319e-05 0.0002725515
## ASV_4 0.002260072 0.008216056 1.748678e-02 0.0039860658
## ASV_5 0.000589584 0.001733888 1.384874e-03 0.0005876892
## ASV_6 0.008851949 0.006690480 5.528505e-03 0.0128525070
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed_RA = tax_glom(ps_ra, "phylum")
# and plot
plot_bar(phylumGlommed_RA, x = "Sample", fill = "phylum") +
scale_fill_manual(values = PhylaPalette)
Check abundance plots for some taxa of interest
The Proteobacteria
# Remove the antibiotic treatments
ps_filtered <- subset_samples(ps_ra, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples
# subset to proteobacteria
ps_proteo_ra <- subset_taxa(ps_filtered, phylum == "Proteobacteria")
# define colors
colourCount = length(table(tax_table(ps_proteo_ra)[, "order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_proteo_ra, "order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "order") +
scale_fill_manual(values = OrderPalette)
In the above you can see the increase in Vibrionales and Acetobacterales going from T0 to 20 to 25 and the decrease in Rhizobiales and Rhodobacterales
The Actinobacteriota
# subset to proteobacteria
ps_actino_ra <- subset_taxa(ps_filtered, phylum == "Actinobacteriota")
# define colors
colourCount = length(table(tax_table(ps_actino_ra)[, "order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_actino_ra, "order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "order") +
scale_fill_manual(values = OrderPalette)
In the above you can see a decrease in Micotrichales with increase in temperature
The Campilobacterota (note, this is not a phylum that is annotated differently in Silva v132)
# subset to proteobacteria
ps_campo_ra <- subset_taxa(ps_filtered, phylum == "Campilobacterota")
# define colors
colourCount = length(table(tax_table(ps_campo_ra)[, "family"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
FamilyPalette = getPalette(colourCount)
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
familyGlommed_RA = tax_glom(ps_campo_ra, "family")
# and plot
plot_bar(familyGlommed_RA, x = "replicate", fill = "family") +
scale_fill_manual(values = FamilyPalette)
There is also a slight increase in Arcobacteraceae from T0 to the warmer temperatures
Instead of relative abundances, need different transformation for stats. Microbiome package has both Hellinger and CLR built in. Try hellinger
ps_hellinger <- microbiome::transform(ps, transform = "hellinger")
head(otu_table(ps_hellinger))
## OTU Table: [6 taxa and 15 samples]
## taxa are rows
## SRR6374533 SRR6374534 SRR6374535 SRR6374536 SRR6374537 SRR6374538
## ASV_1 0.45963365 0.24314855 0.41616589 0.5395252 0.08208567 0.03615231
## ASV_2 0.26057078 0.43354895 0.19929780 0.1827024 0.20432581 0.06945909
## ASV_3 0.02953685 0.04970291 0.02048402 0.0448032 0.60233178 0.67504026
## ASV_4 0.46095519 0.08017346 0.45367801 0.3944845 0.07942948 0.02882684
## ASV_5 0.45398301 0.11845166 0.32214242 0.2914361 0.05324343 0.03703420
## ASV_6 0.28288346 0.09043580 0.27542753 0.2625865 0.04446159 0.03150149
## SRR6374539 SRR6374540 SRR6374541 SRR6374542 SRR6374543 SRR6374544
## ASV_1 0.19229683 0.17125229 0.09350887 0.71776441 0.72636189 0.77847727
## ASV_2 0.24330058 0.46068115 0.19425682 0.53232602 0.35366769 0.31260766
## ASV_3 0.15958101 0.05141885 0.41353850 0.00000000 0.13678411 0.02478205
## ASV_4 0.04831244 0.06985947 0.04033116 0.25217959 0.04604704 0.04754022
## ASV_5 0.07695244 0.12337148 0.07965228 0.04083965 0.03043811 0.02428135
## ASV_6 0.06895384 0.09047219 0.05566544 0.09640737 0.07316884 0.09408480
## SRR6374545 SRR6374546 SRR6374547
## ASV_1 0.79691579 0.774619372 0.83305208
## ASV_2 0.46307499 0.420099415 0.32548021
## ASV_3 0.04938088 0.005742229 0.01650913
## ASV_4 0.09064246 0.132237601 0.06313530
## ASV_5 0.04163998 0.037213897 0.02424230
## ASV_6 0.08179535 0.074353919 0.11336890
Principal coordinate analysis (PCoA) using Bray-Curtis
out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "bray")
pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
geom_point(size = 3)
pcoa_plot
Scale axes by Eigenvalues
evals <- out.pcoa$values$Eigenvalues
pcoa_plot.scaled = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
geom_point(size = 3) +
coord_fixed(sqrt(evals[2] / evals[1]))
pcoa_plot.scaled
Principal coordinate analysis (PCoA) using weighted UniFrac PCoA Also scaling axes by eigenvalues
out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "wunifrac")
wuf_pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
geom_point(size = 3) +
coord_fixed(sqrt(evals[2] / evals[1]))
wuf_pcoa_plot
Non-Metric dimensional scaling (NMDS) ordination using Bray-Curtis
out.nmds <- ordinate(ps_hellinger, method = "NMDS", distance = "bray")
## Run 0 stress 0.07023555
## Run 1 stress 0.06836951
## ... New best solution
## ... Procrustes: rmse 0.04483448 max resid 0.1507302
## Run 2 stress 0.07846926
## Run 3 stress 0.07023579
## Run 4 stress 0.07023553
## Run 5 stress 0.07686721
## Run 6 stress 0.07623932
## Run 7 stress 0.07686652
## Run 8 stress 0.07852225
## Run 9 stress 0.06836871
## ... New best solution
## ... Procrustes: rmse 0.0002597934 max resid 0.0008821824
## ... Similar to previous best
## Run 10 stress 0.07623932
## Run 11 stress 0.07023566
## Run 12 stress 0.07846944
## Run 13 stress 0.07686646
## Run 14 stress 0.09561721
## Run 15 stress 0.06993789
## Run 16 stress 0.06836867
## ... New best solution
## ... Procrustes: rmse 4.616461e-05 max resid 0.0001381005
## ... Similar to previous best
## Run 17 stress 0.06993778
## Run 18 stress 0.06993784
## Run 19 stress 0.07852218
## Run 20 stress 0.07904735
## *** Solution reached
nmds_plot = plot_ordination(ps_hellinger, out.nmds, color ="Temperature", shape = "Treatment") +
geom_point(size = 3)
nmds_plot
The above shows an apparent impact of temperature. I want to find out which organisms account for this.
Filter out the antibiotic-treated samples. Those are not part of my hypothesis and would influence the interpretation of any results (as seen above).
ps_filtered <- subset_samples(ps, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples
This uses untransformed data, as it does its own log 2 transformation in the DeSeq calculation
ps_deseq <- phyloseq_to_deseq2(ps_filtered, ~Temperature)
## converting counts to integer mode
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
ps_deseq <- DESeq(ps_deseq)
## estimating size factors
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
## final dispersion estimates
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
## fitting model and testing
First Compare Control (Time zero) to Heat Stress (20C) Pull out results
# pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_20 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "20"))
# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_20)
##
## out of 3466 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up) : 2, 0.058%
## LFC < 0 (down) : 11, 0.32%
## outliers [1] : 177, 5.1%
## low counts [2] : 3114, 90%
## (mean count < 44)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
The above shows out of 3466 ASVs, there are 2 that increased when comparing t0 to 20 and 11 that decreased. Ie. the 2 that ‘increased’ are in higher abundance in “t0” than in “20” and the 11 that decreased are in lower abundance in t0 than in 20.
Make a table of the significant results
# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_20 <- deseq_res_temp_T0_20[which(deseq_res_temp_T0_20$padj < 0.01), ]
# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_20_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_20, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_20), ], "matrix"))
# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_20_with_tax_ordered <- sigtab_deseq_res_temp_T0_20_with_tax[order(sigtab_deseq_res_temp_T0_20_with_tax$baseMean, decreasing=T), ]
sigtab_deseq_res_temp_T0_20_with_tax_ordered
## baseMean log2FoldChange lfcSE stat pvalue padj
## ASV_4 9449.22615 6.603312 1.771892 3.726702 1.940014e-04 0.0042437802
## ASV_8 1831.54900 -5.661860 1.672566 -3.385134 7.114354e-04 0.0095770143
## ASV_29 406.37087 -7.086652 2.066304 -3.429627 6.044115e-04 0.0095770143
## ASV_49 333.25140 -9.687737 2.470502 -3.921364 8.804906e-05 0.0025680976
## ASV_63 294.05336 -9.743062 2.390683 -4.075430 4.592931e-05 0.0016075258
## ASV_76 286.44351 13.217484 3.008383 4.393551 1.115142e-05 0.0007964192
## ASV_100 216.45580 -8.471779 2.424877 -3.493695 4.763856e-04 0.0083367476
## ASV_83 188.36686 -9.830787 2.402220 -4.092375 4.269774e-05 0.0016075258
## ASV_66 141.59485 -11.010939 2.531617 -4.349370 1.365290e-05 0.0007964192
## ASV_62 141.49639 -11.160098 2.396589 -4.656660 3.213810e-06 0.0005624167
## ASV_78 118.38886 -7.980829 2.178931 -3.662727 2.495448e-04 0.0048522598
## ASV_141 57.76480 -10.021217 2.955639 -3.390541 6.975473e-04 0.0095770143
## ASV_183 48.22575 -9.336202 2.455326 -3.802429 1.432843e-04 0.0035821086
## domain phylum class order
## ASV_4 Bacteria Cyanobacteria Cyanobacteriia Chloroplast
## ASV_8 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_29 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_49 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales
## ASV_63 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_76 Bacteria Campilobacterota Campylobacteria Campylobacterales
## ASV_100 Bacteria Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_83 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_66 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_62 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_78 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_141 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_183 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## family genus species
## ASV_4 <NA> <NA> <NA>
## ASV_8 Alteromonadaceae <NA> <NA>
## ASV_29 Alteromonadaceae <NA> <NA>
## ASV_49 Rhodobacteraceae <NA> <NA>
## ASV_63 Pseudoalteromonadaceae Pseudoalteromonas <NA>
## ASV_76 <NA> <NA> <NA>
## ASV_100 Nitrincolaceae <NA> <NA>
## ASV_83 Pseudoalteromonadaceae Pseudoalteromonas <NA>
## ASV_66 Colwelliaceae <NA> <NA>
## ASV_62 Alteromonadaceae <NA> <NA>
## ASV_78 Alteromonadaceae <NA> <NA>
## ASV_141 Alteromonadaceae <NA> <NA>
## ASV_183 Colwelliaceae Thalassotalea <NA>
Translation: Several Alteromonadaceae and Pseudoalteromonadaceae and also one ASV from Nitrincolaceae increased from T0 to 20C, similar to the results from QIIME2. No Vibrio, unlike QIIME
Since there are several ASVs here, you can also visualize these results in a plot
# Incase you have many ASVs, define enough colors for the plot
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
scale_fill_brewer(palette = palname, ...)
}
# And plot the families, color coding by Class
x = tapply(sigtab_deseq_res_temp_T0_20_with_tax$log2FoldChange, sigtab_deseq_res_temp_T0_20_with_tax$family, function(x) max(x))
x = sort(x, TRUE)
sigtab_deseq_res_temp_T0_20_with_tax$family = factor(as.character(sigtab_deseq_res_temp_T0_20_with_tax$family), levels=names(x))
ggplot(sigtab_deseq_res_temp_T0_20_with_tax, aes(x=family, y=log2FoldChange, color=class)) + geom_point(size=6) +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
Next Compare Heat Stress (20C) to more extreme heat stress (25C)
Pull out results
# pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_20_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "20", "25"))
# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_20_25)
##
## out of 3466 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up) : 0, 0%
## LFC < 0 (down) : 0, 0%
## outliers [1] : 177, 5.1%
## low counts [2] : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
Translation: no ASVs are significantly different between the 20 and 25C samples
Next Compare Control (Time zero) to extreme heat stress (25C)
Pull out results
# pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "25"))
# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_25)
##
## out of 3466 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up) : 2, 0.058%
## LFC < 0 (down) : 26, 0.75%
## outliers [1] : 177, 5.1%
## low counts [2] : 3053, 88%
## (mean count < 29)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
Translation: out of 3466 ASVs, there are 2 that increased when comparing temperature “T0” to “25” and 26 that decreased. Ie. the ones that ‘increased’ are in higher abundance in “T0” than in “25” and those that decreased are in lower abundance in T0 than in 25.
Make a table of the significant results
# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_25 <- deseq_res_temp_T0_25[which(deseq_res_temp_T0_25$padj < 0.01), ]
# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_25_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_25, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_25), ], "matrix"))
# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_25_with_tax[order(sigtab_deseq_res_temp_T0_25_with_tax$baseMean, decreasing=T), ]
## baseMean log2FoldChange lfcSE stat pvalue padj
## ASV_3 58837.42678 -10.279563 1.976695 -5.200379 1.988823e-07 4.693622e-05
## ASV_17 5789.50086 -8.481983 2.010734 -4.218352 2.460939e-05 5.279832e-04
## ASV_18 5687.17856 -11.861923 2.428170 -4.885129 1.033609e-06 6.841703e-05
## ASV_11 2532.78479 -7.216508 1.869676 -3.859763 1.134970e-04 2.060408e-03
## ASV_15 1492.32322 -7.703328 2.071910 -3.717983 2.008198e-04 2.632971e-03
## ASV_37 942.87283 -14.049122 3.001824 -4.680195 2.866023e-06 9.752046e-05
## ASV_49 333.25140 -11.010288 2.470940 -4.455911 8.353786e-06 2.464367e-04
## ASV_63 294.05336 -12.114532 2.390860 -5.067018 4.040953e-07 4.768325e-05
## ASV_76 286.44351 11.075242 3.008415 3.681421 2.319374e-04 2.880907e-03
## ASV_73 269.13165 -12.148686 2.596813 -4.678305 2.892556e-06 9.752046e-05
## ASV_100 216.45580 -11.789029 2.424520 -4.862417 1.159611e-06 6.841703e-05
## ASV_152 211.60611 -8.894050 2.743024 -3.242425 1.185170e-03 9.989286e-03
## ASV_83 188.36686 -11.275867 2.402948 -4.692513 2.698694e-06 9.752046e-05
## ASV_66 141.59485 -8.912905 2.537174 -3.512927 4.432000e-04 4.980724e-03
## ASV_62 141.49639 -8.012976 2.407586 -3.328219 8.740303e-04 8.594631e-03
## ASV_119 133.69127 -11.052627 2.501425 -4.418533 9.937319e-06 2.605786e-04
## ASV_86 115.92536 8.244870 2.516158 3.276769 1.050022e-03 9.196545e-03
## ASV_204 93.10416 -10.710888 3.018087 -3.548900 3.868442e-04 4.564761e-03
## ASV_218 88.30632 -10.633921 2.794329 -3.805536 1.414972e-04 2.385238e-03
## ASV_253 84.00313 -10.440111 2.489212 -4.194143 2.739054e-05 5.386806e-04
## ASV_240 81.01731 -10.445471 2.431226 -4.296381 1.736093e-05 4.097181e-04
## ASV_315 46.37355 -9.346933 2.510364 -3.723338 1.966057e-04 2.632971e-03
## ASV_620 44.89776 -9.568260 2.801357 -3.415580 6.364639e-04 6.530673e-03
## ASV_285 43.04040 -9.130744 2.447280 -3.730976 1.907390e-04 2.632971e-03
## ASV_303 36.04322 -8.553828 2.449588 -3.491945 4.795168e-04 5.143907e-03
## ASV_391 33.63229 -9.124083 2.776745 -3.285892 1.016600e-03 9.196545e-03
## ASV_500 33.05216 -9.112504 2.415271 -3.772870 1.613805e-04 2.539053e-03
## ASV_709 30.94063 -9.066105 2.767264 -3.276198 1.052147e-03 9.196545e-03
## domain phylum class order
## ASV_3 Bacteria Proteobacteria Gammaproteobacteria Vibrionales
## ASV_17 Bacteria Proteobacteria Gammaproteobacteria Vibrionales
## ASV_18 Bacteria Proteobacteria Gammaproteobacteria Vibrionales
## ASV_11 Bacteria Campilobacterota Campylobacteria Campylobacterales
## ASV_15 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_37 Bacteria Proteobacteria Gammaproteobacteria <NA>
## ASV_49 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales
## ASV_63 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_76 Bacteria Campilobacterota Campylobacteria Campylobacterales
## ASV_73 Bacteria Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_100 Bacteria Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_152 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_83 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_66 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_62 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_119 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_86 Bacteria Campilobacterota Campylobacteria Campylobacterales
## ASV_204 Bacteria Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_218 Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales
## ASV_253 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_240 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales
## ASV_315 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales
## ASV_620 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_285 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales
## ASV_303 Bacteria Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_391 Bacteria Proteobacteria Gammaproteobacteria Vibrionales
## ASV_500 Bacteria Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_709 Bacteria Bacteroidota Bacteroidia <NA>
## family genus species
## ASV_3 Vibrionaceae Vibrio <NA>
## ASV_17 Vibrionaceae <NA> <NA>
## ASV_18 Vibrionaceae <NA> <NA>
## ASV_11 Arcobacteraceae Malaciobacter <NA>
## ASV_15 Pseudoalteromonadaceae Pseudoalteromonas <NA>
## ASV_37 <NA> <NA> <NA>
## ASV_49 Rhodobacteraceae <NA> <NA>
## ASV_63 Pseudoalteromonadaceae Pseudoalteromonas <NA>
## ASV_76 <NA> <NA> <NA>
## ASV_73 Nitrincolaceae <NA> <NA>
## ASV_100 Nitrincolaceae <NA> <NA>
## ASV_152 Colwelliaceae <NA> <NA>
## ASV_83 Pseudoalteromonadaceae Pseudoalteromonas <NA>
## ASV_66 Colwelliaceae <NA> <NA>
## ASV_62 Alteromonadaceae <NA> <NA>
## ASV_119 Pseudoalteromonadaceae <NA> <NA>
## ASV_86 Arcobacteraceae Malaciobacter <NA>
## ASV_204 Nitrincolaceae Marinobacterium <NA>
## ASV_218 Terasakiellaceae Terasakiella <NA>
## ASV_253 Colwelliaceae Colwellia <NA>
## ASV_240 Rhodobacteraceae Cognatishimia <NA>
## ASV_315 Rhodobacteraceae <NA> <NA>
## ASV_620 Colwelliaceae <NA> <NA>
## ASV_285 Pseudoalteromonadaceae Pseudoalteromonas <NA>
## ASV_303 Marinomonadaceae Marinomonas <NA>
## ASV_391 Vibrionaceae <NA> <NA>
## ASV_500 Oleiphilaceae Oleiphilus <NA>
## ASV_709 <NA> <NA> <NA>
There are lots of Vibrionaceae here that are higher in the heat stressed (25C) samples than T0. Also Pseudoalteromonadaceae, Nitrincolaceae, Rhodobacteraceae. This is all similar to the QIIME results Something different than QIIME results: This also identified 2 Arcobacteraceae ASVs: 1 that was higher in the T0 and one that was higher in the 25C. Arcobacter have been identified as possible pathogens in addition to Vibrio
Since there are several ASVs here, you can also visualize these results in a plot
# Incase you have many ASVs, define enough colors for the plot
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
scale_fill_brewer(palette = palname, ...)
}
# And plot the families, color coding by Class
x = tapply(sigtab_deseq_res_temp_T0_25_with_tax$log2FoldChange, sigtab_deseq_res_temp_T0_25_with_tax$family, function(x) max(x))
x = sort(x, TRUE)
sigtab_deseq_res_temp_T0_25_with_tax$family = factor(as.character(sigtab_deseq_res_temp_T0_25_with_tax$family), levels=names(x))
ggplot(sigtab_deseq_res_temp_T0_25_with_tax, aes(x=family, y=log2FoldChange, color=class)) + geom_point(size=6) +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
Export the data tables as count tables (with no transformation) and without any aggregation.
For this study, I am excluding the antiobiotic samples, as those don’t really fir the meta-analysis study. I only want the samples from different temperatures (time zero, 20, and 25C).
First, filter out any ASVs without a species or genus and agglomerate at the genus-level. This will allow for comparisons to other datasets
# Make a brief descriptive file
descript_file<-writeLines(c("The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
The ASV count table was produced using DADA2 (for denoising, merging, and ASV inference) in R, and a Silva v138 DECIPHER classifier for taxonomy calling.
Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
impacts on microbiome.
The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining
samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two
different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.
dada2 analysis file (done in R markdown file) = PRJNA421986.Rmd
processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_Dada2_postanalysis.Rmd"))
## The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
## The ASV count table was produced using DADA2 (for denoising, merging, and ASV inference) in R, and a Silva v138 DECIPHER classifier for taxonomy calling.
## Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
## impacts on microbiome.
## The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining
## samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two
## different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
## On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
## The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
## The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.
##
## dada2 analysis file (done in R markdown file) = PRJNA421986.Rmd
##
## processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_Dada2_postanalysis.Rmd
# Export as .RData
save(ps_filtered, descript_file, file = "../../Processed/PRJNA421986_dada2_processed_data.RData")