Comparison of Output from 2 Amplicon Pipelines- Here I am processing the data from the Dada2/Decipher pipeline

Using data from Green et al.: Green TJ, Siboni N, King WL, Labbate M, Seymour JR, Raftos D. 2019. Simulated Marine Heat Wave Alters Abundance and Structure of Vibrio Populations Associated with the Pacific Oyster Resulting in a Mass Mortality Event. Microb Ecol. 77(3):736–747. doi:10.1007/s00248-018-1242-9.

I followed the BVCN lesson 3a and lesson 3b pipelines. Lesson 3a uses QIIME2, calling DADA2 for ASV assignment, and using a Naive Bayes classifier trained on Silva 132 for taxonomy assignment (data are in qiime2_export) Lesson 3b uses DADA2 directly in R (I used the Cyverse app I created), and IDTAXA (from Decipher) trained in Silva 138 for taxonomy assignment (data are in ‘dada2_export’)

Modifications:

There were no blanks so did not remove contamination

Load packages

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.1     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(phyloseq)
library(dada2)

## Loading required package: Rcpp

library(Biostrings)

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which, which.max, which.min

## Loading required package: S4Vectors

## Warning: package 'S4Vectors' was built under R version 3.6.3

## Loading required package: stats4

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:dplyr':
## 
##     first, rename

## The following object is masked from 'package:tidyr':
## 
##     expand

## The following object is masked from 'package:base':
## 
##     expand.grid

## Loading required package: IRanges

## 
## Attaching package: 'IRanges'

## The following object is masked from 'package:phyloseq':
## 
##     distance

## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice

## The following object is masked from 'package:purrr':
## 
##     reduce

## Loading required package: XVector

## 
## Attaching package: 'XVector'

## The following object is masked from 'package:purrr':
## 
##     compact

## 
## Attaching package: 'Biostrings'

## The following object is masked from 'package:base':
## 
##     strsplit

library(DECIPHER)

## Loading required package: RSQLite

library(phangorn)

## Loading required package: ape

## 
## Attaching package: 'ape'

## The following object is masked from 'package:Biostrings':
## 
##     complement

library(readr)
library(seqinr)

## 
## Attaching package: 'seqinr'

## The following objects are masked from 'package:ape':
## 
##     as.alignment, consensus

## The following object is masked from 'package:Biostrings':
## 
##     translate

## The following object is masked from 'package:dplyr':
## 
##     count

library(decontam)
library(ape)
library(vegan)

## Loading required package: permute

## 
## Attaching package: 'permute'

## The following object is masked from 'package:seqinr':
## 
##     getType

## Loading required package: lattice

## This is vegan 2.5-6

## 
## Attaching package: 'vegan'

## The following objects are masked from 'package:phangorn':
## 
##     diversity, treedist

library(RColorBrewer)
library(microbiome)

## 
## microbiome R package (microbiome.github.com)
##     
## 
## 
##  Copyright (C) 2011-2020 Leo Lahti, 
##     Sudarshan Shetty et al. <microbiome.github.io>

## 
## Attaching package: 'microbiome'

## The following object is masked from 'package:vegan':
## 
##     diversity

## The following object is masked from 'package:phangorn':
## 
##     diversity

## The following object is masked from 'package:Biostrings':
## 
##     coverage

## The following objects are masked from 'package:IRanges':
## 
##     coverage, transform

## The following object is masked from 'package:S4Vectors':
## 
##     transform

## The following object is masked from 'package:ggplot2':
## 
##     alpha

## The following object is masked from 'package:base':
## 
##     transform

library(DESeq2)

## Loading required package: GenomicRanges

## Loading required package: GenomeInfoDb

## Warning: package 'GenomeInfoDb' was built under R version 3.6.3

## Loading required package: SummarizedExperiment

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:phyloseq':
## 
##     sampleNames

## Loading required package: DelayedArray

## Warning: package 'DelayedArray' was built under R version 3.6.3

## Loading required package: matrixStats

## 
## Attaching package: 'matrixStats'

## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians

## The following object is masked from 'package:seqinr':
## 
##     count

## The following object is masked from 'package:dplyr':
## 
##     count

## Loading required package: BiocParallel

## 
## Attaching package: 'DelayedArray'

## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

## The following object is masked from 'package:purrr':
## 
##     simplify

## The following objects are masked from 'package:base':
## 
##     aperm, apply, rowsum

Import the metadata associated with the samples

These metadata can be used with both qiime2 and dada2 results.

For this Bioproject, all info from Biosample file is also in SraRunTable, so I will only import SraRunTable (it is also formatted better)

I manually modified the SraRunTable to split the sample names into numbers (since temperature was built into the sample name) and put some info about samples from the biosample_result.txt summary (eg. antibiotic treatment). I also added in a column for “mortality” because the paper indicates that oysters in the control samples at 25C faced higher mortality (~50% on day 4 when samples were collected; Fig. 1) and this was not seen in any other samples.

Therefore, I am importing SraRunTableMod.txt

SraRunTable <- read_delim("SraRunTableMod.csv", delim = ",")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   AvgSpotLen = col_double(),
##   Bases = col_double(),
##   Bytes = col_double(),
##   ReleaseDate = col_datetime(format = ""),
##   Replicate = col_double()
## )

## See spec(...) for full column specifications.

Import the results from Dada2

# Import Count table. Skip first row of tsv file, which is just some text
count_table <- read_tsv(file="dada2_export/ASVs_counts.tsv")

## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   X1 = col_character(),
##   SRR6374533 = col_double(),
##   SRR6374534 = col_double(),
##   SRR6374535 = col_double(),
##   SRR6374536 = col_double(),
##   SRR6374537 = col_double(),
##   SRR6374538 = col_double(),
##   SRR6374539 = col_double(),
##   SRR6374540 = col_double(),
##   SRR6374541 = col_double(),
##   SRR6374542 = col_double(),
##   SRR6374543 = col_double(),
##   SRR6374544 = col_double(),
##   SRR6374545 = col_double(),
##   SRR6374546 = col_double(),
##   SRR6374547 = col_double()
## )

# And specify that the first column of data are rownames
count_table <- column_to_rownames(count_table, var = colnames(count_table)[1])


# Import taxonomy of ASVs
taxonomy <- read_tsv(file="dada2_export/ASVs_taxonomy.tsv")

## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   X1 = col_character(),
##   domain = col_character(),
##   phylum = col_character(),
##   class = col_character(),
##   order = col_character(),
##   family = col_character(),
##   genus = col_character(),
##   species = col_logical()
## )

# And specify that the first column of data are rownames
taxonomy <- column_to_rownames(taxonomy, var = colnames(taxonomy)[1])

Check sequencing depth with rarefaction curves

# Use rarecurve, from the Vegan package. Rarcurve expects the dataset as a dataframe so we need to use as.data.frame again:
count_table_df <- as.data.frame(count_table)

# Plot the rarefaction curves, color-coding by the colors listed in sample_info_tab, which indicate sample type, and transforming using t() again
rarecurve(t(count_table_df), step=100, cex=0.5, ylab="ASVs", label=T)

Generally these seem to be sequenced to completion

Remove singletons

count_table_no_singletons <- filter(count_table,rowSums(count_table)>1)
# retains 5759 ASVs (out of 5776)

Make tree

In the Dada2 tools, there are no options to build a tree (unlike in Qiime2) but we can build it here using DECIPHER and phangorn

(Based on https://f1000research.com/articles/5-1492/v2)

Make an alignment using tools from Decipher (Note- alignment step takes a while. Commented out for now. Only need to run once)

## import fasta
fas <- "dada2_export/ASVs.fasta"
seqs <- readDNAStringSet(fas)
seqs

##   A DNAStringSet instance of length 5776
##        width seq                                            names               
##    [1]   404 TGGGGAATATTGGACAATGGGC...AAAGCGTGGGGAGCAAACAGG ASV_1
##    [2]   424 TGAGGAATATTGGACAATGGAG...AAAGCGTGGGGAGCGAACAGG ASV_2
##    [3]   429 TGGGGAATATTGCACAATGGGC...AAAGCGTGGGGAGCAAACAGG ASV_3
##    [4]   406 TGAGGAATTTTCCGCAATGGGC...AAAGCTAGGGTAGCAAAAGGG ASV_4
##    [5]   404 TGGGGAATCTTAGACAATGGGC...AAAGTGTGGGGAGCAAACAGG ASV_5
##    ...   ... ...
## [5772]   424 TGAGGAATATTGGACAATGGGC...AAAGCGTGGGGAGCGAACAGG ASV_5772
## [5773]   424 TAAGGAATATTGGACAATGGGG...AAAGCGTGGGGAGCGAACAGG ASV_5773
## [5774]   405 TCGAGAATCTTCCGCAATGGAC...AAAGCTAGGGGAGCGAACGGG ASV_5774
## [5775]   429 TGGGGAATATTGGACAATGGGC...AAAGCGTGGGGAGCAAACAGG ASV_5775
## [5776]   424 TGAGGAATATTGGACAATGGAG...AAAGCGTGGGGAGCGAACAGG ASV_5776

# perform the alignment
aligned <- AlignSeqs(seqs)

## Determining distance matrix based on shared 8-mers:
## ================================================================================
## 
## Time difference of 665.68 secs
## 
## Clustering into groups by similarity:
## ================================================================================
## 
## Time difference of 11.6 secs
## 
## Aligning Sequences:
## ================================================================================
## 
## Time difference of 66.87 secs
## 
## Iteration 1 of 2:
## 
## Determining distance matrix based on alignment:
## ================================================================================
## 
## Time difference of 45.21 secs
## 
## Reclustering into groups by similarity:
## ================================================================================
## 
## Time difference of 5.88 secs
## 
## Realigning Sequences:
## ================================================================================
## 
## Time difference of 33.37 secs
## 
## Iteration 2 of 2:
## 
## Determining distance matrix based on alignment:
## ================================================================================
## 
## Time difference of 52.4 secs
## 
## Reclustering into groups by similarity:
## ================================================================================
## 
## Time difference of 8.44 secs
## 
## Realigning Sequences:
## ================================================================================
## 
## Time difference of 20.93 secs
## 
## Refining the alignment:
## ================================================================================
## 
## Time difference of 2.11 secs

# view the alignment in a browser (optional)
BrowseSeqs(aligned, highlight=0)

# write out aligned sequence file
writeXStringSet(aligned, file="ASVs.aligned.fasta")

Use phangorn package to build tree. Here we are building a maximum likelihood neighbor-joining tree. (Also takes a while to run. Comment out for now.)

phang.align <- phyDat(as(aligned, "matrix"), type="DNA") # convert to phyDat format
dm <- dist.ml(phang.align) # calculate pairwise distance matrix 
treeNJ <- NJ(dm) # perform neighbor-joining tree method
fit = pml(treeNJ, data=phang.align) # compute intermal max likelihood

## negative edges length changed to 0!

Make one phyloseq object, which contains all 4 objects:

Here we will do ordinations using the phyloseq package, which first requires making phyloseq objects out of each of our input data tables (in the last tutorial, I imported the tree using phyloseq so it is already a phyloseq object)

ASV =   otu_table(data.frame(count_table_no_singletons), taxa_are_rows =  TRUE)
TAX =   tax_table(as.matrix(taxonomy))
META    =   sample_data(data.frame(SraRunTable, row.names = SraRunTable$Run))
TREE = phy_tree(fit$tree)

First check that the inputs are in compatible formats by checking for ASV names with the phyloseq function, taxa_names

head(taxa_names(TAX))

## [1] "ASV_1" "ASV_2" "ASV_3" "ASV_4" "ASV_5" "ASV_6"

head(taxa_names(ASV))

## [1] "ASV_1" "ASV_2" "ASV_3" "ASV_4" "ASV_5" "ASV_6"

head(taxa_names(TREE))

## [1] "ASV_1" "ASV_2" "ASV_3" "ASV_4" "ASV_5" "ASV_6"

And check sample names were also detected

head(sample_names(ASV))

## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"

head(sample_names(META))

## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"

And make the phyloseq object

ps <- phyloseq(ASV, TAX,    META, TREE)

Check some features of the phyloseq object

rank_names(ps)

## [1] "domain"  "phylum"  "class"   "order"   "family"  "genus"   "species"

unique(tax_table(ps)[, "domain"])

## Taxonomy Table:     [4 taxa by 1 taxonomic ranks]:
##          domain     
## ASV_1    "Bacteria" 
## ASV_19   NA         
## ASV_648  "Archaea"  
## ASV_1168 "Eukaryota"

table(tax_table(ps)[, "domain"], exclude = NULL)

## 
##   Archaea  Bacteria Eukaryota      <NA> 
##         8      4580         2      1169

Filter out those ambigious Domain annotations

ps <- subset_taxa(ps, !is.na(domain) & !domain %in% c("", "Eukaryota"))

table(tax_table(ps)[, "domain"], exclude = NULL)

## 
##  Archaea Bacteria 
##        8     4580

Check out the phyla names

table(tax_table(ps)[, "phylum"], exclude = NULL)

## 
##               Acidobacteriota              Actinobacteriota 
##                           148                           122 
##                  Bacteroidota              Bdellovibrionota 
##                           885                            89 
##                 Calditrichota              Campilobacterota 
##                             2                            32 
##                   Chloroflexi                 Crenarchaeota 
##                            20                             6 
##                 Cyanobacteria                  Dadabacteria 
##                            54                             3 
##               Deferrisomatota                  Deinococcota 
##                             1                             1 
##                  Dependentiae              Desulfobacterota 
##                             2                            70 
##               Elusimicrobiota             Entotheonellaeota 
##                             1                             1 
##                Fibrobacterota                    Firmicutes 
##                             1                            52 
##                Fusobacteriota               Gemmatimonadota 
##                             4                            54 
##                        LCP-89              Margulisbacteria 
##                             1                             1 
## Marinimicrobia (SAR406 clade)             Methylomirabilota 
##                             1                             1 
##                   Myxococcota                         NB1-j 
##                            59                            29 
##                  Nitrospinota                  Nitrospirota 
##                             3                             2 
##               Patescibacteria               Planctomycetota 
##                            33                           248 
##                Proteobacteria                       RCP2-54 
##                          2254                             2 
##  SAR324 clade(Marine group B)              Schekmanbacteria 
##                             3                             1 
##                 Spirochaetota                       Sva0485 
##                             1                             7 
##                  Synergistota              Thermoplasmatota 
##                             1                             2 
##             Verrucomicrobiota                         WPS-2 
##                           174                             1 
##                           WS2                  Zixibacteria 
##                             1                             4 
##                          <NA> 
##                           211

Filter out any with “NA” as phylum. For now, I am leaving in phyla with abundance of 1

ps <- subset_taxa(ps, !is.na(phylum) & !phylum %in% c(""))

table(tax_table(ps)[, "phylum"], exclude = NULL)

## 
##               Acidobacteriota              Actinobacteriota 
##                           148                           122 
##                  Bacteroidota              Bdellovibrionota 
##                           885                            89 
##                 Calditrichota              Campilobacterota 
##                             2                            32 
##                   Chloroflexi                 Crenarchaeota 
##                            20                             6 
##                 Cyanobacteria                  Dadabacteria 
##                            54                             3 
##               Deferrisomatota                  Deinococcota 
##                             1                             1 
##                  Dependentiae              Desulfobacterota 
##                             2                            70 
##               Elusimicrobiota             Entotheonellaeota 
##                             1                             1 
##                Fibrobacterota                    Firmicutes 
##                             1                            52 
##                Fusobacteriota               Gemmatimonadota 
##                             4                            54 
##                        LCP-89              Margulisbacteria 
##                             1                             1 
## Marinimicrobia (SAR406 clade)             Methylomirabilota 
##                             1                             1 
##                   Myxococcota                         NB1-j 
##                            59                            29 
##                  Nitrospinota                  Nitrospirota 
##                             3                             2 
##               Patescibacteria               Planctomycetota 
##                            33                           248 
##                Proteobacteria                       RCP2-54 
##                          2254                             2 
##  SAR324 clade(Marine group B)              Schekmanbacteria 
##                             3                             1 
##                 Spirochaetota                       Sva0485 
##                             1                             7 
##                  Synergistota              Thermoplasmatota 
##                             1                             2 
##             Verrucomicrobiota                         WPS-2 
##                           174                             1 
##                           WS2                  Zixibacteria 
##                             1                             4

After the above, 4377 ASVs remain. This is many fewer than the QIIME2 pipeline.

Re-root tree (from experience, I have had to do this because you may have removed the root of your tree when pruning). (I found this handy function from here which picks the longest branch to root from). There is also a compatibilty issue between the type of tree calculated by QIIME and the one expected by phyloseq, so we have to change that format as well (see here for discussion)

# first define function from link above to find furthest outgroup
pick_new_outgroup <- function(tree.unrooted){
require("magrittr")
require("data.table")
require("ape") # ape::Ntip
# tablify parts of tree that we need.
treeDT <- 
     cbind(
         data.table(tree.unrooted$edge),
         data.table(length = tree.unrooted$edge.length)
     )[1:Ntip(tree.unrooted)] %>% 
 cbind(data.table(id = tree.unrooted$tip.label))
 # Take the longest terminal branch as outgroup
 new.outgroup <- treeDT[which.max(length)]$id
 return(new.outgroup) }

# then run on my phyloseq tree
my.tree <- phy_tree(ps)
out.group <- pick_new_outgroup(my.tree)

## Loading required package: magrittr

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

## Loading required package: data.table

## 
## Attaching package: 'data.table'

## The following object is masked from 'package:SummarizedExperiment':
## 
##     shift

## The following object is masked from 'package:GenomicRanges':
## 
##     shift

## The following object is masked from 'package:IRanges':
## 
##     shift

## The following objects are masked from 'package:S4Vectors':
## 
##     first, second

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

out.group

## [1] "ASV_5120"

# Then use this outgroup to root the tree
new.tree1 <- ape::root(my.tree, outgroup=out.group, resolve.root=TRUE)


# and convert to dichotomy tree
new.tree2 <- ape::multi2di(new.tree1)
phy_tree(ps) <- new.tree2
phy_tree(ps)

## 
## Phylogenetic tree with 4377 tips and 4376 internal nodes.
## 
## Tip labels:
##  ASV_1, ASV_2, ASV_3, ASV_4, ASV_5, ASV_6, ...
## 
## Rooted; includes branch lengths.

Make some abundance plots

Check overall how the phyla are distributed among samples. Phyloseq makes this easy

# First aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed = tax_glom(ps, "phylum")

# There are many phyla here, so have to make a custom color palette by interpolating from an existing one in RColorBrewer
colourCount = length(table(tax_table(ps)[, "phylum"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
PhylaPalette = getPalette(colourCount)

# and plot
plot_bar(phylumGlommed, x = "Sample", fill = "phylum") + 
  scale_fill_manual(values = PhylaPalette)

Plot compositional (relative abundances) instead of absolute abundance using microbiome::transform

ps_ra <- microbiome::transform(ps, transform = "compositional")
head(otu_table(ps_ra))

## OTU Table:          [6 taxa and 15 samples]
##                      taxa are rows
##         SRR6374533  SRR6374534   SRR6374535  SRR6374536  SRR6374537
## ASV_1 0.2112630964 0.059121216 0.1731940442 0.291087469 0.006738057
## ASV_2 0.0678971338 0.187964695 0.0397196118 0.033380171 0.041749035
## ASV_3 0.0008724257 0.002470379 0.0004195951 0.002007327 0.362803570
## ASV_4 0.2124796901 0.006427783 0.2058237330 0.155618006 0.006309042
## ASV_5 0.2061005771 0.014030796 0.1037757388 0.084935013 0.002834863
## ASV_6 0.0800230512 0.008178635 0.0758603242 0.068951674 0.001976833
##         SRR6374538  SRR6374539  SRR6374540  SRR6374541  SRR6374542   SRR6374543
## ASV_1 0.0013069892 0.036978071 0.029327345 0.008743909 0.515185753 0.5276016018
## ASV_2 0.0048245649 0.059195170 0.212227122 0.037735710 0.283370995 0.1250808337
## ASV_3 0.4556793520 0.025466098 0.002643898 0.171014087 0.000000000 0.0187098940
## ASV_4 0.0008309869 0.002334092 0.004880346 0.001626603 0.063594544 0.0021203303
## ASV_5 0.0013715318 0.005921678 0.015220521 0.006344486 0.001667877 0.0009264786
## ASV_6 0.0009923436 0.004754632 0.008185218 0.003098641 0.009294380 0.0053536786
##        SRR6374544  SRR6374545   SRR6374546   SRR6374547
## ASV_1 0.606026859 0.635074778 6.000352e-01 0.6939757600
## ASV_2 0.097723551 0.214438447 1.764835e-01 0.1059373643
## ASV_3 0.000614150 0.002438471 3.297319e-05 0.0002725515
## ASV_4 0.002260072 0.008216056 1.748678e-02 0.0039860658
## ASV_5 0.000589584 0.001733888 1.384874e-03 0.0005876892
## ASV_6 0.008851949 0.006690480 5.528505e-03 0.0128525070

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed_RA = tax_glom(ps_ra, "phylum")
# and plot
plot_bar(phylumGlommed_RA, x = "Sample", fill = "phylum") + 
  scale_fill_manual(values = PhylaPalette)

Check abundance plots for some taxa of interest

The Proteobacteria

# Remove the antibiotic treatments
ps_filtered <- subset_samples(ps_ra, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples

# subset to proteobacteria
ps_proteo_ra <- subset_taxa(ps_filtered, phylum == "Proteobacteria")

# define colors
colourCount = length(table(tax_table(ps_proteo_ra)[, "order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_proteo_ra, "order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "order") + 
  scale_fill_manual(values = OrderPalette)

In the above you can see the increase in Vibrionales and Acetobacterales going from T0 to 20 to 25 and the decrease in Rhizobiales and Rhodobacterales

The Actinobacteriota

# subset to proteobacteria
ps_actino_ra <- subset_taxa(ps_filtered, phylum == "Actinobacteriota")

# define colors
colourCount = length(table(tax_table(ps_actino_ra)[, "order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_actino_ra, "order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "order") + 
  scale_fill_manual(values = OrderPalette)

In the above you can see a decrease in Micotrichales with increase in temperature

The Campilobacterota (note, this is not a phylum that is annotated differently in Silva v132)

# subset to proteobacteria
ps_campo_ra <- subset_taxa(ps_filtered, phylum == "Campilobacterota")

# define colors
colourCount = length(table(tax_table(ps_campo_ra)[, "family"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
FamilyPalette = getPalette(colourCount)

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
familyGlommed_RA = tax_glom(ps_campo_ra, "family")
# and plot
plot_bar(familyGlommed_RA, x = "replicate", fill = "family") + 
  scale_fill_manual(values = FamilyPalette)

There is also a slight increase in Arcobacteraceae from T0 to the warmer temperatures

Ordinations

First transform

Instead of relative abundances, need different transformation for stats. Microbiome package has both Hellinger and CLR built in. Try hellinger

ps_hellinger <- microbiome::transform(ps, transform = "hellinger")
head(otu_table(ps_hellinger))

## OTU Table:          [6 taxa and 15 samples]
##                      taxa are rows
##       SRR6374533 SRR6374534 SRR6374535 SRR6374536 SRR6374537 SRR6374538
## ASV_1 0.45963365 0.24314855 0.41616589  0.5395252 0.08208567 0.03615231
## ASV_2 0.26057078 0.43354895 0.19929780  0.1827024 0.20432581 0.06945909
## ASV_3 0.02953685 0.04970291 0.02048402  0.0448032 0.60233178 0.67504026
## ASV_4 0.46095519 0.08017346 0.45367801  0.3944845 0.07942948 0.02882684
## ASV_5 0.45398301 0.11845166 0.32214242  0.2914361 0.05324343 0.03703420
## ASV_6 0.28288346 0.09043580 0.27542753  0.2625865 0.04446159 0.03150149
##       SRR6374539 SRR6374540 SRR6374541 SRR6374542 SRR6374543 SRR6374544
## ASV_1 0.19229683 0.17125229 0.09350887 0.71776441 0.72636189 0.77847727
## ASV_2 0.24330058 0.46068115 0.19425682 0.53232602 0.35366769 0.31260766
## ASV_3 0.15958101 0.05141885 0.41353850 0.00000000 0.13678411 0.02478205
## ASV_4 0.04831244 0.06985947 0.04033116 0.25217959 0.04604704 0.04754022
## ASV_5 0.07695244 0.12337148 0.07965228 0.04083965 0.03043811 0.02428135
## ASV_6 0.06895384 0.09047219 0.05566544 0.09640737 0.07316884 0.09408480
##       SRR6374545  SRR6374546 SRR6374547
## ASV_1 0.79691579 0.774619372 0.83305208
## ASV_2 0.46307499 0.420099415 0.32548021
## ASV_3 0.04938088 0.005742229 0.01650913
## ASV_4 0.09064246 0.132237601 0.06313530
## ASV_5 0.04163998 0.037213897 0.02424230
## ASV_6 0.08179535 0.074353919 0.11336890

Principal coordinate analysis (PCoA) using Bray-Curtis

out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "bray")

pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) 
pcoa_plot

Scale axes by Eigenvalues

evals <- out.pcoa$values$Eigenvalues

pcoa_plot.scaled = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) +
  coord_fixed(sqrt(evals[2] / evals[1]))

pcoa_plot.scaled

Principal coordinate analysis (PCoA) using weighted UniFrac PCoA Also scaling axes by eigenvalues

out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "wunifrac")

wuf_pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) +
  coord_fixed(sqrt(evals[2] / evals[1]))


wuf_pcoa_plot

Non-Metric dimensional scaling (NMDS) ordination using Bray-Curtis

out.nmds <- ordinate(ps_hellinger, method = "NMDS", distance = "bray")

## Run 0 stress 0.07023555 
## Run 1 stress 0.06836951 
## ... New best solution
## ... Procrustes: rmse 0.04483448  max resid 0.1507302 
## Run 2 stress 0.07846926 
## Run 3 stress 0.07023579 
## Run 4 stress 0.07023553 
## Run 5 stress 0.07686721 
## Run 6 stress 0.07623932 
## Run 7 stress 0.07686652 
## Run 8 stress 0.07852225 
## Run 9 stress 0.06836871 
## ... New best solution
## ... Procrustes: rmse 0.0002597934  max resid 0.0008821824 
## ... Similar to previous best
## Run 10 stress 0.07623932 
## Run 11 stress 0.07023566 
## Run 12 stress 0.07846944 
## Run 13 stress 0.07686646 
## Run 14 stress 0.09561721 
## Run 15 stress 0.06993789 
## Run 16 stress 0.06836867 
## ... New best solution
## ... Procrustes: rmse 4.616461e-05  max resid 0.0001381005 
## ... Similar to previous best
## Run 17 stress 0.06993778 
## Run 18 stress 0.06993784 
## Run 19 stress 0.07852218 
## Run 20 stress 0.07904735 
## *** Solution reached

nmds_plot = plot_ordination(ps_hellinger, out.nmds, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) 

nmds_plot

Test for affects of temperature

The above shows an apparent impact of temperature. I want to find out which organisms account for this.

Filter out the antibiotic-treated samples. Those are not part of my hypothesis and would influence the interpretation of any results (as seen above).

ps_filtered <- subset_samples(ps, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples

Differential abundance with DeSeq2

This uses untransformed data, as it does its own log 2 transformation in the DeSeq calculation

ps_deseq <- phyloseq_to_deseq2(ps_filtered, ~Temperature)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

ps_deseq <- DESeq(ps_deseq)

## estimating size factors

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

## final dispersion estimates

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

## fitting model and testing

First Compare Control (Time zero) to Heat Stress (20C) Pull out results

 # pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_20 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "20"))

# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_20)

## 
## out of 3466 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up)       : 2, 0.058%
## LFC < 0 (down)     : 11, 0.32%
## outliers [1]       : 177, 5.1%
## low counts [2]     : 3114, 90%
## (mean count < 44)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

The above shows out of 3466 ASVs, there are 2 that increased when comparing t0 to 20 and 11 that decreased. Ie. the 2 that ‘increased’ are in higher abundance in “t0” than in “20” and the 11 that decreased are in lower abundance in t0 than in 20.

Make a table of the significant results

# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_20 <- deseq_res_temp_T0_20[which(deseq_res_temp_T0_20$padj < 0.01), ]

# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_20_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_20, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_20), ], "matrix"))

# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_20_with_tax_ordered <- sigtab_deseq_res_temp_T0_20_with_tax[order(sigtab_deseq_res_temp_T0_20_with_tax$baseMean, decreasing=T), ]
sigtab_deseq_res_temp_T0_20_with_tax_ordered

##           baseMean log2FoldChange    lfcSE      stat       pvalue         padj
## ASV_4   9449.22615       6.603312 1.771892  3.726702 1.940014e-04 0.0042437802
## ASV_8   1831.54900      -5.661860 1.672566 -3.385134 7.114354e-04 0.0095770143
## ASV_29   406.37087      -7.086652 2.066304 -3.429627 6.044115e-04 0.0095770143
## ASV_49   333.25140      -9.687737 2.470502 -3.921364 8.804906e-05 0.0025680976
## ASV_63   294.05336      -9.743062 2.390683 -4.075430 4.592931e-05 0.0016075258
## ASV_76   286.44351      13.217484 3.008383  4.393551 1.115142e-05 0.0007964192
## ASV_100  216.45580      -8.471779 2.424877 -3.493695 4.763856e-04 0.0083367476
## ASV_83   188.36686      -9.830787 2.402220 -4.092375 4.269774e-05 0.0016075258
## ASV_66   141.59485     -11.010939 2.531617 -4.349370 1.365290e-05 0.0007964192
## ASV_62   141.49639     -11.160098 2.396589 -4.656660 3.213810e-06 0.0005624167
## ASV_78   118.38886      -7.980829 2.178931 -3.662727 2.495448e-04 0.0048522598
## ASV_141   57.76480     -10.021217 2.955639 -3.390541 6.975473e-04 0.0095770143
## ASV_183   48.22575      -9.336202 2.455326 -3.802429 1.432843e-04 0.0035821086
##           domain           phylum               class             order
## ASV_4   Bacteria    Cyanobacteria      Cyanobacteriia       Chloroplast
## ASV_8   Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_29  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_49  Bacteria   Proteobacteria Alphaproteobacteria   Rhodobacterales
## ASV_63  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_76  Bacteria Campilobacterota     Campylobacteria Campylobacterales
## ASV_100 Bacteria   Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_83  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_66  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_62  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_78  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_141 Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_183 Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
##                         family             genus species
## ASV_4                     <NA>              <NA>    <NA>
## ASV_8         Alteromonadaceae              <NA>    <NA>
## ASV_29        Alteromonadaceae              <NA>    <NA>
## ASV_49        Rhodobacteraceae              <NA>    <NA>
## ASV_63  Pseudoalteromonadaceae Pseudoalteromonas    <NA>
## ASV_76                    <NA>              <NA>    <NA>
## ASV_100         Nitrincolaceae              <NA>    <NA>
## ASV_83  Pseudoalteromonadaceae Pseudoalteromonas    <NA>
## ASV_66           Colwelliaceae              <NA>    <NA>
## ASV_62        Alteromonadaceae              <NA>    <NA>
## ASV_78        Alteromonadaceae              <NA>    <NA>
## ASV_141       Alteromonadaceae              <NA>    <NA>
## ASV_183          Colwelliaceae     Thalassotalea    <NA>

Translation: Several Alteromonadaceae and Pseudoalteromonadaceae and also one ASV from Nitrincolaceae increased from T0 to 20C, similar to the results from QIIME2. No Vibrio, unlike QIIME

Since there are several ASVs here, you can also visualize these results in a plot

# Incase you have many ASVs, define enough colors for the plot
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
  scale_fill_brewer(palette = palname, ...)
}

# And plot the families, color coding by Class
x = tapply(sigtab_deseq_res_temp_T0_20_with_tax$log2FoldChange, sigtab_deseq_res_temp_T0_20_with_tax$family, function(x) max(x))
x = sort(x, TRUE)
sigtab_deseq_res_temp_T0_20_with_tax$family = factor(as.character(sigtab_deseq_res_temp_T0_20_with_tax$family), levels=names(x))
ggplot(sigtab_deseq_res_temp_T0_20_with_tax, aes(x=family, y=log2FoldChange, color=class)) + geom_point(size=6) + 
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

Next Compare Heat Stress (20C) to more extreme heat stress (25C)

Pull out results

 # pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_20_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "20", "25"))

# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_20_25)

## 
## out of 3466 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up)       : 0, 0%
## LFC < 0 (down)     : 0, 0%
## outliers [1]       : 177, 5.1%
## low counts [2]     : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

Translation: no ASVs are significantly different between the 20 and 25C samples

Next Compare Control (Time zero) to extreme heat stress (25C)

Pull out results

 # pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "25"))

# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_25)

## 
## out of 3466 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up)       : 2, 0.058%
## LFC < 0 (down)     : 26, 0.75%
## outliers [1]       : 177, 5.1%
## low counts [2]     : 3053, 88%
## (mean count < 29)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

Translation: out of 3466 ASVs, there are 2 that increased when comparing temperature “T0” to “25” and 26 that decreased. Ie. the ones that ‘increased’ are in higher abundance in “T0” than in “25” and those that decreased are in lower abundance in T0 than in 25.

Make a table of the significant results

# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_25 <- deseq_res_temp_T0_25[which(deseq_res_temp_T0_25$padj < 0.01), ]

# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_25_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_25, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_25), ], "matrix"))

# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_25_with_tax[order(sigtab_deseq_res_temp_T0_25_with_tax$baseMean, decreasing=T), ]

##            baseMean log2FoldChange    lfcSE      stat       pvalue         padj
## ASV_3   58837.42678     -10.279563 1.976695 -5.200379 1.988823e-07 4.693622e-05
## ASV_17   5789.50086      -8.481983 2.010734 -4.218352 2.460939e-05 5.279832e-04
## ASV_18   5687.17856     -11.861923 2.428170 -4.885129 1.033609e-06 6.841703e-05
## ASV_11   2532.78479      -7.216508 1.869676 -3.859763 1.134970e-04 2.060408e-03
## ASV_15   1492.32322      -7.703328 2.071910 -3.717983 2.008198e-04 2.632971e-03
## ASV_37    942.87283     -14.049122 3.001824 -4.680195 2.866023e-06 9.752046e-05
## ASV_49    333.25140     -11.010288 2.470940 -4.455911 8.353786e-06 2.464367e-04
## ASV_63    294.05336     -12.114532 2.390860 -5.067018 4.040953e-07 4.768325e-05
## ASV_76    286.44351      11.075242 3.008415  3.681421 2.319374e-04 2.880907e-03
## ASV_73    269.13165     -12.148686 2.596813 -4.678305 2.892556e-06 9.752046e-05
## ASV_100   216.45580     -11.789029 2.424520 -4.862417 1.159611e-06 6.841703e-05
## ASV_152   211.60611      -8.894050 2.743024 -3.242425 1.185170e-03 9.989286e-03
## ASV_83    188.36686     -11.275867 2.402948 -4.692513 2.698694e-06 9.752046e-05
## ASV_66    141.59485      -8.912905 2.537174 -3.512927 4.432000e-04 4.980724e-03
## ASV_62    141.49639      -8.012976 2.407586 -3.328219 8.740303e-04 8.594631e-03
## ASV_119   133.69127     -11.052627 2.501425 -4.418533 9.937319e-06 2.605786e-04
## ASV_86    115.92536       8.244870 2.516158  3.276769 1.050022e-03 9.196545e-03
## ASV_204    93.10416     -10.710888 3.018087 -3.548900 3.868442e-04 4.564761e-03
## ASV_218    88.30632     -10.633921 2.794329 -3.805536 1.414972e-04 2.385238e-03
## ASV_253    84.00313     -10.440111 2.489212 -4.194143 2.739054e-05 5.386806e-04
## ASV_240    81.01731     -10.445471 2.431226 -4.296381 1.736093e-05 4.097181e-04
## ASV_315    46.37355      -9.346933 2.510364 -3.723338 1.966057e-04 2.632971e-03
## ASV_620    44.89776      -9.568260 2.801357 -3.415580 6.364639e-04 6.530673e-03
## ASV_285    43.04040      -9.130744 2.447280 -3.730976 1.907390e-04 2.632971e-03
## ASV_303    36.04322      -8.553828 2.449588 -3.491945 4.795168e-04 5.143907e-03
## ASV_391    33.63229      -9.124083 2.776745 -3.285892 1.016600e-03 9.196545e-03
## ASV_500    33.05216      -9.112504 2.415271 -3.772870 1.613805e-04 2.539053e-03
## ASV_709    30.94063      -9.066105 2.767264 -3.276198 1.052147e-03 9.196545e-03
##           domain           phylum               class             order
## ASV_3   Bacteria   Proteobacteria Gammaproteobacteria       Vibrionales
## ASV_17  Bacteria   Proteobacteria Gammaproteobacteria       Vibrionales
## ASV_18  Bacteria   Proteobacteria Gammaproteobacteria       Vibrionales
## ASV_11  Bacteria Campilobacterota     Campylobacteria Campylobacterales
## ASV_15  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_37  Bacteria   Proteobacteria Gammaproteobacteria              <NA>
## ASV_49  Bacteria   Proteobacteria Alphaproteobacteria   Rhodobacterales
## ASV_63  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_76  Bacteria Campilobacterota     Campylobacteria Campylobacterales
## ASV_73  Bacteria   Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_100 Bacteria   Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_152 Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_83  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_66  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_62  Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_119 Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_86  Bacteria Campilobacterota     Campylobacteria Campylobacterales
## ASV_204 Bacteria   Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_218 Bacteria   Proteobacteria Alphaproteobacteria  Rhodospirillales
## ASV_253 Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_240 Bacteria   Proteobacteria Alphaproteobacteria   Rhodobacterales
## ASV_315 Bacteria   Proteobacteria Alphaproteobacteria   Rhodobacterales
## ASV_620 Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_285 Bacteria   Proteobacteria Gammaproteobacteria   Alteromonadales
## ASV_303 Bacteria   Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_391 Bacteria   Proteobacteria Gammaproteobacteria       Vibrionales
## ASV_500 Bacteria   Proteobacteria Gammaproteobacteria Oceanospirillales
## ASV_709 Bacteria     Bacteroidota         Bacteroidia              <NA>
##                         family             genus species
## ASV_3             Vibrionaceae            Vibrio    <NA>
## ASV_17            Vibrionaceae              <NA>    <NA>
## ASV_18            Vibrionaceae              <NA>    <NA>
## ASV_11         Arcobacteraceae     Malaciobacter    <NA>
## ASV_15  Pseudoalteromonadaceae Pseudoalteromonas    <NA>
## ASV_37                    <NA>              <NA>    <NA>
## ASV_49        Rhodobacteraceae              <NA>    <NA>
## ASV_63  Pseudoalteromonadaceae Pseudoalteromonas    <NA>
## ASV_76                    <NA>              <NA>    <NA>
## ASV_73          Nitrincolaceae              <NA>    <NA>
## ASV_100         Nitrincolaceae              <NA>    <NA>
## ASV_152          Colwelliaceae              <NA>    <NA>
## ASV_83  Pseudoalteromonadaceae Pseudoalteromonas    <NA>
## ASV_66           Colwelliaceae              <NA>    <NA>
## ASV_62        Alteromonadaceae              <NA>    <NA>
## ASV_119 Pseudoalteromonadaceae              <NA>    <NA>
## ASV_86         Arcobacteraceae     Malaciobacter    <NA>
## ASV_204         Nitrincolaceae   Marinobacterium    <NA>
## ASV_218       Terasakiellaceae      Terasakiella    <NA>
## ASV_253          Colwelliaceae         Colwellia    <NA>
## ASV_240       Rhodobacteraceae     Cognatishimia    <NA>
## ASV_315       Rhodobacteraceae              <NA>    <NA>
## ASV_620          Colwelliaceae              <NA>    <NA>
## ASV_285 Pseudoalteromonadaceae Pseudoalteromonas    <NA>
## ASV_303       Marinomonadaceae       Marinomonas    <NA>
## ASV_391           Vibrionaceae              <NA>    <NA>
## ASV_500          Oleiphilaceae        Oleiphilus    <NA>
## ASV_709                   <NA>              <NA>    <NA>

There are lots of Vibrionaceae here that are higher in the heat stressed (25C) samples than T0. Also Pseudoalteromonadaceae, Nitrincolaceae, Rhodobacteraceae. This is all similar to the QIIME results Something different than QIIME results: This also identified 2 Arcobacteraceae ASVs: 1 that was higher in the T0 and one that was higher in the 25C. Arcobacter have been identified as possible pathogens in addition to Vibrio

Since there are several ASVs here, you can also visualize these results in a plot

# Incase you have many ASVs, define enough colors for the plot
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
  scale_fill_brewer(palette = palname, ...)
}

# And plot the families, color coding by Class
x = tapply(sigtab_deseq_res_temp_T0_25_with_tax$log2FoldChange, sigtab_deseq_res_temp_T0_25_with_tax$family, function(x) max(x))
x = sort(x, TRUE)
sigtab_deseq_res_temp_T0_25_with_tax$family = factor(as.character(sigtab_deseq_res_temp_T0_25_with_tax$family), levels=names(x))
ggplot(sigtab_deseq_res_temp_T0_25_with_tax, aes(x=family, y=log2FoldChange, color=class)) + geom_point(size=6) + 
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

Export

Export the data tables as count tables (with no transformation) and without any aggregation.

For this study, I am excluding the antiobiotic samples, as those don’t really fir the meta-analysis study. I only want the samples from different temperatures (time zero, 20, and 25C).

First, filter out any ASVs without a species or genus and agglomerate at the genus-level. This will allow for comparisons to other datasets

# Make a brief descriptive file
descript_file<-writeLines(c("The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
The ASV count table was produced using DADA2 (for denoising, merging, and ASV inference) in R, and a Silva v138 DECIPHER classifier for taxonomy calling.
Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
impacts on microbiome.
The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining 
samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two 
different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.

dada2 analysis file (done in R markdown file) =  PRJNA421986.Rmd

processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_Dada2_postanalysis.Rmd"))

## The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
## The ASV count table was produced using DADA2 (for denoising, merging, and ASV inference) in R, and a Silva v138 DECIPHER classifier for taxonomy calling.
## Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
## impacts on microbiome.
## The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining 
## samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two 
## different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
## On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
## The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
## The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.
## 
## dada2 analysis file (done in R markdown file) =  PRJNA421986.Rmd
## 
## processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_Dada2_postanalysis.Rmd

# Export as .RData
save(ps_filtered, descript_file, file = "../../Processed/PRJNA421986_dada2_processed_data.RData")

Comparison of Output from 2 Amplicon Pipelines- Here I am processing the data from the Dada2/Decipher pipeline

Liz Suter

6/28/2020

Load packages

Import the metadata associated with the samples

Import the results from Dada2

Check sequencing depth with rarefaction curves

Remove singletons

Make tree

Make one phyloseq object, which contains all 4 objects:

Check some features of the phyloseq object

Make some abundance plots

Ordinations

First transform

Test for affects of temperature

Differential abundance with DeSeq2

Export