Comparison of Output from 2 Amplicon Pipelines- Here I am processing the data from the QIIME2 pipeline

Using data from Green et al.: Green TJ, Siboni N, King WL, Labbate M, Seymour JR, Raftos D. 2019. Simulated Marine Heat Wave Alters Abundance and Structure of Vibrio Populations Associated with the Pacific Oyster Resulting in a Mass Mortality Event. Microb Ecol. 77(3):736–747. doi:10.1007/s00248-018-1242-9.

I followed the BVCN lesson 3a and lesson 3b pipelines. Lesson 3a uses QIIME2, calling DADA2 for ASV assignment, and using a Naive Bayes classifier trained on Silva 132 for taxonomy assignment (data are in qiime2_export) Lesson 3b uses DADA2 directly in R (I used the Cyverse app I created), and IDTAXA (from Decipher) trained in Silva 138 for taxonomy assignment (data are in ‘dada2_export’)

Modifications:

There were no blanks so did not remove contamination

Load packages

library(tidyverse)

## ── Attaching packages ───────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.1     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ──────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(phyloseq)
library(readr)
library(seqinr)

## 
## Attaching package: 'seqinr'

## The following object is masked from 'package:dplyr':
## 
##     count

library(decontam)
library(ape)

## 
## Attaching package: 'ape'

## The following objects are masked from 'package:seqinr':
## 
##     as.alignment, consensus

library(vegan)

## Loading required package: permute

## 
## Attaching package: 'permute'

## The following object is masked from 'package:seqinr':
## 
##     getType

## Loading required package: lattice

## This is vegan 2.5-6

library(RColorBrewer)

#install.packages("remotes")
#remotes::install_github("microbiome/microbiome")

library(microbiome)

## 
## microbiome R package (microbiome.github.com)
##     
## 
## 
##  Copyright (C) 2011-2020 Leo Lahti, 
##     Sudarshan Shetty et al. <microbiome.github.io>

## 
## Attaching package: 'microbiome'

## The following object is masked from 'package:vegan':
## 
##     diversity

## The following object is masked from 'package:ggplot2':
## 
##     alpha

## The following object is masked from 'package:base':
## 
##     transform

# if (!requireNamespace("BiocManager", quietly = TRUE))
#   install.packages("BiocManager")
# 
# BiocManager::install("DESeq2")

library("DESeq2")

## Loading required package: S4Vectors

## Warning: package 'S4Vectors' was built under R version 3.6.3

## Loading required package: stats4

## Loading required package: BiocGenerics

## Loading required package: parallel

## 
## Attaching package: 'BiocGenerics'

## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB

## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union

## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs

## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which, which.max, which.min

## 
## Attaching package: 'S4Vectors'

## The following objects are masked from 'package:dplyr':
## 
##     first, rename

## The following object is masked from 'package:tidyr':
## 
##     expand

## The following object is masked from 'package:base':
## 
##     expand.grid

## Loading required package: IRanges

## 
## Attaching package: 'IRanges'

## The following object is masked from 'package:microbiome':
## 
##     coverage

## The following object is masked from 'package:phyloseq':
## 
##     distance

## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice

## The following object is masked from 'package:purrr':
## 
##     reduce

## Loading required package: GenomicRanges

## Loading required package: GenomeInfoDb

## Warning: package 'GenomeInfoDb' was built under R version 3.6.3

## Loading required package: SummarizedExperiment

## Loading required package: Biobase

## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.

## 
## Attaching package: 'Biobase'

## The following object is masked from 'package:phyloseq':
## 
##     sampleNames

## Loading required package: DelayedArray

## Warning: package 'DelayedArray' was built under R version 3.6.3

## Loading required package: matrixStats

## 
## Attaching package: 'matrixStats'

## The following objects are masked from 'package:Biobase':
## 
##     anyMissing, rowMedians

## The following object is masked from 'package:seqinr':
## 
##     count

## The following object is masked from 'package:dplyr':
## 
##     count

## Loading required package: BiocParallel

## 
## Attaching package: 'DelayedArray'

## The following objects are masked from 'package:matrixStats':
## 
##     colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

## The following object is masked from 'package:purrr':
## 
##     simplify

## The following objects are masked from 'package:base':
## 
##     aperm, apply, rowsum

Import the metadata associated with the samples

These metadata can be used with both qiime2 and dada2 results.

For this Bioproject, all info from Biosample file is also in SraRunTable, so I will only import SraRunTable (it is also formatted better)

I manually modified the SraRunTable to split the sample names into numbers (since temperature was built into the sample name) and put some info about samples from the biosample_result.txt summary (eg. antibiotic treatment). I also added in a column for “mortality” because the paper indicates that oysters in the control samples at 25C faced higher mortality (~50% on day 4 when samples were collected; Fig. 1) and this was not seen in any other samples.

Therefore, I am importing SraRunTableMod.txt

SraRunTable <- read_delim("SraRunTableMod.csv", delim = ",")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   AvgSpotLen = col_double(),
##   Bases = col_double(),
##   Bytes = col_double(),
##   ReleaseDate = col_datetime(format = ""),
##   Replicate = col_double()
## )

## See spec(...) for full column specifications.

Import the results from QIIME2

# Import Count table. Skip first row of tsv file, which is just some text
count_table <- read_tsv(file="qiime2_export/table/table.tsv", skip = 1)

## Parsed with column specification:
## cols(
##   `#OTU ID` = col_character(),
##   SRR6374533 = col_double(),
##   SRR6374534 = col_double(),
##   SRR6374535 = col_double(),
##   SRR6374536 = col_double(),
##   SRR6374537 = col_double(),
##   SRR6374538 = col_double(),
##   SRR6374539 = col_double(),
##   SRR6374540 = col_double(),
##   SRR6374541 = col_double(),
##   SRR6374542 = col_double(),
##   SRR6374543 = col_double(),
##   SRR6374544 = col_double(),
##   SRR6374545 = col_double(),
##   SRR6374546 = col_double(),
##   SRR6374547 = col_double()
## )

# And specify that the first column of data are rownames
count_table <- column_to_rownames(count_table, var = colnames(count_table)[1])


# Import taxonomy of ASVs
taxonomy <- read_tsv(file="qiime2_export/taxonomy/taxonomy.tsv")

## Parsed with column specification:
## cols(
##   `Feature ID` = col_character(),
##   Taxon = col_character(),
##   Confidence = col_double()
## )

# Import tree file 
tree = read_tree("qiime2_export/exported-tree/tree.nwk")

# Import fasta
fasta <- read.fasta(file = "qiime2_export/rep-seqs.fasta/dna-sequences.fasta")

Check sequencing depth with rarefaction curves

# Use rarecurve, from the Vegan package. Rarcurve expects the dataset as a dataframe so we need to use as.data.frame again:
count_table_df <- as.data.frame(count_table)

# Plot the rarefaction curves, color-coding by the colors listed in sample_info_tab, which indicate sample type, and transforming using t() again
rarecurve(t(count_table_df), step=100, cex=0.5, ylab="ASVs", label=T)

Generally these seem to be sequenced to completion

Remove singletons

count_table_no_singletons <- filter(count_table,rowSums(count_table)>1)
# retains 5638 ASVs (out of 5647)

Modify taxonomy table

The taxonomy in the taxonomy table is retained in one column and the different levels are separated by underscore, eg:

head(taxonomy)

## # A tibble: 6 x 3
##   `Feature ID`            Taxon                                       Confidence
##   <chr>                   <chr>                                            <dbl>
## 1 cd0075f2438e6a59c84c52… D_0__Bacteria;D_1__Proteobacteria;D_2__Alp…      0.926
## 2 ca04f36a8f57c6693cb6ef… D_0__Bacteria;D_1__Bacteroidetes;D_2__Bact…      1.00 
## 3 bce6e2f2b0236981e67783… D_0__Bacteria;D_1__Proteobacteria;D_2__Gam…      1.00 
## 4 7207e61b39eefb42fce68f… D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyp…      1.00 
## 5 97416f19e8a5effec9ba55… D_0__Bacteria;D_1__Proteobacteria;D_2__Alp…      1.00 
## 6 1572e9751932966549c25e… D_0__Bacteria;D_1__Proteobacteria;D_2__Alp…      1.00

Move each taxonomic level to its own column by removing the ";D_#_" using regular expressions (see this cheatsheet about regexp)

taxonomy_mod <-  taxonomy %>%
  mutate(taxonomy=str_replace_all(string=Taxon, pattern="D_\\d*\\__", replacement="")) %>%
  mutate(taxonomy=str_replace_all(string=taxonomy, pattern=";$", replacement="")) %>%
  separate(taxonomy, into=c("Domain", "Phylum", "Class", "Order", "Family", "Genus","Species"), sep=";") %>%
  select (-Taxon, -Confidence) %>%
  column_to_rownames(var = 'Feature ID')

## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 3696 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, ...].

head(taxonomy_mod)

##                                    Domain         Phylum               Class
## cd0075f2438e6a59c84c522c920c1703 Bacteria Proteobacteria Alphaproteobacteria
## ca04f36a8f57c6693cb6efa65de4c04f Bacteria  Bacteroidetes         Bacteroidia
## bce6e2f2b0236981e67783c042ad3de1 Bacteria Proteobacteria Gammaproteobacteria
## 7207e61b39eefb42fce68f0ec7a78744 Bacteria  Cyanobacteria    Oxyphotobacteria
## 97416f19e8a5effec9ba55e05d5f15dc Bacteria Proteobacteria Alphaproteobacteria
## 1572e9751932966549c25ea32daaf545 Bacteria Proteobacteria Alphaproteobacteria
##                                             Order            Family     Genus
## cd0075f2438e6a59c84c522c920c1703      Rhizobiales       Stappiaceae Labrenzia
## ca04f36a8f57c6693cb6efa65de4c04f Flavobacteriales Flavobacteriaceae    Kordia
## bce6e2f2b0236981e67783c042ad3de1      Vibrionales      Vibrionaceae      <NA>
## 7207e61b39eefb42fce68f0ec7a78744      Chloroplast              <NA>      <NA>
## 97416f19e8a5effec9ba55e05d5f15dc  Rhodobacterales  Rhodobacteraceae      <NA>
## 1572e9751932966549c25ea32daaf545 Sphingomonadales Sphingomonadaceae      <NA>
##                                  Species
## cd0075f2438e6a59c84c522c920c1703    <NA>
## ca04f36a8f57c6693cb6efa65de4c04f    <NA>
## bce6e2f2b0236981e67783c042ad3de1    <NA>
## 7207e61b39eefb42fce68f0ec7a78744    <NA>
## 97416f19e8a5effec9ba55e05d5f15dc    <NA>
## 1572e9751932966549c25ea32daaf545    <NA>

Put all into phyloseq object

Here we will do ordinations using the phyloseq package, which first requires making phyloseq objects out of each of our input data tables (in the last tutorial, I imported the tree using phyloseq so it is already a phyloseq object)

ASV =   otu_table(data.frame(count_table_no_singletons), taxa_are_rows =  TRUE)
TAX =   tax_table(as.matrix(taxonomy_mod))
META    =   sample_data(data.frame(SraRunTable, row.names = SraRunTable$Run))

First check that the inputs are in compatible formats by checking for ASV names with the phyloseq function, taxa_names

head(taxa_names(TAX))

## [1] "cd0075f2438e6a59c84c522c920c1703" "ca04f36a8f57c6693cb6efa65de4c04f"
## [3] "bce6e2f2b0236981e67783c042ad3de1" "7207e61b39eefb42fce68f0ec7a78744"
## [5] "97416f19e8a5effec9ba55e05d5f15dc" "1572e9751932966549c25ea32daaf545"

head(taxa_names(ASV))

## [1] "cd0075f2438e6a59c84c522c920c1703" "ca04f36a8f57c6693cb6efa65de4c04f"
## [3] "bce6e2f2b0236981e67783c042ad3de1" "7207e61b39eefb42fce68f0ec7a78744"
## [5] "97416f19e8a5effec9ba55e05d5f15dc" "1572e9751932966549c25ea32daaf545"

head(taxa_names(tree))

## [1] "595c0515f401d2fd3ea585b0595e5251" "47df0ccb39d8162decb89c97df38791e"
## [3] "b2c95f651e00fdb9583f73f75e69b288" "118a43b14b2f5cb1f2764faa4487fa41"
## [5] "40f29cd308e3331e7977ba8828161685" "e2d545daf064bbdddf1c7adcaae5664b"

And check sample names were also detected

head(sample_names(ASV))

## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"

head(sample_names(META))

## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"

Make one phyloseq object, which contains all 4 objects:

ps <- phyloseq(ASV, TAX,    META, tree)

Check some features of the phyloseq object

rank_names(ps)

## [1] "Domain"  "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"

unique(tax_table(ps)[, "Domain"])

## Taxonomy Table:     [4 taxa by 1 taxonomic ranks]:
##                                  Domain      
## 595c0515f401d2fd3ea585b0595e5251 "Unassigned"
## b2c95f651e00fdb9583f73f75e69b288 "Bacteria"  
## 1e6aaaca6e37db2a1d65d17f5f595d2e "Eukaryota" 
## 21ae507cdc1b62f0b307d705df3dc87d "Archaea"

table(tax_table(ps)[, "Domain"], exclude = NULL)

## 
##    Archaea   Bacteria  Eukaryota Unassigned 
##         22       5603          4          9

Filter out those ambigious Domain annotations

ps <- subset_taxa(ps, !is.na(Domain) & !Domain %in% c("Unassigned", "Eukaryota"))

table(tax_table(ps)[, "Domain"], exclude = NULL)

## 
##  Archaea Bacteria 
##       22     5603

Check out the phyla names

table(tax_table(ps)[, "Phylum"], exclude = NULL)

## 
##                 Acidobacteria                Actinobacteria 
##                           198                           157 
##                 Bacteroidetes               Calditrichaeota 
##                          1133                            10 
##                    Chlamydiae                   Chloroflexi 
##                            21                            58 
##                 Crenarchaeota                 Cyanobacteria 
##                             4                            72 
##                  Dadabacteria           Deinococcus-Thermus 
##                             2                             2 
##                  Dependentiae                 Elusimicrobia 
##                            19                             1 
##             Entotheonellaeota            Epsilonbacteraeota 
##                             1                            33 
##                 Euryarchaeota                 Fibrobacteres 
##                             2                             1 
##                    Firmicutes                  Fusobacteria 
##                            44                            14 
##              Gemmatimonadetes               Hydrogenedentes 
##                            65                             5 
##            Kiritimatiellaeota               Latescibacteria 
##                             9                            23 
##                 Lentisphaerae              Margulisbacteria 
##                             1                             1 
## Marinimicrobia (SAR406 clade)               Nanoarchaeaeota 
##                             1                            12 
##                   Nitrospinae                   Nitrospirae 
##                             3                            10 
##              Omnitrophicaeota               Patescibacteria 
##                             1                            52 
##                       PAUC34f                Planctomycetes 
##                             1                           345 
##                Proteobacteria                  Rokubacteria 
##                          2954                             2 
##              Schekmanbacteria                  Spirochaetes 
##                             2                            12 
##                 Synergistetes                   Tenericutes 
##                             1                            18 
##                Thaumarchaeota               Verrucomicrobia 
##                             4                           163 
##                         WPS-2                  Zixibacteria 
##                             2                             4 
##                          <NA> 
##                           162

Filter out any with “NA” as phylum. For now, I am leaving in phyla with abundance of 1 (“Fibrobacteres”, “PAUC34f”, “Elusimicrobia”, “Synergistetes”, “Entotheonellaeota”, “Lentisphaerae”, “Margulisbacteria”, “Omnitrophicaeota”, “Marinimicrobia (SAR406 clade)”)

ps <- subset_taxa(ps, !is.na(Phylum) & !Phylum %in% c(""))

table(tax_table(ps)[, "Phylum"], exclude = NULL)

## 
##                 Acidobacteria                Actinobacteria 
##                           198                           157 
##                 Bacteroidetes               Calditrichaeota 
##                          1133                            10 
##                    Chlamydiae                   Chloroflexi 
##                            21                            58 
##                 Crenarchaeota                 Cyanobacteria 
##                             4                            72 
##                  Dadabacteria           Deinococcus-Thermus 
##                             2                             2 
##                  Dependentiae                 Elusimicrobia 
##                            19                             1 
##             Entotheonellaeota            Epsilonbacteraeota 
##                             1                            33 
##                 Euryarchaeota                 Fibrobacteres 
##                             2                             1 
##                    Firmicutes                  Fusobacteria 
##                            44                            14 
##              Gemmatimonadetes               Hydrogenedentes 
##                            65                             5 
##            Kiritimatiellaeota               Latescibacteria 
##                             9                            23 
##                 Lentisphaerae              Margulisbacteria 
##                             1                             1 
## Marinimicrobia (SAR406 clade)               Nanoarchaeaeota 
##                             1                            12 
##                   Nitrospinae                   Nitrospirae 
##                             3                            10 
##              Omnitrophicaeota               Patescibacteria 
##                             1                            52 
##                       PAUC34f                Planctomycetes 
##                             1                           345 
##                Proteobacteria                  Rokubacteria 
##                          2954                             2 
##              Schekmanbacteria                  Spirochaetes 
##                             2                            12 
##                 Synergistetes                   Tenericutes 
##                             1                            18 
##                Thaumarchaeota               Verrucomicrobia 
##                             4                           163 
##                         WPS-2                  Zixibacteria 
##                             2                             4

Re-root tree (from experience, I have had to do this because you may have removed the root of your tree when pruning). (I found this handy function from here which picks the longest branch to root from). There is also a compatibilty issue between the type of tree calculated by QIIME and the one expected by phyloseq, so we have to change that format as well (see here for discussion)

# first define function from link above to find furthest outgroup
pick_new_outgroup <- function(tree.unrooted){
require("magrittr")
require("data.table")
require("ape") # ape::Ntip
# tablify parts of tree that we need.
treeDT <- 
     cbind(
         data.table(tree.unrooted$edge),
         data.table(length = tree.unrooted$edge.length)
     )[1:Ntip(tree.unrooted)] %>% 
 cbind(data.table(id = tree.unrooted$tip.label))
 # Take the longest terminal branch as outgroup
 new.outgroup <- treeDT[which.max(length)]$id
 return(new.outgroup) }

# then run on my phyloseq tree
my.tree <- phy_tree(ps)
out.group <- pick_new_outgroup(my.tree)

## Loading required package: magrittr

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

## Loading required package: data.table

## 
## Attaching package: 'data.table'

## The following object is masked from 'package:SummarizedExperiment':
## 
##     shift

## The following object is masked from 'package:GenomicRanges':
## 
##     shift

## The following object is masked from 'package:IRanges':
## 
##     shift

## The following objects are masked from 'package:S4Vectors':
## 
##     first, second

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

out.group

## [1] "a06ebe09ac052f0f50128b596fdfe133"

# Then use this outgroup to root the tree
new.tree1 <- ape::root(my.tree, outgroup=out.group, resolve.root=TRUE)


# and convert to dichotomy tree
new.tree2 <- ape::multi2di(new.tree1)
phy_tree(ps) <- new.tree2
phy_tree(ps)

## 
## Phylogenetic tree with 5463 tips and 5462 internal nodes.
## 
## Tip labels:
##  9f2853dbd08fdc4ca56e5b3aa68d77fd, dfdb2156d075ab8ccd0b76d30c478c15, f410cddad452190c6acda2076d1aad25, 3ec9430b7bfb58c08f1516aa34128e49, 5fb175aa9a607423784f422735aafbb1, ca18e49de3d4dfd91c1f07d4f531298b, ...
## Node labels:
##  Root, 0.992, 0.876, 0.897, 0.986, 0.786, ...
## 
## Rooted; includes branch lengths.

Check overall how the phyla are distributed among samples. Phyloseq makes this easy

# First aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed = tax_glom(ps, "Phylum")

# There are many phyla here, so have to make a custom color palette by interpolating from an existing one in RColorBrewer
colourCount = length(table(tax_table(ps)[, "Phylum"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
PhylaPalette = getPalette(colourCount)

# and plot
plot_bar(phylumGlommed, x = "Sample", fill = "Phylum") + 
  scale_fill_manual(values = PhylaPalette)

Plot compositional (relative abundances) instead of absolute abundance using microbiome::transform

ps_ra <- microbiome::transform(ps, transform = "compositional")
head(otu_table(ps_ra))

## OTU Table:          [6 taxa and 15 samples]
##                      taxa are rows
##                                    SRR6374533   SRR6374534   SRR6374535
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.0000000000 0.0000000000 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15 0.0000000000 0.0000000000 0.000000e+00
## f410cddad452190c6acda2076d1aad25 0.0000000000 0.0000000000 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49 0.0004414122 0.0004722355 1.234883e-03
## 5fb175aa9a607423784f422735aafbb1 0.0000000000 0.0000000000 0.000000e+00
## ca18e49de3d4dfd91c1f07d4f531298b 0.0000000000 0.0000000000 5.936938e-05
##                                    SRR6374536   SRR6374537   SRR6374538
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.000000e+00 0.0000000000 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15 0.000000e+00 0.0000000000 0.000000e+00
## f410cddad452190c6acda2076d1aad25 0.000000e+00 0.0000000000 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49 6.885564e-04 0.0001511992 0.000000e+00
## 5fb175aa9a607423784f422735aafbb1 0.000000e+00 0.0000000000 3.218979e-05
## ca18e49de3d4dfd91c1f07d4f531298b 2.401941e-05 0.0001034521 0.000000e+00
##                                    SRR6374539   SRR6374540   SRR6374541
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.0001755357 0.000000e+00 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15 0.0000000000 1.699192e-05 4.347732e-05
## f410cddad452190c6acda2076d1aad25 0.0000000000 0.000000e+00 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49 0.0001485302 1.180938e-03 7.970841e-05
## 5fb175aa9a607423784f422735aafbb1 0.0000000000 0.000000e+00 0.000000e+00
## ca18e49de3d4dfd91c1f07d4f531298b 0.0000000000 0.000000e+00 0.000000e+00
##                                  SRR6374542 SRR6374543 SRR6374544   SRR6374545
## 9f2853dbd08fdc4ca56e5b3aa68d77fd          0          0          0 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15          0          0          0 5.961892e-05
## f410cddad452190c6acda2076d1aad25          0          0          0 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49          0          0          0 0.000000e+00
## 5fb175aa9a607423784f422735aafbb1          0          0          0 0.000000e+00
## ca18e49de3d4dfd91c1f07d4f531298b          0          0          0 0.000000e+00
##                                    SRR6374546 SRR6374547
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.0000000000          0
## dfdb2156d075ab8ccd0b76d30c478c15 0.0000000000          0
## f410cddad452190c6acda2076d1aad25 0.0004365383          0
## 3ec9430b7bfb58c08f1516aa34128e49 0.0000000000          0
## 5fb175aa9a607423784f422735aafbb1 0.0000000000          0
## ca18e49de3d4dfd91c1f07d4f531298b 0.0000000000          0

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed_RA = tax_glom(ps_ra, "Phylum")
# and plot
plot_bar(phylumGlommed_RA, x = "Sample", fill = "Phylum") + 
  scale_fill_manual(values = PhylaPalette)

Check abundance plots for some taxa of interest

The Proteobacteria

# Remove the antibiotic treatments
ps_filtered <- subset_samples(ps_ra, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples

# subset to proteobacteria
ps_proteo_ra <- subset_taxa(ps_filtered, Phylum == "Proteobacteria")

# define colors
colourCount = length(table(tax_table(ps_proteo_ra)[, "Order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_proteo_ra, "Order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "Order") + 
  scale_fill_manual(values = OrderPalette)

In the above you can see the increase in Vibrionales and Acetobacterales going from T0 to 20 to 25 and the decrease in Rhizobiales and Rhodobacterales

The Actinobacteriota

# subset to proteobacteria
ps_actino_ra <- subset_taxa(ps_filtered, Phylum == "Actinobacteria")

# define colors
colourCount = length(table(tax_table(ps_actino_ra)[, "Order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_actino_ra, "Order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "Order") + 
  scale_fill_manual(values = OrderPalette)

In the above you can see a decrease in Micotrichales with increase in temperature

The Chlamydiae

# subset to proteobacteria
ps_chlamy_ra <- subset_taxa(ps_filtered, Phylum == "Chlamydiae")

# define colors
colourCount = length(table(tax_table(ps_chlamy_ra)[, "Family"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
FamilyPalette = getPalette(colourCount)

# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
familyGlommed_RA = tax_glom(ps_chlamy_ra, "Family")
# and plot
plot_bar(familyGlommed_RA, x = "replicate", fill = "Family") + 
  scale_fill_manual(values = FamilyPalette)

Ordinations

First transform

Instead of relative abundances, need different transformation for stats. Microbiome package has both Hellinger and CLR built in. Try hellinger

ps_hellinger <- microbiome::transform(ps, transform = "hellinger")
head(otu_table(ps_hellinger))

## OTU Table:          [6 taxa and 15 samples]
##                      taxa are rows
##                                  SRR6374533 SRR6374534  SRR6374535 SRR6374536
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.00000000 0.00000000 0.000000000 0.00000000
## dfdb2156d075ab8ccd0b76d30c478c15 0.00000000 0.00000000 0.000000000 0.00000000
## f410cddad452190c6acda2076d1aad25 0.00000000 0.00000000 0.000000000 0.00000000
## 3ec9430b7bfb58c08f1516aa34128e49 0.02100981 0.02173098 0.035140903 0.02624036
## 5fb175aa9a607423784f422735aafbb1 0.00000000 0.00000000 0.000000000 0.00000000
## ca18e49de3d4dfd91c1f07d4f531298b 0.00000000 0.00000000 0.007705153 0.00490096
##                                  SRR6374537  SRR6374538 SRR6374539  SRR6374540
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.00000000 0.000000000 0.01324899 0.000000000
## dfdb2156d075ab8ccd0b76d30c478c15 0.00000000 0.000000000 0.00000000 0.004122126
## f410cddad452190c6acda2076d1aad25 0.00000000 0.000000000 0.00000000 0.000000000
## 3ec9430b7bfb58c08f1516aa34128e49 0.01229631 0.000000000 0.01218730 0.034364785
## 5fb175aa9a607423784f422735aafbb1 0.00000000 0.005673605 0.00000000 0.000000000
## ca18e49de3d4dfd91c1f07d4f531298b 0.01017114 0.000000000 0.00000000 0.000000000
##                                   SRR6374541 SRR6374542 SRR6374543 SRR6374544
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.000000000          0          0          0
## dfdb2156d075ab8ccd0b76d30c478c15 0.006593733          0          0          0
## f410cddad452190c6acda2076d1aad25 0.000000000          0          0          0
## 3ec9430b7bfb58c08f1516aa34128e49 0.008927957          0          0          0
## 5fb175aa9a607423784f422735aafbb1 0.000000000          0          0          0
## ca18e49de3d4dfd91c1f07d4f531298b 0.000000000          0          0          0
##                                   SRR6374545 SRR6374546 SRR6374547
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.000000000  0.0000000          0
## dfdb2156d075ab8ccd0b76d30c478c15 0.007721329  0.0000000          0
## f410cddad452190c6acda2076d1aad25 0.000000000  0.0208935          0
## 3ec9430b7bfb58c08f1516aa34128e49 0.000000000  0.0000000          0
## 5fb175aa9a607423784f422735aafbb1 0.000000000  0.0000000          0
## ca18e49de3d4dfd91c1f07d4f531298b 0.000000000  0.0000000          0

Principal coordinate analysis (PCoA) using Bray-Curtis

out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "bray")

pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) 
pcoa_plot

Scale axes by Eigenvalues

evals <- out.pcoa$values$Eigenvalues

pcoa_plot.scaled = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) +
  coord_fixed(sqrt(evals[2] / evals[1]))

pcoa_plot.scaled

Principal coordinate analysis (PCoA) using weighted UniFrac PCoA Also scaling axes by eigenvalues

out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "wunifrac")

wuf_pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) +
  coord_fixed(sqrt(evals[2] / evals[1]))


wuf_pcoa_plot

Non-Metric dimensional scaling (NMDS) ordination using Bray-Curtis

out.nmds <- ordinate(ps_hellinger, method = "NMDS", distance = "bray")

## Run 0 stress 0.06266313 
## Run 1 stress 0.06266315 
## ... Procrustes: rmse 2.861775e-05  max resid 5.290733e-05 
## ... Similar to previous best
## Run 2 stress 0.06266309 
## ... New best solution
## ... Procrustes: rmse 5.43692e-05  max resid 0.0001433653 
## ... Similar to previous best
## Run 3 stress 0.0626631 
## ... Procrustes: rmse 1.759838e-05  max resid 4.138797e-05 
## ... Similar to previous best
## Run 4 stress 0.06266309 
## ... New best solution
## ... Procrustes: rmse 2.234642e-05  max resid 5.705337e-05 
## ... Similar to previous best
## Run 5 stress 0.06266309 
## ... Procrustes: rmse 2.010026e-05  max resid 5.081437e-05 
## ... Similar to previous best
## Run 6 stress 0.06266309 
## ... Procrustes: rmse 1.089042e-05  max resid 2.002234e-05 
## ... Similar to previous best
## Run 7 stress 0.06266312 
## ... Procrustes: rmse 6.634299e-05  max resid 0.0001762634 
## ... Similar to previous best
## Run 8 stress 0.06266309 
## ... Procrustes: rmse 9.456853e-06  max resid 1.832632e-05 
## ... Similar to previous best
## Run 9 stress 0.06266312 
## ... Procrustes: rmse 6.568815e-05  max resid 0.0001712149 
## ... Similar to previous best
## Run 10 stress 0.06266317 
## ... Procrustes: rmse 0.0001061625  max resid 0.0002718836 
## ... Similar to previous best
## Run 11 stress 0.06266312 
## ... Procrustes: rmse 5.945858e-05  max resid 0.0001536172 
## ... Similar to previous best
## Run 12 stress 0.06266322 
## ... Procrustes: rmse 0.0001416553  max resid 0.0003733884 
## ... Similar to previous best
## Run 13 stress 0.06266309 
## ... Procrustes: rmse 1.170926e-05  max resid 2.417438e-05 
## ... Similar to previous best
## Run 14 stress 0.0626631 
## ... Procrustes: rmse 4.997966e-05  max resid 0.0001270507 
## ... Similar to previous best
## Run 15 stress 0.06266309 
## ... Procrustes: rmse 3.413022e-05  max resid 8.058208e-05 
## ... Similar to previous best
## Run 16 stress 0.06266312 
## ... Procrustes: rmse 7.037463e-05  max resid 0.0001790781 
## ... Similar to previous best
## Run 17 stress 0.06266309 
## ... Procrustes: rmse 1.164173e-05  max resid 2.667362e-05 
## ... Similar to previous best
## Run 18 stress 0.3253169 
## Run 19 stress 0.06266311 
## ... Procrustes: rmse 1.835599e-05  max resid 3.37214e-05 
## ... Similar to previous best
## Run 20 stress 0.06266312 
## ... Procrustes: rmse 6.035561e-05  max resid 0.0001589721 
## ... Similar to previous best
## *** Solution reached

nmds_plot = plot_ordination(ps_hellinger, out.nmds, color ="Temperature", shape = "Treatment") +
  geom_point(size = 3) 

nmds_plot

Test for affects of temperature

The above shows an apparent impact of temperature. I want to find out which organisms account for this.

Filter out the antibiotic-treated samples. Those are not part of my hypothesis and would influence the interpretation of any results (as seen above).

ps_filtered <- subset_samples(ps, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples

Differential abundance with DeSeq2

This uses untransformed data, as it does its own log 2 transformation in the DeSeq calculation

ps_deseq <- phyloseq_to_deseq2(ps_filtered, ~Temperature)

## converting counts to integer mode

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

ps_deseq <- DESeq(ps_deseq)

## estimating size factors

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

## estimating dispersions

## gene-wise dispersion estimates

## mean-dispersion relationship

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

## final dispersion estimates

##   Note: levels of factors in the design contain characters other than
##   letters, numbers, '_' and '.'. It is recommended (but not required) to use
##   only letters, numbers, and delimiters '_' or '.', as these are safe characters
##   for column names in R. [This is a message, not an warning or error]

## fitting model and testing

First Compare Control (Time zero) to Heat Stress (20C) Pull out results

 # pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_20 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "20"))

# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_20)

## 
## out of 4203 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up)       : 1, 0.024%
## LFC < 0 (down)     : 3, 0.071%
## outliers [1]       : 262, 6.2%
## low counts [2]     : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

The above shows out of 4203 ASVs, there is 1 that increased when comparing t0 to 20 and 3 that decreased. Ie. the one that ‘increased’ is in higher abundance in “t0” than in “20” and the 3 that decreased are in lower abundance in t0 than in 20.

Make a table of the significant results

# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_20 <- deseq_res_temp_T0_20[which(deseq_res_temp_T0_20$padj < 0.01), ]

# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_20_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_20, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_20), ], "matrix"))

# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_20_with_tax_ordered <- sigtab_deseq_res_temp_T0_20_with_tax[order(sigtab_deseq_res_temp_T0_20_with_tax$baseMean, decreasing=T), ]
sigtab_deseq_res_temp_T0_20_with_tax_ordered

##                                    baseMean log2FoldChange    lfcSE      stat
## f6e60561b232cd176707f8fb39b7cf13 4487.80230      -21.93682 3.568238 -6.147804
## 69a578040fb5a9fff8bc026b7909ba83 1485.91195      -12.75950 2.677898 -4.764742
## 35963c2f13929c0037df65dbf7f3147b   68.57731      -23.73879 3.418301 -6.944616
## 30caf1b5b4dfcdbb1c655c0fee6ad939   10.62314       21.70837 3.909835  5.552246
##                                        pvalue         padj   Domain
## f6e60561b232cd176707f8fb39b7cf13 7.856320e-10 1.548088e-06 Bacteria
## 69a578040fb5a9fff8bc026b7909ba83 1.890955e-06 1.863063e-03 Bacteria
## 35963c2f13929c0037df65dbf7f3147b 3.794905e-12 1.495572e-08 Bacteria
## 30caf1b5b4dfcdbb1c655c0fee6ad939 2.820225e-08 3.704836e-05 Bacteria
##                                          Phylum               Class
## f6e60561b232cd176707f8fb39b7cf13 Proteobacteria Gammaproteobacteria
## 69a578040fb5a9fff8bc026b7909ba83 Proteobacteria Gammaproteobacteria
## 35963c2f13929c0037df65dbf7f3147b Proteobacteria Alphaproteobacteria
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Proteobacteria Gammaproteobacteria
##                                                  Order                 Family
## f6e60561b232cd176707f8fb39b7cf13           Vibrionales           Vibrionaceae
## 69a578040fb5a9fff8bc026b7909ba83       Alteromonadales Pseudoalteromonadaceae
## 35963c2f13929c0037df65dbf7f3147b       Rhodobacterales       Rhodobacteraceae
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Ectothiorhodospirales    Thioalkalispiraceae
##                                              Genus Species
## f6e60561b232cd176707f8fb39b7cf13              <NA>    <NA>
## 69a578040fb5a9fff8bc026b7909ba83 Pseudoalteromonas    <NA>
## 35963c2f13929c0037df65dbf7f3147b              <NA>    <NA>
## 30caf1b5b4dfcdbb1c655c0fee6ad939              <NA>    <NA>

Translation: The above table shows that the first 3 ASVs (Family: Vibrionaceae, Pseudoalteromonadaceae, and Rhodobacteraceae) which have negative log2FoldChange are in lower abundance in the T0 samples than in the 20C sample and the last ASV (Family: Thioalkalispiraceae) are in higher abundance in the T0 than 20C sample.

Check the relative abundance of of these significant ASVs to check that this translationmakes sense.

# pull out the ID of the ASVs that are significant
sig_taxa_T0_20_1 <- row.names(sigtab_deseq_res_temp_T0_20_with_tax_ordered)[1]

# find them in the relative abundance matrix
ind <- which(row.names(as(otu_table(ps_ra), "matrix")) == sig_taxa_T0_20_1)
as(otu_table(ps_ra), "matrix")[ind,]

##  SRR6374533  SRR6374534  SRR6374535  SRR6374536  SRR6374537  SRR6374538 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.038261368 0.042385907 
##  SRR6374539  SRR6374540  SRR6374541  SRR6374542  SRR6374543  SRR6374544 
## 0.001870131 0.000000000 0.018898140 0.000000000 0.000000000 0.000000000 
##  SRR6374545  SRR6374546  SRR6374547 
## 0.000000000 0.000000000 0.000000000

According to this table, ASV f6e60561b232cd176707f8fb39b7cf13 (from the Vibrionaceae family) is present in samples SRR6374537, SRR6374538, SRR6374539, and SRR6374541. Looking back at the SraRunTable, these are samples 25C1, 25C2, 25C3, and 20C2. So this ASV is never present in the T0 samples… translation makes sense!

Next Compare Heat Stress (20C) to more extreme heat stress (25C)

Pull out results

 # pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_20_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "20", "25"))

# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_20_25)

## 
## out of 4203 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up)       : 1, 0.024%
## LFC < 0 (down)     : 1, 0.024%
## outliers [1]       : 262, 6.2%
## low counts [2]     : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

Translation: out of 1451 ASVs, there is 1 that increased when comparing temperature “20” to “25” and 1 that decreased. Ie. the one that ‘increased’ is in higher abundance in “20” than in “25” and the one that decreased is in lower abundance in “20” than “25.”

Make a table of the significant results

# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_20_25 <- deseq_res_temp_20_25[which(deseq_res_temp_20_25$padj < 0.01), ]

# pull out their taxonomic affiliation
sigtab_deseq_res_temp_20_25_with_tax <- cbind(as(sigtab_deseq_res_temp_20_25, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_20_25), ], "matrix"))

# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_20_25_with_tax[order(sigtab_deseq_res_temp_20_25_with_tax$baseMean, decreasing=T), ]

##                                  baseMean log2FoldChange    lfcSE      stat
## 0e746083d8d0dc2ed880fa804824a4df 34.78684      -23.33753 3.909086 -5.970074
## eff3b303fe138b00d481bdb5277b1954 25.59946       20.87645 3.908234  5.341658
##                                        pvalue         padj   Domain
## 0e746083d8d0dc2ed880fa804824a4df 2.371463e-09 9.345936e-06 Bacteria
## eff3b303fe138b00d481bdb5277b1954 9.210047e-08 1.814840e-04 Bacteria
##                                         Phylum       Class
## 0e746083d8d0dc2ed880fa804824a4df Acidobacteria  Holophagae
## eff3b303fe138b00d481bdb5277b1954 Bacteroidetes Bacteroidia
##                                                    Order
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacterales
## eff3b303fe138b00d481bdb5277b1954        Flavobacteriales
##                                                    Family               Genus
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacteraceae Acanthopleuribacter
## eff3b303fe138b00d481bdb5277b1954        Flavobacteriaceae          Aquimarina
##                                  Species
## 0e746083d8d0dc2ed880fa804824a4df    <NA>
## eff3b303fe138b00d481bdb5277b1954    <NA>

The ASV that is in lower abundance in 20 than 25 is (negative log2FoldChange) is Acidobacter>…>Acanthopleuribacter and the one that is in higher abundance in 20 than 25 is Bacteroidetes>…>Aquimarina.

Next Compare Control (Time zero) to extreme heat stress (25C)

Pull out results

 # pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "25"))

# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_25)

## 
## out of 4203 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up)       : 2, 0.048%
## LFC < 0 (down)     : 11, 0.26%
## outliers [1]       : 262, 6.2%
## low counts [2]     : 3158, 75%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

Translation: out of 4203 ASVs, there are2 that increased when comparing temperature “T0” to “25” and 11 that decreased. Ie. the one that ‘increased’ is in higher abundance in “T0” than in “25” and those that decreased are in lower abundance in T0 than in 25.

Make a table of the significant results

# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_25 <- deseq_res_temp_T0_25[which(deseq_res_temp_T0_25$padj < 0.01), ]

# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_25_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_25, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_25), ], "matrix"))

# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_25_with_tax[order(sigtab_deseq_res_temp_T0_25_with_tax$baseMean, decreasing=T), ]

##                                     baseMean log2FoldChange    lfcSE      stat
## bce6e2f2b0236981e67783c042ad3de1 46572.86158      -9.787316 2.139044 -4.575556
## f6e60561b232cd176707f8fb39b7cf13  4487.80230     -30.338630 3.567051 -8.505240
## 69a578040fb5a9fff8bc026b7909ba83  1485.91195     -14.057371 2.677975 -5.249254
## e342d3ec847717ebe7db822633dfd75c   842.34421     -13.729633 3.344705 -4.104885
## 52bddf12d54343bc0d17f79b88d9eeb9   315.98674     -11.787775 2.820865 -4.178780
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b   273.25865     -11.824303 2.665469 -4.436106
## c9a1dcd976179ec1b9858b627a57e5fd   254.21000     -11.897320 3.149633 -3.777367
## 7040bb987094c28000f0501eae1c243c   201.53088     -11.516590 2.690986 -4.279691
## e7e8f6f87af549b487d3e470afe74f8b    76.83352     -10.203777 2.709211 -3.766327
## 35963c2f13929c0037df65dbf7f3147b    68.57731     -21.811228 3.424490 -6.369190
## 0e746083d8d0dc2ed880fa804824a4df    34.78684     -29.859647 3.909151 -7.638396
## eff3b303fe138b00d481bdb5277b1954    25.59946      21.476985 3.908935  5.494332
## 30caf1b5b4dfcdbb1c655c0fee6ad939    10.62314      26.708329 3.909900  6.830949
##                                        pvalue         padj   Domain
## bce6e2f2b0236981e67783c042ad3de1 4.749580e-06 5.312745e-04 Bacteria
## f6e60561b232cd176707f8fb39b7cf13 1.812200e-17 1.418952e-14 Bacteria
## 69a578040fb5a9fff8bc026b7909ba83 1.527168e-07 1.992954e-05 Bacteria
## e342d3ec847717ebe7db822633dfd75c 4.045153e-05 2.879414e-03 Bacteria
## 52bddf12d54343bc0d17f79b88d9eeb9 2.930767e-05 2.294791e-03 Bacteria
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b 9.160089e-06 8.965437e-04 Bacteria
## c9a1dcd976179ec1b9858b627a57e5fd 1.584950e-04 9.978220e-03 Bacteria
## 7040bb987094c28000f0501eae1c243c 1.871526e-05 1.628228e-03 Bacteria
## e7e8f6f87af549b487d3e470afe74f8b 1.656665e-04 9.978220e-03 Bacteria
## 35963c2f13929c0037df65dbf7f3147b 1.900293e-10 3.719824e-08 Bacteria
## 0e746083d8d0dc2ed880fa804824a4df 2.199440e-14 8.610807e-12 Bacteria
## eff3b303fe138b00d481bdb5277b1954 3.921921e-08 6.141729e-06 Bacteria
## 30caf1b5b4dfcdbb1c655c0fee6ad939 8.435465e-12 2.201656e-09 Bacteria
##                                          Phylum               Class
## bce6e2f2b0236981e67783c042ad3de1 Proteobacteria Gammaproteobacteria
## f6e60561b232cd176707f8fb39b7cf13 Proteobacteria Gammaproteobacteria
## 69a578040fb5a9fff8bc026b7909ba83 Proteobacteria Gammaproteobacteria
## e342d3ec847717ebe7db822633dfd75c Proteobacteria Gammaproteobacteria
## 52bddf12d54343bc0d17f79b88d9eeb9 Proteobacteria Alphaproteobacteria
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b Proteobacteria Gammaproteobacteria
## c9a1dcd976179ec1b9858b627a57e5fd Proteobacteria Gammaproteobacteria
## 7040bb987094c28000f0501eae1c243c Proteobacteria Gammaproteobacteria
## e7e8f6f87af549b487d3e470afe74f8b Proteobacteria Alphaproteobacteria
## 35963c2f13929c0037df65dbf7f3147b Proteobacteria Alphaproteobacteria
## 0e746083d8d0dc2ed880fa804824a4df  Acidobacteria          Holophagae
## eff3b303fe138b00d481bdb5277b1954  Bacteroidetes         Bacteroidia
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Proteobacteria Gammaproteobacteria
##                                                    Order
## bce6e2f2b0236981e67783c042ad3de1             Vibrionales
## f6e60561b232cd176707f8fb39b7cf13             Vibrionales
## 69a578040fb5a9fff8bc026b7909ba83         Alteromonadales
## e342d3ec847717ebe7db822633dfd75c         Alteromonadales
## 52bddf12d54343bc0d17f79b88d9eeb9         Rhodobacterales
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b         Alteromonadales
## c9a1dcd976179ec1b9858b627a57e5fd       Oceanospirillales
## 7040bb987094c28000f0501eae1c243c       Oceanospirillales
## e7e8f6f87af549b487d3e470afe74f8b         Rhodobacterales
## 35963c2f13929c0037df65dbf7f3147b         Rhodobacterales
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacterales
## eff3b303fe138b00d481bdb5277b1954        Flavobacteriales
## 30caf1b5b4dfcdbb1c655c0fee6ad939   Ectothiorhodospirales
##                                                    Family               Genus
## bce6e2f2b0236981e67783c042ad3de1             Vibrionaceae                <NA>
## f6e60561b232cd176707f8fb39b7cf13             Vibrionaceae                <NA>
## 69a578040fb5a9fff8bc026b7909ba83   Pseudoalteromonadaceae   Pseudoalteromonas
## e342d3ec847717ebe7db822633dfd75c            Colwelliaceae       Thalassotalea
## 52bddf12d54343bc0d17f79b88d9eeb9         Rhodobacteraceae                <NA>
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b   Pseudoalteromonadaceae   Pseudoalteromonas
## c9a1dcd976179ec1b9858b627a57e5fd           Nitrincolaceae          Nitrincola
## 7040bb987094c28000f0501eae1c243c           Nitrincolaceae      Neptuniibacter
## e7e8f6f87af549b487d3e470afe74f8b         Rhodobacteraceae        Thalassobius
## 35963c2f13929c0037df65dbf7f3147b         Rhodobacteraceae                <NA>
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacteraceae Acanthopleuribacter
## eff3b303fe138b00d481bdb5277b1954        Flavobacteriaceae          Aquimarina
## 30caf1b5b4dfcdbb1c655c0fee6ad939      Thioalkalispiraceae                <NA>
##                                                 Species
## bce6e2f2b0236981e67783c042ad3de1                   <NA>
## f6e60561b232cd176707f8fb39b7cf13                   <NA>
## 69a578040fb5a9fff8bc026b7909ba83                   <NA>
## e342d3ec847717ebe7db822633dfd75c                   <NA>
## 52bddf12d54343bc0d17f79b88d9eeb9                   <NA>
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b                   <NA>
## c9a1dcd976179ec1b9858b627a57e5fd                   <NA>
## 7040bb987094c28000f0501eae1c243c                   <NA>
## e7e8f6f87af549b487d3e470afe74f8b Thalassobius maritimus
## 35963c2f13929c0037df65dbf7f3147b                   <NA>
## 0e746083d8d0dc2ed880fa804824a4df                   <NA>
## eff3b303fe138b00d481bdb5277b1954                   <NA>
## 30caf1b5b4dfcdbb1c655c0fee6ad939                   <NA>

There are lots of Vibrionaceae here that are higher in the heat stressed (25C) samples than T0. Also Pseudoalteromonadaceae, Nitrincolaceae, Rhodobacteraceae,

Since there are several ASVs here, you can also visualize these results in a plot

# Incase you have many ASVs, define enough colors for the plot
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
  scale_fill_brewer(palette = palname, ...)
}

# And plot the families, color coding by Class
x = tapply(sigtab_deseq_res_temp_T0_25_with_tax$log2FoldChange, sigtab_deseq_res_temp_T0_25_with_tax$Family, function(x) max(x))
x = sort(x, TRUE)
sigtab_deseq_res_temp_T0_25_with_tax$Family = factor(as.character(sigtab_deseq_res_temp_T0_25_with_tax$Family), levels=names(x))
ggplot(sigtab_deseq_res_temp_T0_25_with_tax, aes(x=Family, y=log2FoldChange, color=Class)) + geom_point(size=6) + 
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))

Export

Export the data tables as count tables (with no transformation) and without any aggregation.

For this study, I am excluding the antiobiotic samples, as those don’t really fir the meta-analysis study. I only want the samples from different temperatures (time zero, 20, and 25C).

First, filter out any ASVs without a species or genus and agglomerate at the genus-level. This will allow for comparisons to other datasets

# Make a brief descriptive file
descript_file<-writeLines(c("The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
The ASV count table was produced using Qiime2 (v. 2020.02), calling DADA2 for denoising, merging, and ASV inference, and a Silva v132 Naive 
Bayes classifier for taxonomy calling.
Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
impacts on microbiome.
The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining 
samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two 
different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.

qiime2 analysis file (done in Jupyter notebook) =  Qiime2_PRJNA421986.ipynb

processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_QIIME2_postanalysis.Rmd"))

## The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
## The ASV count table was produced using Qiime2 (v. 2020.02), calling DADA2 for denoising, merging, and ASV inference, and a Silva v132 Naive 
## Bayes classifier for taxonomy calling.
## Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
## impacts on microbiome.
## The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining 
## samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two 
## different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
## On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
## The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
## The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.
## 
## qiime2 analysis file (done in Jupyter notebook) =  Qiime2_PRJNA421986.ipynb
## 
## processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_QIIME2_postanalysis.Rmd

# Export as .RData
save(ps_filtered, descript_file, file = "../../Processed/PRJNA421986_qiime2_processed_data.RData")

Comparison of Output from 2 Amplicon Pipelines- Here I am processing the data from the QIIME2 pipeline

Liz Suter

6/26/2020

Load packages

Import the metadata associated with the samples

Import the results from QIIME2

Check sequencing depth with rarefaction curves

Remove singletons

Modify taxonomy table

Put all into phyloseq object

Ordinations

First transform

Test for affects of temperature

Differential abundance with DeSeq2

Export