Using data from Green et al.: Green TJ, Siboni N, King WL, Labbate M, Seymour JR, Raftos D. 2019. Simulated Marine Heat Wave Alters Abundance and Structure of Vibrio Populations Associated with the Pacific Oyster Resulting in a Mass Mortality Event. Microb Ecol. 77(3):736–747. doi:10.1007/s00248-018-1242-9.
I followed the BVCN lesson 3a and lesson 3b pipelines. Lesson 3a uses QIIME2, calling DADA2 for ASV assignment, and using a Naive Bayes classifier trained on Silva 132 for taxonomy assignment (data are in qiime2_export
) Lesson 3b uses DADA2 directly in R (I used the Cyverse app I created), and IDTAXA (from Decipher) trained in Silva 138 for taxonomy assignment (data are in ‘dada2_export’)
Modifications:
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.1 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(phyloseq)
library(readr)
library(seqinr)
##
## Attaching package: 'seqinr'
## The following object is masked from 'package:dplyr':
##
## count
library(decontam)
library(ape)
##
## Attaching package: 'ape'
## The following objects are masked from 'package:seqinr':
##
## as.alignment, consensus
library(vegan)
## Loading required package: permute
##
## Attaching package: 'permute'
## The following object is masked from 'package:seqinr':
##
## getType
## Loading required package: lattice
## This is vegan 2.5-6
library(RColorBrewer)
#install.packages("remotes")
#remotes::install_github("microbiome/microbiome")
library(microbiome)
##
## microbiome R package (microbiome.github.com)
##
##
##
## Copyright (C) 2011-2020 Leo Lahti,
## Sudarshan Shetty et al. <microbiome.github.io>
##
## Attaching package: 'microbiome'
## The following object is masked from 'package:vegan':
##
## diversity
## The following object is masked from 'package:ggplot2':
##
## alpha
## The following object is masked from 'package:base':
##
## transform
# if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#
# BiocManager::install("DESeq2")
library("DESeq2")
## Loading required package: S4Vectors
## Warning: package 'S4Vectors' was built under R version 3.6.3
## Loading required package: stats4
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which, which.max, which.min
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:tidyr':
##
## expand
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following object is masked from 'package:microbiome':
##
## coverage
## The following object is masked from 'package:phyloseq':
##
## distance
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## The following object is masked from 'package:purrr':
##
## reduce
## Loading required package: GenomicRanges
## Loading required package: GenomeInfoDb
## Warning: package 'GenomeInfoDb' was built under R version 3.6.3
## Loading required package: SummarizedExperiment
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:phyloseq':
##
## sampleNames
## Loading required package: DelayedArray
## Warning: package 'DelayedArray' was built under R version 3.6.3
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following objects are masked from 'package:Biobase':
##
## anyMissing, rowMedians
## The following object is masked from 'package:seqinr':
##
## count
## The following object is masked from 'package:dplyr':
##
## count
## Loading required package: BiocParallel
##
## Attaching package: 'DelayedArray'
## The following objects are masked from 'package:matrixStats':
##
## colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
## The following object is masked from 'package:purrr':
##
## simplify
## The following objects are masked from 'package:base':
##
## aperm, apply, rowsum
These metadata can be used with both qiime2 and dada2 results.
For this Bioproject, all info from Biosample file is also in SraRunTable, so I will only import SraRunTable (it is also formatted better)
I manually modified the SraRunTable to split the sample names into numbers (since temperature was built into the sample name) and put some info about samples from the biosample_result.txt summary (eg. antibiotic treatment). I also added in a column for “mortality” because the paper indicates that oysters in the control samples at 25C faced higher mortality (~50% on day 4 when samples were collected; Fig. 1) and this was not seen in any other samples.
Therefore, I am importing SraRunTableMod.txt
SraRunTable <- read_delim("SraRunTableMod.csv", delim = ",")
## Parsed with column specification:
## cols(
## .default = col_character(),
## AvgSpotLen = col_double(),
## Bases = col_double(),
## Bytes = col_double(),
## ReleaseDate = col_datetime(format = ""),
## Replicate = col_double()
## )
## See spec(...) for full column specifications.
# Import Count table. Skip first row of tsv file, which is just some text
count_table <- read_tsv(file="qiime2_export/table/table.tsv", skip = 1)
## Parsed with column specification:
## cols(
## `#OTU ID` = col_character(),
## SRR6374533 = col_double(),
## SRR6374534 = col_double(),
## SRR6374535 = col_double(),
## SRR6374536 = col_double(),
## SRR6374537 = col_double(),
## SRR6374538 = col_double(),
## SRR6374539 = col_double(),
## SRR6374540 = col_double(),
## SRR6374541 = col_double(),
## SRR6374542 = col_double(),
## SRR6374543 = col_double(),
## SRR6374544 = col_double(),
## SRR6374545 = col_double(),
## SRR6374546 = col_double(),
## SRR6374547 = col_double()
## )
# And specify that the first column of data are rownames
count_table <- column_to_rownames(count_table, var = colnames(count_table)[1])
# Import taxonomy of ASVs
taxonomy <- read_tsv(file="qiime2_export/taxonomy/taxonomy.tsv")
## Parsed with column specification:
## cols(
## `Feature ID` = col_character(),
## Taxon = col_character(),
## Confidence = col_double()
## )
# Import tree file
tree = read_tree("qiime2_export/exported-tree/tree.nwk")
# Import fasta
fasta <- read.fasta(file = "qiime2_export/rep-seqs.fasta/dna-sequences.fasta")
# Use rarecurve, from the Vegan package. Rarcurve expects the dataset as a dataframe so we need to use as.data.frame again:
count_table_df <- as.data.frame(count_table)
# Plot the rarefaction curves, color-coding by the colors listed in sample_info_tab, which indicate sample type, and transforming using t() again
rarecurve(t(count_table_df), step=100, cex=0.5, ylab="ASVs", label=T)
Generally these seem to be sequenced to completion
count_table_no_singletons <- filter(count_table,rowSums(count_table)>1)
# retains 5638 ASVs (out of 5647)
The taxonomy in the taxonomy table is retained in one column and the different levels are separated by underscore, eg:
head(taxonomy)
## # A tibble: 6 x 3
## `Feature ID` Taxon Confidence
## <chr> <chr> <dbl>
## 1 cd0075f2438e6a59c84c52… D_0__Bacteria;D_1__Proteobacteria;D_2__Alp… 0.926
## 2 ca04f36a8f57c6693cb6ef… D_0__Bacteria;D_1__Bacteroidetes;D_2__Bact… 1.00
## 3 bce6e2f2b0236981e67783… D_0__Bacteria;D_1__Proteobacteria;D_2__Gam… 1.00
## 4 7207e61b39eefb42fce68f… D_0__Bacteria;D_1__Cyanobacteria;D_2__Oxyp… 1.00
## 5 97416f19e8a5effec9ba55… D_0__Bacteria;D_1__Proteobacteria;D_2__Alp… 1.00
## 6 1572e9751932966549c25e… D_0__Bacteria;D_1__Proteobacteria;D_2__Alp… 1.00
Move each taxonomic level to its own column by removing the ";D_#_" using regular expressions (see this cheatsheet about regexp)
taxonomy_mod <- taxonomy %>%
mutate(taxonomy=str_replace_all(string=Taxon, pattern="D_\\d*\\__", replacement="")) %>%
mutate(taxonomy=str_replace_all(string=taxonomy, pattern=";$", replacement="")) %>%
separate(taxonomy, into=c("Domain", "Phylum", "Class", "Order", "Family", "Genus","Species"), sep=";") %>%
select (-Taxon, -Confidence) %>%
column_to_rownames(var = 'Feature ID')
## Warning: Expected 7 pieces. Missing pieces filled with `NA` in 3696 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, ...].
head(taxonomy_mod)
## Domain Phylum Class
## cd0075f2438e6a59c84c522c920c1703 Bacteria Proteobacteria Alphaproteobacteria
## ca04f36a8f57c6693cb6efa65de4c04f Bacteria Bacteroidetes Bacteroidia
## bce6e2f2b0236981e67783c042ad3de1 Bacteria Proteobacteria Gammaproteobacteria
## 7207e61b39eefb42fce68f0ec7a78744 Bacteria Cyanobacteria Oxyphotobacteria
## 97416f19e8a5effec9ba55e05d5f15dc Bacteria Proteobacteria Alphaproteobacteria
## 1572e9751932966549c25ea32daaf545 Bacteria Proteobacteria Alphaproteobacteria
## Order Family Genus
## cd0075f2438e6a59c84c522c920c1703 Rhizobiales Stappiaceae Labrenzia
## ca04f36a8f57c6693cb6efa65de4c04f Flavobacteriales Flavobacteriaceae Kordia
## bce6e2f2b0236981e67783c042ad3de1 Vibrionales Vibrionaceae <NA>
## 7207e61b39eefb42fce68f0ec7a78744 Chloroplast <NA> <NA>
## 97416f19e8a5effec9ba55e05d5f15dc Rhodobacterales Rhodobacteraceae <NA>
## 1572e9751932966549c25ea32daaf545 Sphingomonadales Sphingomonadaceae <NA>
## Species
## cd0075f2438e6a59c84c522c920c1703 <NA>
## ca04f36a8f57c6693cb6efa65de4c04f <NA>
## bce6e2f2b0236981e67783c042ad3de1 <NA>
## 7207e61b39eefb42fce68f0ec7a78744 <NA>
## 97416f19e8a5effec9ba55e05d5f15dc <NA>
## 1572e9751932966549c25ea32daaf545 <NA>
Here we will do ordinations using the phyloseq package, which first requires making phyloseq objects out of each of our input data tables (in the last tutorial, I imported the tree using phyloseq so it is already a phyloseq object)
ASV = otu_table(data.frame(count_table_no_singletons), taxa_are_rows = TRUE)
TAX = tax_table(as.matrix(taxonomy_mod))
META = sample_data(data.frame(SraRunTable, row.names = SraRunTable$Run))
First check that the inputs are in compatible formats by checking for ASV names with the phyloseq function, taxa_names
head(taxa_names(TAX))
## [1] "cd0075f2438e6a59c84c522c920c1703" "ca04f36a8f57c6693cb6efa65de4c04f"
## [3] "bce6e2f2b0236981e67783c042ad3de1" "7207e61b39eefb42fce68f0ec7a78744"
## [5] "97416f19e8a5effec9ba55e05d5f15dc" "1572e9751932966549c25ea32daaf545"
head(taxa_names(ASV))
## [1] "cd0075f2438e6a59c84c522c920c1703" "ca04f36a8f57c6693cb6efa65de4c04f"
## [3] "bce6e2f2b0236981e67783c042ad3de1" "7207e61b39eefb42fce68f0ec7a78744"
## [5] "97416f19e8a5effec9ba55e05d5f15dc" "1572e9751932966549c25ea32daaf545"
head(taxa_names(tree))
## [1] "595c0515f401d2fd3ea585b0595e5251" "47df0ccb39d8162decb89c97df38791e"
## [3] "b2c95f651e00fdb9583f73f75e69b288" "118a43b14b2f5cb1f2764faa4487fa41"
## [5] "40f29cd308e3331e7977ba8828161685" "e2d545daf064bbdddf1c7adcaae5664b"
And check sample names were also detected
head(sample_names(ASV))
## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"
head(sample_names(META))
## [1] "SRR6374533" "SRR6374534" "SRR6374535" "SRR6374536" "SRR6374537"
## [6] "SRR6374538"
Make one phyloseq object, which contains all 4 objects:
ps <- phyloseq(ASV, TAX, META, tree)
Check some features of the phyloseq object
rank_names(ps)
## [1] "Domain" "Phylum" "Class" "Order" "Family" "Genus" "Species"
unique(tax_table(ps)[, "Domain"])
## Taxonomy Table: [4 taxa by 1 taxonomic ranks]:
## Domain
## 595c0515f401d2fd3ea585b0595e5251 "Unassigned"
## b2c95f651e00fdb9583f73f75e69b288 "Bacteria"
## 1e6aaaca6e37db2a1d65d17f5f595d2e "Eukaryota"
## 21ae507cdc1b62f0b307d705df3dc87d "Archaea"
table(tax_table(ps)[, "Domain"], exclude = NULL)
##
## Archaea Bacteria Eukaryota Unassigned
## 22 5603 4 9
Filter out those ambigious Domain annotations
ps <- subset_taxa(ps, !is.na(Domain) & !Domain %in% c("Unassigned", "Eukaryota"))
table(tax_table(ps)[, "Domain"], exclude = NULL)
##
## Archaea Bacteria
## 22 5603
Check out the phyla names
table(tax_table(ps)[, "Phylum"], exclude = NULL)
##
## Acidobacteria Actinobacteria
## 198 157
## Bacteroidetes Calditrichaeota
## 1133 10
## Chlamydiae Chloroflexi
## 21 58
## Crenarchaeota Cyanobacteria
## 4 72
## Dadabacteria Deinococcus-Thermus
## 2 2
## Dependentiae Elusimicrobia
## 19 1
## Entotheonellaeota Epsilonbacteraeota
## 1 33
## Euryarchaeota Fibrobacteres
## 2 1
## Firmicutes Fusobacteria
## 44 14
## Gemmatimonadetes Hydrogenedentes
## 65 5
## Kiritimatiellaeota Latescibacteria
## 9 23
## Lentisphaerae Margulisbacteria
## 1 1
## Marinimicrobia (SAR406 clade) Nanoarchaeaeota
## 1 12
## Nitrospinae Nitrospirae
## 3 10
## Omnitrophicaeota Patescibacteria
## 1 52
## PAUC34f Planctomycetes
## 1 345
## Proteobacteria Rokubacteria
## 2954 2
## Schekmanbacteria Spirochaetes
## 2 12
## Synergistetes Tenericutes
## 1 18
## Thaumarchaeota Verrucomicrobia
## 4 163
## WPS-2 Zixibacteria
## 2 4
## <NA>
## 162
Filter out any with “NA” as phylum. For now, I am leaving in phyla with abundance of 1 (“Fibrobacteres”, “PAUC34f”, “Elusimicrobia”, “Synergistetes”, “Entotheonellaeota”, “Lentisphaerae”, “Margulisbacteria”, “Omnitrophicaeota”, “Marinimicrobia (SAR406 clade)”)
ps <- subset_taxa(ps, !is.na(Phylum) & !Phylum %in% c(""))
table(tax_table(ps)[, "Phylum"], exclude = NULL)
##
## Acidobacteria Actinobacteria
## 198 157
## Bacteroidetes Calditrichaeota
## 1133 10
## Chlamydiae Chloroflexi
## 21 58
## Crenarchaeota Cyanobacteria
## 4 72
## Dadabacteria Deinococcus-Thermus
## 2 2
## Dependentiae Elusimicrobia
## 19 1
## Entotheonellaeota Epsilonbacteraeota
## 1 33
## Euryarchaeota Fibrobacteres
## 2 1
## Firmicutes Fusobacteria
## 44 14
## Gemmatimonadetes Hydrogenedentes
## 65 5
## Kiritimatiellaeota Latescibacteria
## 9 23
## Lentisphaerae Margulisbacteria
## 1 1
## Marinimicrobia (SAR406 clade) Nanoarchaeaeota
## 1 12
## Nitrospinae Nitrospirae
## 3 10
## Omnitrophicaeota Patescibacteria
## 1 52
## PAUC34f Planctomycetes
## 1 345
## Proteobacteria Rokubacteria
## 2954 2
## Schekmanbacteria Spirochaetes
## 2 12
## Synergistetes Tenericutes
## 1 18
## Thaumarchaeota Verrucomicrobia
## 4 163
## WPS-2 Zixibacteria
## 2 4
Re-root tree (from experience, I have had to do this because you may have removed the root of your tree when pruning). (I found this handy function from here which picks the longest branch to root from). There is also a compatibilty issue between the type of tree calculated by QIIME and the one expected by phyloseq, so we have to change that format as well (see here for discussion)
# first define function from link above to find furthest outgroup
pick_new_outgroup <- function(tree.unrooted){
require("magrittr")
require("data.table")
require("ape") # ape::Ntip
# tablify parts of tree that we need.
treeDT <-
cbind(
data.table(tree.unrooted$edge),
data.table(length = tree.unrooted$edge.length)
)[1:Ntip(tree.unrooted)] %>%
cbind(data.table(id = tree.unrooted$tip.label))
# Take the longest terminal branch as outgroup
new.outgroup <- treeDT[which.max(length)]$id
return(new.outgroup) }
# then run on my phyloseq tree
my.tree <- phy_tree(ps)
out.group <- pick_new_outgroup(my.tree)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
## Loading required package: data.table
##
## Attaching package: 'data.table'
## The following object is masked from 'package:SummarizedExperiment':
##
## shift
## The following object is masked from 'package:GenomicRanges':
##
## shift
## The following object is masked from 'package:IRanges':
##
## shift
## The following objects are masked from 'package:S4Vectors':
##
## first, second
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following object is masked from 'package:purrr':
##
## transpose
out.group
## [1] "a06ebe09ac052f0f50128b596fdfe133"
# Then use this outgroup to root the tree
new.tree1 <- ape::root(my.tree, outgroup=out.group, resolve.root=TRUE)
# and convert to dichotomy tree
new.tree2 <- ape::multi2di(new.tree1)
phy_tree(ps) <- new.tree2
phy_tree(ps)
##
## Phylogenetic tree with 5463 tips and 5462 internal nodes.
##
## Tip labels:
## 9f2853dbd08fdc4ca56e5b3aa68d77fd, dfdb2156d075ab8ccd0b76d30c478c15, f410cddad452190c6acda2076d1aad25, 3ec9430b7bfb58c08f1516aa34128e49, 5fb175aa9a607423784f422735aafbb1, ca18e49de3d4dfd91c1f07d4f531298b, ...
## Node labels:
## Root, 0.992, 0.876, 0.897, 0.986, 0.786, ...
##
## Rooted; includes branch lengths.
Check overall how the phyla are distributed among samples. Phyloseq makes this easy
# First aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed = tax_glom(ps, "Phylum")
# There are many phyla here, so have to make a custom color palette by interpolating from an existing one in RColorBrewer
colourCount = length(table(tax_table(ps)[, "Phylum"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
PhylaPalette = getPalette(colourCount)
# and plot
plot_bar(phylumGlommed, x = "Sample", fill = "Phylum") +
scale_fill_manual(values = PhylaPalette)
Plot compositional (relative abundances) instead of absolute abundance using microbiome::transform
ps_ra <- microbiome::transform(ps, transform = "compositional")
head(otu_table(ps_ra))
## OTU Table: [6 taxa and 15 samples]
## taxa are rows
## SRR6374533 SRR6374534 SRR6374535
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.0000000000 0.0000000000 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15 0.0000000000 0.0000000000 0.000000e+00
## f410cddad452190c6acda2076d1aad25 0.0000000000 0.0000000000 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49 0.0004414122 0.0004722355 1.234883e-03
## 5fb175aa9a607423784f422735aafbb1 0.0000000000 0.0000000000 0.000000e+00
## ca18e49de3d4dfd91c1f07d4f531298b 0.0000000000 0.0000000000 5.936938e-05
## SRR6374536 SRR6374537 SRR6374538
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.000000e+00 0.0000000000 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15 0.000000e+00 0.0000000000 0.000000e+00
## f410cddad452190c6acda2076d1aad25 0.000000e+00 0.0000000000 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49 6.885564e-04 0.0001511992 0.000000e+00
## 5fb175aa9a607423784f422735aafbb1 0.000000e+00 0.0000000000 3.218979e-05
## ca18e49de3d4dfd91c1f07d4f531298b 2.401941e-05 0.0001034521 0.000000e+00
## SRR6374539 SRR6374540 SRR6374541
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.0001755357 0.000000e+00 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15 0.0000000000 1.699192e-05 4.347732e-05
## f410cddad452190c6acda2076d1aad25 0.0000000000 0.000000e+00 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49 0.0001485302 1.180938e-03 7.970841e-05
## 5fb175aa9a607423784f422735aafbb1 0.0000000000 0.000000e+00 0.000000e+00
## ca18e49de3d4dfd91c1f07d4f531298b 0.0000000000 0.000000e+00 0.000000e+00
## SRR6374542 SRR6374543 SRR6374544 SRR6374545
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0 0 0 0.000000e+00
## dfdb2156d075ab8ccd0b76d30c478c15 0 0 0 5.961892e-05
## f410cddad452190c6acda2076d1aad25 0 0 0 0.000000e+00
## 3ec9430b7bfb58c08f1516aa34128e49 0 0 0 0.000000e+00
## 5fb175aa9a607423784f422735aafbb1 0 0 0 0.000000e+00
## ca18e49de3d4dfd91c1f07d4f531298b 0 0 0 0.000000e+00
## SRR6374546 SRR6374547
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.0000000000 0
## dfdb2156d075ab8ccd0b76d30c478c15 0.0000000000 0
## f410cddad452190c6acda2076d1aad25 0.0004365383 0
## 3ec9430b7bfb58c08f1516aa34128e49 0.0000000000 0
## 5fb175aa9a607423784f422735aafbb1 0.0000000000 0
## ca18e49de3d4dfd91c1f07d4f531298b 0.0000000000 0
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
phylumGlommed_RA = tax_glom(ps_ra, "Phylum")
# and plot
plot_bar(phylumGlommed_RA, x = "Sample", fill = "Phylum") +
scale_fill_manual(values = PhylaPalette)
Check abundance plots for some taxa of interest
The Proteobacteria
# Remove the antibiotic treatments
ps_filtered <- subset_samples(ps_ra, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples
# subset to proteobacteria
ps_proteo_ra <- subset_taxa(ps_filtered, Phylum == "Proteobacteria")
# define colors
colourCount = length(table(tax_table(ps_proteo_ra)[, "Order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_proteo_ra, "Order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "Order") +
scale_fill_manual(values = OrderPalette)
In the above you can see the increase in Vibrionales and Acetobacterales going from T0 to 20 to 25 and the decrease in Rhizobiales and Rhodobacterales
The Actinobacteriota
# subset to proteobacteria
ps_actino_ra <- subset_taxa(ps_filtered, Phylum == "Actinobacteria")
# define colors
colourCount = length(table(tax_table(ps_actino_ra)[, "Order"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
OrderPalette = getPalette(colourCount)
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
orderGlommed_RA = tax_glom(ps_actino_ra, "Order")
# and plot
plot_bar(orderGlommed_RA, x = "replicate", fill = "Order") +
scale_fill_manual(values = OrderPalette)
In the above you can see a decrease in Micotrichales with increase in temperature
The Chlamydiae
# subset to proteobacteria
ps_chlamy_ra <- subset_taxa(ps_filtered, Phylum == "Chlamydiae")
# define colors
colourCount = length(table(tax_table(ps_chlamy_ra)[, "Family"], exclude = NULL))
getPalette = colorRampPalette(brewer.pal(9, "Spectral"))
FamilyPalette = getPalette(colourCount)
# Then aglomerate the ASVs at the phylum level using the phyloseq function, tax_glom
familyGlommed_RA = tax_glom(ps_chlamy_ra, "Family")
# and plot
plot_bar(familyGlommed_RA, x = "replicate", fill = "Family") +
scale_fill_manual(values = FamilyPalette)
Instead of relative abundances, need different transformation for stats. Microbiome package has both Hellinger and CLR built in. Try hellinger
ps_hellinger <- microbiome::transform(ps, transform = "hellinger")
head(otu_table(ps_hellinger))
## OTU Table: [6 taxa and 15 samples]
## taxa are rows
## SRR6374533 SRR6374534 SRR6374535 SRR6374536
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.00000000 0.00000000 0.000000000 0.00000000
## dfdb2156d075ab8ccd0b76d30c478c15 0.00000000 0.00000000 0.000000000 0.00000000
## f410cddad452190c6acda2076d1aad25 0.00000000 0.00000000 0.000000000 0.00000000
## 3ec9430b7bfb58c08f1516aa34128e49 0.02100981 0.02173098 0.035140903 0.02624036
## 5fb175aa9a607423784f422735aafbb1 0.00000000 0.00000000 0.000000000 0.00000000
## ca18e49de3d4dfd91c1f07d4f531298b 0.00000000 0.00000000 0.007705153 0.00490096
## SRR6374537 SRR6374538 SRR6374539 SRR6374540
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.00000000 0.000000000 0.01324899 0.000000000
## dfdb2156d075ab8ccd0b76d30c478c15 0.00000000 0.000000000 0.00000000 0.004122126
## f410cddad452190c6acda2076d1aad25 0.00000000 0.000000000 0.00000000 0.000000000
## 3ec9430b7bfb58c08f1516aa34128e49 0.01229631 0.000000000 0.01218730 0.034364785
## 5fb175aa9a607423784f422735aafbb1 0.00000000 0.005673605 0.00000000 0.000000000
## ca18e49de3d4dfd91c1f07d4f531298b 0.01017114 0.000000000 0.00000000 0.000000000
## SRR6374541 SRR6374542 SRR6374543 SRR6374544
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.000000000 0 0 0
## dfdb2156d075ab8ccd0b76d30c478c15 0.006593733 0 0 0
## f410cddad452190c6acda2076d1aad25 0.000000000 0 0 0
## 3ec9430b7bfb58c08f1516aa34128e49 0.008927957 0 0 0
## 5fb175aa9a607423784f422735aafbb1 0.000000000 0 0 0
## ca18e49de3d4dfd91c1f07d4f531298b 0.000000000 0 0 0
## SRR6374545 SRR6374546 SRR6374547
## 9f2853dbd08fdc4ca56e5b3aa68d77fd 0.000000000 0.0000000 0
## dfdb2156d075ab8ccd0b76d30c478c15 0.007721329 0.0000000 0
## f410cddad452190c6acda2076d1aad25 0.000000000 0.0208935 0
## 3ec9430b7bfb58c08f1516aa34128e49 0.000000000 0.0000000 0
## 5fb175aa9a607423784f422735aafbb1 0.000000000 0.0000000 0
## ca18e49de3d4dfd91c1f07d4f531298b 0.000000000 0.0000000 0
Principal coordinate analysis (PCoA) using Bray-Curtis
out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "bray")
pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
geom_point(size = 3)
pcoa_plot
Scale axes by Eigenvalues
evals <- out.pcoa$values$Eigenvalues
pcoa_plot.scaled = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
geom_point(size = 3) +
coord_fixed(sqrt(evals[2] / evals[1]))
pcoa_plot.scaled
Principal coordinate analysis (PCoA) using weighted UniFrac PCoA Also scaling axes by eigenvalues
out.pcoa <- ordinate(ps_hellinger, method = "PCoA", distance = "wunifrac")
wuf_pcoa_plot = plot_ordination(ps_hellinger, out.pcoa, color ="Temperature", shape = "Treatment") +
geom_point(size = 3) +
coord_fixed(sqrt(evals[2] / evals[1]))
wuf_pcoa_plot
Non-Metric dimensional scaling (NMDS) ordination using Bray-Curtis
out.nmds <- ordinate(ps_hellinger, method = "NMDS", distance = "bray")
## Run 0 stress 0.06266313
## Run 1 stress 0.06266315
## ... Procrustes: rmse 2.861775e-05 max resid 5.290733e-05
## ... Similar to previous best
## Run 2 stress 0.06266309
## ... New best solution
## ... Procrustes: rmse 5.43692e-05 max resid 0.0001433653
## ... Similar to previous best
## Run 3 stress 0.0626631
## ... Procrustes: rmse 1.759838e-05 max resid 4.138797e-05
## ... Similar to previous best
## Run 4 stress 0.06266309
## ... New best solution
## ... Procrustes: rmse 2.234642e-05 max resid 5.705337e-05
## ... Similar to previous best
## Run 5 stress 0.06266309
## ... Procrustes: rmse 2.010026e-05 max resid 5.081437e-05
## ... Similar to previous best
## Run 6 stress 0.06266309
## ... Procrustes: rmse 1.089042e-05 max resid 2.002234e-05
## ... Similar to previous best
## Run 7 stress 0.06266312
## ... Procrustes: rmse 6.634299e-05 max resid 0.0001762634
## ... Similar to previous best
## Run 8 stress 0.06266309
## ... Procrustes: rmse 9.456853e-06 max resid 1.832632e-05
## ... Similar to previous best
## Run 9 stress 0.06266312
## ... Procrustes: rmse 6.568815e-05 max resid 0.0001712149
## ... Similar to previous best
## Run 10 stress 0.06266317
## ... Procrustes: rmse 0.0001061625 max resid 0.0002718836
## ... Similar to previous best
## Run 11 stress 0.06266312
## ... Procrustes: rmse 5.945858e-05 max resid 0.0001536172
## ... Similar to previous best
## Run 12 stress 0.06266322
## ... Procrustes: rmse 0.0001416553 max resid 0.0003733884
## ... Similar to previous best
## Run 13 stress 0.06266309
## ... Procrustes: rmse 1.170926e-05 max resid 2.417438e-05
## ... Similar to previous best
## Run 14 stress 0.0626631
## ... Procrustes: rmse 4.997966e-05 max resid 0.0001270507
## ... Similar to previous best
## Run 15 stress 0.06266309
## ... Procrustes: rmse 3.413022e-05 max resid 8.058208e-05
## ... Similar to previous best
## Run 16 stress 0.06266312
## ... Procrustes: rmse 7.037463e-05 max resid 0.0001790781
## ... Similar to previous best
## Run 17 stress 0.06266309
## ... Procrustes: rmse 1.164173e-05 max resid 2.667362e-05
## ... Similar to previous best
## Run 18 stress 0.3253169
## Run 19 stress 0.06266311
## ... Procrustes: rmse 1.835599e-05 max resid 3.37214e-05
## ... Similar to previous best
## Run 20 stress 0.06266312
## ... Procrustes: rmse 6.035561e-05 max resid 0.0001589721
## ... Similar to previous best
## *** Solution reached
nmds_plot = plot_ordination(ps_hellinger, out.nmds, color ="Temperature", shape = "Treatment") +
geom_point(size = 3)
nmds_plot
The above shows an apparent impact of temperature. I want to find out which organisms account for this.
Filter out the antibiotic-treated samples. Those are not part of my hypothesis and would influence the interpretation of any results (as seen above).
ps_filtered <- subset_samples(ps, !Treatment %in% c("Pen-Strep")) #reduced from 15 to 9 samples
This uses untransformed data, as it does its own log 2 transformation in the DeSeq calculation
ps_deseq <- phyloseq_to_deseq2(ps_filtered, ~Temperature)
## converting counts to integer mode
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
ps_deseq <- DESeq(ps_deseq)
## estimating size factors
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
## final dispersion estimates
## Note: levels of factors in the design contain characters other than
## letters, numbers, '_' and '.'. It is recommended (but not required) to use
## only letters, numbers, and delimiters '_' or '.', as these are safe characters
## for column names in R. [This is a message, not an warning or error]
## fitting model and testing
First Compare Control (Time zero) to Heat Stress (20C) Pull out results
# pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_20 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "20"))
# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_20)
##
## out of 4203 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up) : 1, 0.024%
## LFC < 0 (down) : 3, 0.071%
## outliers [1] : 262, 6.2%
## low counts [2] : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
The above shows out of 4203 ASVs, there is 1 that increased when comparing t0 to 20 and 3 that decreased. Ie. the one that ‘increased’ is in higher abundance in “t0” than in “20” and the 3 that decreased are in lower abundance in t0 than in 20.
Make a table of the significant results
# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_20 <- deseq_res_temp_T0_20[which(deseq_res_temp_T0_20$padj < 0.01), ]
# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_20_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_20, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_20), ], "matrix"))
# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_20_with_tax_ordered <- sigtab_deseq_res_temp_T0_20_with_tax[order(sigtab_deseq_res_temp_T0_20_with_tax$baseMean, decreasing=T), ]
sigtab_deseq_res_temp_T0_20_with_tax_ordered
## baseMean log2FoldChange lfcSE stat
## f6e60561b232cd176707f8fb39b7cf13 4487.80230 -21.93682 3.568238 -6.147804
## 69a578040fb5a9fff8bc026b7909ba83 1485.91195 -12.75950 2.677898 -4.764742
## 35963c2f13929c0037df65dbf7f3147b 68.57731 -23.73879 3.418301 -6.944616
## 30caf1b5b4dfcdbb1c655c0fee6ad939 10.62314 21.70837 3.909835 5.552246
## pvalue padj Domain
## f6e60561b232cd176707f8fb39b7cf13 7.856320e-10 1.548088e-06 Bacteria
## 69a578040fb5a9fff8bc026b7909ba83 1.890955e-06 1.863063e-03 Bacteria
## 35963c2f13929c0037df65dbf7f3147b 3.794905e-12 1.495572e-08 Bacteria
## 30caf1b5b4dfcdbb1c655c0fee6ad939 2.820225e-08 3.704836e-05 Bacteria
## Phylum Class
## f6e60561b232cd176707f8fb39b7cf13 Proteobacteria Gammaproteobacteria
## 69a578040fb5a9fff8bc026b7909ba83 Proteobacteria Gammaproteobacteria
## 35963c2f13929c0037df65dbf7f3147b Proteobacteria Alphaproteobacteria
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Proteobacteria Gammaproteobacteria
## Order Family
## f6e60561b232cd176707f8fb39b7cf13 Vibrionales Vibrionaceae
## 69a578040fb5a9fff8bc026b7909ba83 Alteromonadales Pseudoalteromonadaceae
## 35963c2f13929c0037df65dbf7f3147b Rhodobacterales Rhodobacteraceae
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Ectothiorhodospirales Thioalkalispiraceae
## Genus Species
## f6e60561b232cd176707f8fb39b7cf13 <NA> <NA>
## 69a578040fb5a9fff8bc026b7909ba83 Pseudoalteromonas <NA>
## 35963c2f13929c0037df65dbf7f3147b <NA> <NA>
## 30caf1b5b4dfcdbb1c655c0fee6ad939 <NA> <NA>
Translation: The above table shows that the first 3 ASVs (Family: Vibrionaceae, Pseudoalteromonadaceae, and Rhodobacteraceae) which have negative log2FoldChange are in lower abundance in the T0 samples than in the 20C sample and the last ASV (Family: Thioalkalispiraceae) are in higher abundance in the T0 than 20C sample.
Check the relative abundance of of these significant ASVs to check that this translationmakes sense.
# pull out the ID of the ASVs that are significant
sig_taxa_T0_20_1 <- row.names(sigtab_deseq_res_temp_T0_20_with_tax_ordered)[1]
# find them in the relative abundance matrix
ind <- which(row.names(as(otu_table(ps_ra), "matrix")) == sig_taxa_T0_20_1)
as(otu_table(ps_ra), "matrix")[ind,]
## SRR6374533 SRR6374534 SRR6374535 SRR6374536 SRR6374537 SRR6374538
## 0.000000000 0.000000000 0.000000000 0.000000000 0.038261368 0.042385907
## SRR6374539 SRR6374540 SRR6374541 SRR6374542 SRR6374543 SRR6374544
## 0.001870131 0.000000000 0.018898140 0.000000000 0.000000000 0.000000000
## SRR6374545 SRR6374546 SRR6374547
## 0.000000000 0.000000000 0.000000000
According to this table, ASV f6e60561b232cd176707f8fb39b7cf13 (from the Vibrionaceae family) is present in samples SRR6374537, SRR6374538, SRR6374539, and SRR6374541. Looking back at the SraRunTable, these are samples 25C1, 25C2, 25C3, and 20C2. So this ASV is never present in the T0 samples… translation makes sense!
Next Compare Heat Stress (20C) to more extreme heat stress (25C)
Pull out results
# pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_20_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "20", "25"))
# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_20_25)
##
## out of 4203 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up) : 1, 0.024%
## LFC < 0 (down) : 1, 0.024%
## outliers [1] : 262, 6.2%
## low counts [2] : 0, 0%
## (mean count < 0)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
Translation: out of 1451 ASVs, there is 1 that increased when comparing temperature “20” to “25” and 1 that decreased. Ie. the one that ‘increased’ is in higher abundance in “20” than in “25” and the one that decreased is in lower abundance in “20” than “25.”
Make a table of the significant results
# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_20_25 <- deseq_res_temp_20_25[which(deseq_res_temp_20_25$padj < 0.01), ]
# pull out their taxonomic affiliation
sigtab_deseq_res_temp_20_25_with_tax <- cbind(as(sigtab_deseq_res_temp_20_25, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_20_25), ], "matrix"))
# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_20_25_with_tax[order(sigtab_deseq_res_temp_20_25_with_tax$baseMean, decreasing=T), ]
## baseMean log2FoldChange lfcSE stat
## 0e746083d8d0dc2ed880fa804824a4df 34.78684 -23.33753 3.909086 -5.970074
## eff3b303fe138b00d481bdb5277b1954 25.59946 20.87645 3.908234 5.341658
## pvalue padj Domain
## 0e746083d8d0dc2ed880fa804824a4df 2.371463e-09 9.345936e-06 Bacteria
## eff3b303fe138b00d481bdb5277b1954 9.210047e-08 1.814840e-04 Bacteria
## Phylum Class
## 0e746083d8d0dc2ed880fa804824a4df Acidobacteria Holophagae
## eff3b303fe138b00d481bdb5277b1954 Bacteroidetes Bacteroidia
## Order
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacterales
## eff3b303fe138b00d481bdb5277b1954 Flavobacteriales
## Family Genus
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacteraceae Acanthopleuribacter
## eff3b303fe138b00d481bdb5277b1954 Flavobacteriaceae Aquimarina
## Species
## 0e746083d8d0dc2ed880fa804824a4df <NA>
## eff3b303fe138b00d481bdb5277b1954 <NA>
The ASV that is in lower abundance in 20 than 25 is (negative log2FoldChange) is Acidobacter>…>Acanthopleuribacter and the one that is in higher abundance in 20 than 25 is Bacteroidetes>…>Aquimarina.
Next Compare Control (Time zero) to extreme heat stress (25C)
Pull out results
# pulling out our results table, we specify the object, the p-value, and what contrast we want to consider by first naming the column, then the two groups we care about
deseq_res_temp_T0_25 <- results(ps_deseq, alpha=0.01, contrast=c("Temperature", "Time-zero", "25"))
# Get a glimpse at what this table currently holds with the summary command
summary(deseq_res_temp_T0_25)
##
## out of 4203 with nonzero total read count
## adjusted p-value < 0.01
## LFC > 0 (up) : 2, 0.048%
## LFC < 0 (down) : 11, 0.26%
## outliers [1] : 262, 6.2%
## low counts [2] : 3158, 75%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results
Translation: out of 4203 ASVs, there are2 that increased when comparing temperature “T0” to “25” and 11 that decreased. Ie. the one that ‘increased’ is in higher abundance in “T0” than in “25” and those that decreased are in lower abundance in T0 than in 25.
Make a table of the significant results
# subset the table to only include those ASVs that pass the specified significance level
sigtab_deseq_res_temp_T0_25 <- deseq_res_temp_T0_25[which(deseq_res_temp_T0_25$padj < 0.01), ]
# pull out their taxonomic affiliation
sigtab_deseq_res_temp_T0_25_with_tax <- cbind(as(sigtab_deseq_res_temp_T0_25, "data.frame"), as(tax_table(ps_filtered)[row.names(sigtab_deseq_res_temp_T0_25), ], "matrix"))
# and sort that table by the baseMean column (incase there was more than one taxon)
sigtab_deseq_res_temp_T0_25_with_tax[order(sigtab_deseq_res_temp_T0_25_with_tax$baseMean, decreasing=T), ]
## baseMean log2FoldChange lfcSE stat
## bce6e2f2b0236981e67783c042ad3de1 46572.86158 -9.787316 2.139044 -4.575556
## f6e60561b232cd176707f8fb39b7cf13 4487.80230 -30.338630 3.567051 -8.505240
## 69a578040fb5a9fff8bc026b7909ba83 1485.91195 -14.057371 2.677975 -5.249254
## e342d3ec847717ebe7db822633dfd75c 842.34421 -13.729633 3.344705 -4.104885
## 52bddf12d54343bc0d17f79b88d9eeb9 315.98674 -11.787775 2.820865 -4.178780
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b 273.25865 -11.824303 2.665469 -4.436106
## c9a1dcd976179ec1b9858b627a57e5fd 254.21000 -11.897320 3.149633 -3.777367
## 7040bb987094c28000f0501eae1c243c 201.53088 -11.516590 2.690986 -4.279691
## e7e8f6f87af549b487d3e470afe74f8b 76.83352 -10.203777 2.709211 -3.766327
## 35963c2f13929c0037df65dbf7f3147b 68.57731 -21.811228 3.424490 -6.369190
## 0e746083d8d0dc2ed880fa804824a4df 34.78684 -29.859647 3.909151 -7.638396
## eff3b303fe138b00d481bdb5277b1954 25.59946 21.476985 3.908935 5.494332
## 30caf1b5b4dfcdbb1c655c0fee6ad939 10.62314 26.708329 3.909900 6.830949
## pvalue padj Domain
## bce6e2f2b0236981e67783c042ad3de1 4.749580e-06 5.312745e-04 Bacteria
## f6e60561b232cd176707f8fb39b7cf13 1.812200e-17 1.418952e-14 Bacteria
## 69a578040fb5a9fff8bc026b7909ba83 1.527168e-07 1.992954e-05 Bacteria
## e342d3ec847717ebe7db822633dfd75c 4.045153e-05 2.879414e-03 Bacteria
## 52bddf12d54343bc0d17f79b88d9eeb9 2.930767e-05 2.294791e-03 Bacteria
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b 9.160089e-06 8.965437e-04 Bacteria
## c9a1dcd976179ec1b9858b627a57e5fd 1.584950e-04 9.978220e-03 Bacteria
## 7040bb987094c28000f0501eae1c243c 1.871526e-05 1.628228e-03 Bacteria
## e7e8f6f87af549b487d3e470afe74f8b 1.656665e-04 9.978220e-03 Bacteria
## 35963c2f13929c0037df65dbf7f3147b 1.900293e-10 3.719824e-08 Bacteria
## 0e746083d8d0dc2ed880fa804824a4df 2.199440e-14 8.610807e-12 Bacteria
## eff3b303fe138b00d481bdb5277b1954 3.921921e-08 6.141729e-06 Bacteria
## 30caf1b5b4dfcdbb1c655c0fee6ad939 8.435465e-12 2.201656e-09 Bacteria
## Phylum Class
## bce6e2f2b0236981e67783c042ad3de1 Proteobacteria Gammaproteobacteria
## f6e60561b232cd176707f8fb39b7cf13 Proteobacteria Gammaproteobacteria
## 69a578040fb5a9fff8bc026b7909ba83 Proteobacteria Gammaproteobacteria
## e342d3ec847717ebe7db822633dfd75c Proteobacteria Gammaproteobacteria
## 52bddf12d54343bc0d17f79b88d9eeb9 Proteobacteria Alphaproteobacteria
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b Proteobacteria Gammaproteobacteria
## c9a1dcd976179ec1b9858b627a57e5fd Proteobacteria Gammaproteobacteria
## 7040bb987094c28000f0501eae1c243c Proteobacteria Gammaproteobacteria
## e7e8f6f87af549b487d3e470afe74f8b Proteobacteria Alphaproteobacteria
## 35963c2f13929c0037df65dbf7f3147b Proteobacteria Alphaproteobacteria
## 0e746083d8d0dc2ed880fa804824a4df Acidobacteria Holophagae
## eff3b303fe138b00d481bdb5277b1954 Bacteroidetes Bacteroidia
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Proteobacteria Gammaproteobacteria
## Order
## bce6e2f2b0236981e67783c042ad3de1 Vibrionales
## f6e60561b232cd176707f8fb39b7cf13 Vibrionales
## 69a578040fb5a9fff8bc026b7909ba83 Alteromonadales
## e342d3ec847717ebe7db822633dfd75c Alteromonadales
## 52bddf12d54343bc0d17f79b88d9eeb9 Rhodobacterales
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b Alteromonadales
## c9a1dcd976179ec1b9858b627a57e5fd Oceanospirillales
## 7040bb987094c28000f0501eae1c243c Oceanospirillales
## e7e8f6f87af549b487d3e470afe74f8b Rhodobacterales
## 35963c2f13929c0037df65dbf7f3147b Rhodobacterales
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacterales
## eff3b303fe138b00d481bdb5277b1954 Flavobacteriales
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Ectothiorhodospirales
## Family Genus
## bce6e2f2b0236981e67783c042ad3de1 Vibrionaceae <NA>
## f6e60561b232cd176707f8fb39b7cf13 Vibrionaceae <NA>
## 69a578040fb5a9fff8bc026b7909ba83 Pseudoalteromonadaceae Pseudoalteromonas
## e342d3ec847717ebe7db822633dfd75c Colwelliaceae Thalassotalea
## 52bddf12d54343bc0d17f79b88d9eeb9 Rhodobacteraceae <NA>
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b Pseudoalteromonadaceae Pseudoalteromonas
## c9a1dcd976179ec1b9858b627a57e5fd Nitrincolaceae Nitrincola
## 7040bb987094c28000f0501eae1c243c Nitrincolaceae Neptuniibacter
## e7e8f6f87af549b487d3e470afe74f8b Rhodobacteraceae Thalassobius
## 35963c2f13929c0037df65dbf7f3147b Rhodobacteraceae <NA>
## 0e746083d8d0dc2ed880fa804824a4df Acanthopleuribacteraceae Acanthopleuribacter
## eff3b303fe138b00d481bdb5277b1954 Flavobacteriaceae Aquimarina
## 30caf1b5b4dfcdbb1c655c0fee6ad939 Thioalkalispiraceae <NA>
## Species
## bce6e2f2b0236981e67783c042ad3de1 <NA>
## f6e60561b232cd176707f8fb39b7cf13 <NA>
## 69a578040fb5a9fff8bc026b7909ba83 <NA>
## e342d3ec847717ebe7db822633dfd75c <NA>
## 52bddf12d54343bc0d17f79b88d9eeb9 <NA>
## 8ebb2aa2bdc4fdcf65b09b8806bc1d8b <NA>
## c9a1dcd976179ec1b9858b627a57e5fd <NA>
## 7040bb987094c28000f0501eae1c243c <NA>
## e7e8f6f87af549b487d3e470afe74f8b Thalassobius maritimus
## 35963c2f13929c0037df65dbf7f3147b <NA>
## 0e746083d8d0dc2ed880fa804824a4df <NA>
## eff3b303fe138b00d481bdb5277b1954 <NA>
## 30caf1b5b4dfcdbb1c655c0fee6ad939 <NA>
There are lots of Vibrionaceae here that are higher in the heat stressed (25C) samples than T0. Also Pseudoalteromonadaceae, Nitrincolaceae, Rhodobacteraceae,
Since there are several ASVs here, you can also visualize these results in a plot
# Incase you have many ASVs, define enough colors for the plot
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
scale_fill_brewer(palette = palname, ...)
}
# And plot the families, color coding by Class
x = tapply(sigtab_deseq_res_temp_T0_25_with_tax$log2FoldChange, sigtab_deseq_res_temp_T0_25_with_tax$Family, function(x) max(x))
x = sort(x, TRUE)
sigtab_deseq_res_temp_T0_25_with_tax$Family = factor(as.character(sigtab_deseq_res_temp_T0_25_with_tax$Family), levels=names(x))
ggplot(sigtab_deseq_res_temp_T0_25_with_tax, aes(x=Family, y=log2FoldChange, color=Class)) + geom_point(size=6) +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))
Export the data tables as count tables (with no transformation) and without any aggregation.
For this study, I am excluding the antiobiotic samples, as those don’t really fir the meta-analysis study. I only want the samples from different temperatures (time zero, 20, and 25C).
First, filter out any ASVs without a species or genus and agglomerate at the genus-level. This will allow for comparisons to other datasets
# Make a brief descriptive file
descript_file<-writeLines(c("The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
The ASV count table was produced using Qiime2 (v. 2020.02), calling DADA2 for denoising, merging, and ASV inference, and a Silva v132 Naive
Bayes classifier for taxonomy calling.
Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
impacts on microbiome.
The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining
samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two
different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.
qiime2 analysis file (done in Jupyter notebook) = Qiime2_PRJNA421986.ipynb
processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_QIIME2_postanalysis.Rmd"))
## The phyloseq object in this file are data processed from Green et al. 2019 (doi: 10.1007/s00248-018-1242-9), NCBI BioProject # PRJNA421986.
## The ASV count table was produced using Qiime2 (v. 2020.02), calling DADA2 for denoising, merging, and ASV inference, and a Silva v132 Naive
## Bayes classifier for taxonomy calling.
## Samples were collected from Pacific oyster (Crassostrea gigas) samples in Australia. AUthors performed a simulated heat wave in mesocosms to test
## impacts on microbiome.
## The phyloseq object, ps_filtered, was filtered to remove a subset of samples that were also used to test the impact of anitibiotics. The remaining
## samples here were not treated with antibiotics. The remaining samples correspond to a time-zero (field) collection and a time-4day collection from two
## different heat stress conditions (20˚C and 25˚C). Each assay has biological triplicates and thus there are 9 total samples.
## On day 4 in the 25˚C treatment, the oysters were showing signs of mortality.
## The metadata in this file come from the SraRunTable and have been modified to clearly indicate temperature and mortality data.
## The otu_table object in the phyloseq object contains absolute abundaces of ASVs, without any tranformation or agglomeration at higher taxonomic levels.
##
## qiime2 analysis file (done in Jupyter notebook) = Qiime2_PRJNA421986.ipynb
##
## processing of qiime2 and preliminary stats file (done in R) = PRJNA421986_PacificOyster_QIIME2_postanalysis.Rmd
# Export as .RData
save(ps_filtered, descript_file, file = "../../Processed/PRJNA421986_qiime2_processed_data.RData")