seqOutBias

Vignette

by Guertin Lab

Contents

Scaling Tissue Accessible Chromatin (TACh) Benzonase and Cyanase Digested DNA

Many enzymes nick DNA with distinct sequence biases. Here we characterize the biases of Benzonase and Cyanase and show that seqOutBias can scale DNA digestion data resulting from Benzonase and Cyanase treatment (Grøntved et al., 2012).

Retrieving TACh-seq data from mouse liver

mkdir ~/TACh_Grontved
cd ~/TACh_Grontved 
url=ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX174/ 
wget ${url}SRX174756/SRR535737/SRR535737.sra
wget ${url}SRX174757/SRR535738/SRR535738.sra
wget ${url}SRX174757/SRR535739/SRR535739.sra
wget ${url}SRX174758/SRR535740/SRR535740.sra
wget ${url}SRX174761/SRR535744/SRR535744.sra
wget ${url}SRX174760/SRR535742/SRR535742.sra
wget ${url}SRX174760/SRR535743/SRR535743.sra
wget ${url}SRX174759/SRR535741/SRR535741.sra
wget ${url}SRX174755/SRR535735/SRR535735.sra
wget ${url}SRX174755/SRR535736/SRR535736.sra

for i in *sra
do
    fastq-dump $i
done

mv SRR535737.fastq mm10_liver_Benzonase0.25U.fastq
mv SRR535738.fastq mm10_liver_Benzonase1U_1.fastq
mv SRR535739.fastq mm10_liver_Benzonase1U_2.fastq
mv SRR535740.fastq mm10_liver_Benzonase4U.fastq
mv SRR535741.fastq mm10_liver_Cyanase0.25U.fastq
mv SRR535742.fastq mm10_liver_Cyanase1U_1.fastq
mv SRR535743.fastq mm10_liver_Cyanase1U_2.fastq
mv SRR535744.fastq mm10_liver_Cyanase4U.fastq
mv SRR535735.fastq DNaseI_a.fastq 
mv SRR535736.fastq DNaseI_b.fastq
cat *Benz* > mm10_liver_Benzonase.fastq 
cat *Cyan* > mm10_liver_Cyanase.fastq
cat DNaseI_*.fastq > mm10_liver_DNase.fastq 
gzip *ase.fastq
rm *fastq rm *sra

Index the mm10 genome file and align to mm10

Retrieve the compressed mm10 genome from UCSC (Karolchik et al., 2014). This is a 2bit compressed file and needs to be converted to a fasta using twoBitToFa from http://hgdownload.soe.ucsc.edu/admin/exe/.

wget http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/mm10.2bit
twoBitToFa mm10.2bit mm10.fa
bowtie2-build mm10.fa mm10

for fq in *.fastq.gz
do
  name=$(echo $fq | awk -F".fastq.gz" '{print $1}')
  echo $name
  bowtie2 -x mm10 -U $fq -S $name.sam
  samtools view -b $name.sam | samtools sort - $name
  rm $name.sam
done

Using seqOutBias to determine the sequence preference for Cyanase and Benzonase

for bam in mm10_liver*.bam
do
    name=$(echo $bam | awk -F"mm10_liver_" '{print $NF}' | awk -F".bam" '{print $1}')
    echo $name
    seqOutBias mm10.fa $bam --no-scale --bw=${name}_0-mer.bigWig --shift-counts --skip-bed --read-size=35 
    seqOutBias mm10.fa $bam --kmer-mask=NNNCNNN --bw=${name}_6-mer.bigWig --shift-counts --read-size=35 
    seqOutBias mm10.fa $bam --kmer-mask=NNNNCNNNN --bw=${name}_8-mer.bigWig --shift-counts --read-size=35
done

mv DNase_0-mer.bigWig DNase_mouse_0-mer.bigWig 
mv DNase_6-mer.bigWig DNase_mouse_6-mer.bigWig 
mv DNase_8-mer.bigWig DNase_mouse_8-mer.bigWig

mkdir dnase
mkdir benzonase
mkdir cyanase
mv Benzonase*bigWig benzonase 
mv Cyanase*bigWig cyanase
mv DNase*bigWig dnase

Retrieving ChIP-seq binding and sequence motif data for mouse liver

To look at composite footprints that are the result of transcription factors binding to DNA in the context of chromatin, we need to first find all the regions bound by the factor. We will retrieve TF binding data from several sources (Seo et al., 2009; Stamatoyannopoulos et al., 2012; Grøntved et al., 2013). Then we convert the peak files to the latest genome assembly.

#CTCF
wget https://www.encodeproject.org/files/ENCFF001YAM/@@download/ENCFF001YAM.bed.gz
mv ENCFF001YAM.bed.gz CTCF.mm9.bed.gz
#FOXA2
url=ftp://ftp.ncbi.nlm.nih.gov/geo/series/
wget ${url}GSE25nnn/GSE25836/suppl/GSE25836_Mouse_Liver_FOXA2_GLITR_1p5_FDR.bed.gz
mv GSE25836_Mouse_Liver_FOXA2_GLITR_1p5_FDR.bed.gz FOXA2.mm8.bed.gz
#CEBP-beta
wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE46nnn/GSE46047/suppl/GSE46047%5FCEBPb%5Fpeaks%5Fveh%5Fmouse%5Fliver%5Fmm9%2Etxt%2Egz 
mv GSE46047_CEBPb_peaks_veh_mouse_liver_mm9.txt.gz CEBP-beta_temp.mm9.bed.gz

wget http://hgdownload.cse.ucsc.edu/goldenPath/mm9/liftOver/mm9ToMm10.over.chain.gz 
wget http://hgdownload.cse.ucsc.edu/goldenPath/mm8/liftOver/mm8ToMm10.over.chain.gz

gunzip *.over.chain.gz
gunzip *mm*bed.gz
tail +2 CEBP-beta_temp.mm9.bed > CEBP-beta.mm9.bed 
rm CEBP-beta_temp.mm9.bed

for peak in *mm8.bed
do
    name=$(echo $peak | awk -F".mm8" '{print $1}')
    echo $name
    liftOver $peak mm8ToMm10.over.chain $name.mm10.bed $name.mm10.unmapped.txt -bedPlus=6 
    fastaFromBed -fi mm10.fa -bed $name.mm10.bed -fo $name.mm10.fasta
done

for peak in *mm9.bed
do
    name=$(echo $peak | awk -F".mm9" '{print $1}')
    echo $name
    liftOver $peak mm9ToMm10.over.chain $name.mm10.bed $name.mm10.unmapped.txt -bedPlus=6 
    fastaFromBed -fi mm10.fa -bed $name.mm10.bed -fo $name.mm10.fasta
done

head -9 ~/DNase_ENCODE/motif_databases/JASPAR/JASPAR_CORE_2016_vertebrates.meme > header_meme_temp.txt

grep -i -A 16 'MOTIF MA0047.2 FOXA2' ~/DNase_ENCODE/motif_databases/JASPAR/JASPAR_CORE_2016.meme > foxa2_temp.txt 
grep -i -A 23 'MOTIF MA0139.1 CTCF' ~/DNase_ENCODE/motif_databases/JASPAR/JASPAR_CORE_2016.meme > ctcf_temp.txt 
grep -i -A 15 'MOTIF MA0466.2 CEBPB' ~/DNase_ENCODE/motif_databases/JASPAR/JASPAR_CORE_2016.meme > cebpb_temp.txt 
cat header_meme_temp.txt foxa2_temp.txt > FOXA2_minimal_meme.txt
cat header_meme_temp.txt ctcf_temp.txt > CTCF_minimal_meme.txt
cat header_meme_temp.txt cebpb_temp.txt > CEBP-beta_minimal_meme.txt rm *temp.txt

for meme in *.mm10.fasta
do
    name=$(echo $meme | awk -F".mm10.fasta" '{print $1}')
    echo $name
    mast ${name}_minimal_meme.txt $meme -hit_list -mt 0.0001 > ${name}_mast.txt 
    fimo --thresh 0.0001 --text ${name}_minimal_meme.txt mm10.fa > ${name}_fimo.txt ceqlogo -i1 ${name}_minimal_meme.txt -o ${name}_logo.eps -N -Y
done

Plotting Benzonase and Cyanase composites using R

source('https://raw.githubusercontent.com/guertinlab/seqOutBias/master/docs/R/seqOutBias_functions.R')

#note that the full path is needed to the directory containing the bigWigs
all.composites.cyanase = cycle.fimo.new.not.hotspots(path.dir.mast = '~/TACh_Grontved/', path.dir.bigWig = '/Users/guertinlab/TACh_Grontved/cyanase/', window = 30, exp = 'Cyanase')
all.composites.benzonase = cycle.fimo.new.not.hotspots(path.dir.mast = '~/TACh_Grontved/', path.dir.bigWig = '/Users/guertinlab/TACh_Grontved/benzonase/', window = 30, exp = 'Benzonase')
all.composites.dnase = cycle.fimo.new.not.hotspots(path.dir.mast = '~/TACh_Grontved/', path.dir.bigWig = '/Users/guertinlab/TACh_Grontved/dnase/', window = 30, exp = 'DNase_mm10')

save(all.composites.cyanase, all.composites.benzonase, all.composites.dnase, file = 'MOUSE_composites.Rdata')

composites.func.panels.naked.chromatin(all.composites.benzonase[all.composites.benzonase$cond == 'Benzonase_0-mer' | all.composites.benzonase$cond == 'Benzonase_8-mer',], fact= "Benzonase8", summit= "Motif",num = 24,
    col.lines = c(rgb(0,0,1,1/2), rgb(0,0,0,1/2)), fill.poly = c(rgb(0,0,1,1/4), rgb(0,0,0,1/4)))

composites.func.panels.naked.chromatin(all.composites.cyanase[all.composites.cyanase$cond == 'Cyanase_0-mer' | all.composites.cyanase$cond == 'Cyanase_8-mer',], fact= "Cyanase8", summit= "Motif",num = 24,
    col.lines = c(rgb(0,0,1,1/2), rgb(0,0,0,1/2)), fill.poly = c(rgb(0,0,1,1/4), rgb(0,0,0,1/4)))

composites.func.panels.naked.chromatin(all.composites.dnase[all.composites.dnase$cond == 'DNase_mouse_0-mer' | all.composites.dnase$cond == 'DNase_mouse_6-mer',], fact= "Dnase6", summit= "Motif",num = 24,
    col.lines = c(rgb(0,0,1,1/2), rgb(0,0,0,1/2)), fill.poly = c(rgb(0,0,1,1/4), rgb(0,0,0,1/4)))