Detection of large chromosomal inversions with chromsome conformation capture sequencing (Hi-C) data — a usage example
-
Load the package
data.table
. This code has been tested with version 1.9.6 and 1.10.4 ofdata.table
. More information ondata.table
can be found here.library(data.table)
-
Read the R object holding the Hi-C link information for Morex and Barke from a file. This file can be downloaded from the Plant Genomics and Phenomics Research Data Repository.
readRDS('hic_links_morex_barke.Rds') -> hic_links_morex_barke
-
Read the pseudomolecules lengths of the reference genome sequence assembly of barley cultivar Morex. This file is available in a public Bitbucket repository.
fread('barley_chromosome_lengths.txt') -> chromosome_lengths
-
Read the function
findInversions()
from a file. This file is available in a public Bitbucket repository.source('findInversions.R')
-
Run the inversion detection by calling
findInversions()
.morex_barke_inversions <- findInversions(links=hic_links_morex_barke, genotypes=c("Morex", "Barke"), chromosome_lengths=chromosome_lengths, maxdist=1e8, multiplier=1e8, cab=c(1,2), rsquare=0.4, coefficient=-2, dcab=1.8)
findInversions()
takes the following parameters:- links
-
A data.table in long format with the columns as in the example
hic_links_morex_barke
. - genotypes
-
A character vector with exactly two elements listing the genotypes to compare. The genotypes should occurs in
genotype
column oflinks
. - chromosome_lengths
-
A two-column data.table indicating the length (in bp) of each chromosome.
- maxdist
-
Hi-C links between genomic bins separated by more than
maxdist
bp will be ignored. - multiplier
-
Effectively a depth threshold: a multiplier of
1e8
(100 million) means that pairs of genomic bins with less than 1 link per 100 total million links will be discarded. - cab
-
Numeric vector with two elements indicating the
CAB
values of genotypes 1 and 2 at boundaries of candidate inversions as described in Himmelbach et al. 2017. Smaller values and of the first element or higher values of the second element increase stringency. - rsquare
-
Minimum adjusted
r2
value of the linear model described in Himmelbach et al., 2017. Higher values will result in more stringent filtering. - coefficient
-
Maximum coefficient of the linear model described in Himmelbach et al., 2017. Smaller values impose a more stringent filter.
- dcab
-
Minimum absolute difference between
CAB
of genotypes 1 and 2 at boundaries of inversions. Higher values increase stringency.
-
Plot the directionality biases along the genome and highlight putative inversions.
pdf("Inversions_between_Morex_and_Barke.pdf", height=10, width=7) par(mfrow=c(7,1)) par(oma=c(4,0,0,0)) par(mar=c(0,4,5,1)) ylim <- 300 with(morex_barke_inversions, lapply(1:7, function(i){ directionality_bias[chr == i, plot(bin/1e6, directionality_bias, pch=20, las=1, col=1, ylim=c(-ylim, ylim), bty='l', ylab="D", xlab="")] if(nrow(putative_inversions[chr == i]) > 0) putative_inversions[chr == i, rect(col="#FF000011", border='red', start/1e6, -2*ylim, end/1e6, 2*ylim)] title(paste0(i, "H"), line=1.4) })) title(outer=T, xlab="genomic position (Mb)", xpd=NA) dev.off()