r - Break region into smaller regions based on cutoff -
this assume simple programming issue, i've been struggling it. because don't know right words use, perhaps?
given set of "ranges" (in form of 1-a set of numbers below, 2-iranges, or 3-genomicranges), i'd split set of smaller ranges.
example beginning:
chr start end 1 1 10000 2 1 5000
example size of breaks: 2000
new dataset:
chr start end 1 1 2000 1 2001 4000 1 4001 6000 1 6001 8000 1 8001 10000 2 1 2000 2 2001 4000 2 4001 5000
i'm doing in r. know generate these seq
, i'd able based on list/df of regions instead of having manually every time have new list of regions.
here's example i've made using seq:
given 22 chromosomes, loop through them , break each pieces
# initialize df regions <- data.frame(chromosome = c(), start = c(), end = c()) # each row, following for(i in 1:nrow(chromosomes)){ # create sequence minimum start max end value breks <- seq(min(chromosomes$start[chromosomes$chromosome == i]), max(chromosomes$end[chromosomes$chromosome == i]), by=2000000) # put dataframe database <- data.frame(chromosome = i, start = breks, end = c(breks[2:length(breks)]-1, max(chromosomes$end[chromosomes$chromosome == i]))) # bind have regions <- rbind(regions, database) rm(database) }
this works fine, i'm wondering if there built package one-liner or more flexible, has limitations.
using r / bioconductor package genomicranges, here initial ranges
library(genomicranges) rngs = granges(1:2, iranges(1, c(10000, 5000)))
and create sliding window across genome, generated first list (one set of tiles per chromosome) , unlisted format have in question
> windows = slidingwindows(rngs, width=2000, step=2000) > unlist(windows) granges object 8 ranges , 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] 1 [ 1, 2000] * [2] 1 [2001, 4000] * [3] 1 [4001, 6000] * [4] 1 [6001, 8000] * [5] 1 [8001, 10000] * [6] 2 [ 1, 2000] * [7] 2 [2001, 4000] * [8] 2 [4001, 5000] * ------- seqinfo: 2 sequences unspecified genome; no seqlengths
coerce / data.frame as(df, "granges")
or as(unlist(tiles), "data.frame")
.
find @ ?"slidingwindows,genomicranges-method"
(tab completion friend, ?"slidingw<tab>
).
embarrassingly, seems implemented in 'devel' version of genomicranges (v. 1.25.93?); tile
similar rounds width of ranges approximately equal while spanning width of granges. here poor-man's version
windows <- function(gr, width, withmcols=false) { starts <- map(seq, start(rngs), end(rngs), by=width) ends <- map(function(starts, len) c(tail(starts, -1) - 1l, len), starts, end(gr)) seq <- rep(seqnames(gr), lengths(starts)) strand <- rep(strand(gr), lengths(starts)) result <- granges(seq, iranges(unlist(starts), unlist(ends)), strand) seqinfo(result) <- seqinfo(gr) if (withmcols) { idx <- rep(seq_len(nrow(gr)), lengths(starts)) mcols(result) = mcols(gr)[idx,,drop=false] } result }
invoked as
> windows(rngs, 2000)
if approach useful, consider asking follow-up questions on bioconductor support site.
Comments
Post a Comment