ggplot2 - R ggplot Coincidence plot -
i'm working on database of patients multiple conditions , trying create graphic showing associations between these conditions. more specifically, i'd obtain below:
my data organized as:
mal1 mal2 mal3 etc. 0 0 1 1 1 0 0 1 0 etc.
i create data want shown using following code:
x <- as.matrix(hdat2) out <- crossprod(x) diag(out) <- 0
and create plot with:
out<- melt(out) out$value[which(out$value==0)]<-na g <- ggplot(data.frame(out), aes(var1, var2)) + geom_point(aes(size = value), colour = "black") + theme_bw() + xlab("") + ylab("") g + scale_size_continuous(range=c(2,10))+
as result obtain plot
i'd hide symetric half of plot, think misleading (similarly how, correlation matrices can hide symetric half). however, i'm not sure how it.
could ? thanks
first, reproducible data:
mat <- data.frame( mala = sample(0:1, 100, true, c(0.2,0.8)) , malb = sample(0:1, 100, true, c(0.3,0.7)) , malc = sample(0:1, 100, true, c(0.4,0.6)) , mald = sample(0:1, 100, true, c(0.5,0.5)) ) out <- crossprod(as.matrix(mat)) diag(out) <- 0
here example limiting down half interested in using dplyr
:
toplothalf <- melt(out) %>% mutate_each(funs(factor(.)) , starts_with("var")) %>% filter(as.numeric(var1) < as.numeric(var2)) ggplot(toplothalf , aes(var1, var2)) + geom_point(aes(size = value), colour = "black") + theme_bw() + xlab("") + ylab("") + scale_size_continuous(range=c(2,10))
note, however, in way plot going dominated particular maladies common. alternatively, can present percentage of people each malady have other malady (note reciprocal points not (necessarily) same size:
toplot <- prop.table(out, 1) %>% melt() %>% filter(value > 0) ggplot(toplot , aes(var1, var2)) + geom_point(aes(size = value), colour = "black") + theme_bw() + xlab("") + ylab("") + scale_size_continuous(range=c(2,10))
Comments
Post a Comment