Reading in Complicated CSV to R -
i trying read in following .csv file r. can see imagine below, row 2 has unique variable names, while row 3 has values above variables. rows 2/3 represent 1 observation. process continues, row 4 line of variable names , row 5 corresponds variable values. process continues each two-row pair (2/3, 4/5, 6/7....999/1000) represent 1 observation. there 1,000 observations total in data set.
what having trouble reading r have more usable dataset. goal have standard set of variable names across top row, , each subsequent line representing 1 observation.
any expert r coders have suggestions?
thank you,
here solution worked on simple test case made. you'd need import data data.frame, x = read.csv(file="your-file.csv")
to test though, used test data.frame, x:
x=structure(list(v1 = structure(c(2l, 1l, 4l, 3l), .label = c("1", "a", "ab", "h"), class = "factor"), v2 = structure(c(2l, 1l, 4l, 3l), .label = c("2", "b", "cd", "i"), class = "factor"), v3 = structure(c(3l, 1l, 2l, 4l), .label = c("3", "a", "c", "ef"), class = "factor"), v4 = structure(c(3l, 1l, 2l, 4l ), .label = c("4", "b", "d", "gh"), class = "factor"), v5 = structure(c(3l, 1l, 2l, 4l), .label = c("5", "c", "e", "ij"), class = "factor"), v6 = structure(c(3l, 1l, 2l, 4l), .label = c("6", "d", "f", "kl"), class = "factor"), v7 = structure(c(3l, 1l, 2l, 4l ), .label = c("7", "e", "g", "mno"), class = "factor")), .names = c("v1", "v2", "v3", "v4", "v5", "v6", "v7"), class = "data.frame", row.names = c(na, -4l))
which turns table (rows 1 , 3 labels):
v1 v2 v3 v4 v5 v6 v7 1 b c d e f g 2 1 2 3 4 5 6 7 3 h b c d e 4 ab cd ef gh ij kl mno
using script generate final data.frame dat
:
library(plyr) variables = x[seq(1,nrow(x),2),] #df of variable rows values = x[seq(2,nrow(x),2),] #df of value rows dat=data.frame() #generate blank data.frame for(i in 1:nrow(variables)) { dat.temp=data.frame(values[i,])#make temporary df row of values colnames(dat.temp)=as.matrix(variables[i,]) # name temporary df row of variables print(dat.temp) #check coming out right (comment out necessary) dat=rbind.fill(dat,dat.temp) #create final data.frame rm(dat.temp) #remove temporary df }
into final table (variables column names now):
b c d e f g h 1 1 2 3 4 5 6 7 <na> <na> 2 ef gh ij kl mno <na> <na> ab cd
hope works.
Comments
Post a Comment