Reading in Complicated CSV to R -


i trying read in following .csv file r. can see imagine below, row 2 has unique variable names, while row 3 has values above variables. rows 2/3 represent 1 observation. process continues, row 4 line of variable names , row 5 corresponds variable values. process continues each two-row pair (2/3, 4/5, 6/7....999/1000) represent 1 observation. there 1,000 observations total in data set.

what having trouble reading r have more usable dataset. goal have standard set of variable names across top row, , each subsequent line representing 1 observation.

any expert r coders have suggestions?

thank you,

csv image

here solution worked on simple test case made. you'd need import data data.frame, x = read.csv(file="your-file.csv")

to test though, used test data.frame, x:

x=structure(list(v1 = structure(c(2l, 1l, 4l, 3l), .label = c("1",  "a", "ab", "h"), class = "factor"), v2 = structure(c(2l, 1l,  4l, 3l), .label = c("2", "b", "cd", "i"), class = "factor"),      v3 = structure(c(3l, 1l, 2l, 4l), .label = c("3", "a", "c",      "ef"), class = "factor"), v4 = structure(c(3l, 1l, 2l, 4l     ), .label = c("4", "b", "d", "gh"), class = "factor"), v5 = structure(c(3l,      1l, 2l, 4l), .label = c("5", "c", "e", "ij"), class = "factor"),      v6 = structure(c(3l, 1l, 2l, 4l), .label = c("6", "d", "f",      "kl"), class = "factor"), v7 = structure(c(3l, 1l, 2l, 4l     ), .label = c("7", "e", "g", "mno"), class = "factor")), .names = c("v1",  "v2", "v3", "v4", "v5", "v6", "v7"), class = "data.frame", row.names = c(na,  -4l)) 

which turns table (rows 1 , 3 labels):

  v1 v2 v3 v4 v5 v6  v7 1   b  c  d  e  f   g 2  1  2  3  4  5  6   7 3  h    b  c  d   e 4 ab cd ef gh ij kl mno 

using script generate final data.frame dat:

library(plyr) variables = x[seq(1,nrow(x),2),] #df of variable rows values = x[seq(2,nrow(x),2),] #df of value rows dat=data.frame() #generate blank data.frame for(i in 1:nrow(variables)) {     dat.temp=data.frame(values[i,])#make temporary df row of values     colnames(dat.temp)=as.matrix(variables[i,]) # name temporary df row of variables     print(dat.temp) #check coming out right (comment out necessary)     dat=rbind.fill(dat,dat.temp) #create final data.frame     rm(dat.temp) #remove temporary df } 

into final table (variables column names now):

    b  c  d   e    f    g    h    1  1  2  3  4   5    6    7 <na> <na> 2 ef gh ij kl mno <na> <na>   ab   cd 

hope works.


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -