Click here to Skip to main content
15,888,089 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am using a large dataset and I am not used to using one this big (286,212 rows, 19 columns) and I am not sure how to go about my problem. the data is made up of values for each day of the year for 782 grid references and I have this for 15 years. It looks as follows
**Month  Day  Grid   x2004    x2005    x2006     x2007**
 1       1    A10    0.091   0.134     NA       0.066
 1       2    A10    0.12    0.10      0.23     0.054
 1       3    A10    0.55    NA        NA       0.08
 1       1    B10    NA      0.134     NA       0.17
 1       2    B10    0.14    0.151     NA       0.21
 1       3    B10    0.43    0.162     0.24      NA

However some of the days are missing and I want to insert the mean of that day for that specific grid using values from the other years. So if the Grid A10 for day 1 in 2006 is missing. I want to insert the mean for day 1 grid A10 from 2004, 2005, 2007, in this case 0.097.

What I have tried:

I am trying the following code
x<-for(i in 1:ncol(data)){
  data[is.na(data[,i]) ,i] <- mean(data[,i], na.rm = TRUE)
}

but it seems to be finding the column mean i think and adding it in. I have also tried to change it to
x<-for(i in 1:nrow(data)){
data[is.na(data[i,]) ,i] <- mean(data[i,], na.rm = TRUE)
}

and that didn't work either. I have already asked on stackoverflow but have not got a solution yet. I am not a computer programmer, and this is the last bit of coding I need to do the stats analysis for my PhD so I am quite desperate to figure it out, but I am just not sure how to go about it. Iknow this forum is for other progamming languages but Please help if you can.
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900