Thursday, September 30, 2010

Eval, Parse, and a little magic in R

Magic in the sense it really saved me loads of troubles.

So following my last post, where I could parse a model specification into a list of useful things. Now let's use the parse in a MLE function.

So parsepara() returns a list, for example:

$input
[1] "145, 8, 145, ?6/7/3, ?0.7|0.3|0.9, ?100|150|60, ?0.05|0.15|0.01"

$paralist
     145        8      145                                  
   "145"      "8"    "145" "par[1]" "par[2]" "par[3]" "par[4]"

$initpara
    6   0.7   100  0.05
6e+00 7e-01 1e+02 5e-02

$upper
[1]   7.00   0.90 150.00   0.15

$lower
[1]  3.00  0.30 60.00  0.01


$paralist is the most important part. Note it's a list of strings, even for numerals. You can use this to create a log-likelihood function where it can take any combinations of fix/free/tied parameters. I also used it for drawing the model -- I needed to use the estimated parameters to generate a theoretical line. To do so I also needed both free and fixed parameters. Fortunately I can just replace "par" with anything that gives me the MLE parameters. See the code below, look for "gsub(...) function.

$initpara is the "real" parameter list, one that will be used by optim(). It contains ONLY free parameters. Look at the code to see how tied parameters are handled. But the length of $initpara is the # of parameters of the model.

$upper/$lower are the bounds. Doesn't matter the order you specify them; the larger will be upper, and smaller be lower. And if you don't specify, you get Inf/-Inf.

I highlighted places where some tricks are used.


fitll2_2 <- function(data, strpara, maxtime=1000, ylim=0.010) {
if (missing(data)) {
cat("Data should be a vector of SRT/fixation duration\n");
return()
} else if (class(data)!=  "numeric" & class(data)!=  "list") {
cat("Data should be a vector or a list of vectors of SRT/fixation duration\n");
return()
} else if (missing(strpara)) {
cat("Please supply the correct parameters; see 'parsepara.r' for instructions\n");
return()
}
if (class(data) == "list") {datalist<-data;}
else {datalist<-list(data=data);}
# now parse the LL parameters
para<-parsepara(strpara); #parse the parameter, return a list of named lists, see parsepara.r
if (is.null(para)) {
cat("Please supply the correct parameters; see 'parsepara.r' for instructions\n");
return()
}
strparalist<-paste(para$paralist, collapse=", "); cat("para=",strparalist,"\n"); llfun<-eval(parse(text = paste("function(par) {-sum(log(dll2x(data, ", strparalist, ")));}")));
initpara<-para$initpara; upperpara<-para$upper; lowerpara<-para$lower;
# set up things
fit <-list(); stats <-list();colors <-c('red', 'blue','black', 'green', 'brown', 'grey');
# now do the estimates

 counter<-1;
for (i in names(datalist)) {
data<-datalist[[i]]; data<-data[data>0]; data<-data[is.finite(data)];
mleFit<- optim(fn = eval(parse(text = paste("function(par) {-sum(log(dll2x(data, ", strparalist, ")));}"))), par = initpara, method = "L-BFGS-B", upper=upperpara, lower=lowerpara, hessian = T)
mleFit$initpara<-initpara;
mleFit$npara <-length(initpara); # only non identical upper/lower limites
mleFit$n <- length(data);
mleFit$BIC <- log1p(mleFit$n)*mleFit$npara + 2*mleFit$value; #using log1p to prevent n=0; shouldn't be an issue
mleFit$AIC <- 2*mleFit$npara + 2*mleFit$value;
fit[[i]]  <-mleFit;
stat[[i]] <-getDistr(data, binwidth=20);

counter<-counter+1;
}

if (T) {
# set up the pdf
plot(0,0, type='n', xlim=c(0,min(maxtime, 1000)), ylim=c(0,ylim),frame.plot=FALSE, xlab="Fixation Duration", ylab="Probability Density");
c<-1; time2<-seq(10, min(maxtime, 1000), by=1);  counter<-1;
for (i in names(fit)) {
lines(stat[[i]]$time, stat[[i]]$pdf, col=colors[c], lty='dashed');
#lines(time2-10, dll2x(time2, a, b1, a, b2, mixp, d, px), col=colors[c]); # shared L parameter
lines(time2-10, eval(parse(text = paste("dll2x(time2, ", gsub("par", "fit[[i]]$par", strparalist), ")"))), col=colors[c]); # paralist replaced with fit[[i]]$par
text(min(maxtime, 1000)*0.3, (ylim-c*0.0015), paste(c("parameters=", unlist(para$paralist)), collapse=", "), col=colors[c], pos=4, cex=0.8);
text(min(maxtime, 1000)*0.3, (ylim-c*0.0015-0.0005), paste(c(i, unlist(round(fit[[i]]$par, 2))), collapse=", "), col=colors[c], pos=4, cex=0.8);
text(min(maxtime, 1000)*0.3, (ylim-c*0.0015-0.0010), paste(c('n=', 'LL=', 'AIC=', 'BIC='), c(unlist(round(c(fit[[i]]$n, fit[[i]]$value, fit[[i]]$AIC, fit[[i]]$BIC), 2))), collapse=", "), col=colors[c], pos=4, cex=0.8);
c=c+1; counter<-counter+1;
}
c<-1; counter<-1;
#set up the haz; missing the eXpress saccade component
plot(0,0, type='n', xlim=c(0,min(maxtime, 1000)), ylim=c(0,ylim*2), frame.plot=FALSE, xlab="Fixation Duration", ylab="Hazard Rate");
for (i in names(fit)) {
lines(stat[[i]]$time, stat[[i]]$haz, col=colors[c], lty='dashed');
a<-fit[[i]]$par[1]; b1<-fit[[i]]$par[2]; b2<-fit[[i]]$par[3];
mixp<-fit[[i]]$par[4]; d<-fit[[i]]$par[5]; px<-fit[[i]]$par[6];
#lines(time2-10, hll2x(time2, a, b1, a, b2, mixp, d, px), col=colors[c]); # No Px parameter
lines(time2-10, eval(parse(text = paste("hll2x(time2, ", gsub("par", "fit[[i]]$par", strparalist), ")"))), col=colors[c]); # paralist replaced with fit[[i]]$par
c=c+1; counter<-counter+1;
}

}

return (mleFit)
}

1 comment:

  1. New version handles fixed parameters that contain R expressions. For example, suppose the MLE fitting function has an internal variable called "counter" and it is updated for each subgroup of data it is estimating. You want to specify a slight different fixed parameter for each group. The value is tied to the counter. You can now do:

    "@100+counter*4, ..."

    The new parser inserts the expression, and it is replaced with the counter's value at the time of eval().

    This allows a lot of fun.

    ReplyDelete