Im trying to built a function for my Q learning research. It suppose to receive number of trails and repeat probability and simulate data of choosing between 2 actions (0,1) each step, according to the repeat probability. (baseline so no learning according to reward, only switching between actions according to given prob).
sim.block = function(Ntrl, repeat_p){
for (i in 2:Ntrl){
action = sample(x = c(0,1), size = Ntrl, replace=T)
last.action <- action[i-1] # the number in the previous step
if(last.action==0){
action[i] <- sample(x = c(0,1), size = 1, prob = c(repeat_p,1-repeat_p))
} else {
action[i] <- sample(x = c(0,1), size = 1, prob = c(1-repeat_p,repeat_p))
}}
return (data.frame(action))
}
When i test the functions with extreme repeat probability i dont get what i expect.
For example, when i insert sim.block(400,0.000000001) i expcect to get no repeat at all but its not the case. the same for 0.999999999, i expect to get only repeat but i get random list of 1s and 0s.
Where is the problem?
CodePudding user response:
There is just a small error regarding the creation of the sample. This should be outside of the loop, e.g.:
set.seed(1)
action = sample(x = c(0,1), size = Ntrl, replace=T)
for (i in 2:Ntrl){
last.action <- action[i-1] # the number in the previous step
#if (i == 4) break
if(last.action==0){
action[i] <- sample(x = c(0,1), size = 1, prob = c(repeat_p,1-repeat_p))
} else {
action[i] <- sample(x = c(0,1), size = 1, prob = c(1-repeat_p,repeat_p))
}}
the result for your example of Ntrl=400 and repeat_p is than:
> action
[1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[73] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[145] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[217] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[289] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[361] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
