Simulating baseline for Q learning research-CodePudding

Im trying to built a function for my Q learning research. It suppose to receive number of trails and repeat probability and simulate data of choosing between 2 actions (0,1) each step, according to the repeat probability. (baseline so no learning according to reward, only switching between actions according to given prob).

  sim.block = function(Ntrl, repeat_p){ 
  for (i in 2:Ntrl){
    action = sample(x = c(0,1), size = Ntrl, replace=T)
   last.action <- action[i-1] # the number in the previous step
  
  if(last.action==0){
    action[i] <- sample(x = c(0,1), size = 1, prob = c(repeat_p,1-repeat_p))
  } else {
    action[i] <- sample(x = c(0,1), size = 1, prob = c(1-repeat_p,repeat_p))
  }}
    return (data.frame(action))
    }

When i test the functions with extreme repeat probability i dont get what i expect. For example, when i insert sim.block(400,0.000000001) i expcect to get no repeat at all but its not the case. the same for 0.999999999, i expect to get only repeat but i get random list of 1s and 0s.

Where is the problem?

CodePudding user response：

There is just a small error regarding the creation of the sample. This should be outside of the loop, e.g.:

set.seed(1)
action = sample(x = c(0,1), size = Ntrl, replace=T)
for (i in 2:Ntrl){
  last.action <- action[i-1] # the number in the previous step
  #if (i == 4) break
  if(last.action==0){
    action[i] <- sample(x = c(0,1), size = 1, prob = c(repeat_p,1-repeat_p))
  } else {
    action[i] <- sample(x = c(0,1), size = 1, prob = c(1-repeat_p,repeat_p))
  }}

the result for your example of Ntrl=400 and repeat_p is than:

> action
  [1] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
 [73] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[145] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[217] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[289] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
[361] 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1