Invalid Argument Error when using Tensorboard callback-CodePudding

I have used a Tensorboard callback in fitting a model consisting of one embedding layer and one SimpleRNN layer. The model performs binary sentiment classification for 9600 input text sequences. They have been tokenised and padded in advance.

# 1. Remove previous logs
!rm -rf ./logs/
# 2. Change to Py_file_dir
os.chdir(...)

# input_dim = 43489 (size of tokenizer word dictionary); output_dim = 100 (GloVe 100d embeddings); input_length = 1403 (length of longest text sequence).
# xtr_pad is padded, tokenised text sequences. nrow = 9600, ncol = input_length = 1403. 

model= Sequential()
model.add(Embedding(input_dim, output_dim, input_length= input_length, 
                       weights= [Embedding_matrix], trainable= False))
model.add(SimpleRNN(200))
model.add(Dense(1, activation= 'sigmoid'))
model.compile(loss='binary_crossentropy', optimizer= 'adam', metrics=['accuracy'])
tb = TensorBoard(histogram_freq=1, log_dir= 'tbcallback_prac')
tr_results= model.fit(xtr_pad, ytr, epochs= 2, batch_size= 64, verbose= 1, 
                      validation_split= 0.2, callbacks= [tb])

# In command prompt enter: tensorboard --logdir tbcallback_prac

I have run this on Jupyterlab and on the first time the model trains without issue. I was able to view the Tensorboard statistics on local host.
However when I run this same code a second time, i.e. removing logs and fitting model it completed the first epoch of training, but returns this error before the 2nd epoch begins.

Train on 7680 samples, validate on 1920 samples

Epoch 1/2
7680/7680 [==============================] - ETA: 0s - loss: 0.2919 - accuracy: 0.9004
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-12-a1cde9b5b1f4> in <module>()
      7 tb = TensorBoard(histogram_freq=1, log_dir= 'tbcallback_prac')
      8 tr_results= model.fit(xtr_pad, ytr, epochs= 2, batch_size= 64, verbose= 1, 
----> 9                       validation_split= 0.2, callbacks= [tb])

...
InvalidArgumentError: You must feed a value for placeholder tensor 'embedding_input' with dtype float and shape [?,1403]
     [[{{node embedding_input}}]]

Note 1403 is the length of all padded, tokenised sequences in training input 'xtr'.

Thanks in advance for any help!

CodePudding user response：

I have no issue but I think that is a dimensions problem when working on logtis and sigmoid

 Layer (type)                Output Shape              Param #
=================================================================
 embedding (Embedding)       (None, 3072, 64)          64000

 simple_rnn (SimpleRNN)      (None, 200)               53000

 dense (Dense)               (None, 1)                 201

=================================================================
Total params: 117,201
Trainable params: 117,201
Non-trainable params: 0
_________________________________________________________________
val_dir: F:\models\checkpoint\ale_highscores_3\validation
Epoch 1/1500
2/2 [==============================] - ETA: 0s - loss: -0.5579 - accuracy: 0.1000[<KerasTensor: shape=(None, 3072) dtype=float32 (created by layer 'embedding_input')>]
<keras.engine.functional.Functional object at 0x00000233003A8550>
Press AnyKey!
2/2 [==============================] - 14s 7s/step - loss: -0.5579 - accuracy: 0.1000 - val_loss: -0.6446 - val_accuracy: 0.1000
Epoch 2/1500
2/2 [==============================] - ETA: 0s - loss: -0.6588 - accuracy: 0.1000[<KerasTensor: shape=(None, 3072) dtype=float32 (created by layer 'embedding_input')>]
<keras.engine.functional.Functional object at 0x00000233003A8C40>
Press AnyKey!
2/2 [==============================] - 13s 7s/step - loss: -0.6588 - accuracy: 0.1000 - val_loss: -0.7242 - val_accuracy: 0.1000
Epoch 3/1500
1/2 [==============>...............] - ETA: 6s - loss: -0.1867 - accuracy: 0.1429

CodePudding user response：

I have managed to get around this in Jupyter Notebook. I don't think it is a problem of dimensions. Because the error only arises the second time I run the model fitting code. When I restarted the kernel it again ran without issue for once.

Previously I avoided Jupyter Notebook because when it is launched from Anaconda the command prompt will pop out and remain running the Notebook job. (I'm curious whether others have this problem). Therefore I can pass but cannot execute the command tensorboard --logdir tbcallback_prac in the command prompt to obtain the local host address to view the TB results.

But there is a way:

pass ctrl z in the command prompt to stop the current JupyterNotebook job.
Pass tensorboard --logdir tbcallback_prac to view TB results.
Pass ctrl c to exit TB in command prompt.
Resume job by passing bg in command prompt.
Rerun the model fitting code and I do not see the Invalid Argument Error on the 2nd epoch of training.

It seems that when the kernel is paused in command prompt by ctrl z or simply restarted, the Invalid Argument Error does not occur any more.