Home > Back-end >  Tensorflow: Is it normal that my GPU is using all its Memory but is not under full load?
Tensorflow: Is it normal that my GPU is using all its Memory but is not under full load?

Time:01-21

I am currently trying to run a text-based sequence to sequence model using tensorflow 2.6 and CuDNN.

The code is running, but taking suspiciously long. When I check my Task Manager, I see the following:

Task Manager Screenshot

This looks weird to me, because all memory is taking but it's not under heavy load. Is this expected behaviour?

System:

  • Windows 10
  • Python 3.9.9
  • Tensorflow & Keras 2.6
  • CUDA 11.6
  • CuDNN 8.3
  • NVidia RTX 3080ti

In the code I found the following settings for the GPU

def get_gpu_config():
  gconfig = tf.compat.v1.ConfigProto()
  gconfig.gpu_options.per_process_gpu_memory_fraction = 0.975 # Don't take 100% of the memory
  gconfig.allow_soft_placement = True # Does not aggressively take all the GPU memory
  gconfig.gpu_options.allow_growth = True # Take more memory when necessary
  return gconfig

My python output tells me it found my graphics card:

Python Console

And it is also visible in my nvidia-smi output: nvidia-smi output

Am I maybe missing a configuration? The times it takes are similar to what I got on a CPU system, which seems off to me.

Sidenote:

The Code I try to run had to be migrated from tensorflow-gpu 1.12, but that went "relatively" smooth.

CodePudding user response:

Yes this behaviour is normal for TensorFlow!

From the TensorFlow docs

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation. To limit TensorFlow to a specific set of GPUs, use the tf.config.set_visible_devices method.


If you don't want TensorFlow to allocate the totality of your VRAM, you can either set a hard limit on how much memory to use or tell TensorFlow to only allocate as much memory as needed.

To set a hard limit

Configure a virtual GPU device as follows:

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Only use as much as needed

  • You can set the environment variable TF_FORCE_GPU_ALLOW_GROWTH=true

OR

  • Use tf.config.experimental.set_memory_growth as follows:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

All code and information here is taken from https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth

  •  Tags:  
  • Related