I am using keras.layers.Normalization for preprocessing a csv dataset returned from make_csv_dataset. The execution freezes at adapt(ds) call. No output for error, it just executes adapt forever. I have tried using pandas for normalization, it completed in seconds.
System info:
- tensorflow 2.7.0
- cuda 11.0
- 3080ti mobile
- i9-10980HK CPU @ 2.40GHz, 3096 Mhz, 8 Core(s), 16 Logical Processor(s) OS Name Microsoft
- Windows 11 Home
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
features = ["sepal-length", "sepal-width", "pedal-length", "pedal-width"]
label = ["class"]
class_names = ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
def get_data():
columns = features label
fpath = keras.utils.get_file("iris.csv", origin=url)
ds = tf.data.experimental.make_csv_dataset(fpath, header=False, label_name=label[0],column_names=features label, batch_size=10, shuffle=True, ignore_errors=True)
return ds
ds = get_data()
ds_features = ds.map(lambda x, y: tf.stack([x.pop(feature) for feature in features], axis=-1))
norm = keras.layers.Normalization(axis=-1)
norm.adapt(ds_features)
print("adapt completed")
CodePudding user response:
You have to set the parameter to num_epochs to 1 in make_csv_dataset, since the default value is None and it causes an infinite loop as stated in the docs:
An int specifying the number of times this dataset is repeated. If None, cycles through the dataset forever.
Working example:
import tensorflow as tf
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
features = ["sepal-length", "sepal-width", "pedal-length", "pedal-width"]
label = ["class"]
class_names = ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
def get_data():
columns = features label
fpath = tf.keras.utils.get_file("iris.csv", origin=url)
ds = tf.data.experimental.make_csv_dataset(fpath, header=False, label_name=label[0],column_names=features label, num_epochs=1, batch_size=10, shuffle=True, ignore_errors=True)
return ds
ds = get_data()
ds_feature = ds.map(lambda x, y: tf.stack([x.pop(feature) for feature in features], axis=-1))
norm = tf.keras.layers.Normalization(axis=-1)
norm.adapt(ds_feature)
print("adapt completed")
adapt completed
