I am trying to read multiple .mat files in python. Every time I get the error. This is my code:
folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
directs = sorted(listdir(folder))
labels = []
for file in directs:
f = h5py.File(folder file,'r')
label = np.array(f.get("cjdata/label"))[0][0]
labels.append(label)
labels = pd.Series(labels)
labels.shape
The error I am getting is:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-11-e7d73f54f73d> in <module>
3 labels = []
4 for file in directs:
----> 5 f = h5py.File(folder file,'r')
6 label = np.array(f.get("cjdata/label"))[0][0]
7 labels.append(label)
~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
404 with phil:
405 fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
--> 406 fid = make_fid(name, mode, userblock_size,
407 fapl, fcpl=make_fcpl(track_order=track_order),
408 swmr=swmr)
~\miniconda3\envs\tensorflow\lib\site-packages\h5py\_hl\files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
171 if swmr and swmr_support:
172 flags |= h5f.ACC_SWMR_READ
--> 173 fid = h5f.open(name, flags, fapl=fapl)
174 elif mode == 'r ':
175 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\_objects.pyx in h5py._objects.with_phil.wrapper()
h5py\h5f.pyx in h5py.h5f.open()
OSError: Unable to open file (file signature not found)
I have 5849 mat files. Can anyone tell me where I am going wrong?
I used h5py to read mat files. I wanted to read the labels and images in each .mat files.
CodePudding user response:
I believe the issue is in concatenating folder file.
2 things about that:
- The word
fileis a python keyword, so you shouldn't use it as a variable name. - Assuming you used
os.listdirhere (you didn't attach the import itself), your concatenation of folder and file is missing a slash.
A fix for that (after I renamed file to filename):
full_file_path = os.path.join(folder, filename)
f = h5py.File(full_file_path,'r')
CodePudding user response:
I here are 4 areas where the code could be improved:
- I prefer
glob.iglob()method to get a list of files. It can use a wildcard to define the filenames, and is a generator. That way you don't have to create a list with 5849 mat filenames. - You open the file with
h5py.File(), but don't close it. That probably won't cause a problem, but is bad practice. It's better to use Python'swith/as:context manager. (If you don't do that, addf.close()inside the loop). - You are using the dataset
.get()method to retrieve the dataset object. That method has been deprecated for quite some time. Documented practice is to reference the dataset name like thisf["cjdata/label"] - Also, you added
[0][0]after the dataset object. Are you sure you want to do that? They are indices that will access the dataset value at index=[0][0]. If you want to create a numpy array of the dataset values, use label = f["cjdata/label"][()]
Modified code that demonstrates all of these changes below:
folder = "C:/Users/Sreeraj/Desktop/Me/PhD/Mahindra/brain_tumor_dataset/data/"
file_wc = folder "*.mat" # assumes filename extension is .mat
labels = []
for fname in glob.iglob(file_wc):
with h5py.File(fname,'r') as f:
# dataset .get() method deprecated, line below updated appropriately:
label = np.array(f["cjdata/label"][0][0])
#or maybe just:
label = f["cjdata/label"][()]
labels.append(label)
labels = pd.Series(labels)
labels.shape
