How to avoid memory mapping when loading a numpy file-CodePudding

Csv file:

0,0,0,0,0,0,0,0,0,0.32,0.21,0,0.16,0,0,0,0,0,0,0.32
0,0,0,0,0,0,0.17,0,0.04,0,0,0.25,0.03,0.32,0,0.02,0.05,0.03,0.08,0
0.08,0.07,0.09,0.06,0,0,0.21,0.02,0,0,0,0,0,0,0,0.1,0.36,0,0,0
[goes on always 20 columns and x number of rows]

I'm saving the array this way:

with open(csv_profile) as csv_file:
    array = np.loadtxt(csv_file, delimiter=",",dtype='str')
npy_profile=open(outfile, "wb")
np.save(npy_profile, array)

Which is saved as u4 instead of f8 which is what I need.

I noticed this error in the datatype as the output file says

<93>NUMPY^A^@v^@{'descr': '<U4', 'fortran_order': False, 'shape': (680, 20), }

Also when I load it:

profile_matrix=np.load(npy_profile,"r")

the class type is numpy.memmap instead of numpy.ndarray. How can I avoid this issue?

Both saving it in the correct format and loading it in the correct format.

CodePudding user response：

Looking into the manual we can see that the second parameter of numpy.load is called mmap_mode and is set to "r" in your code. This enables memory mapping the file:

A memory-mapped array is kept on disk. However, it can be accessed and sliced like any ndarray. Memory mapping is especially useful for accessing small fragments of large files without reading the entire file into memory.

Memory mapping is normally not an "issue" as you called it, but a feature that enables faster file access and saves memory for large files. When doing memory mapped I/O, your operating system maps parts of the file into the RAM address space of your program. That way the data has not to be copied into RAM. Any changes that are made to the memory mapped numpy array are directly reflected in the file. Because you specified read only access, you probably cannot change values in the array.

If you want to disable memory mapping, you could remove the second argument "r" from the call to numpy.load, which leads to a fresh copy of the array in RAM, that you can modify without affecting the file.

CodePudding user response：

While the answer from Jakob Stark explains what the additional "r" argument to np.load() does, let me just suggest a simpler and safer usage. To save and load NumPy arrays in the straight-forward way (no memory mapping, etc.), use the most straight-forward syntax:

np.save('filename.npy', array)
array2 = np.load('filename.npy')

You don't have to specify the dtype or anything, it just does the simplest possible thing, as you are expecting. Also, not manually opening the file prior to calling np.save() means that you do not have to worry about closing it again (these acts should generally be written inside a try/except block, which further adds to the complexity).