Home > Enterprise >  Write a function that reads a file and returns all of the numbers in it as a list of floats
Write a function that reads a file and returns all of the numbers in it as a list of floats

Time:01-18

I have a large text file containing many thousand lines but a short example that covers the same idea is:

vapor dust -2C pb 
x14 71 hello! 42.42
100,000 lover baby: -2

there is a mixture of integers, alphanumerics, and floats.

ATTEMPT AT SOLN. Ive done this to create a single list composed of strings, but I am unable to isolate each cell based on if its numeric or alphanumeric

with open ('file.txt','r') as f:
data = f.read().split()
#dirty = [ x for x in data if x.isnumeric()]
print(data)

The line #dirty fails.

I have had luck constructing a list-of-lists containing almost all required values using the code as follows:

with open ('benzene_SDS.txt','r') as f:  
    for word in f:
        data= word.split()
        clean = [ x for x in data if x.isnumeric()]            
        res = list(set(data).difference(clean))
        print(clean)

But It doesnt return a single list, it a list of lists, most of which are blank [].

There was a hint given, that using the "try" control statement is useful in solving the problem but I dont see how to utilize it.

Any help would be greatly appreciated! Thanks.

CodePudding user response:

If you're mainly asking how one would use try to check for validity, this is what you're after:

values = []
with open ('benzene_SDS.txt','r') as f:  
    for word in f.read().split():
        try:
            values.append(float(word))
        except ValueError:
            pass
print(values)

Output:

[71.0, 42.42, -2.0]

However, not that this does not parse '100,000' as either 100 or 100000.

This code would do that:

import locale

locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

values = []
with open('benzene_SDS.txt', 'r') as f:
    for word in f.read().split():
        try:
            values.append(locale.atof(word))
        except ValueError:
            pass

print(values)

Result:

[71.0, 42.42, 100000.0, -2.0]

Note that running the same code with this:

locale.setlocale(locale.LC_ALL, 'nl_NL.UTF-8')

Yields a different result:

[71.0, 4242.0, 100.0, -2.0]

Since the Netherlands use , as a decimal separator and . as a thousands separator (which basically just gets ignored in 42.42)

CodePudding user response:

numbers = []
with open('file.txt','r') as f:
    for line in f.read():
        words = line.split()
        numbers.extend([word for word in words if word.isnumeric()])

# Print all numbers
print(numbers)

# Print all unique numbers
print(set(numbers))

# Print all unique numbers, converted to floats
print([float(n) for n in set(numbers)])

If you specifically need a list then you can wrap the set with list().

  •  Tags:  
  • Related