I have a large text file containing many thousand lines but a short example that covers the same idea is:
vapor dust -2C pb
x14 71 hello! 42.42
100,000 lover baby: -2
there is a mixture of integers, alphanumerics, and floats.
ATTEMPT AT SOLN. Ive done this to create a single list composed of strings, but I am unable to isolate each cell based on if its numeric or alphanumeric
with open ('file.txt','r') as f:
data = f.read().split()
#dirty = [ x for x in data if x.isnumeric()]
print(data)
The line #dirty fails.
I have had luck constructing a list-of-lists containing almost all required values using the code as follows:
with open ('benzene_SDS.txt','r') as f:
for word in f:
data= word.split()
clean = [ x for x in data if x.isnumeric()]
res = list(set(data).difference(clean))
print(clean)
But It doesnt return a single list, it a list of lists, most of which are blank [].
There was a hint given, that using the "try" control statement is useful in solving the problem but I dont see how to utilize it.
Any help would be greatly appreciated! Thanks.
CodePudding user response:
If you're mainly asking how one would use try to check for validity, this is what you're after:
values = []
with open ('benzene_SDS.txt','r') as f:
for word in f.read().split():
try:
values.append(float(word))
except ValueError:
pass
print(values)
Output:
[71.0, 42.42, -2.0]
However, not that this does not parse '100,000' as either 100 or 100000.
This code would do that:
import locale
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
values = []
with open('benzene_SDS.txt', 'r') as f:
for word in f.read().split():
try:
values.append(locale.atof(word))
except ValueError:
pass
print(values)
Result:
[71.0, 42.42, 100000.0, -2.0]
Note that running the same code with this:
locale.setlocale(locale.LC_ALL, 'nl_NL.UTF-8')
Yields a different result:
[71.0, 4242.0, 100.0, -2.0]
Since the Netherlands use , as a decimal separator and . as a thousands separator (which basically just gets ignored in 42.42)
CodePudding user response:
numbers = []
with open('file.txt','r') as f:
for line in f.read():
words = line.split()
numbers.extend([word for word in words if word.isnumeric()])
# Print all numbers
print(numbers)
# Print all unique numbers
print(set(numbers))
# Print all unique numbers, converted to floats
print([float(n) for n in set(numbers)])
If you specifically need a list then you can wrap the set with list().
