How could I adjust this code to have the function loop through the list models_2? If I have the function use models it works, if I change to `models_2' it give me this error:
AttributeError: 'float' object has no attribute 'seek'
This is my dataframe, from an excel with all cell format set to "text".
MOD1 MOD2 MOD3 MOD4
0 File1.pdf File3.pdf File1.pdf File3.pdf
1 File2.pdf NaN File2.pdf File3.pdf
2 File3.pdf NaN NaN NaN
models = ['MOD1']
models_2 = ['MOD1', 'MOD2']
def merge_pdf(models):
merger = PdfFileMerger()
for name in models:
for index, row in df.iterrows():
merger.append(row[name])
merger.write(f"Order #XXXXXXX ({name}) Production Package - Rev.0.pdf")
merger.close()
merge_pdf(models)
The full error message:
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [_reader.py:1065]
Traceback (most recent call last):
File "Z:\PyCharm\Excel_Reader\Excel_Reader.py", line 30, in <module>
merge_pdf(models)
File "Z:\PyCharm\Excel_Reader\Excel_Reader.py", line 27, in merge_pdf
merger.append(row[name])
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\merger.py", line 227, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\merger.py", line 149, in merge
pdfr = PdfFileReader(
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\_reader.py", line 239, in __init__
self.read(stream)
File "C:\Users\x\AppData\Local\Programs\Python\Python39\lib\site-packages\PyPDF2\_reader.py", line 911, in read
stream.seek(-1, 2)
AttributeError: 'float' object has no attribute 'seek'
CodePudding user response:
Your code is failing because the column 'MOD2' contains NaN values, which are of type float. The way you handle this depends on what you want to do with those NaN values.
You can verify that by running the following code:
import pandas as pd
import numpy as np
data = {
'MOD1':['File1.pdf', 'File2.pdf', 'File3.pdf'],
'MOD2':['File1.pdf', np.nan, np.nan],
'MOD3':['File1.pdf', 'File2.pdf', np.nan],
'MOD4':['File1.pdf', 'File2.pdf', np.nan]
}
df = pd.DataFrame(data)
models = ['MOD1']
models_2 = ['MOD1', 'MOD2']
merger = []
for name in models_2:
for index, row in df.iterrows():
print(name, index, row[name], type(row[name]))
This will print the following:
MOD1 0 File1.pdf <class 'str'>
MOD1 1 File2.pdf <class 'str'>
MOD1 2 File3.pdf <class 'str'>
MOD2 0 File1.pdf <class 'str'>
MOD2 1 nan <class 'float'>
MOD2 2 nan <class 'float'>
If you know you only want to include the cells with string values, you can add a type check prior to appending it to your merger object, like so:
models = ['MOD1']
models_2 = ['MOD1', 'MOD2']
def merge_pdf(models):
merger = PdfFileMerger()
for name in models:
for index, row in df.iterrows():
if type(row[name]) == str:
merger.append(row[name])
merger.write(f"Order #XXXXXXX ({name}) Production Package - Rev.0.pdf")
merger.close()
merge_pdf(models)
