When I try to read CSV from sftp server with read_csv() there is no problem at all. But when I try to read xlsx with read_excel() it never finishes reading file. I tried downloading from sftp and reading xlsx. It works well. But I didn't understand why I can't read from sftp server. Is there any solutions?
t = paramiko.Transport(('ip', port))
t.connect(username='user', password='pass')
sftp = paramiko.SFTPClient.from_transport(t)
fileattr = sftp.lstat(remote_file)
try:
if stat.S_ISREG(fileattr.st_mode):
if '.xlsx' in remote_file:
with sftp.open(remote_file) as f:
f.prefetch()
df = pd.read_excel(f,dtype=str,engine='openpyxl',sheet_name=sheet)
t.close()
print(df)
except Exception as e:
t.close()
print(e)
CodePudding user response:
Use reas_excel(f.read()) instead of just read_excel(f). read_excel() only reads objects with a read() method or file paths.
For the problem of more columns than expected there are 3 options:
- Modify your excel file so that you do not have absolutely nothing in the blanck cells, not even any kind format. And even though, this method is unpredictable.
- Read the excel into a dataframe and then filter the columns that you need. Or drop the empty columns like
df.dropna(axis='columns', inplace=True) - Use the
usecolsparameter fromread_excellikeread_excel(f.read(), usecols="A,C,E:F", ...). See documentation
I would recommend option 3.
