Error while converting csv to parquet file using pandas-CodePudding

I would like to upload csv as parquet file to S3 bucket. Below is the code snippet.

df = pd.read_csv('right_csv.csv')
csv_buffer = BytesIO()
df.to_parquet(csv_buffer, compression='gzip', engine='fastparquet')
csv_buffer.seek(0)

Above is giving me an error: TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO How to make it work?

CodePudding user response：

As per the documentation, when fastparquet is used as the engine, io.BytesIO cannot be used. auto or pyarrow engine must be used. Quoting from the documentation.

The engine fastparquet does not accept file-like objects.

Below code works without any issues.

import io
f = io.BytesIO()
df.to_parquet(f, compression='gzip', engine='pyarrow')
f.seek(0)

CodePudding user response：

As mentioned in the other answer, this is not supported. One work around would be to save as parquet to a NamedTemporaryFile. Then copy the content to a BytesIO buffer:


import tempfile

with tempfile.NamedTemporaryFile() as tmp:
    df.to_parquet(tmp.name, compression='gzip', engine='fastparquet')
    with open(tmp.name, 'rb') as fh:
        buf = io.BytesIO(fh.read())