I'm trying to get spreadsheet data from zipped .xlsx files. I'm using rubyzip to access the contents of the zipfile
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
*process entry*
end
end
My problem is that rubyzip gives a Zip::Entry object, which, I cant get to work with gems like roo or creek.
I've done something similar, but with .csv file. This was as simple as CSV.parse(entry.get_input_stream.read). However, that just gives me a string of encoded gibberish when using it on an .xlsx file.
I've looked around and the closest answer I got was temporarily extracting the files, but I want to avoid doing this since the files can get pretty large.
Does anyone have any suggestions? Thanks in advance.
CodePudding user response:
So what you need to do is convert the stream into an IO object that Roo can understand.
To determine if the object passed to Roo::Spreadsheet.open is a "stream" Roo uses the following method:
def is_stream?(filename_or_stream)
filename_or_stream.respond_to?(:seek)
end
Since a Zip::InputStream does not respond to seek you cannot use this object directly. To get around this we simply need an object that does respond to seek (like a StringIO)
We can just read the input stream into the StringIO directly:
stream = StringIO.new(entry.get_input_stream.read)
Or the Zip library also provides a method to copy a Zip::InputStream to another IO object through the IOExtras module, which I think reads fairly nicely as well.
Knowing all of the above we can implement as follows:
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
# make sure Roo can handle the file (at least based on the extension)
ext = File.extname(entry.name)&.to_sym
next unless Roo::CLASS_FOR_EXTENSION[ext]
# stream = StringIO.new(entry.get_input_stream.read)
::Zip::IOExtras.copy_stream(stream = StringIO.new, entry.get_input_stream)
spreadsheet = Roo::Spreadsheet.open(stream, extension: ext)
# process file
end
end
