Home > Enterprise >  Inserting large dataset from CSV file with CI Model
Inserting large dataset from CSV file with CI Model

Time:01-09

I have a CSV file with 5mn rows. I am using CSV library from thephpleague.com. It uses generators so reads the file line by line. So reading does not cause any problem.

Yet, I want to save those to the DB using CI model because I need to make sure that data integrity will be okay. So I want to use model's validators, beforeInsert callbacks, etc. So question is not how to insert data to db, but how to insert data by processing with its Model.

Here is my code:

namespace App\Util;

use App\Models\TestModel;
use League\Csv\Reader;

class CSV
{
    public function readData(string $filePath, array $headers = [])
    {
      
        $filePath = ROOTPATH.'test.5mn.csv';

        $maxExecutionTime = ini_get("max_execution_time");
        ini_set('max_execution_time', 300);

        $reader = Reader::createFromPath($filePath, 'r');
        $reader->setHeaderOffset(0); //set the CSV header offset
        $reader->skipEmptyRecords();

        if (true === empty($headers)) {
            $headers = $reader->getHeader();
        }

        $records = $reader->getRecords($headers);

        $isGoodToGo = true;
    
        foreach ($records as $record) {
            if ($this->isEmptyWithNullValues($record)) {
                continue;
            }

            if ($this->isEmptyWithBlankString($record)) {
                continue;
            }

            $model = model(TestModel::class);
            $model->save([
                'item_type' => $record["Item Type"],
                'sales_channel' => $record["Sales Channel"],
                'total_profit' =>  $record["Total Profit"],
            ]);
            log_message('error', '#'.$model->getInsertID() . ' saved');
            $model = null;
        }

        ini_set('max_execution_time', $maxExecutionTime);

        return $isGoodToGo;
    }
    ....
} 

On controller I am calling this:

public function index()
{
    $test = new CSV();
    $test->readData('path', []);
    dump('EXIT');
} 

But I gives php memory error after a certain number of records. In dev environment, it can save only 123456 records, after throws memory error. In production, it saves 253321 records then throws memory error.

There is an overhead in somewhere I presume, yet I am not an expert. In model, or in connection, or in anywhere, is there a method which I can flush the memory.. I am thinking CI keeping some data even in production environment. Any idea would be very helpful. Thank you..

For reference; error message:

CRITICAL - 2022-01-06 11:48:09 --> Allowed memory size of 134217728 bytes exhausted (tried to allocate 67108872 bytes)
#0 [internal function]: CodeIgniter\Debug\Exceptions->shutdownHandler()
#1 {main}

CodePudding user response:

How big is your csv file in MB?

Reason is, if your memory limit is 128mb and your file is 100mb (which I presume from your filename you have some millions of records there) you already consumed most of your memory only by loading the csv file.

I believe there isn't much to do with the code except setting memory limit to -1 and cross your fingers. It may work or crash again.

My suggestion and what would I do?

That amount of records should not be inserted to DB in one go anyway. Doing job in batches is the best practice. When you have 30mn or 100mn records, batches will still work but not this one.

CodePudding user response:

I got an answer in CI forum. CI Project Lead mentioned that:

"I have had to do this in the past. The way I handled it there was to use the LOAD DATA command to load the CSV into a new, temporary table. Then would loop over the temp table doing validation and normalization of the data before inserting it into the correct table(s). And, yes, chunk() is a perfect use case for that once you have it in the temp table."

Posting it here for future reference. Thanks.

  •  Tags:  
  • Related