Home > Blockchain >  Does the OLEDB source task pass all the rows to the OLE DB Command task, and the OLE DB Command task
Does the OLEDB source task pass all the rows to the OLE DB Command task, and the OLE DB Command task

Time:02-01

I have a data flow task with 2 components:

  1. OLEDB data source tasks that are a SELECT query:
SELECT ACCOUNTID
FROM JOBS
WHERE STATUS=3
  1. OLE DB Command task that does:
DELETE FROM ACCOUNT
WHERE ACCOUNTID=?

The logic is that any job with status id 3 has to result in deletion of the accountid from the account table.

I know that when step 1 returns multiple records, step 2 performs slow because it is an operation that happens on each record. Where as if I had staged the data from step 1 in a separate table and then in an execute sql task fired delete based on the staged table, then it would have been faster.

Since the number of rows returns will always be small (under 20), I am using the OLE DB Command task approach. My question is -

  1. Does the OLEDB source task pass each row into the OLE DB Command task, there by resulting the the OLEDB Command task processing one record at a time?

Or Does the OLEDB source task pass all the rows to the OLE DB Command task, and the OLE DB Command task processes one record at a time?

  1. Once rows are extracted by the OLEDB source task, then where are they held before passing into the OLE DB Command task?

  2. Until the completion of the OLE DB Command task, are the rows produced by the OLEDB Source task locked?

CodePudding user response:

1st question

Does the OLEDB source task pass each row into the OLE DB Command task, thereby resulting in the OLEDB Command task processing one record at a time? Or Does the OLEDB source task pass all the rows to the OLE DB Command task, and the OLE DB Command task processes one record at a time?

The OLE DB source extracts data from the SQL database and stores them within a memory buffer. There are several factors that affect the way that data is loaded into SSIS:

  1. The extracted data rows count
  2. The amount of data stored in each row
  3. The DefaultBufferMaxRows and DefaultBufferSize Data Flow Task's enter image description here

    On the other hand, the OLE DB Command process the data row-by-row as enter image description here

    Besides, while processing data, the OLE DB command does not receive any new data from the OLE DB Source. It is not like the Sort Transformation. It only receives one buffer, processes it, and sends it via output instead of storing all data in memory and processing them.

    The following screenshot shows how an OLE DB component does not receive a new data buffer before that it ends processing the current one (I am using an update statement over a huge table without indexes in this example):

    enter image description here


    Helpful resources

    CodePudding user response:

    The OLE DB source task is a non-blocking object does not retain data either partially (semi-blocking component) or totally (blocking component). They are therefore more efficient in terms of execution time.

    Database snapshots was introduced at page level if you are using SQL Server. It might be that before the SQL query started insertion it used to take snapshot of the entire source query and then start insertion into destination database. Every update/insert transaction on the source server potentially bears the overhead of the page copy operation to maintain the snapshot particularly in SSMS 2017.

    Bad habits : adding NOLOCK to the source.

    You can check this blog : https://blog.sqlauthority.com/2015/07/03/sql-server-difference-between-read-committed-snapshot-and-snapshot-isolation-level/

    CodePudding user response:

    C# Console application (written in Core 3.1 but should apply to Framework 4.8 ) to run your request asynchronously. This cannot be put into a script task in SSIS. This is not too different than your original SSIS package (i.e. load accounts to delete then process the deletes). However, the main difference is that all deletes happen at the same time and not one at a time.

    You also might consider a "select distinct accountID" if appropriate. No need to process the same accountID twice.

        public static async Task Main()
        {
            List<int> accts = await getAccts(3);
            await Task.WhenAll(accts.Select(async a => await deleteAccts(a)));
        }
    
        public static async Task deleteAccts(int acct)
        {
            string sql = "DELETE FROM ACCOUNT WHERE ACCOUNTID=@acct";
    
            using (var conn = new SqlConnection("Insert your connection string"))
            {
                using (var cmd = new SqlCommand(sql, conn))
                {
                    cmd.Parameters.Add("@acct", SqlDbType.Int).Value = acct;
                    conn.Open();
                    await cmd.ExecuteNonQueryAsync();
                }
            }
        }
    
        public static async Task<List<int>> getAccts(int Status)
        {
            List<int> accts = new List<int>();
            
            string sql = "SELECT ACCOUNTID FROM JOBS WHERE STATUS=@Status";
    
            using (var conn = new SqlConnection("Insert your connection string"))
            {
                using (var cmd = new SqlCommand(sql,conn))
                {
                    cmd.Parameters.Add("@Status", SqlDbType.Int).Value = Status;
                    conn.Open();
                    using (var rdr = await cmd.ExecuteReaderAsync())
                    {
                        while (await rdr.ReadAsync())
                        {
                            accts.Add(rdr.GetInt32(0));
                        }
                    }
                }
            }
    
            return accts;
        }
    

    CodePudding user response:

    I will leave the C# answer for reference but it seems like there would be no need to stage or use any code. You can do this in a single SQL statement.

    Delete from ACCOUNT
      join jobs on jobs.AccountID = Account.AccountID and jobs.status=3
    

    The only restriction is these tables are on the same server.

  •  Tags:  
  • Related