Home > Enterprise >  LINQ Lazy Inner Join
LINQ Lazy Inner Join

Time:02-02

Is there a way to make use of LINQ API to (inner) join collections in a lazy fashion? I face a situation where I need to correlate multiple CSVs in cascade and would like to avoid unnecessary parsing operations if "upstream" CSVs turn out to be empty:

ReadCsvRows(fileA).Join(
    ReadCsvRows(fileB), // why to parse B if A is empty?!
    ...
).Join(
    ReadCsvRows(fileC),
    ...
).Join(...)

Note that ReadCsvRows method lies behind an interface, only requirement of which is that method has to return IEnumerable (as depicted here). To keep things "streamy", I could get around the problem by introducing custom extension method

public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
    this IEnumerable<TOuter> outer, 
    Func<IEnumerable<TInner>> innerFunc, 
    Func<TOuter, TKey> outerKeySelector, 
    Func<TInner, TKey> innerKeySelector, 
    Func<TOuter, TInner, TResult> resultSelector) =>
        !outer.Any() ?
        Array.Empty<TResult>() :
        outer.Join(
            innerFunc(),
            outerKeySelector,
            innerKeySelector,
            resultSelector
        );

but was wondering what options do I have with vanilla LINQ API.

CodePudding user response:

If I were you, I would split ReadCsvRows(fileB) and ReadCsvRows(fileC) into two seperate variables. And add an "If" condition to determine if "ReadCSVRows(fileB)" should be executed or not.

Something like this:

List<FileATypeA> fileAData = ReadCsvRows(fileA);
List<FileATypeB> fileBData = new List<FileATypeB>();
List<FileATypeC> fileCData =  new List<FileTypeC>();

if(fileAData.Count() > 0)
    fileBData = ReadCsvRows(fileB);
if(fileBData.Count() > 0)
    fileCData = ReadCsvRows(fileC);

List<FinalType> final = from a in fileAData
                        join b in fileBData
                             on a.Key = b.Key
                        join c in fileCData
                             on b.Key = c.Key

See if this helps

CodePudding user response:

You do not need own Join implementation. Standard System.Linq implementation do not enumerate inner sequence if outer has no records.

So, check your ReadCsvRows implementation. If you use yield in it's body, no unwanted reads will appear.

Schematically:

public static IEnumerable<Row> ReadCsvRows(string fileName)
{
   using var reader = new CsvReader(fileName);
   while (reader.ReadNext())
   {
      yield return reader.CurrentRow;
   }
}

In this case even new CsvReader(fileName) will be NOT called if first sequence has no rows.

  •  Tags:  
  • Related