Is there a way to make use of LINQ API to (inner) join collections in a lazy fashion? I face a situation where I need to correlate multiple CSVs in cascade and would like to avoid unnecessary parsing operations if "upstream" CSVs turn out to be empty:
ReadCsvRows(fileA).Join(
ReadCsvRows(fileB), // why to parse B if A is empty?!
...
).Join(
ReadCsvRows(fileC),
...
).Join(...)
Note that ReadCsvRows method lies behind an interface, only requirement of which is that method has to return IEnumerable (as depicted here). To keep things "streamy", I could get around the problem by introducing custom extension method
public static IEnumerable<TResult> Join<TOuter, TInner, TKey, TResult>(
this IEnumerable<TOuter> outer,
Func<IEnumerable<TInner>> innerFunc,
Func<TOuter, TKey> outerKeySelector,
Func<TInner, TKey> innerKeySelector,
Func<TOuter, TInner, TResult> resultSelector) =>
!outer.Any() ?
Array.Empty<TResult>() :
outer.Join(
innerFunc(),
outerKeySelector,
innerKeySelector,
resultSelector
);
but was wondering what options do I have with vanilla LINQ API.
CodePudding user response:
If I were you, I would split ReadCsvRows(fileB) and ReadCsvRows(fileC) into two seperate variables. And add an "If" condition to determine if "ReadCSVRows(fileB)" should be executed or not.
Something like this:
List<FileATypeA> fileAData = ReadCsvRows(fileA);
List<FileATypeB> fileBData = new List<FileATypeB>();
List<FileATypeC> fileCData = new List<FileTypeC>();
if(fileAData.Count() > 0)
fileBData = ReadCsvRows(fileB);
if(fileBData.Count() > 0)
fileCData = ReadCsvRows(fileC);
List<FinalType> final = from a in fileAData
join b in fileBData
on a.Key = b.Key
join c in fileCData
on b.Key = c.Key
See if this helps
CodePudding user response:
You do not need own Join implementation. Standard System.Linq implementation do not enumerate inner sequence if outer has no records.
So, check your ReadCsvRows implementation. If you use yield in it's body, no unwanted reads will appear.
Schematically:
public static IEnumerable<Row> ReadCsvRows(string fileName)
{
using var reader = new CsvReader(fileName);
while (reader.ReadNext())
{
yield return reader.CurrentRow;
}
}
In this case even new CsvReader(fileName) will be NOT called if first sequence has no rows.
