I have an SQL table with a list of objects and I am trying to return a list using a number of different criteria.
Here is my object (same as the SQL table):
class Photo{
public int Id { get; set; }
public string FileName { get; set; }
public DateTime Uploaded { get; set; }
public int? ProjectId { get; set; }
public int GalleryOrder { get; set; }
}
In the SQL table, there are lots of photos uploaded over time that may or may not be tagged with ProjectIds. There are numerous photos tagged to each ProjectId. They are uploaded in batches, so that there may be numerous photos with the same DateTime Uploaded and then organized with GalleryOrder.
Given a list of ProjectIds I am trying to return one representative photo for each project based on the following parameters:
- One Photo per ProjectId
- Most recent uploaded date
- Lowest gallery order
I have some code that appears to be working with my test database of ~20 entries, but it pulls excess data and has numerous sorts. I'm not sure how to simplify and optimize it.
Here is my current query:
public async Task<List<Photo>> GetOneImageFilesPerProject(List<int> projectIds)
{
using var context = _contextFactory.CreateDbContext();
var results = await context.Photos.Where(x => x.ProjectId != null
&& projectIds.Contains((int)x.ProjectId))
.ToListAsync();
results = results.OrderBy(x => x.ProjectId)
.ThenByDescending(x => x.Uploaded)
.ThenBy(x => x.GalleryOrder)
.GroupBy(x => x.ProjectId)
.Select(x => x.First())
.ToList();
return results;
}
The program compiles with the OrderBy, GroupBy, and Select calls added after the .Where call, but it seemed to hang at that point and never return the final list. That's why it's split into two processing steps.
My other thought was to get the list from the Db and then go through with a foreach loop to build the final list. Not sure if that's faster than using .GroupBy and .Select. Either way seems a bit inelegant and brute force. If there's a straight SQL query that would be a better solution, I'm open to it!
Is there some method that is more straightforward for returning a list with one unique item based on other criteria within the object list?
For scale, this won't be a massive app, but it will be looking for ~10-20 projectIds at a time and each project might have ~40-50 photos tagged to it. Many cases will be a bit smaller (2-5 photos) but some larger, (100-200 photos). This function will be run by the users very frequently, as in several times a day, and the Photo list will be changing frequently.
Edit: Using .NET 5 and EF Core 5
CodePudding user response:
Try this query, it should work with EF Core 5
public async Task<List<Photo>> GetOneImageFilesPerProject(List<int> projectIds)
{
using var context = _contextFactory.CreateDbContext();
var photos = context.Photos.Where(x => x.ProjectId != null
&& projectIds.Contains((int)x.ProjectId));
var query =
from dp in photos.Select(x => new { x.ProjectId }).Distinct()
from p in photos.Where(p => p.ProjectId == dp.ProjectId)
.OrderByDescending(p => p.Uploaded)
.ThenBy(p => p.GalleryOrder)
.Take(1)
select p;
var results = await query.ToListAsync();
return results;
}
CodePudding user response:
Fairly sure that EFC5 (even) will translate this:
var some = new [] {1,2,3};
var v = await context.Photos
.Where(p => some.Contains(p.ProjectId))
.GroupBy(p => p.ProjectId)
.Select(g => g.OrderByDescending(p => p.Uploaded).ThenBy(p => p.Gallery).First())
.ToListAsync();
To something like:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ProjectId ORDER BY Uploaded DESC, Gallery) rn
FROM Photos
WHERE ProjectId IN (1,2,3)
)
WHERE rn = 1
Which should be fairly performant; you can always executeraw this if it won't translate it..
