c# – 如何将LINQ分区为对象查询?

前端之家收集整理的这篇文章主要介绍了c# – 如何将LINQ分区为对象查询?前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
这是资源分配问题.我的目标是运行查询获取任何时隙的最高优先级班次.

数据集非常大.对于这个例子,假设1000家公司各有100个班次(尽管真实数据集更大).它们都被加载到内存中,我需要对它们运行一个LINQ to Objects查询

  1. var topShifts =
  2. (from s in shifts
  3. where (from s2 in shifts
  4. where s2.CompanyId == s.CompanyId && s.TimeSlot == s2.TimeSlot
  5. orderby s2.Priority
  6. select s2).First().Equals(s)
  7. select s).ToList();

问题在于,如果没有优化,LINQ to Objects将比较两个集合中的每个对象,进行所有1,000 x 100与1,000 x 100的交叉连接,这相当于100亿(10,000,000)个比较.我想要的是只比较每个公司内的对象(就像公司在sql表中被索引一样).这将产生1000组100×100个对象,总计1000万(10,000)个比较.随着公司数量的增长,后者将线性扩展而不是指数级扩展.

I4o这样的技术可以让我做这样的事情,但不幸的是,我没有在我正在执行这个查询的环境中使用自定义集合的奢侈.此外,我只希望在任何给定的数据集上运行此查询一次,因此持久索引的值是有限的.我希望使用一种扩展方法,它可以按公司对数据进行分组,然后在每个组上运行表达式.

完整示例代码

  1. public struct Shift
  2. {
  3. public static long Iterations;
  4.  
  5. private int companyId;
  6. public int CompanyId
  7. {
  8. get { Iterations++; return companyId; }
  9. set { companyId = value; }
  10. }
  11.  
  12. public int Id;
  13. public int TimeSlot;
  14. public int Priority;
  15. }
  16.  
  17. class Program
  18. {
  19. static void Main(string[] args)
  20. {
  21. const int Companies = 1000;
  22. const int Shifts = 100;
  23. Console.WriteLine(string.Format("{0} Companies x {1} Shifts",Companies,Shifts));
  24. var timer = Stopwatch.StartNew();
  25.  
  26. Console.WriteLine("Populating data");
  27. var shifts = new List<Shift>();
  28. for (int companyId = 0; companyId < Companies; companyId++)
  29. {
  30. for (int shiftId = 0; shiftId < Shifts; shiftId++)
  31. {
  32. shifts.Add(new Shift() { CompanyId = companyId,Id = shiftId,TimeSlot = shiftId / 3,Priority = shiftId % 5 });
  33. }
  34. }
  35. Console.WriteLine(string.Format("Completed in {0:n}ms",timer.ElapsedMilliseconds));
  36. timer.Restart();
  37.  
  38. Console.WriteLine("Computing Top Shifts");
  39. var topShifts =
  40. (from s in shifts
  41. where (from s2 in shifts
  42. where s2.CompanyId == s.CompanyId && s.TimeSlot == s2.TimeSlot
  43. orderby s2.Priority
  44. select s2).First().Equals(s)
  45. select s).ToList();
  46. Console.WriteLine(string.Format("Completed in {0:n}ms",timer.ElapsedMilliseconds));
  47. timer.Restart();
  48.  
  49. Console.WriteLine("\nShifts:");
  50. foreach (var shift in shifts.Take(20))
  51. {
  52. Console.WriteLine(string.Format("C {0} Id {1} T {2} P{3}",shift.CompanyId,shift.Id,shift.TimeSlot,shift.Priority));
  53. }
  54.  
  55. Console.WriteLine("\nTop Shifts:");
  56. foreach (var shift in topShifts.Take(10))
  57. {
  58. Console.WriteLine(string.Format("C {0} Id {1} T {2} P{3}",shift.Priority));
  59. }
  60.  
  61. Console.WriteLine(string.Format("\nTotal Comparisons: {0:n}",Shift.Iterations/2));
  62.  
  63. Console.WriteLine("Any key to continue");
  64. Console.ReadKey();
  65. }
  66. }

样本输出

1000 Companies x 100 Shifts
Populating data
Completed in 10.00ms
Computing Top Shifts
Completed in 520,721.00ms

Shifts:
C 0 Id 0 T 0 P0
C 0 Id 1 T 0 P1
C 0 Id 2 T 0 P2
C 0 Id 3 T 1 P3
C 0 Id 4 T 1 P4
C 0 Id 5 T 1 P0
C 0 Id 6 T 2 P1
C 0 Id 7 T 2 P2
C 0 Id 8 T 2 P3
C 0 Id 9 T 3 P4
C 0 Id 10 T 3 P0
C 0 Id 11 T 3 P1
C 0 Id 12 T 4 P2
C 0 Id 13 T 4 P3
C 0 Id 14 T 4 P4
C 0 Id 15 T 5 P0
C 0 Id 16 T 5 P1
C 0 Id 17 T 5 P2
C 0 Id 18 T 6 P3
C 0 Id 19 T 6 P4

Top Shifts:
C 0 Id 0 T 0 P0
C 0 Id 5 T 1 P0
C 0 Id 6 T 2 P1
C 0 Id 10 T 3 P0
C 0 Id 12 T 4 P2
C 0 Id 15 T 5 P0
C 0 Id 20 T 6 P0
C 0 Id 21 T 7 P1
C 0 Id 25 T 8 P0
C 0 Id 27 T 9 P2

Total Comparisons: 10,015.00
Any key to continue

问题:

>如何对查询进行分区(同时仍作为单个LinQ查询执行),以便将比较从100亿减少到1000万?
>有没有更有效的方法解决问题而不是子查询

解决方法

怎么样
  1. var topShifts = from s in shifts.GroupBy(s => s.CompanyId)
  2. from a in s.GroupBy(b => b.TimeSlot)
  3. select a.OrderBy(p => p.Priority).First();

似乎得到相同的输出,但100015比较

与@ Geoff的编辑他只是减少了我的比较:-)

猜你在找的C#相关文章