delphi – TParallel的奇怪行为默认的ThreadPool

我正在尝试Delphi XE7 Update 1的并行编程功能。

我创建了一个简单的TParallel.For循环，基本上做一些虚假的操作来传递时间。

我在AWS实例(c4.8xlarge)上在36个vcpu上启动了该程序，以试图看看并行编程的收益是多少。

当我第一次启动程序并执行TParallel.For循环时，我看到了一个显着的增益(尽管比36个vcpu预期的要少得多)：

Parallel matches: 23077072 in 242ms
Single Threaded matches: 23077072 in 2314ms

如果我不在36个vcpu机器上关闭程序并再次运行pass(例如，立即或大约10-20秒后)，并行通路会恶化很多：

Parallel matches: 23077169 in 2322ms
Single Threaded matches: 23077169 in 2316ms

如果我没有关闭程序，并且等待几分钟(不是几秒钟，但是几分钟)，再次运行通行证之前，我再次获得第一次启动程序时得到的结果(响应时间提高10倍) 。

在36个vcpu机器上启动程序后，第一次通过速度总是更快，所以似乎这种效果只在程序中调用了第二次TParallel.For。

这是我正在运行的示例代码：

unit ParallelTests;

interface

uses
  Winapi.Windows,Winapi.Messages,System.SysUtils,System.Variants,System.Classes,Vcl.Graphics,System.Threading,System.SyncObjs,System.Diagnostics,Vcl.Controls,Vcl.Forms,Vcl.Dialogs,Vcl.StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    Memo1: TMemo;
    SingleThreadCheckBox: TCheckBox;
    ParallelCheckBox: TCheckBox;
    UnitsEdit: TEdit;
    Label1: TLabel;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.Button1Click(Sender: TObject);
var
  matches: integer;
  i,j: integer;
  sw: TStopWatch;
  maxItems: integer;
  referenceStr: string;

 begin
  sw := TStopWatch.Create;

  maxItems := 5000;

  Randomize;
  SetLength(referenceStr,120000); for i := 1 to 120000 do referenceStr[i] := Chr(Ord('a') + Random(26)); 

  if ParallelCheckBox.Checked then begin
    matches := 0;
    sw.Reset;
    sw.Start;
    TParallel.For(1,MaxItems,procedure (Value: Integer)
        var
          index: integer;
          found: integer;
        begin
          found := 0;
          for index := 1 to length(referenceStr) do begin
            if (((Value mod 26) + ord('a')) = ord(referenceStr[index])) then begin
              inc(found);
            end;
          end;
          TInterlocked.Add(matches,found);
        end);
    sw.Stop;
    Memo1.Lines.Add('Parallel matches: ' + IntToStr(matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
  end;

  if SingleThreadCheckBox.Checked then begin
    matches := 0;
    sw.Reset;
    sw.Start;
    for i := 1 to MaxItems do begin
      for j := 1 to length(referenceStr) do begin
        if (((i mod 26) + ord('a')) = ord(referenceStr[j])) then begin
          inc(matches);
        end;
      end;
    end;
    sw.Stop;
    Memo1.Lines.Add('Single Threaded matches: ' + IntToStr(Matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
  end;
end;

end.

这是否按照设计工作？我发现这篇文章(http://delphiaball.co.uk/tag/parallel-programming/)建议我让图书馆决定线程池，但是如果我需要等待几分钟的时间才能请求请求，我看不到使用并行编程的要点，以便更快地提供请求。

我暂时缺少一个TParallel.For循环是如何被使用的？

请注意，我无法在AWS m3.large实例(根据AWS的2个vcpu)中重现此信息。在这种情况下，我总是轻微的改善，而且后续的TParallel呼叫也没有得到更糟的结果。

Parallel matches: 23077054 in 2057ms
Single Threaded matches: 23077054 in 2900ms

所以看起来，当有很多核心可用(36)时，会发生这种效应，这是可惜的，因为并行编程的整体要受益于许多内核。我不知道这是否是一个库错误，因为在这种情况下，核心数量不高于2的核心数量。

UPDATE: After testing it with varIoUs instances of different vcpu
counts in AWS,this seems to be the behavIoUr:

36 vcpus (c4.8xlarge). You have to wait minutes between subsequent calls to a vanilla TParallel call (it makes it unusable for
production)

32 vcpus (c3.8xlarge). You have to wait minutes between subsequent calls to a vanilla TParallel call (it makes it unusable for
production)

16 vcpus (c3.4xlarge). You have to wait sub second times. It could be usable if load is low but response time still important

8 vcpus (c3.2xlarge). It seems to work normally

4 vcpus (c3.xlarge). It seems to work normally

2 vcpus (m3.large). It seems to work normally

解决方法

我创建了两个基于您的测试程序，以比较System.Threading和 OTL.我用XE7更新1和OTL r1397构建。我使用的OTL源对应于3.04版本。我用32位Windows编译器构建，使用发布版本选项。

我的测试机是运行Windows 7 x64的双Intel Xeon E5530。该系统有两个四核处理器。这是总共8个处理器，但系统说由于超线程而有16个处理器。经验告诉我，超线程只是营销guff，我从来没有看到在这台机器上超过8倍的扩展。

现在这两个程序几乎相同。

的System.Threading

program SystemThreadingTest;

{$APPTYPE CONSOLE}

uses
  System.Diagnostics,System.Threading;

const
  maxItems = 5000;
  DataSize = 100000;

procedure DoTest;
var
  matches: integer;
  i,j: integer;
  sw: TStopWatch;
  referenceStr: string;
begin
  Randomize;
  SetLength(referenceStr,DataSize);
  for i := low(referenceStr) to high(referenceStr) do
    referenceStr[i] := Chr(Ord('a') + Random(26));

  // parallel
  matches := 0;
  sw := TStopWatch.StartNew;
  TParallel.For(1,maxItems,procedure(Value: integer)
    var
      index: integer;
      found: integer;
    begin
      found := 0;
      for index := low(referenceStr) to high(referenceStr) do
        if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
          inc(found);
      AtomicIncrement(matches,found);
    end);
  Writeln('Parallel matches: ',matches,' in ',sw.ElapsedMilliseconds,'ms');

  // serial
  matches := 0;
  sw := TStopWatch.StartNew;
  for i := 1 to maxItems do
    for j := low(referenceStr) to high(referenceStr) do
      if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
        inc(matches);
  Writeln('Serial matches: ','ms');
end;

begin
  while True do
    DoTest;
end.

OTL

program OTLTest;

{$APPTYPE CONSOLE}

uses
  Winapi.Windows,OtlParallel;

const
  maxItems = 5000;
  DataSize = 100000;

procedure ProcessThreadMessages;
var
  msg: TMsg;
begin
  while PeekMessage(Msg,PM_REMOVE) and (Msg.Message <> WM_QUIT) do begin
    TranslateMessage(Msg);
    DispatchMessage(Msg);
  end;
end;

procedure DoTest;
var
  matches: integer;
  i,DataSize);
  for i := low(referenceStr) to high(referenceStr) do
    referenceStr[i] := Chr(Ord('a') + Random(26));

  // parallel
  matches := 0;
  sw := TStopWatch.StartNew;
  Parallel.For(1,maxItems).Execute(
    procedure(Value: integer)
    var
      index: integer;
      found: integer;
    begin
      found := 0;
      for index := low(referenceStr) to high(referenceStr) do
        if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
          inc(found);
      AtomicIncrement(matches,'ms');

  ProcessThreadMessages;

  // serial
  matches := 0;
  sw := TStopWatch.StartNew;
  for i := 1 to maxItems do
    for j := low(referenceStr) to high(referenceStr) do
      if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
        inc(matches);
  Writeln('Serial matches: ','ms');
end;

begin
  while True do
    DoTest;
end.

而现在的输出。

System.Threading输出

Parallel matches: 19230817 in 374ms
Serial matches: 19230817 in 2423ms
Parallel matches: 19230698 in 374ms
Serial matches: 19230698 in 2409ms
Parallel matches: 19230556 in 368ms
Serial matches: 19230556 in 2433ms
Parallel matches: 19230635 in 2412ms
Serial matches: 19230635 in 2430ms
Parallel matches: 19230843 in 2441ms
Serial matches: 19230843 in 2413ms
Parallel matches: 19230905 in 2493ms
Serial matches: 19230905 in 2423ms
Parallel matches: 19231032 in 2430ms
Serial matches: 19231032 in 2443ms
Parallel matches: 19230669 in 2440ms
Serial matches: 19230669 in 2473ms
Parallel matches: 19230811 in 2404ms
Serial matches: 19230811 in 2432ms
....

OTL输出

Parallel matches: 19230667 in 422ms
Serial matches: 19230667 in 2475ms
Parallel matches: 19230663 in 335ms
Serial matches: 19230663 in 2438ms
Parallel matches: 19230889 in 395ms
Serial matches: 19230889 in 2461ms
Parallel matches: 19230874 in 391ms
Serial matches: 19230874 in 2441ms
Parallel matches: 19230617 in 385ms
Serial matches: 19230617 in 2524ms
Parallel matches: 19231021 in 368ms
Serial matches: 19231021 in 2455ms
Parallel matches: 19230904 in 357ms
Serial matches: 19230904 in 2537ms
Parallel matches: 19230568 in 373ms
Serial matches: 19230568 in 2456ms
Parallel matches: 19230758 in 333ms
Serial matches: 19230758 in 2710ms
Parallel matches: 19230580 in 371ms
Serial matches: 19230580 in 2532ms
Parallel matches: 19230534 in 336ms
Serial matches: 19230534 in 2436ms
Parallel matches: 19230879 in 368ms
Serial matches: 19230879 in 2419ms
Parallel matches: 19230651 in 409ms
Serial matches: 19230651 in 2598ms
Parallel matches: 19230461 in 357ms
....

我离开OTL版本运行了很长时间，模式从未改变。并行版本总是比串行版快7倍。

结论

代码非常简单。可以得出的唯一合理的结论是System.Threading的实现是有缺陷的。

有关于新的System.Threading库的许多bug报告。所有的迹象表明它的质量差。 Embarcadero在发布子标准库代码方面有着悠久的历史。我正在考虑TMonitor，XE3字符串帮助器，早期版本的System.IoUtils，FireMonkey。列表继续。

看来，质量是Embarcadero的一个大问题。代码释放相当明确地没有被充分测试，如果有的话。这对于线程库而言尤其麻烦，其中的错误可能处于休眠状态，只能在特定的硬件/软件配置中公开。 TMonitor的经验使我相信，Embarcadero没有足够的专业知识来生产高品质，正确的线程代码。

我的建议是，您不应该以当前的形式使用System.Threading。在这样一个时间，可以看出有足够的质量和正确性，应该避免。我建议你使用OTL。

编辑：原始的OTL版本的程序有一个活的内存泄漏发生，因为一个丑陋的实现细节。 Parallel.For使用.Unobserved修饰符创建任务。这导致所述任务仅在某些内部消息窗口接收到“任务已终止”消息时被破坏。该窗口与Parallel.For调用者的线程相同，即在这种情况下在主线程中创建。由于主线程没有处理消息，任务从未被破坏，内存消耗(加上其他资源)刚刚堆积。有可能是由于该程序在一段时间后挂起。

delphi – TParallel的奇怪行为 默认的ThreadPool

解决方法

猜你在找的Delphi相关文章

delphi – TParallel的奇怪行为默认的ThreadPool