delphi – TParallel的奇怪行为 默认的ThreadPool

前端之家收集整理的这篇文章主要介绍了delphi – TParallel的奇怪行为 默认的ThreadPool前端之家小编觉得挺不错的,现在分享给大家,也给大家做个参考。
我正在尝试Delphi XE7 Update 1的并行编程功能

我创建了一个简单的TParallel.For循环,基本上做一些虚假的操作来传递时间。

我在AWS实例(c4.8xlarge)上在36个vcpu上启动了该程序,以试图看看并行编程的收益是多少。

当我第一次启动程序并执行TParallel.For循环时,我看到了一个显着的增益(尽管比36个vcpu预期的要少得多):

Parallel matches: 23077072 in 242ms
Single Threaded matches: 23077072 in 2314ms

如果我不在36个vcpu机器上关闭程序并再次运行pass(例如,立即或大约10-20秒后),并行通路会恶化很多:

Parallel matches: 23077169 in 2322ms
Single Threaded matches: 23077169 in 2316ms

如果我没有关闭程序,并且等待几分钟(不是几秒钟,但是几分钟),再次运行通行证之前,我再次获得第一次启动程序时得到的结果(响应时间提高10倍) 。

在36个vcpu机器上启动程序后,第一次通过速度总是更快,所以似乎这种效果只在程序中调用了第二次TParallel.For。

这是我正在运行的示例代码

unit ParallelTests;

interface

uses
  Winapi.Windows,Winapi.Messages,System.SysUtils,System.Variants,System.Classes,Vcl.Graphics,System.Threading,System.SyncObjs,System.Diagnostics,Vcl.Controls,Vcl.Forms,Vcl.Dialogs,Vcl.StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    Memo1: TMemo;
    SingleThreadCheckBox: TCheckBox;
    ParallelCheckBox: TCheckBox;
    UnitsEdit: TEdit;
    Label1: TLabel;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.Button1Click(Sender: TObject);
var
  matches: integer;
  i,j: integer;
  sw: TStopWatch;
  maxItems: integer;
  referenceStr: string;

 begin
  sw := TStopWatch.Create;

  maxItems := 5000;

  Randomize;
  SetLength(referenceStr,120000); for i := 1 to 120000 do referenceStr[i] := Chr(Ord('a') + Random(26)); 

  if ParallelCheckBox.Checked then begin
    matches := 0;
    sw.Reset;
    sw.Start;
    TParallel.For(1,MaxItems,procedure (Value: Integer)
        var
          index: integer;
          found: integer;
        begin
          found := 0;
          for index := 1 to length(referenceStr) do begin
            if (((Value mod 26) + ord('a')) = ord(referenceStr[index])) then begin
              inc(found);
            end;
          end;
          TInterlocked.Add(matches,found);
        end);
    sw.Stop;
    Memo1.Lines.Add('Parallel matches: ' + IntToStr(matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
  end;

  if SingleThreadCheckBox.Checked then begin
    matches := 0;
    sw.Reset;
    sw.Start;
    for i := 1 to MaxItems do begin
      for j := 1 to length(referenceStr) do begin
        if (((i mod 26) + ord('a')) = ord(referenceStr[j])) then begin
          inc(matches);
        end;
      end;
    end;
    sw.Stop;
    Memo1.Lines.Add('Single Threaded matches: ' + IntToStr(Matches) + ' in ' + IntToStr(sw.ElapsedMilliseconds) + 'ms');
  end;
end;

end.

这是否按照设计工作?我发现这篇文章(http://delphiaball.co.uk/tag/parallel-programming/)建议我让图书馆决定线程池,但是如果我需要等待几分钟的时间才能请求请求,我看不到使用并行编程的要点,以便更快地提供请求。

我暂时缺少一个TParallel.For循环是如何被使用的?

请注意,我无法在AWS m3.large实例(根据AWS的2个vcpu)中重现此信息。在这种情况下,我总是轻微的改善,而且后续的TParallel呼叫也没有得到更糟的结果。

Parallel matches: 23077054 in 2057ms
Single Threaded matches: 23077054 in 2900ms

所以看起来,当有很多核心可用(36)时,会发生这种效应,这是可惜的,因为并行编程的整体要受益于许多内核。我不知道这是否是一个库错误,因为在这种情况下,核心数量不高于2的核心数量

UPDATE: After testing it with varIoUs instances of different vcpu
counts in AWS,this seems to be the behavIoUr:

  • 36 vcpus (c4.8xlarge). You have to wait minutes between subsequent calls to a vanilla TParallel call (it makes it unusable for
    production)
  • 32 vcpus (c3.8xlarge). You have to wait minutes between subsequent calls to a vanilla TParallel call (it makes it unusable for
    production)
  • 16 vcpus (c3.4xlarge). You have to wait sub second times. It could be usable if load is low but response time still important
  • 8 vcpus (c3.2xlarge). It seems to work normally
  • 4 vcpus (c3.xlarge). It seems to work normally
  • 2 vcpus (m3.large). It seems to work normally

解决方法

我创建了两个基于您的测试程序,以比较System.Threading和 OTL.我用XE7更新1和OTL r1397构建。我使用的OTL源对应于3.04版本。我用32位Windows编译器构建,使用发布版本选项。

我的测试机是运行Windows 7 x64的双Intel Xeon E5530。该系统有两个四核处理器。这是总共8个处理器,但系统说由于超线程而有16个处理器。经验告诉我,超线程只是营销guff,我从来没有看到在这台机器上超过8倍的扩展。

现在这两个程序几乎相同。

的System.Threading

program SystemThreadingTest;

{$APPTYPE CONSOLE}

uses
  System.Diagnostics,System.Threading;

const
  maxItems = 5000;
  DataSize = 100000;

procedure DoTest;
var
  matches: integer;
  i,j: integer;
  sw: TStopWatch;
  referenceStr: string;
begin
  Randomize;
  SetLength(referenceStr,DataSize);
  for i := low(referenceStr) to high(referenceStr) do
    referenceStr[i] := Chr(Ord('a') + Random(26));

  // parallel
  matches := 0;
  sw := TStopWatch.StartNew;
  TParallel.For(1,maxItems,procedure(Value: integer)
    var
      index: integer;
      found: integer;
    begin
      found := 0;
      for index := low(referenceStr) to high(referenceStr) do
        if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
          inc(found);
      AtomicIncrement(matches,found);
    end);
  Writeln('Parallel matches: ',matches,' in ',sw.ElapsedMilliseconds,'ms');

  // serial
  matches := 0;
  sw := TStopWatch.StartNew;
  for i := 1 to maxItems do
    for j := low(referenceStr) to high(referenceStr) do
      if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
        inc(matches);
  Writeln('Serial matches: ','ms');
end;

begin
  while True do
    DoTest;
end.

OTL

program OTLTest;

{$APPTYPE CONSOLE}

uses
  Winapi.Windows,OtlParallel;

const
  maxItems = 5000;
  DataSize = 100000;

procedure ProcessThreadMessages;
var
  msg: TMsg;
begin
  while PeekMessage(Msg,PM_REMOVE) and (Msg.Message <> WM_QUIT) do begin
    TranslateMessage(Msg);
    DispatchMessage(Msg);
  end;
end;

procedure DoTest;
var
  matches: integer;
  i,DataSize);
  for i := low(referenceStr) to high(referenceStr) do
    referenceStr[i] := Chr(Ord('a') + Random(26));

  // parallel
  matches := 0;
  sw := TStopWatch.StartNew;
  Parallel.For(1,maxItems).Execute(
    procedure(Value: integer)
    var
      index: integer;
      found: integer;
    begin
      found := 0;
      for index := low(referenceStr) to high(referenceStr) do
        if (((Value mod 26) + Ord('a')) = Ord(referenceStr[index])) then
          inc(found);
      AtomicIncrement(matches,'ms');

  ProcessThreadMessages;

  // serial
  matches := 0;
  sw := TStopWatch.StartNew;
  for i := 1 to maxItems do
    for j := low(referenceStr) to high(referenceStr) do
      if (((i mod 26) + Ord('a')) = Ord(referenceStr[j])) then
        inc(matches);
  Writeln('Serial matches: ','ms');
end;

begin
  while True do
    DoTest;
end.

而现在的输出

System.Threading输出

Parallel matches: 19230817 in 374ms
Serial matches: 19230817 in 2423ms
Parallel matches: 19230698 in 374ms
Serial matches: 19230698 in 2409ms
Parallel matches: 19230556 in 368ms
Serial matches: 19230556 in 2433ms
Parallel matches: 19230635 in 2412ms
Serial matches: 19230635 in 2430ms
Parallel matches: 19230843 in 2441ms
Serial matches: 19230843 in 2413ms
Parallel matches: 19230905 in 2493ms
Serial matches: 19230905 in 2423ms
Parallel matches: 19231032 in 2430ms
Serial matches: 19231032 in 2443ms
Parallel matches: 19230669 in 2440ms
Serial matches: 19230669 in 2473ms
Parallel matches: 19230811 in 2404ms
Serial matches: 19230811 in 2432ms
....

OTL输出

Parallel matches: 19230667 in 422ms
Serial matches: 19230667 in 2475ms
Parallel matches: 19230663 in 335ms
Serial matches: 19230663 in 2438ms
Parallel matches: 19230889 in 395ms
Serial matches: 19230889 in 2461ms
Parallel matches: 19230874 in 391ms
Serial matches: 19230874 in 2441ms
Parallel matches: 19230617 in 385ms
Serial matches: 19230617 in 2524ms
Parallel matches: 19231021 in 368ms
Serial matches: 19231021 in 2455ms
Parallel matches: 19230904 in 357ms
Serial matches: 19230904 in 2537ms
Parallel matches: 19230568 in 373ms
Serial matches: 19230568 in 2456ms
Parallel matches: 19230758 in 333ms
Serial matches: 19230758 in 2710ms
Parallel matches: 19230580 in 371ms
Serial matches: 19230580 in 2532ms
Parallel matches: 19230534 in 336ms
Serial matches: 19230534 in 2436ms
Parallel matches: 19230879 in 368ms
Serial matches: 19230879 in 2419ms
Parallel matches: 19230651 in 409ms
Serial matches: 19230651 in 2598ms
Parallel matches: 19230461 in 357ms
....

我离开OTL版本运行了很长时间,模式从未改变。并行版本总是比串行版快7倍。

结论

代码非常简单。可以得出的唯一合理的结论是System.Threading的实现是有缺陷的。

有关于新的System.Threading库的许多bug报告。所有的迹象表明它的质量差。 Embarcadero在发布子标准库代码方面有着悠久的历史。我正在考虑TMonitor,XE3字符串帮助器,早期版本的System.IoUtils,FireMonkey。列表继续。

看来,质量是Embarcadero的一个大问题。代码释放相当明确地没有被充分测试,如果有的话。这对于线程库而言尤其麻烦,其中的错误可能处于休眠状态,只能在特定的硬件/软件配置中公开。 TMonitor的经验使我相信,Embarcadero没有足够的专业知识来生产高品质,正确的线程代码

我的建议是,您不应该以当前的形式使用System.Threading。在这样一个时间,可以看出有足够的质量和正确性,应该避免。我建议你使用OTL。

编辑:原始的OTL版本的程序有一个活的内存泄漏发生,因为一个丑陋的实现细节。 Parallel.For使用.Unobserved修饰符创建任务。这导致所述任务仅在某些内部消息窗口接收到“任务已终止”消息时被破坏。该窗口与Parallel.For调用者的线程相同,即在这种情况下在主线程中创建。由于主线程没有处理消息,任务从未被破坏,内存消耗(加上其他资源)刚刚堆积。有可能是由于该程序在一段时间后挂起。

猜你在找的Delphi相关文章