List<Integer> list = new ArrayList<>(1000000); for(int i=0;i<1000000;i++){ list.add(i); } List<String> values = new ArrayList<>(1000000); list.stream().forEach( i->values.add(new Date().toString()) ); System.out.println(values.size());
运行这个,我得到了一个正确的输出:1000000.
但是,如果我将stream()更改为parallelStream(),如下所示:
list.parallelStream().forEach( i->values.add(new Date().toString()) );
怎么了?
解决方法
ArrayList
未同步.未定义尝试同时向其添加元素.从
forEach
开始:
For parallel stream pipelines,this operation does not guarantee to respect the encounter order of the stream,as doing so would sacrifice the benefit of parallelism. For any given element,the action may be performed at whatever time and in whatever thread the library chooses.
在第二个示例中,您最终会同时在阵列列表上调用add多个线程,并且ArrayList文档说:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently,and at least one of the threads modifies the list structurally,it must be synchronized externally.
如果将ArrayList的使用更改为Vector
,则会得到正确的结果,因为此列表实现是同步的.它的Javadoc说:
Unlike the new collection implementations,
Vector
is synchronized.
但是,do not use it!此外,由于显式同步,它可能最终会变慢.
正确的方法
显然是为了避免Stream API使用collect
方法提供mutable reduction范例的这种情况.下列
List<String> values = list.stream().map(i -> "foo").collect(Collectors.toList());
无论是否并行运行,都将始终提供正确的结果. Stream管道在内部处理并发和guarantees that it is safe to use a non-concurrent collector in a collect operation of a parallel stream. Collectors.toList()
是一个内置的收集器,将Stream的元素累积到列表中.