实用的Java8集合按指定大小分组的方法

实用的Java8集合按指定大小分组的方法

转自:https://e.printstacktrace.blog/divide-a-list-to-lists-of-n-size-in-Java-8/

Divide a list to lists of n size in Java 8

Every Java developer works with lists daily. There are many popular list (or collection) operations implemented in the standard Java 8 library, but there is one that is useful and commonly used, yet missing - partitioning. In this blog post, I would like to show you how you can split any list into chunks of fixed size without using any 3rd party library. Let’s start!

Introduction

Partitioning (also known as chunking) is an operation that transforms a collection of elements into a collection of chunks of a given size. For instance, let’s say we have a list of 7 elements (incrementing numbers from 1 to 7) and we want to split it into a list of chunks of size 2.

[1,2,3,4,5,6,7] -> [[1,2], [3,4], [5,6], [7]]

Let’s implement this operation using a new type of list, called Partition. We will extend the AbstractList> class, and we will implement two methods - get(int index) and size(). The whole idea here is to hide the original list inside this class and modify its behavior, so instead of returning single elements from the original list, we are going to return chunks of a given size. Let’s take a look at the code and the following example to make this concept much easier to understand.

Listing 1. src/main/java/com/github/wololock/Partition.java

package com.github.wololock;

import java.util.AbstractList;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

public final class Partition<T> extends AbstractList<List<T>> {

    private final List<T> list;
    private final int chunkSize;

    public Partition(List<T> list, int chunkSize) {
        this.list = new ArrayList<>(list);
        this.chunkSize = chunkSize;
    }

    public static <T> Partition<T> ofSize(List<T> list, int chunkSize) {
        return new Partition<>(list, chunkSize);
    }

    @Override
    public List<T> get(int index) {
        int start = index * chunkSize;
        int end = Math.min(start + chunkSize, list.size());

        if (start > end) {
            throw new IndexOutOfBoundsException("Index " + index + " is out of the list range <0," + (size() - 1) + ">");
        }

        return new ArrayList<>(list.subList(start, end));
    }

    @Override
    public int size() {
        return (int) Math.ceil((double) list.size() / (double) chunkSize);
    }
}

All we have to do is to override those two methods, and our list of elements starts behaving like a list of chunks of the input elements. Those overridden methods affect methods like toString(), forEach(), or iterator(). Let’s take a look at a simple usage example to see how this Partition object behaves:

Listing 2. An example of Partition class usage

final List<Integer> numbers = Arrays.asList(1,2,3,4,5,6,7);

System.out.println(Partition.ofSize(numbers, 3));
System.out.println(Partition.ofSize(numbers, 2));

Running this example will produce the following input to the console:

Listing 3. The console output

[[1, 2, 3], [4, 5, 6], [7]]
[[1, 2], [3, 4], [5, 6], [7]]

Why bother with creating a new class?

We can also ask ourselves if creating such a Partition class is mandatory. Maybe there is an alternative way to achieve the same effect without using 3rd party libraries? Well, the good news is - there are at least two different ways to get the same result. Let’s take a look at the first of them.

Imperative for-loop

If we are OK with the imperative programming style, we can use Java’s for-loop and do the partitioning while iterating the input list. In this case, we need to control if the given number belongs to the existing chunk or does it belong to the next one (by its index number). The code could look like this:

Listing 4. Imperative for-loop method

final List<Integer> numbers = Arrays.asList(1,2,3,4,5,6,7);
final int chunkSize = 3;
final AtomicInteger counter = new AtomicInteger();

for (int number : numbers) {
    if (counter.getAndIncrement() % chunkSize == 0) {
        result.add(new ArrayList<>());
    }
    result.get(result.size() - 1).add(number);
}

System.out.println(result);

It produces the following console output:

[[1, 2, 3], [4, 5, 6], [7]]

Using Java 8 Stream API + grouping collector

Alternatively, we could use Java 8 Stream API and its Collectors.groupingBy() collector method. Using this method we produce a map from a stream, but we can invoke values() method on the final map to get a collection of all its values. Then we can decide if we want to stay with the Collection> or if we want to cast it to List> by passing the collection to the, e.g. ArrayList constructor. Below you can find an exemplary code.

final List<Integer> numbers = Arrays.asList(1,2,3,4,5,6,7);
final int chunkSize = 3;
final AtomicInteger counter = new AtomicInteger();

final Collection<List<Integer>> result = numbers.stream()
    .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize))
    .values();

System.out.println(result);

Less lines of code, but the same console output:

[[1, 2, 3], [4, 5, 6], [7]]

Which method to chose?

Now you may wonder - which method is the best? Should we use the Partition class, or is it better to use a good old imperative style that is quite easy to read and reason about? It depends. There is one significant difference between the first approach and the remaining two - the efficiency. Partitioning very small lists won’t make a big difference, but it starts making a huge (I mean huuuuge) difference when we start to play with large and larger lists. Let’s not speculate, but look at the numbers instead.

I created benchmark tests using JMH for the following scenarios:

  • partitioning 20 elements list into chunks of size 3,
  • partitioning 10,000 elements list into chunks of size 23,
  • and finally, partitioning 10,000,000 elements list into chunks of size 1024.

We will measure the efficiency of 4 different methods:

  • partitioning with the Partition class,
  • partitioning with the imperative for-loop,
  • partitioning with the Java 8 Stream API and Collectors.groupingBy(),
  • partitioning with the Java 8 Stream API with a custom collector that makes use of Partition class.

All measurements use microsecond unit of time. (1 μs is equal to 0.001 ms and 0.000001 s).

All benchmark tests can be found in the wololock/java-performance-benchmarks Github repository.Feel free to clone the repository and run benchmarks on your own computer with the following command:$ ./gradlew jmhI’ve run tests on a Lenovo ThinkPad T440p laptop with Intel® Core™ i7-4900MQ CPU @ 2.80GHz and 16 GBs RAM. I used JDK 1.8.0_201 (Java HotSpot™ 64-Bit Server VM, 25.201-b09).Below you can find JMH settings used for each benchmark test case:# JMH version: 1.21 # VM version: JDK 1.8.0_201, Java HotSpot(TM) 64-Bit Server VM, 25.201-b09 # VM invoker: /home/wololock/.sdkman/candidates/java/8.0.201-oracle/jre/bin/java # VM options: # Warmup: 1 iterations, 30 s each # Measurement: 42 iterations, 1 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op

Here are the benchmark tests results.

Listing 5. Partitioning list benchmark results

Benchmark                                                 Mode  Cnt       Score      Error  Units
JavaListPartitionBenchmark.A1_smallListImperative         avgt   42       0,501 ±    0,002  us/op
JavaListPartitionBenchmark.A2_smallListStreamGroupingBy   avgt   42       0,637 ±    0,004  us/op
JavaListPartitionBenchmark.A3_smallListStreamPartitioned  avgt   42       0,311 ±    0,004  us/op
JavaListPartitionBenchmark.A4_smallListToPartition        avgt   42       0,132 ±    0,006  us/op
JavaListPartitionBenchmark.B1_largeListImperative         avgt   42     173,364 ±    2,818  us/op
JavaListPartitionBenchmark.B2_largeListStreamGroupingBy   avgt   42     255,984 ±    0,540  us/op
JavaListPartitionBenchmark.B3_largeListStreamPartitioned  avgt   42      65,313 ±    0,387  us/op
JavaListPartitionBenchmark.B4_largeListToPartition        avgt   42       4,686 ±    0,011  us/op
JavaListPartitionBenchmark.C1_hugeListImperative          avgt   42  154043,254 ± 2497,491  us/op
JavaListPartitionBenchmark.C2_hugeListStreamGroupingBy    avgt   42  249341,828 ±  842,087  us/op
JavaListPartitionBenchmark.C3_hugeListStreamPartitioned   avgt   42   91150,418 ± 1079,959  us/op
JavaListPartitionBenchmark.C4_hugeListToPartition         avgt   42    8737,578 ±  138,671  us/op

Let’s look at those results more closely and see what do they look like as graphs.

image-20200606154504348

The first benchmark tests a small list of 20 elements. As you can see, in this case, it doesn’t make a big difference which approach we choose. The difference between the fastest and the slowest approach is equal to 0.369 microseconds.

image-20200606154544083

However, things start to change when we have to deal with lists containing thousands of elements. The slowest method requires 255 microseconds to complete (0.255 ms), while the fastest method is almost 64 times more efficient and it needs only 4 microseconds to do its job (0.004 ms). At this point, you can start thinking about the efficiency of available solutions. The context still matters - if you perform partitioning operation rarely, you might not need the fastest method to do the job efficiently.

image-20200606154532470

The final benchmark has only a single winner - the Partition class method. In this case, the slowest method takes almost 249341 microseconds (~250 ms), while the fastest method does the same job in 8737 microseconds (8.737 ms). However, the gap between both methods decreased to the size of 28 times. It happens mostly because the Partition class creates a copy of the input list. If we use a reference to the existing list instead of creating its copy, we could reduce 8737 microseconds to less than 2 microseconds. It could be quite risky - using a reference instead of a copy means that whenever the input list changes outside the context of Partition class, it changes our partitioned list.

What if I want to use a 3rd party library?

That’s completely fine. There are at least two libraries that provide list partitioning operation, using a similar method to the Partition class one.

When you decide to use any of these two, be aware that they don’t create a copy of the input list. To avoid some nasty side effects, you may always want to pass a copy of a list instead.

Lists.partition(new ArrayList<>(numbers), 3);

Google 和Apache的 方式与建立Partition类的方法相似;

Google Guava Lists.partition(List list, int size) method 源码

public static <T> List<List<T>> partition(List<T> list, int size) {
    checkNotNull(list);
    checkArgument(size > 0);
    return (list instanceof RandomAccess)
        ? new RandomAccessPartition<>(list, size)
        : new Partition<>(list, size);
}

private static class RandomAccessPartition<T> extends Partition<T> implements RandomAccess {
    RandomAccessPartition(List<T> list, int size) {
      super(list, size);
    }
  }

private static class Partition<T> extends AbstractList<List<T>> {
    final List<T> list;
    final int size;

    Partition(List<T> list, int size) {
      this.list = list;
      this.size = size;
    }

    @Override
    public List<T> get(int index) {
      checkElementIndex(index, size());
      int start = index * size;
      int end = Math.min(start + size, list.size());
      return list.subList(start, end);
    }

    @Override
    public int size() {
      return IntMath.divide(list.size(), size, RoundingMode.CEILING);
    }

    @Override
    public boolean isEmpty() {
      return list.isEmpty();
    }
  }
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,658评论 6 496
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,482评论 3 389
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,213评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,395评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,487评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,523评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,525评论 3 414
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,300评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,753评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,048评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,223评论 1 343
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,905评论 5 338
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,541评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,168评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,417评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,094评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,088评论 2 352