Majority Element

Problem

Given an array of size n, find the majority element. The majority element is the element that appears more than ⌊ n/2 ⌋ times.

You may assume that the array is non-empty and the majority element always exist in the array.

Approach #1 Brute Force

Intuition
We can exhaust the search space in quadratic time by checking whether each element is the majority element.

Algorithm
The brute force algorithm iterates over the array, and then iterates again for each number to count its occurrences. As soon as a number is found to have appeared more than any other can possibly have appeared, return it.

#include <iostream>
#include <vector>

int majorityElement(std::vector<int>& nums)
{
    int size = (int)nums.size();
    int halfCount = size / 2;
    
    for (auto num : nums)
    {
        int count = 0;
        
        for (auto elem : nums)
        {
            if (elem == num)
            {
                ++count;
            }
        }
        
        if (count > halfCount)
        {
            return num;
        }
    }
    
    return -1;
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

  • Time complexity : O(n^2)
    The brute force algorithm contains two nested for loops that each run for n iterations, adding up to quadratic time complexity.
  • Space complexity : O(1)
    The brute force solution does not allocate additional space proportional to the input size.

Approach #2 HashMap

Intuition
We know that the majority element occurs more than [n/2] times, and a HashMap allows us to count element occurrences efficiently.
Algorithm
We can use a HashMap that maps elements to counts in order to count occurrences in linear time by looping over nums. Then, we simply return the key with maximum value.

#include <iostream>
#include <vector>
#include <unordered_map>

int majorityElement(std::vector<int>& nums)
{
    // hash
    std::unordered_map<int, int> counts;
    for (auto num : nums)
    {
        if (counts.count(num))
        {
            ++counts[num];
        }
        else
        {
            counts[num] = 1;
        }
    }
    
    // iteration
    int size = (int)nums.size();
    int halfCount = size / 2;
    
    for (auto elem : nums)
    {
        if (counts[elem] > halfCount)
        {
            return elem;
        }
    }
    
    return -1;
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

  • Time complexity : O(n)
    We iterate over nums once and make a constant time HashMap insertion on each iteration. Therefore, the algorithm runs inO(n) time.
  • Space complexity : O(n)
    At most, the HashMap can contain n – [n/2] associations, so it occupies O(n) space. This is because an arbitrary array of length n can contain n distinct values, but nums is guaranteed to contain a majority element, which will occupy (at minimum) [n/2] +1 array indices. Therefore, n – ([n/2] +1) indices can be occupied by distinct, non-majority elements (plus 1 for the majority element itself), leaving us with (at most) n - [n/2] distinct elements.

Approach #3 Sorting

Intuition
If the elements are sorted in monotonically increasing (or decreasing) order, the majority element can be found at index ⌊​n/2​​​⌋ (and ⌊​n/2​​​⌋ +1, incidentally, if n is even).

Algorithm
For this algorithm, we simply do exactly what is described: sort nums, and return the element in question. To see why this will always return the majority element (given that the array has one), consider the figure below (the top example is for an odd-length array and the bottom is for an even-length array):

For each example, the line below the array denotes the range of indices that are covered by a majority element that happens to be the array minimum. As you might expect, the line above the array is similar, but for the case where the majority element is also the array maximum. In all other cases, this line will lie somewhere between these two, but notice that even in these two most extreme cases, they overlap at index ⌊​n/2​​​⌋for both even- and odd-length arrays. Therefore, no matter what value the majority element has in relation to the rest of the array, returning the value at ⌊​n/2​​​⌋ will never be wrong.

#include <iostream>
#include <vector>
#include <algorithm>

int majorityElement(std::vector<int>& nums)
{
    std::sort(nums.begin(), nums.end());
    return nums[nums.size() / 2];
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

  • Time complexity : O(nlgn)
    Sorting the array costs O(nlgn) time in Python and Java, so it dominates the overall runtime.
  • Space complexity : O(1) or O(n)
    We sorted nums in place here - if that is not allowed, then we must spend linear additional space on a copy of nums and sort the copy instead.

Approach #4 Randomization

Intuition
Because more than ⌊​n/2⌋ array indices are occupied by the majority element, a random array index is likely to contain the majority element.

Algorithm
Because a given index is likely to have the majority element, we can just select a random index, check whether its value is the majority element, return if it is, and repeat if it is not. The algorithm is verifiably correct because we ensure that the randomly chosen value is the majority element before ever returning.

Complexity Analysis

  • Time complexity : O(∞)
    It is technically possible for this algorithm to run indefinitely (if we never manage to randomly select the majority element), so the worst possible runtime is unbounded. However, the expected runtime is far better - linear, in fact. For ease of analysis, convince yourself that because the majority element is guaranteed to occupy more than half of the array, the expected number of iterations will be less than it would be if the element we sought occupied exactly half of the array. Therefore, we can calculate the expected number of iterations for this modified version of the problem and assert that our version is easier.

Because the series converges, the expected number of iterations for the modified problem is constant. Based on an expected-constant number of iterations in which we perform linear work, the expected runtime is linear for the modifed problem. Therefore, the expected runtime for our problem is also linear, as the runtime of the modifed problem serves as an upper bound for it.

  • Space complexity : O(1)
    Much like the brute force solution, the randomized approach runs with constant additional space.

Approach #5 Divide and Conquer

Intuition
If we know the majority element in the left and right halves of an array, we can determine which is the global majority element in linear time.

Algorithm
Here, we apply a classical divide & conquer approach that recurses on the left and right halves of an array until an answer can be trivially achieved for a length-1 array. Note that because actually passing copies of subarrays costs time and space, we instead pass lo and hi indices that describe the relevant slice of the overall array. In this case, the majority element for a length-1 slice is trivially its only element, so the recursion stops there. If the current slice is longer than length-1, we must combine the answers for the slice's left and right halves. If they agree on the majority element, then the majority element for the overall slice is obviously the same1. If they disagree, only one of them can be "right", so we need to count the occurrences of the left and right majority elements to determine which subslice's answer is globally correct. The overall answer for the array is thus the majority element between indices 0 and n.

#include <iostream>
#include <vector>
#include <algorithm>

int countInRange(std::vector<int>& nums, int num, int lo, int hi)
{
    int count = 0;
    for (int i = lo; i < hi; ++i)
    {
        if (nums[i] == num)
        {
            ++count;
        }
    }
    
    return count;
}

int majorityElementRec(std::vector<int>& nums, int lo, int hi)
{
    if (lo == hi - 1)
    {
        return nums[lo];
    }
    
    int mid = lo + (hi - lo) / 2;
    int left = majorityElementRec(nums, lo, mid);
    int right = majorityElementRec(nums, mid, hi);
    
    if (left == right)
    {
        return left;
    }
    
    int leftCount = countInRange(nums, left, lo, hi);
    int rightCount = countInRange(nums, right, lo, hi);
    
    return leftCount > rightCount ? left : right;
}

int majorityElement(std::vector<int>& nums)
{
    return majorityElementRec(nums, 0, (int)nums.size());
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

  • Time complexity :O(nlgn)
    Each recursive call to majority_element_rec performs two recursive calls on subslices of size n/2 and two linear scans of length nn. Therefore, the time complexity of the divide & conquer approach can be represented by the following recurrence relation:
    T(n) = 2T(n/2) + 2n

By the master theorem, the recurrence satisfies case 2, so the complexity can be analyzed as such:

  • Space complexity : O(lgn)
    Although the divide & conquer does not explicitly allocate any additional memory, it uses a non-constant amount of additional memory in stack frames due to recursion. Because the algorithm "cuts" the array in half at each level of recursion, it follows that there can only be O(lgn) "cuts" before the base case of 1 is reached. It follows from this fact that the resulting recursion tree is balanced, and therefore all paths from the root to a leaf are of length O(lgn).

Because the recursion tree is traversed in a depth-first manner, the space complexity is therefore equivalent to the length of the longest path, which is, of course, O(lgn).

Approach #6 Boyer-Moore Voting Algorithm

Intuition
If we had some way of counting instances of the majority element as +1 and instances of any other element as -1, summing them would make it obvious that the majority element is indeed the majority element.

Algorithm
Essentially, what Boyer-Moore does is look for a suffix suf of nums where suf[0] is the majority element in that suffix. To do this, we maintain a count, which is incremented whenever we see an instance of our current candidate for majority element and decremented whenever we see anything else.

Whenever count equals 0, we effectively forget about everything in nums up to the current index and consider the current number as the candidate for majority element. It is not immediately obvious why we can get away with forgetting prefixes of nums - consider the following examples (pipes are inserted to separate runs of nonzero count).
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 7, 7, 7, 7]

Here, the 7 at index 0 is selected to be the first candidate for majority element. count will eventually reach 0 after index 5 is processed, so the 5 at index 6 will be the next candidate. In this case, 7 is the true majority element, so by disregarding this prefix, we are ignoring an equal number of majority and minority elements - therefore, 7 will still be the majority element in the suffix formed by throwing away the first prefix.
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 5, 5, 5, 5]

Now, the majority element is 5 (we changed the last run of the array from 7s to 5s), but our first candidate is still 7. In this case, our candidate is not the true majority element, but we still cannot discard more majority elements than minority elements (this would imply that count could reach -1 before we reassign candidate, which is obviously false).

Therefore, given that it is impossible (in both cases) to discard more majority elements than minority elements, we are safe in discarding the prefix and attempting to recursively solve the majority element problem for the suffix. Eventually, a suffix will be found for which count does not hit 0, and the majority element of that suffix will necessarily be the same as the majority element of the overall array.

#include <iostream>
#include <vector>
#include <algorithm>

int majorityElement(std::vector<int>& nums)
{
    int count = 0;
    int candidate = 0;
    
    for (auto num : nums)
    {
        if (0 == count)
        {
            candidate = num;
        }
        
        count += (candidate == num) ? 1 : -1;
    }
    
    return candidate;
}

int main()
{
    int arr[] = { 1, 2, 3, 2, 4, 2, 2, 2, 2, 5, 7};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    int result = majorityElement(nums);
    
    std::cout << result << std::endl;
    
    return 0;
}

Complexity Analysis

  • Time complexity : O(n)
    Boyer-Moore performs constant work exactly nn times, so the algorithm runs in linear time.
  • Space complexity : O(1)
    Boyer-Moore allocates only constant additional memory.

Majority Element II

Given an integer array of size n, find all elements that appear more than ⌊ n/3 ⌋ times.
Note: The algorithm should run in linear time and in O(1) space.

Approach #1 Boyer-Moore Voting Algorithm

#include <iostream>
#include <vector>
#include <algorithm>

std::vector<int> majorityElement(std::vector<int>& nums)
{
    std::vector<int> result;
    
    int candidate1 = 0;
    int candidate2 = 0;
    int count1 = 0;
    int count2 = 0;
    
    for (auto num : nums)
    {
        if (num == candidate1)
        {
            ++count1;
        }
        else if (num == candidate2)
        {
            ++count2;
        }
        else if (0 == count1)
        {
            candidate1 = num;
            count1 = 1;
        }
        else if (0 == count2)
        {
            candidate2 = num;
            count2 = 1;
        }
        else
        {
            --count1;
            --count2;
        }
    }
    
    count1 = 0;
    count2 = 0;
    
    for (auto elem : nums)
    {
        if (elem == candidate1)
        {
            ++count1;
        }
        else if (elem == candidate2)
        {
            ++count2;
        }
    }
    
    if (count1 > (int)nums.size() / 3)
    {
        result.push_back(candidate1);
    }
    
    if (count2 > (int)nums.size() / 3)
    {
        result.push_back(candidate2);
    }
    
    return result;
}

int main()
{
    int arr[] = { 2, 2, 3, 2, 4, 2, 3, 2, 3, 5, 3};
    std::vector<int> nums(arr, arr + sizeof(arr) / sizeof(arr[0]));
    std::vector<int> result = majorityElement(nums);
    
    for (auto ret : result)
    {
        std::cout << ret << std::endl;
    }
    
    return 0;
}

参考:
https://leetcode.com/problems/majority-element/description/
https://leetcode.com/problems/majority-element-ii/description/
https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_majority_vote_algorithm
https://gregable.com/2013/10/majority-vote-algorithm-find-majority.html
https://blog.csdn.net/novostary/article/details/47680171
https://blog.csdn.net/wmdshhz0404/article/details/52602395
https://www.cnblogs.com/grandyang/p/4606822.html
https://www.cnblogs.com/grandyang/p/4233501.html
https://www.zhihu.com/question/49973163/answer/235921864

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,014评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,796评论 3 386
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,484评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,830评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,946评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,114评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,182评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,927评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,369评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,678评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,832评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,533评论 4 335
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,166评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,885评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,128评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,659评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,738评论 2 351

推荐阅读更多精彩内容

  • rljs by sennchi Timeline of History Part One The Cognitiv...
    sennchi阅读 7,312评论 0 10
  • 指针式C语言的灵魂,我简单写一下自己的见解 指针(pointer)简介 指针是一个值为内存地址的变量变量就是一个内...
    RicherYY阅读 358评论 0 0
  • 【养心养意】20171123学习力践行Day44 ^o^儿歌~小星星亮晶晶 ^o^读诗一首 ^o^画日记 ^o^亲...
    爱己及人阅读 152评论 0 0
  • 五女拜寿 热闹非凡 各人感受峻异 家人之间投射出社会的看法 过年 全家团聚 各有梦想和烦恼 求学的,做官的,当老板...
    俭以养德文以载道阅读 169评论 0 5
  • 玉树琼花 毫不浮夸 像是你拿着的毛笔 饱蘸水样的柔情 肆意流淌对树的深情 冰清玉洁 毫不吝啬 像是你揣着的棋子 抛...
    陶缨子阅读 367评论 7 27