In this next series of videos, we'll get some more practice applying the divide and conquer algorithm and design paradigm to various problems.
This will also give us a glimpse of the kinds of application [inaudible] to which it's been successfully applied.
We're gonna start by extending the merge sort algorithm to solve a problem involving counting the number of inversions of an array.
Before we tackle the specific problem of counting the number of inversions in an array, let me say a few words about the divide and conquer paradigm in general.
So again, you've already seen the totally canonical example of divide and conquer, namely merge sort.
So the following three conceptual steps will be familiar to you.
The first step, no prizes for guessing is you divide.
The problem.
Into smaller sub-problems.
Sometimes this division happens only in your mind.
It's really more of a conceptual step than part of your code.
Sometimes you really do copy over parts of the input into say new arrays to pass on to your recursive calls.
The second step, again no prizes here, is you conquer the sub-problems just using recursion.
So for example, in Merge Sort, you conceptually divide the array into two different pieces.
And then you [inaudible] with the conquer or sort to the first half of the array.
And you, you do the same thing with the second half of the array.
Now, of course, it's not quite as easy as just doing these two steps.
Dividing the problem, and then solving the sub problems recursively.
Usually, you have some extra cleanup work after the recursive calls, and to stitch together the solutions to the sub problems into one for the big problem, the problem that you actually care about.
Recall, for example, in Merge Sort, after our recursive calls, the left half of the array was sorted, the right half of the array was sorted.
But we still had to stitch those together.
Merge into a sorted version of the entire array.
So the [inaudible] step is to combine.
The solutions to the subproblem into one problem.
Generally the largest amount of ingenuity happens in the third step.
How do you actually quickly combine solutions to subproblems into one to the original problem? Sometimes you also get some cleverness in the first step with division.
Sometimes it's as simple as just spliting a ray in two.
But there are cases where the division step also has some ingenuity.
Now let's move on to the specific problem of counting inversions and see how to apply this divide and conquer paradygm.
So let begin by defining the problem formally now.
We're given as input an array A with a length N.
And you can define the problem so that the array a contains any ole distinct numbers.
But, let's just keep thing simple and assume that it contains the numbers one through n.
The integers in that range in some order.
That captures the essence of the problem.
And the goal is to compute the number of inversions of this array so what's an inversion you may ask well an inversion is just a pair of array [inaudible] I and J with I smaller than J so that earlier array entry the I entry is bigger than the latter one the Jake one so one thing that should be evident is that if the array contains these numbers in sorted order if the array is simply one two three four all the way up to N then the number of inversions is zero.
The converse you might also want to think through if the array has any other ordering of the numbers between one and N other than the assorted one, then it's going to have a non.
Of zero number of inversions.
Let's look at another example.
So'spose we have an array of six entries.
So the numbers one thru six in the following order.
One, three, five followed by two, four, six.
So how many inversions does this array have? So again what we need to look for are pairs of array entries so that the earlier or left entry is bigger than the later or right entry.
So one example which we see right here would five and two.
Those are right next to each other and out of order, the earlier entry is bigger than the other one.
But there's others, there's three and two for example Those are out of order.
And, five and four are also out of order.
And I'll leave it to you to check that those are the only three pairs that are out of order.
So summarizing the inversions in this array of length six are 3-2, 5-2, and 5-4.
Corresponding to the array entries, 2-4, 3-4, and 3-5.
Pictorially, we can think of it thusly, we can first.
Write down the numbers in order, one up to six.
And then we can write down the numbers again but, ordered in the way that their given in the input array.
So, one three five two four six.
And then we can connect the dots, meaning we connect one to one.
Reconnect two to two, and so on.
It turns out, and I'll leave to for you to, to think this through, that the number of crossing pairs of line segments prescisely correspond to the number of inversions.
So we see that there are one, two, three crossing line segments.
And these are exactly in correspondence with the three inversions, we found earlier.
Five and two, three and two, and five and four.
Now, [inaudible] wanna solve this problem you might ask.
Well there's few reasons that come up.
One would be to have a numerical similarity measure that quantifies how close to [inaudible] lists are to each other.
So for example, suppose I took you and a friend, and I took, identified ten movies that both of you had seen.
And I asked each of you to order, or to rank these movies from your most favorite to your least favorite.
Now I can form an array, and compute inversions.
And it quantifies, in some sense, how dissimilar your two rankings are to each other.
So in more detail, in the first entry of this array, I would put down the ranking that your friend gave to your favorite movie.
So if you had your favorite movie, Star Wars or whatever.
And your friend only thought it was the fifth best out of the ten, then I would write down a five in the first entry of this array.
Generally, I would take your second favorite movie.
I would look at how your friend ranked that.
I would put that in the second entry of the array and so on, all the way up to the tenth entry of the array, where I would put your friend's ranking of your least favorite movie.
Now, if you have exactly identical preferences, if you rank them exactly the same way, the number of inversions of this array would be zero.
And in general, the more inversions this array has, it quantifies that your lists look more and more different from each other.
Now why might you want to do this why might you want to know whether two different people ranked things in the similar way had similar preferences well one reason might be what's called collaborative filtering, probably many of you have had the experience of going to a website and if you've made a few purchases through this website it starts recommending further purchases for you, so one way to solve this problem under the hood, is to look at your purchases look at what you seem to like, find other people who have similar preferences similar history look at things they've bought that you haven't, and then recommend.
New products to you based on what similar customers seemed to have bought.
So this problem captures some of the essence of identifying which customers or which people are similar based on data about what they prefer.
So just to make sure we're all on the same page, let me pause for a brief quiz.
We've already noticed that a given array will have zero inversions, if and only if it's in sorted order.
If it only contains the numbers of one through N in order.
So, on the other side, what is the largest number of inversions an array could possibly have? Let's say, just for an array of size six, like the one in this example here.
在接下来的视频系列中,我们将获得更多实践,将分而治之算法和设计范例应用于各种问题。
这也将使我们了解成功应用了哪些应用程序[听不清]。
我们将从扩展合并排序算法开始,以解决涉及计算数组反转数的问题。
在我们解决对数组中的求逆数进行计数的特定问题之前,让我先谈一些关于分而治之的范式。
同样,您已经看到了分而治之的完全典范示例,即合并排序。
因此,您将熟悉以下三个概念性步骤。
第一步,猜猜是没有奖品的。
问题。
分为较小的子问题。
有时这种分裂只发生在您的脑海中。
这实际上是更多概念性步骤,而不是代码的一部分。
有时,您确实确实将部分输入复制到新数组中,以传递给递归调用。
第二步,这里也没有奖品,是您仅使用递归即可解决子问题。
因此,例如,在“合并排序”中,您从概念上将数组分为两个不同的部分。
然后,您[听不清]用征服或排序到数组的前半部分。
而您,对数组的后半部分执行相同的操作。
现在,当然,这并不像完成这两个步骤那样容易。
划分问题,然后递归解决子问题。
通常,在递归调用之后,您需要进行一些额外的清理工作,并将子问题的解决方案组合为一个大问题,即您真正关心的问题。
回想一下,例如,在“合并排序”中,在我们进行递归调用之后,对数组的左半部分进行了排序,对数组的右半部分进行了排序。
但是我们仍然必须将它们缝合在一起。
合并到整个数组的排序版本中。
因此,[听不清]步骤是合并。
该子问题的解决方案成为一个问题。
通常,最大的创造力发生在第三步。
您实际上如何快速将子问题的解决方案组合为原始问题?有时,在除法的第一步中,您也会变得很聪明。
有时,就像将光线一分为二一样简单。
但是在某些情况下,分割步骤也有一些技巧。
现在,让我们继续研究反转计数的特定问题,并了解如何应用这种分而治之。
因此,让我们从现在开始正式定义问题开始。
我们将输入长度为N的数组A作为输入。
并且您可以定义问题,以便数组a包含任何ole互不相同的数字。
但是,让我们保持简单,并假设它包含数字1到n。
该范围内的整数按一定顺序排列。
这抓住了问题的实质。
而且目标是计算此数组的求逆数,因此您可能会问什么是求逆?求逆只是一对数组[听不清] I和J,且I小于J,因此,较早的数组入口I入口较大比后一个Jake更重要的一点是,如果数组按排序顺序包含这些数字,并且该数组只是一二三四一直到N,则求逆数为零。
相反,您可能还想考虑一下,如果数组在除N之外的其他任何一个介于1和N之间的数字,则它将有一个非整数。
反转数为零。
让我们看另一个例子。
因此,假设我们有一个包含六个条目的数组。
因此,数字按以下顺序从一到六。
一,三,五,然后是二,四,六。
那么这个数组有多少个反转呢?因此,我们再次需要寻找的是成对的数组条目,以便较早或较左的条目大于较晚或较右的条目。
因此,我们在这里看到的一个示例将是五个和两个。
那些是彼此相邻且顺序混乱的,较早的条目要比另一个条目大。
但是还有其他一些,例如三个和两个,它们是乱序的。
并且,五个和四个也出现故障。
我将留给您检查,以确保只有三对故障。
因此,总结此长度为6的数组中的反演为3-2、5-2和5-4。
对应于数组条目2-4、3-4和3-5。
在图形上,我们可以这样思考,我们可以首先。
依次写下数字,最多1个。
然后我们可以再次写下数字,但是按照输入数组中给定的方式排序。
因此,一三五二四六。
然后我们可以连接点,这意味着我们一对一地连接。
重新连接两个到两个,依此类推。
事实证明,我将让您仔细考虑一下,线段的交叉对的数量精确地对应于反转的数量。
因此,我们看到有1、2、3个交叉线段。
这些恰好与我们先前发现的三个反演相对应。
五和二,三和二,五和四。
现在,[听不清]想解决您可能会问的问题。
好吧,有几个原因。
一种方法是采用数值相似性度量,以量化[听不清]列表彼此之间的接近程度。
例如,假设我带你和一个朋友,然后带我确定了你们俩都看过的十部电影。
我请你们每个人订购,或将这些电影从您最喜欢的到最不喜欢的排列。
现在,我可以形成一个数组,并计算反转。
从某种意义上说,它量化了两个排名之间的差异。
因此,更详细地讲,在该数组的第一个条目中,我将放下您的朋友对您最喜欢的电影的排名。
因此,如果您有自己喜欢的电影,《星球大战》或其他电影。
而您的朋友只认为这是十个最佳中的第五个,那么我会在该数组的第一项中写下一个五个。
通常,我会拍第二部您喜欢的电影。
我会看看你的朋友如何评价的。
我会将其放在数组的第二个条目中,依此类推,一直到数组的第十个条目,然后将您的朋友在您最不喜欢的电影中的排名放在此处。
现在,如果您具有完全相同的首选项,如果以完全相同的方式对它们进行排名,则此数组的反转数将为零。
通常,此数组的反转次数更多,它可以量化列表之间的差异越来越大。
现在,为什么要执行此操作,为什么要知道两个不同的人是否以相似的方式对事物进行排名,是否具有相似的偏好?一个原因可能就是所谓的协作过滤,也许你们中的许多人都有访问网站的经验如果您通过该网站进行了几次购买,它就会开始为您推荐进一步的购买,因此,解决这个问题的一种方法是查看您的购买,看看您的喜好,找到其他有购买意愿的人相似的偏好相似的历史记录会查看他们购买的,您没有的东西,然后推荐。
根据相似客户购买的商品为您提供新产品。
因此,此问题捕获了一些本质,这些本质是基于有关他们偏爱的数据来识别哪些客户或哪些人是相似的。
因此,为了确保我们都在同一页面上,让我暂停一下简短的测验。
我们已经注意到,给定数组只有在按排序顺序时才具有零反转。
如果仅按顺序包含1到N的数字。
那么,另一方面,数组可能具有的最大反转数是多少?假设仅是大小为6的数组,例如此处的示例。
O(n log n) Algorithm for Counting Inversions I - Question 1
What is the largest-possible number of inversions that a 6-element array can have?
A 15
B 21
C 36
D 64