Chapter 1 Introduction to Apache Flink-Quick start setup

Quick start setup

Now that we understand the details about Flink's architecture and its process model, it's time to get started with a quick setup and try out things on our own. Flink works on both . Windows and Linux machines
(截止目前，我们已经理解了Flink的架构和运行模型的相关细节，该安装并尝试做一些我们自己的事情了。Flink 可以同时工作在windows 和linux 平台上)
The very first thing we need to do is to download Flink's binaries. Flink can be downloaded from the Flink download page at: http://flink. apache.org/downloads.html(首先我们要做的是下载Flink的二进制包，见下面的地址)
On the download page, you will see multiple options as shown in the following screenshot:
（下载地址你可以看到更多象下边截图的选项）

image.png

In order to install Flink, you don't need to have Hadoop installed. But in case you need to connect to Hadoop using Flink then you need to download the exact binary that is compatible with the Hadoop version you have with you.（为了安装Flink,你不必安装Hadoop。但如果你需要用 Flink去链接Hadoop ，你需要下载确切的hadoop 兼容版本。）

As I have latest version of Hadoop 2.7.0 installed with me, I am going to download the Flink binary compatible with Hadoop 2.7.0 and built on Scala 2.11.Here is direct link to download http://www-us.apache.orq/dist/flink/flink-1.1.4/flink-1.1.4-bin-hadoop27-scal.a_2.11.tgz
（因为我已经安装了hadoop2.7.0版本，所以我下载了兼容hadoop 2.7.0 并构建在scala2.11上的Flink版本。下面是下载链接）

Pre-requisite

Flink needs Java to be installed first. So before you start, please make sure Java is installed. I have IDK 1.8 installed on my machine:

D:\\java -veresion
Java version "1.8.0_92"
Java<TM>SE Runtime Envirenment (Build 1.8.92-b14)
Java HotSport(TM) 64-Bit Server VM (build 25.92-b14 mixed mode)

Installing on Windows

Flink installation is very easy to install. Just extract the compressed file and store it on the desired location
(Flink安装非常简单，只是解压并将其保存至指定的目录)
Once extracted, go to the folder and execute start-local.bat:
（解压完后，CD 到该目录并执行start-local.bat）

>cd flink-1.1.4
>bin\start-local.bat

And you will see that the local instance of Flink has started.You can also check the web UI on http://localhost :8081/:
(你会看到本地的Flink实例已经启动。你也可以通过WEB UI检测是否安装成功。)
You can stop the Flink process by pressing Cltr +C
（可以按Ctrl+C停止Flink进程）

image.png

Installing on Linux

Similar to Windows, installing Flink on Linux machines is very easy. We need to download the binary, place it in a specific folder, extract, and finish:(象在windows 一下，在linux 上安装也很简单。我们需要下载二进制包放到指定的目录，解压，并完成下面命令)

$sudo tar -xzf flink-1.1.4-bin-hadoop27-scala 2.11.tqz 
$cd flink-1.1.4
$bin/start-local.sh

As in Windows, please make sure Java is installed on the machine
（跟windows一样，确保JAVA 已经安装）
Now we are all set to submit a Flink job. To stop the local Flink instance on Linux, execute following command:
（现在我们可以提交一个Flink Job了。停止本地Flink 实例，用下面的命令）

$bin/stop-local.sh

Cluster setup

Setting up a Flink cluster is very simple as well. Those who have a background of installing a Hadoop cluster will be able to relate to these steps very easily. In order to set up the cluster, let's assume we have four Linux machines with us, each having a moderate configuration. At least two cores and 4 GB RAM machines would be a good option to get started.The very first thing we need to do this is to choose the cluster design. As we have four machines, we will use one machine as the Job Manager and the other three machines as the Task Managers:
（安装Flink Cluster 也是非常简单。有安装Hadoop安装经验的人很容易将这几步关联起来。为了安装Flink集群，假设我们有4个台机器，每台都有合适的配置(至少两核和4G RAM将的机器比较合适)。最首要的是选择集群设计。因为我们有4台机器，我们用一台作为Job Manager并且另外三台作为Task Managers）

image.png

SSH configurations

In order to set up the cluster, we first need to do password less connections to the TaskManager from the Job Manager machine. The following steps needs to be performed on the Job Manager machine which creates an SSH key and copies it to authorized_keys
(为了安装集群，我们首先需要JobManager到TaskManager之间的免密连接。下面的步骤需要在Job Manager上创建SSH key并copy至authorized_keys)

Ssh-keygen

This will generate the public and private keys in the /home/flinkuser/.ssh folder. Now copy the public key to the Task Manager machine and perform the following steps on the Task Manager to allow password less connection from the Job Manager(这个命令会生成公/私钥对在/home/flinuser/.ssh目录下。现在我们copy public key到Task Manager上，并在Task Manager上执行下面步骤允许Job Manager可以免密登录到Task Manager上)

sudo mkdir -p /home/flinkuser/.ssh
sudo touch /home/flinkuser/authorized_keys
sudo cp /home/flinkuser/.ssh  //error 译者注
sudo sh -c "cat id rsa.pub >> /home/flinkuser/.ssh/authorized keys".

Make sure the keys have restricted access by executing the following commands
(执行下面的命令保证key的访问权限。)

sudo chmod 700 /home/flinkuser/.ssh
sudo chmod 600 /home/flinkuser/.ssh/authorized keys

Now you can test the password less SSH connection from the Job Manager machine

现在你可以从Job Manager上测试一下免密登录。

sudo ssh <task-manager-1> 
sudo ssh <task-manager-2> 
sudo ssh <task-manager-3>

If you are using any cloud service instances for the installations, please make sure that the ROOT login is enabled from SSH. In order to do this, you need to login to each machine: open file /etc/ssh/sshd_config.Then change the value to PermitRootLogin yes. Once you save the file, restart the SSH service by executing the command:
(如果你用的是云服务安装，请确保ROOT用户的SSH是可用的。你需要登录到每台机器，然后打开/etc/ssh/sshd_config。改变PermitRootLogin的值为yes。保存后，重启ssh服务)

sudo service sshd restart

Java installation

Next we need to install Java on each machine. The following command will help you install Java on Redhat/CentOS based UNIX machines.

wget --no-check-certificate -no-cookies --header "Cookie:oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/idk/8u92-b14/jdk-8u92-linux-x64 rpm
sudo rpm -ivh jdk-8u92-linux-x64.rpm

Next we need to set up the JAVA_HOME environment variable so that Java is available to access from everywhere.Create a java.sh file

接下来我们需要安装JAVA_HOME 环境变量以至于JAVA命令可以被访问,新建java.sh 文件。

sudo vi /etc/profile.d/java.sh

And add following content in it and save it:
(接下来把以下内容加入到文件中并保存)

#!/bin/bash
JAVA HOME=/usr/java/jdk1.8.0_92 
PATH=$JAVA_HOME/bin:$PATH
export PATH JAVA_HOME 
export CLASSPATH=.

Make the file executable and source it:
(使java.sh文件可执行)

sudo chmod +x /etc/profile.d/java.sh 
source /etc/profile.d/java.sh

You can now check if Java is installed properly:
（检测 java 是否正确安装）

$ java -version
java version "1.8.0_92"
Java (TM) SE Runtime Environment (build 1.8.0 92-b14).
Java HotSpot (TM) 64-Bit Server VM (build 25.92-b14, mixed mode).

Repeat these installations steps on the Job Manager and Task Manager machines.
(在其他的Job Manager和Task Manager 机器上重复这几步)

Flink installation

Once SSH and Java installation is done, we need to download Flink binaries and extract them into a specific folder. Please make a note that the installation directory on all nodes should be same.
So let's get started:
（SSH和JAVA都安装完之后，我们需要下载Flink 的二进制安装包并解压到指定目录。请确保所有机器的安装目录都一致。好，开始安装。）

cd /usr/local
sudo wget http://www-eu.apache.orq/dist/flink/flink-1.1.4/flink1.1.4-bin-hadoop27-scala_2.11.tqz
sudo tar -xzf flink-1.1.4-bin-hadoop27-scala_2.11.tqz

现在，Flink 安装包我们已经准备好了，我们需要做一些相关的配置

Configurations

Flink's configurations are simple. We need to tune a few parameters and we are all set. Most of the configurations are same for the Job Manager node and the Task Manager node. All configurations are done in the conf/flink-conf.yaml file.
（Flink的配置很简单。我们需要调整一小部分参数，我们都已经准备好了。大部分的参数在Job Manager和Task Manager节点都是相同的。所有的配置都在下边的文件中配置好了。）
The following is a configuration file for a Job Manager node:

jobmanager.rpc.address: localhost 
jobmanager.rpc.port: 6123 
jobmanager.heap.mb:256 
taskmanager.heap.mb: 512
taskmanager.numberOfTaskSlots: 1

You may want to change memory configurations for the Job Manager and Task Manager based on your node configurations. For the Task Manager, jobmanager.rpc.address should be populated with the correct Job Manager hostname or IP address.
(你也许想根据你节点的配置改变Job Manager和Task Manager的内存配置。对于Task Manager来讲jobmanager.rpc.address应该是正确的Job Manager的hostname或IP address。)

So for all Task Managers, the configuration file should be like the following:
（所有对于所有的Task Mangers，它们的配置文件应该是下面这样的）

jobmanager.rpc.address:<jobmanager-ip-or-host>
jobmanager.rpc.port:6123
jobmanaqer.heap.mb:250
taskmanager.heap.mb:512
taskmanager.numberOfTaskSlots:1

We need to add the JAVA_HOME details in this file so that Flink knows exactly where to look for Java binaries
(我们还需要将JAVA_HOME放在这个文件里，以便Flink知道JAVA的确切位置。)

export JAVA HOME=/usr/java/idk1.8.0_92

We also need to add the slave node details in the conf/slaves file, with each node on a separate new line.Here is how a sample conf/slaves file should look like:
我们也需要将slave 节点加到conf/slaves文件中，每行一个节点。下面是例子:

<task-manager-1>
<task-manager-2>
<task-manager-3>

Starting daemons

Now the only thing left is starting the Flink processes. We can start each process separately on individual nodes or we can execute the start-cluster.sh command to start the required processes on each node:
(现在剩下的事就是启动Flink进程了。我们可以分别在每一个节点上启动每个进程或我们可以在每一个节点上执行start-cluster.sh命令)

bin/start-cluster.sh

If all the configurations are good, then you would see that the cluster is up and running.You can check the web UI at http://<job-manager-ip>:8081/.The following are some snapshots of the Flink Web UI
(如果所有的配置都是OK的，那么你可以看到集群已经启动并运行了。你可以通过下边的地址访问WEB UI 来验证是否安装成功。下面是Flink WEB UI的截图)

image.png

You can click on the Job Manager link to get the following view:
点击Job Manager显示如下

image.png

Similarly, you can check out the Task Managers view as follows:
同样的，点击Task Manager显示如下

image.png

Adding additional Job/Task Managers

Flink provides you with the facility to add additional instances of Job and Task Managers to the running cluster.Before we start the daemon, please make sure that you have followed the steps given previously.To add an additional Job Manager to the existing cluster, execute the following command: sudo bin/jobmanager.sh start cluster Similarly, we need to execute the following command to add an additional Task Manager sudo bin/taskmanager.sh start cluster

(Flink提供在线增加Job Manager和Task Manager的功能。在我们启动daemon之前，你确认已经执行了上面我们提到那些步骤。加Job Manager到现有集群有如下命令

sudo bin/jobmanager.sh start cluster

同样的，追加Task Manager执行下如下命令

sudo bin/taskmanager.sh  start cluster

)

Stopping daemons and cluster

Once the job execution is completed, you want to shut down the cluster. The following commands are used for that.
(一旦job 运行结束，你需要停止集群。可以用下面的命令)
To stop the complete cluster in one go:
关闭集群里的所有进程

sudo bin/stop-cluster.sh

To stop the individual Job Manager:
关闭一个Job Manager

sudo bin/jobmanager.sh stop cluster

To stop the individual Task Manager:
关闭一个Task Manager

sudo bin/tasknanager.sh stop cluster

Running sample application

Flink binaries come with a sample application which can be used as it is. Let's start with a very simple application, word count. Here we are going try a streaming application which reads data from the netcat server on a specific port.
(Flink 的包里带着一个简单的程序，这个程序是可以用的。我们开始启动这个简单的应用程序word count。在这里，我们尝试一个streaming应用程序，这个程序从metcat服务的批定端口读数据。)
So let's get started. First start the netcat server on port 9000 by executing the following command:
（我们先启动netcat服务在端口9000）

nc -l 9999

Now the netcat server will be start listening on port 9000 so whatever you type on the command prompt will be sent to the Flink processing.
(现在netcat服务器已经启程并监听9000端口，所以你在命令提示符下敲的内容都会被发送到Flink 进程)
Next we need to start the Flink sample program to listen to the netcat server. The following is the command
接下来我们需要启动Flink的示例程序来监听netcat服务。命令如下:

bin/flink run examples/streaming/SocketTextstreamWordCount.iar --hostname localhost --port 9000
08/06/2016 10:32:40 Job execution switched to status RUNNING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map(1/1) switched to SCHEDULED
08/06/2016 10:32:40 Source: Socket stream -> Flat Map (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to SCHRDULED
08/06/2016 10:32:40 Keyed Aqqreqation -> Sink: Unnamed (1/1) switched to DEPLOYING
08/06/2016 10:32:40 Source: Socket stream-> Flat Map (1/1) switched to RUNNING
08/06/2016 10:32:40 Keyed Aggregation -> Sink: Unnamed(1/1) switched to RUNNING

This will start the Flink job execution. Now you can type something on the netcat console and Flink will process it
For example, type the following on the netcat server:
(这会启动Flink job的运行。现在我们可以在netcat控制台敲一些内容，Flink会处理它。比如：敲下面的内容在netcat控制台)

Snc-1 9000
hi Hello
Hello World
This distribution includes crvptographic software. 
The country in.which you currently reside may have restrictions on the import,
 possession. use, and/or re-export to another country, of. encryption software BEFORE using any encryption software. 
please check your country's laws. regulations and policies 
concerning the import, possession, or use, and re-export of 
encryption software, to see if this is permitted. 
See <http://www.wassenaar.org/> for more information.

You can verify the output in logs

$ tail-f flink--taskmanager--flink-instance-*.out.
==> flink-root-taskmanager-0-flink-instance-1.out <==
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(org, 1)
(for, 1)
(more, 1)
(information, 1)
(hellow, 1) 
(world, 1)
==> flink-root-taskmananer-1-flink-instance-1 out <==
(is,1)
(permitted, 1)
(see, 2)
(http, 1)
(www, 1)
(wassenaar, 1).
(orq, 1)
(for, 1)
(more, 1)
(information, 1)

==> flirk-root-taskmanager-2-flink-instance-1.out <==
(he11o, 1)
(worlds, 1)
(hi, 1)
(how, 1)
(are, 1)
(you, 1)
(how, 2)
(is,1)
(it,1)
(going, 1)

You can also checkout the Flink Web UI to see how your job is performing. The following screenshot shows the data flow plan for the execution:
你可以打开Flink WEB UI看一下你的job是如何运行的。下面的截图显示了data flow执行计划。

image.png

Here for the job execution, Flink has two operators. The first is the source operator which reads data from the Socket stream. The second operator is the transformation operator which aggregates counts of words We can also look at the timeline of the job execution:
(这里job的执行，Flink 有两个操作（符）。第一个是source,这个从 Socket stream中读数据。第二个是transformation,它聚合单词数，我们也可以看一下job 执行的时间线。)

image.png

Summary

In this chapter, we talked about how Flink started as a university project and then became a full-fledged enterprise-ready data processing platform. We looked at the details of Flink's.architecture and how its process model works. We also learnt how to run Flink in local and cluster modes
在这一章，我们讨论了从大学里发起的Flink项目，变成一个成熟的企业级的数据处理平台。我们也学习了更多Flink架构的细节和它的处理模型。我们也学习了怎样以local和cluster方式运行。
In the next chapter, we are going to learn about Flink's Streaming API and look at its details and how can we use that API to solve our data streaming processing problems.
在下面的章节当中，我们将要学习Flink的Streaming API，并学习它的相关细节，以及我们怎样用API来解决我们的流处理问题。

Chapter 1 Introduction to Apache Flink-Quick start setup

Quick start setup

Pre-requisite

Installing on Windows

Installing on Linux

Cluster setup

SSH configurations

Java installation

Flink installation

Configurations

Starting daemons

Adding additional Job/Task Managers

Stopping daemons and cluster

Running sample application

Summary

推荐阅读更多精彩内容