(三)SparkConf & SparkContext

一、Initializing Spark 初始化spark
1.创建sparkconf,包含应用程序的有关信息,如Application Name,Core,Memory,以键值对的方式设置
详参http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.SparkConf
2.构建Sparkcontext,告诉spark如何去连接集群(可以是local,standalone,yarn,mesos)
注意:Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one.

val conf = new SparkConf().setAppName(appName).setMaster(master)
new SparkContext(conf)


HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
最佳实践:In practice, when running on a cluster, you will not want to hardcode master in the program, but rather launch the application with spark-submit and receive it there.
当运行集群时,不要硬编码master,应该用spark-submit的方式提交上去,这样的话,同一份代码既可以跑在yarn上,也可以跑在standalone和mesos上,不需要做任何代码的修改
二、IDEA构建Spark应用程序
1.添加spark-core和scala的依赖

<properties>
    <maven.compiler.source>1.5</maven.compiler.source>
    <maven.compiler.target>1.5</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.11.8</scala.version>
    <spark.version>2.3.1</spark.version>
    <hadoop.version>2.6.0-cdh5.7.0</hadoop.version>
  </properties>

<repositories>
    <repository>
      <id>cloudera</id>
      <name>cloudera</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

<dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>
<dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
    </dependency>
</dependencies>

2.创建SparkContext 两行开始加一行结束
编程模板

object SparkContextApp {
  def main(args: Array[String]): Unit = {
//第一步:创建SparkContext
      val sparkConf = new SparkConf().setAppName("SparkContextApp").setMaster("local[2]")
      val sc = new SparkContext(sparkConf)
//第二步:读取文件并进行相应的业务逻辑处理      
//TODO...
//第三步:关闭SparkContext     
 sc.stop()
  }
}
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。