sparkContext是spark程序的入口,可以创建RDD、accumulators和broadcast
/**
* Main entry point for Spark functionality. A SparkContext represents the connection to a Spark
* cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.
*
* Only one SparkContext may be active per JVM. You must `stop()` the active SparkContext before
* creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.
*
* @param config a Spark Config object describing the application configuration. Any settings in
* this config overrides the default configs as well as system properties.
*/
class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationClient {
sparkContext的初始化只需要一个sparkConf,sparkConf的实质是一个<k, v>对
/**
* Configuration for a Spark application. Used to set various Spark parameters as key-value pairs.
*
* Most of the time, you would create a SparkConf object with `new SparkConf()`, which will load
* values from any `spark.*` Java system properties set in your application as well. In this case,
* parameters you set directly on the `SparkConf` object take priority over system properties.
*
* For unit tests, you can also call `new SparkConf(false)` to skip loading external settings and
* get the same configuration no matter what the system properties are.
*
* All setter methods in this class support chaining. For example, you can write
* `new SparkConf().setMaster("local").setAppName("My app")`.
*
* Note that once a SparkConf object is passed to Spark, it is cloned and can no longer be modified
* by the user. Spark does not support modifying the configuration at runtime.
*
* @param loadDefaults whether to also load values from Java system properties
*/
class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
sparkConf的主要参数有
- master
- appName
- jars
- executorEnv
- sparkHome
/**
* Creates a modified version of a SparkConf with the parameters that can be passed separately
* to SparkContext, to make it easier to write SparkContext's constructors. This ignores
* parameters that are passed as the default value of null, instead of throwing an exception
* like SparkConf would.
*/
private[spark] def updatedConf(
conf: SparkConf,
master: String,
appName: String,
sparkHome: String = null,
jars: Seq[String] = Nil,
environment: Map[String, String] = Map()): SparkConf =
{
val res = conf.clone()
res.setMaster(master)
res.setAppName(appName)
if (sparkHome != null) {
res.setSparkHome(sparkHome)
}
if (jars != null && !jars.isEmpty) {
res.setJars(jars)
}
res.setExecutorEnv(environment.toSeq)
res
}