Maven&Intellij IDEA打jar包以及创建scala project

1.Maven

1.1 Maven依赖：

进入这个网址查jar包依赖
http://mvnrepository.com/
将xml代码复制到你的pom.xml中即可。

如果有时候用的是别人的代码。
可以打开pom.xml文件，按“alt + insert”，
弹出:

点击dependency

搜索缺少的包，点击“add”。
idea中在这里配置自己的maven（maven，设置文件，本地仓库），有时候自己配置的新版本的maven会有不兼容的问题，也可以用idea自带的。

1.2 Maven 常用指令：

常用命令为：

1.mvn archetype:create ：创建 Maven 项目
2.mvn compile ：编译源代码
3.mvn clean compile：清理并编译
4.mvn test-compile ：编译测试代码
5.mvn test ： 运行应用程序中的单元测试
6.mvn site ： 生成项目相关信息的网站
7.mvn clean ：清除目标目录中的生成结果
8.mvn package ： 依据项目生成 jar 文件
9.mvn deploy：生成jar文件并上传到本地和私服仓库
10.mvn install    在本地repository中安装jar（包含mvn compile，mvn package，然后上传到本地仓库）

有时候需要把自己的包上传到本地maven作为依赖包。

如果需要在里面加入外部的jar包，需要在maven中安装。
进入maven的bin文件夹，如果配置了环境变量不在bin文件夹也可以，用指令安装该jar包，Dfile参数可以写入文件的路径，但是我试了一下失败了，cd到该文件目录下，不写路径安装就成功了。
在IDEA终端中输入：
mvn install:install-file -Dfile=xgboost4j-spark-0.7.jar -DgroupId=ml.dmlc -DartifactId=xgboost4j -Dversion=0.7 -Dpackaging=jar
mvn install:install-file -Dfile=scopt_2.10-3.3.0.jar -DgroupId=com.github.scopt -DartifactId=scopt_2.10 -Dversion=3.3.0 -Dpackaging=jar
再添加依赖就正常了
<dependency>
<groupId>com.github.scopt</groupId>
<artifactId>scopt_2.10</artifactId>
<version>3.3.0</version>
</dependency>

-Dfile：包的本地真实地址（如果在当前文件夹下，直接写名字）
-DgroupId：pom.xml中groupId
-DartifactId：pom.xml中artifactId
-Dversion：pom.xml中版本号
-Dpackaging：jar或war，包的后缀名
-Dclassifier：兄弟包的别名，也就是-Dversion值后面的字符workspace-1.1.1-SNAPSHOT-core.jar的-core，我没有用到。

第一种方法：

在本地 Repository 中安装 jar。
Dfile参数是该jar包的全路径，-DartifactId、-Dversion等属性要与pom.xml文件中的依赖属性一致：
Dfile路径最好是直接在D盘C盘这种文件夹下面，试过把文件放在桌面上，发现识别不了
-Dversion= pom.xml中的版本号
-DartifactId=pom.xml中的artifactId
-DgroupId=pom.xml中的groupId

mvn install
(mvn install:install-file -Dfile=新框架.jar -DgroupId=com. xx.xx -DartifactId=NAME -Dversion=0.0.1 -Dpackaging=jar  -DgeneratePom=true)

mvn install:install-file -Dfile=xgboost4j-0.7.jar -DgroupId=ml.dmlc -DartifactId=xgboost4j -Dversion=0.7 -Dpackaging=jar

如果显示build success，则安装成功了，可以去maven的本地目录查看一下文件是否生成。

第二种方法

读取系统本地的jar包，改依赖中的参数即可。

    <dependency>    
      <groupId>org.jblas</groupId>     
        <artifactId>jblas</artifactId>   
          <version>1.2.3</version>     
      <scope>system</scope>
    <systemPath>${basedir}/*****.jar</systemPath>
    </dependency>

install&package&deploy的区别
package命令完成了项目编译、单元测试、打包功能。用于生成jar包。
install完成package的功能以后，同时把打好的可执行jar包布署到本地maven仓库，这样可以通过修改maven依赖来使用。
deploy完成install功能以后，同时把打好的可执行jar包布署到本地maven仓库和远程maven私服仓库。

2.用Intellij IDEA 打包jar

如果之前已经打过jar包的话，先删除META-INF文件夹下的MANIFEST.MF配置文件，才能重新生成jar包。

2.1.File->Project Structure->Artifacts->Add->Jar->From modules with dependencies

ctrl + alt + shift +s或者点击图标

来打开Project Structure。

然后,点击add光标

来添加jar包。

2.2.配置

第一步选择Main函数执行的类。

第二步选择如图的选项，目的是对第三方Jar包打包时做额外的配置，如果不做额外的配置可不选这个选项（但不保证打包成功）

第三步删除不需要的模块
按住shift点击不需要的模块，然后点击remove删除。

通常是留下最后的文件夹，其他全部删除。

第三步点击build，然后选择build artifacts,就能在out文件夹里找到jar包。

2.3.直接打包

使用IDEA + Maven 创建scala项目时，勾选Create for archetype，找到org.scala-tools.archetypes:scala-archetype-simple然后点next。
创建完项目以后出现：Maven projects need to be imported
选择Enable Auto-Import，这样就会自动配置pom.xml文件
右边会出现Maven Projects，点开里面会有运行编译测试等等功能。

点击build(ctrl + F9)

windows build快捷键

mac build快捷键

再点击package

jar包就会出现在target文件夹下。

注意事项：

此方法需要删除test文件夹，否则打包的时候会报错。

如果不打算删除test文件夹，那么需要再对应pom.xml文件目录中构建这两个文件夹路径。（src/main/scala和src/test/scala）

    <build>
      <sourceDirectory>src/main/scala</sourceDirectory>
      <testSourceDirectory>src/test/scala</testSourceDirectory>
    </build>

另外<properties>中可能指定了scala版本之类的信息:

    <properties>
        <spark.version>2.2.1</spark.version>
        <scala.version>2.11.8</scala.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

如果package时打包出错，并出现scala版本的问题可以把<properties>这一项改成：

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
  </properties>

如果Maven Project 这一栏隐藏了，点击搜索图标

搜索 Maven Project,再点击即可。

3.IDEA加载github下载的java，scala项目

代码通过git clone或者下载zip得到

3.1 打开

选择 File-->Open... 打开下载的项目

3.2 配置JDK

File--> Project Structure ---> Project---->选择JDK （不能选就new一个选中本机JDK）

3.3 选择 Modules

Project Structure界面，设置目录。
target或out设置Excluded，src设置Sources

3.4 程序包junit.framework不存在

编译时会报错，报“junit.framework不存在”错误。
一般来说pom中有junit包的依赖，但是有scope属性。

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>${junit.version}</version>
    <scope>test</scope>
</dependency>

依赖加入scope属性后，就会被约束在该生命周期内，在compile阶段不引入test阶段的依赖，也就是说编译阶段该依赖无效。可以删除scope，或者修改scope属性：

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>${junit.version}</version>
    <scope>compile</scope>
</dependency>

4.创建scala project

4.1.新建project，并勾选

4.2.按两下shift搜索scala-archetype-simple

4.3.修改pom依赖

通常我这边依赖写成：

<dependencies>
  <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.11.0</version>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.2.1</version>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.11</artifactId>
    <version>2.2.1</version>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.1.1</version>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
    <version>2.1.1</version>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.2.0</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.4.1</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-yarn-client</artifactId>
    <version>2.4.1</version>
  </dependency>

  <dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.4</version>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>org.specs</groupId>
    <artifactId>specs</artifactId>
    <version>1.2.5</version>
    <scope>test</scope>
  </dependency>
</dependencies>

有时候会报错
expected START_TAG or END_TAG not TEXT
这是依赖前面有空格造成的，删除空格以后直接回车。

有时候
为了避免冲突，需要用<exclusions>剔除部分jar包。
例如已经引入了hadoop-client依赖，而spark-core中含有hadoop-client包，则可以使用<exclusions>剔除

   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>${hadoop.version}</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>${spark.version}</version>
      <scope>provided</scope>
      <exclusions>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-client</artifactId>
        </exclusion>
      </exclusions>
    </dependency>

其中Spark-Core提供Spark最基础与最核心的功能，主要包括以下功能：
(1)SparkContext：
通常而言，Driver Application的执行与输出都是通过SparkContext来完成的。在正式提交Application之前，首先需要初始化SparkContext。SparkContext隐藏了网络通信、分布式部署、消息通信、存储能力、计算能力、缓存、测量系统、文件服务、Web服务等内容，应用程序开发者只需要使用SparkContext提供的API完成功能开发。SparkContext内置的DAGScheduler负责创建Job，将DAG中的RDD划分到不同的Stage，提交Stage等功能。内置的TaskScheduler负责资源的申请，任务的提交及请求集群对任务的调度等工作。
(2)存储体系：
Spark优先考虑使用各节点的内存作为存储，当内存不足时才会考虑使用磁盘，这极大地减少了磁盘IO，提升了任务执行的效率，使得Spark适用于实时计算、流式计算等场景。此外，Spark还提供了以内存为中心的高容错的分布式文件系统Tachyon供用户进行选择。Tachyon能够为Spark提供可靠的内存级的文件共享服务。
(3)计算引擎：
计算引擎由SparkContext中的DAGScheduler、RDD以及具体节点上的Executor负责执行的Map和Reduce任务组成。DAGScheduler和RDD虽然位于SparkContext内部，但是在任务正式提交与执行之前会将Job中的RDD组织成有向无环图（DAG），并对Stage进行划分，决定了任务执行阶段任务的数量、迭代计算、shuffle等过程。
(4)部署模式：
由于单节点不足以提供足够的存储和计算能力，所以作为大数据处理的Spark在SparkContext的TaskScheduler组件中提供了对Standalone部署模式的实现和Yarn、Mesos等分布式资源管理系统的支持。通过使用Standalone、Yarn、Mesos等部署模式为Task分配计算资源，提高任务的并发执行效率。

4.4.插件(maven-shade-plugin,maven-jar-plugin)

有的时候需要引用一些外部包或者文件，需要把集群中没有的库或者文件一起打进jar包。例如可以用shade把分词工具ansj打进jar包：
maven-jar-plugin
使用maven-jar-plugin这个插件可以去掉一些不想要的文件或者类。

           <plugin>
                 <groupId>org.apache.maven.plugins</groupId>
                 <artifactId>maven-jar-plugin</artifactId>
                 <version>3.1.0</version>
                 <configuration>
                     <archive>
                         <manifest>
                             <!-- 指定入口函数 -->
                             <mainClass>com.main.Jar</mainClass>
                             <!-- 是否添加依赖的jar路径配置 -->
                            <addClasspath>true</addClasspath>
                             <!-- 依赖的jar包存放未知，和生成的jar放在同一级目录下 -->
                             <classpathPrefix>lib/</classpathPrefix>
                         </manifest>
                     </archive>
                     <!-- 不打包com.library下面的所有文件或类 -->
                     <excludes>com.library/*</excludes>
                 </configuration>
            </plugin>

maven-shade-plugin
依赖：
为了防止打出来的jar包占用太大空间，已有的包不需要使用shade打包，集群中没有的才需要。在已经有的包后面加上<scope>provided</scope>，这样这些库就不会被打包，没有这句的包会打进jar包。

<dependencies>
  <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.11.0</version>
  </dependency>

  <dependency>
      <groupId>org.ansj</groupId>
      <artifactId>ansj_seg</artifactId>
      <version>5.1.1</version>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
<version>2.2.1</version>
<scope>provided</scope>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.11</artifactId>
<version>2.2.1</version>
<scope>provided</scope>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
  </dependency>

  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
  </dependency>

  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
<version>2.4.1</version>
<scope>provided</scope>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-yarn-client</artifactId>
<version>2.4.1</version>
<scope>provided</scope>
  </dependency>

  <dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.4</version>
    <scope>test</scope>
  </dependency>
  <dependency>
    <groupId>org.specs</groupId>
    <artifactId>specs</artifactId>
    <version>1.2.5</version>
    <scope>test</scope>
  </dependency>
</dependencies>

  <build>
//插件
    <plugins>
      <!-- the Maven compiler plugin will compile Java source files 编译JAVA的插件-->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.7.0</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-resources-plugin</artifactId>
        <version>3.0.2</version>
        <configuration>
          <encoding>UTF-8</encoding>
        </configuration>
      </plugin>

      <!-- the Maven Scala plugin will compile Scala source files 编译scala插件-->
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.2</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

      <!-- configure the eclipse plugin to generate eclipse project descriptors for a Scala project -->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-eclipse-plugin</artifactId>
        <version>2.10</version>
        <configuration>
          <projectnatures>
            <projectnature>org.scala-ide.sdt.core.scalanature</projectnature>
            <projectnature>org.eclipse.jdt.core.javanature</projectnature>
          </projectnatures>
          <buildcommands>
            <buildcommand>org.scala-ide.sdt.core.scalabuilder</buildcommand>
          </buildcommands>
          <classpathContainers>
            <classpathContainer>org.scala-ide.sdt.launching.SCALA_CONTAINER</classpathContainer>
            <classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
          </classpathContainers>
          <excludes>
            <exclude>org.scala-lang:scala-library</exclude>
            <exclude>org.scala-lang:scala-compiler</exclude>
          </excludes>
          <sourceIncludes>
            <sourceInclude>**/*.scala</sourceInclude>
            <sourceInclude>**/*.java</sourceInclude>
          </sourceIncludes>
        </configuration>
      </plugin>

      <!-- allows the route to be run via 'mvn exec:java' -->
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <version>1.6.0</version>
        <configuration>
          <mainClass>UP.MyRouteMain</mainClass>
        </configuration>
      </plugin>

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.0.0</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <shadedArtifactAttached>true</shadedArtifactAttached>
//有时候需要同时部署使用shade和不使用shade两种jar包
//那么在shadedClassifierName后加上名称，该名称作为后缀在shade构件jar包后
              <shadedClassifierName>wangyao</shadedClassifierName>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

4.5. 更新插件
Intellj Idea 可以自动载入Maven依赖，但有使用Mac book的时候常常会碰到问题，导致pom文件修改却没有触发自动重新载入的动作。点击Maven Project=》你的项目名称=》Plugins，一些插件下面有红色下划线且尾部带有"unknow"，此时就需要手动强制更新依赖。
两个方法：
（1）右键单击项目；
（2）点击Maven=》Reimport菜单项。

或者，IDEA将通过网络自动下载相关依赖，并存放在Maven的本地仓库中。另外，可以将Maven的刷新设置为自动，配置方法为：

（1）单击File|Setting菜单项，打开Settings选项卡；
（2）在左侧的目录树中，展开Maven节点；
（3）勾选Import Maven projects automatically选择项。
（4）项目右键-》Maven-》Reimport
（5）操作之后你就可以发现maven的依赖包已经更新！

4.6 Maven镜像

在maven里创建MyRepository目录，
我一般叫maven-dependcies，那就是<localRepository>D:/maven-dependcies</localRepository>
打开maven下的conf\settings.xml，如果是idea自带的要去idea的插件里面找maven，在conf文件夹里。
在<settings>后面加上
<localRepository>maven路径/MyRepository</localRepository>
对应自己MyRepository的路径。

国外源下载太慢，在setting.xml中mirrors节点中增加一段代码：

 <mirror>
        <id>nexus-aliyun</id>
        <mirrorOf>*</mirrorOf>
        <name>Nexus aliyun</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public</url>
 </mirror>

使用阿里云的镜像速度会更快。

5.新建package

6.在package里新建scala object

7.package显示
有的时候package是叠在一起的，没法再一个package下创建多个子package。
点击小齿轮，然后把Hide Empty Middle Packages取消就可以了

Maven&Intellij IDEA打jar包以及创建scala project