搭建spark本地環境 搭建Java環境 (1)到官網下載JDK 官網鏈接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html (2)解壓縮到指定的目錄 (3)設置路徑和環境變數 在 ...
搭建spark本地環境
搭建Java環境
(1)到官網下載JDK
官網鏈接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
(2)解壓縮到指定的目錄
>sudo tar -zxvf jdk-8u91-linux-x64.tar.gz -C /usr/lib/jdk //版本號視自己安裝的而定
(3)設置路徑和環境變數
>sudo vim /etc/profile
在文件的最後加上
export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_91 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
(4)讓配置生效
source /etc/profile
(5)驗證安裝是否成功
~$ java -version java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
安裝Scala
(1)到官網下載安裝包
官網鏈接:https://www.scala-lang.org/download/
(2)解壓縮到指定目錄
sudo tar -zxvf scala-2.11.8.tgz -C /usr/lib/scala //版本號視自己安裝的而定
(3)設置路徑和環境變數
>sudo vim /etc/profile
在文件最後加上
export SCALA_HOME=/usr/lib/scala/scala-2.11.8 //版本號視自己安裝的而定 export PATH=${SCALA_HOME}/bin:$PATH
(4)讓配製生效
source /etc/profile
(5)驗證安裝是否成功
:~$ scala Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181). Type in expressions for evaluation. Or try :help. scala>
安裝Spark
(1)到官網下載安裝包
官網鏈接:http://spark.apache.org/downloads.html
(2)解壓縮到指定目錄
sudo tar -zxvf spark-1.6.1-bin-hadoop2.6.tgz -C /usr/lib/spark //版本號視自己安裝的而定
(3)設置路徑和環境變數
>sudo vim /etc/profile
在文件最後加上
export SPARK_HOME=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6 export PATH=${SPARK_HOME}/bin:$PATH
(4)讓配置生效
source /etc/profile
(5)驗證安裝是否成功
:~$ cd spark-1.6.1-bin-hadoop2.6 :~/spark-1.6.1-bin-hadoop2.6$ ./bin/spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 18/09/30 20:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/09/30 20:59:32 WARN Utils: Your hostname, pxh resolves to a loopback address: 127.0.1.1; using 10.22.48.4 instead (on interface wlan0) 18/09/30 20:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 18/09/30 20:59:45 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://10.22.48.4:4040 Spark context available as 'sc' (master = local[*], app id = local-1538312374870). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181) Type in expressions to have them evaluated. Type :help for more information.
安裝sbt
(1)到官網下載安裝包
官網鏈接:https://www.scala-sbt.org/download.html
(2)解壓縮到指定目錄
tar -zxvf sbt-0.13.9.tgz -C /usr/local/sbt
(3)在/usr/local/sbt 創建sbt腳本並添加以下內容
$ cd /usr/local/sbt $ vim sbt # 在sbt文本文件中添加如下信息: BT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M" java $SBT_OPTS -jar /usr/local/sbt/bin/sbt-launch.jar "$@"
(4)保存後,為sbt腳本增加執行許可權
$ chmod u+x sbt
(5)設置路徑和環境變數
>sudo vim /etc/profile
在文件最後加上
export PATH=/usr/local/sbt/:$PATH
(6)讓配置生效
source /etc/profile
(7)驗證安裝是否成功
$ sbt sbt-version //如果這條命令運行不成功請改為以下這條 >sbt sbtVersion $ sbt sbtVersion Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0 [info] Loading project definition from /home/pxh/project [info] Set current project to pxh (in build file:/home/pxh/) [info] 1.2.1
編寫Scala應用程式
(1)在終端創建一個文件夾sparkapp作為應用程式根目錄
cd ~ mkdir ./sparkapp mkdir -p ./sparkapp/src/main/scala #創建所需的文件夾結構
(2)./sparkapp/src/main/scala在建立一個SimpleApp.scala的文件並添加以下代碼
import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object SimpleApp { def main(args:Array[String]){ val logFile = "file:///home/pxh/hello.ts" val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val logData = sc.textFile(logFile,2).cache() val numAs = logData.filter(line => line.contains("a")).count() println("Lines with a: %s".format(numAs)) } }
(3)添加該獨立應用程式的信息以及與Spark的依賴關係
vim ./sparkapp/simple.sbt
在文件中添加如下內容
name:= "Simple Project" version:= "1.0" scalaVersion :="2.11.8" libraryDependencies += "org.apache.spark"%% "spark-core" % "2.2.0"
(4)檢查整個應用程式的文件結構
cd ~/sparkapp
find .
文件結構如下
. ./simple.sbt ./src ./src/main ./src/main/scala ./src/main/scala/SimpleApp.scala
(5)將整個應用程式打包成JAR(首次運行的話會花費較長時間下載依賴包,請耐心等待)
sparkapp$ /usr/local/sbt/sbt package Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0 [info] Loading project definition from /home/pxh/sparkapp/project [info] Loading settings for project sparkapp from simple.sbt ... [info] Set current project to Simple Project (in build file:/home/pxh/sparkapp/) [success] Total time: 2 s, completed 2018-10-1 0:04:59
(6)將生成的jar包通過spark-submit提交到Spark中運行
:~$ /home/pxh/spark-2.2.0-bin-hadoop2.7/bin/spark-submit --class "SimpleApp" /home/pxh/sparkapp/target/scala-2.11/simple-project_2.11-1.0.jar 2>&1 | grep "Lines with a:" Lines with a: 3
END........