Spark_ZenDei技術網路在線

Spark

-Advertisement-

Spark簡介官網地址：http://spark.apache.org/ Apache Spark is a fast and general engine for large-scale data processing. Speed Run programs up to 100x faster ...

Spark簡介

官網地址：http://spark.apache.org/

Apache Spark is a fast and general engine for large-scale data processing.

Speed

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

　　Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.

Ease of Use

Write applications quickly in Java, Scala, Python, R.

　　Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.

Generality
Combine SQL, streaming, and complex analytics.

　　Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Runs Everywhere
Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.

　　You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos. Access data in HDFS, Cassandra, HBase, Hive, Tachyon, and any Hadoop data source.

官方文檔地址：http://spark.apache.org/docs/1.6.0/

安裝

下載：

Spark runs on Java 7+, Python 2.6+ and R 3.1+. For the Scala API, Spark 1.6.0 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).

spark版本：spark-1.6.0-bin-hadoop2.6.tgz

scala版本：scala-2.10.4.tgz

spark監控埠 :4040

安裝集群：

tar -xzvf spark-1.6.0-bin-hadoop2.6.tgz -C /usr/

修改解壓後的配置文件 /conf/spark-env.sh

export SPARK_MASTER_IP=node1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1024m
export SPARK_LOCAL_DIRS=/data/spark/dataDir
export HADOOP_CONF_DIR=/usr/hadoop-2.5.1/etc/hadoop

修改slaves文件

node2
node3

集群管理器類型有

Standalone

Apache Mesos

Hadoop Yarn

這裡介紹兩種方式 Standalone 和 Yarn

配置文件中的 HADOOP_CONF_DIR就是為了使用Yarn配置的

Standalone

啟動spark自身的管理器

spark/sbin/start-all.sh

運行測試腳本

standalone client模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://node1:7077 --executor-memory 512m --total-executor-cores 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar  100

standalone cluster模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://node1:7077 --deploy-mode cluster --supervise --executor-memory 512M --total-executor-cores 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar 100

查詢結果http://node1:4040 client模式通過4040查詢不到結果，在腳本執行結束後在終端就能看到結果。

Yarn

啟動hadoop集群, 由於使用Yarn不需要啟動spark

start-all.sh

運行測試腳本：

Yarn client模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 512M --num-executors 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar 100

Yarn cluster模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 512m --num-executors 1 ./lib/spark-examples-1.6.0-hadoop2.6.0.jar 100

查詢執行結果 http://node1:8088 client模式通過8088查詢不到結果，在腳本執行結束後在終端就能看到結果。

Yarn應用場景，集群中同時運行mapreduce和spark建議使用yarn,公用相同的資源調度器。

Standalone針對只跑spark應用的集群。

集群模式

The following table summarizes terms you’ll see used to refer to cluster concepts:

Term	Meaning
Application	User program built on Spark. Consists of a driver program and executors on the cluster.
Application jar	A jar containing the user's Spark application. In some cases users will want to create an "uber jar" containing their application along with its dependencies. The user's jar should never include Hadoop or Spark libraries, however, these will be added at runtime.
Driver program	The process running the main() function of the application and creating the SparkContext
Cluster manager	An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
Deploy mode	Distinguishes where the driver process runs. In "cluster" mode, the framework launches the driver inside of the cluster. In "client" mode, the submitter launches the driver outside of the cluster.
Worker node	Any node that can run application code in the cluster
Executor	A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
Task	A unit of work that will be sent to one executor
Job	A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. `save`, `collect`); you'll see this term used in the driver's logs.
Stage	Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you'll see this term used in the driver's logs.

您的分享是我們最大的動力!

-Advertisement-

更多相關文章

關於自定義tabBar時修改系統自帶tabBarItem屬性造成的按鈕順序錯亂的問題相關探究

/Users/chenjiajiang/Desktop/截圖/大神班/Snip20160827_1.png ...
蘋果系統開發中的混合編程(2):Swift和C的相互調用

在進行Swift和C之間的相互調用時，有必要先瞭解一下兩種語言之間的類型轉換關係：下在還是先演示一下如何在Swift里對C的方法進行調用，創建一個Swift的項目：在項目里創建C代碼文件：這個時候會提示你是否要生成Bridging Header，選擇創建。文件創建完成後的代碼結構如下，可以看 ...
Xamarin.Android 入門之：Xamarin+vs2015 環境搭建

一、前言此篇博客主要寫瞭如何使用搭建xamarin開發的環境，防止我自己萬一哪天電腦重裝系統了，可以直接看這篇博客。二、準備工作在安裝之前需要下載好xamarin所需要的軟體並一個個安裝他們： 1.visual studio 2015：http://news.mydrivers.com/1/4 ...
Android快樂貪吃蛇游戲實戰項目開發教程-06虛擬方向鍵（五）繪製方向鍵箭頭

本系列教程概述與目錄：http://www.cnblogs.com/chengyujia/p/5787111.html本系列教程項目源碼GitHub地址：https://github.com/jackchengyujia/HappySnake 一、本文概述在上篇教程中，我們畫了4個背景三角形，並且 ...
Android學習之Handler消息傳遞機制

Android只允許UI線程修改Activity里的UI組件。當Android程式第一次啟動時，Android會同時啟動一條主線程（Main Thread），主線程主要負責處理與UI相關的事件，如用戶的按鍵事件、屏幕繪圖事件，並把相關的事件分發到對應的組件進行處理。所以，主線程通常又被稱為UI線程。 ...
StackExchange.Redis幫助類解決方案RedisRepository封裝（字元串類型數據操作）

本文版權歸博客園和作者本人共同所有，轉載和爬蟲請註明原文鏈接 http://www.cnblogs.com/tdws/tag/NoSql/ 目錄一、基礎配置封裝二、String字元串類型數據操作封裝三、Hash散列類型數據操作封裝四、List列表類型數據操作封裝五、Set集合類型數據操作封 ...
Mysql(Mariadb) 基礎操作語句（持續更新）

基礎SQL語句，記錄以備查閱。（在HeiDiSql中執行） ...
什麼是存儲引擎以及不同存儲引擎特點

以前一直玩Oracle資料庫，整天圍著業務需求和執行計劃轉，剛剛接觸Mysql看到存儲引擎不慎理解，相應會有與我相同人群存在，所以寫文以記之。首先簡單從字面理解，想當是與磁碟打交道的，實際情況也是如此。一個資料庫系統可以隨意切換不同的存儲引擎，也就是說隨意選擇寫磁碟或操作磁碟的方式，覺得還是很牛掰 ...