Monday, May 5, 2014

Strata 2014: Matei Zaharia, "How Companies are Using Spark, and Where th...

Spark - Matei Zaharia

Spark 5 times faster than Hive on disk

 Spark 18 times faster than Hive in Memory RAM

Spark 100 times faster than MapReduce

Spark Stack - Shark SQL,

Spark Streaming, MLlib machine Learning, GraphX

 Hadoop - Batch Processing
Spark - Iterative Processing

Yarn - Resource Manager,
HDFS, HBase, etc.- Storage

 120 lines in Scala, compared to 15K in C++
30 mins to run on 100 million Samples

 Yahoo Ad Analytics - Hive on Spark - Shark

Storm - Streaming
Hadoop

Map Reduce - batch processsing

 Impala - SQL processing in Big Data

Spark - Hive (SQL query) on top of Spark - Shark

 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.