Big Data Small Data it is All Data: Strata 2014: Matei Zaharia, "How Companies are Using Spark, and Where th...

Monday, May 5, 2014

Strata 2014: Matei Zaharia, "How Companies are Using Spark, and Where th...

Spark - Matei Zaharia

Spark 5 times faster than Hive on disk

Spark 18 times faster than Hive in Memory RAM

Spark 100 times faster than MapReduce

Spark Stack - Shark SQL,

Spark Streaming, MLlib machine Learning, GraphX

Hadoop - Batch Processing
Spark - Iterative Processing

Yarn - Resource Manager,
HDFS, HBase, etc.- Storage

120 lines in Scala, compared to 15K in C++
30 mins to run on 100 million Samples

Yahoo Ad Analytics - Hive on Spark - Shark

Storm - Streaming
Hadoop

Map Reduce - batch processsing

Impala - SQL processing in Big Data

Spark - Hive (SQL query) on top of Spark - Shark

Big Data Small Data it is All Data

Monday, May 5, 2014

Strata 2014: Matei Zaharia, "How Companies are Using Spark, and Where th...

No comments:

Post a Comment