Big Data technologies are changing very fast. Today Big Data stack may be overshadowed by new stack. Berkeley Data Stack is one of them. Ampcamp provides a good tutorial on Berkeley Big Data Stack.
Tuesday, May 13, 2014
Monday, May 12, 2014
Is more data good? Interesting papers from Peter Norvig
The Unreasonable Effectiveness of Data
"Scaling to Very Very Large Corpora for Natural Language Disambiguation"
"Scaling to Very Very Large Corpora for Natural Language Disambiguation"
Context Aware Smartphones
Qualcomm Context-Awareness Symposium Sets Research
Agenda for Context-Aware Smartphones
Agenda for Context-Aware Smartphones
Thursday, May 8, 2014
Data Science Books - Python, Data Analysis
Data Science Books
1. Data Analysis with Open Source Tools
A Hands-On Guide for Programmers and Data Scientists
2. Hello Python
Anthony Briggs
3. Mining the Social Web
Matthew Russell
4. Core Python Applications Programming
Wesley Chun
5. Python for Data Analysis - IPython, NumPy
Wes McKinney
6. Social network Analysis for Startups
Makesim Tsvetovat
Alexander Kouznetsov
1. Data Analysis with Open Source Tools
A Hands-On Guide for Programmers and Data Scientists
2. Hello Python
Anthony Briggs
3. Mining the Social Web
Matthew Russell
4. Core Python Applications Programming
Wesley Chun
5. Python for Data Analysis - IPython, NumPy
Wes McKinney
6. Social network Analysis for Startups
Makesim Tsvetovat
Alexander Kouznetsov
Data Mining Books
Data Mining Concepts and Techniques - Jiawei han and Micheline Kamber
Principles of Data Mining - by David Hand, Heikki Mannila and Padhraic Smyth
Journal of Statistical Software
Journal of Statistical Software
Principles of Data Mining - by David Hand, Heikki Mannila and Padhraic Smyth
Journal of Statistical Software
Journal of Statistical Software
Computer Vision using SimpleCV
A very useful book that uses Python libraries to perform Image operations.
SimpleCV
SimpleCV
Monday, May 5, 2014
Spark Summit 2013 - The State of Spark, and Where We're Going Next - Mat...
How Spark Started
How Spark is Growing
What are the components of Spark?
How Spark is Growing
What are the components of Spark?
Strata 2014: Matei Zaharia, "How Companies are Using Spark, and Where th...
Spark - Matei Zaharia
Spark 5 times faster than Hive on disk
Spark 18 times faster than Hive in Memory RAM
Spark 100 times faster than MapReduce
Spark Stack - Shark SQL,
Spark Streaming, MLlib machine Learning, GraphX
Hadoop - Batch Processing
Spark - Iterative Processing
Yarn - Resource Manager,
HDFS, HBase, etc.- Storage
120 lines in Scala, compared to 15K in C++
30 mins to run on 100 million Samples
Yahoo Ad Analytics - Hive on Spark - Shark
Storm - Streaming
Hadoop
Map Reduce - batch processsing
Impala - SQL processing in Big Data
Spark - Hive (SQL query) on top of Spark - Shark
Spark 5 times faster than Hive on disk
Spark 18 times faster than Hive in Memory RAM
Spark 100 times faster than MapReduce
Spark Stack - Shark SQL,
Spark Streaming, MLlib machine Learning, GraphX
Hadoop - Batch Processing
Spark - Iterative Processing
Yarn - Resource Manager,
HDFS, HBase, etc.- Storage
120 lines in Scala, compared to 15K in C++
30 mins to run on 100 million Samples
Yahoo Ad Analytics - Hive on Spark - Shark
Storm - Streaming
Hadoop
Map Reduce - batch processsing
Impala - SQL processing in Big Data
Spark - Hive (SQL query) on top of Spark - Shark
Friday, May 2, 2014
Thursday, May 1, 2014
Strata Conference 2014 - Table of Contents
Strata Conference 2014
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 1 47 minutes
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 2 42 minutes
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 3 49 minutes
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 4 35 minutes
IPython In Depth - Brian Granger and Fernando Prez - Part 1 1 hour 3 minutes
IPython In Depth - Brian Granger and Fernando Prez - Part 2 50 minutes
IPython In Depth - Brian Granger and Fernando Prez - Part 3 47 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 1 43 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 2 46 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 3 53 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 4 33 minutes
Design Thinking for Dummies (Data Scientists) - Michael Stringer, Dean Malmgren, and Laurie Skelly - Part 1 21 minutes
Design Thinking for Dummies (Data Scientists) - Michael Stringer, Dean Malmgren, and Laurie Skelly - Part 2 21 minutes
Design Thinking for Dummies (Data Scientists) - Michael Stringer, Dean Malmgren, and Laurie Skelly - Part 3 39 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 1 47 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 2 46 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 3 44 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 4 29 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 1 52 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 2 33 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 3 57 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 4 40 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 1 37 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 2 40 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 3 48 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 4 35 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 1 33 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 2 47 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 3 42 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 4 43 minutes
Effective Data Science With Scalding - Vitaly Gordon - Part 1 43 minutes
Effective Data Science With Scalding - Vitaly Gordon - Part 2 48 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 1 45 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 2 29 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 3 35 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 4 37 minutes
Adviser: Learning How to get A Second Opinion on Your Analysis when it's Important to get it Right - Leland Wilkinson - Part 1 45 minutes
Adviser: Learning How to get A Second Opinion on Your Analysis when it's Important to get it Right - Leland Wilkinson - Part 2 39 minutes
Adviser: Learning How to get A Second Opinion on Your Analysis when it's Important to get it Right - Leland Wilkinson - Part 3 1 hour 0 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 1 28 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 2 33 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 3 45 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 4 43 minutes
Data Transformation: Skills of the Agile Data Wrangler - Joe Hellerstein, and Jeffrey Heer - Part 1 44 minutes
Data Transformation: Skills of the Agile Data Wrangler - Joe Hellerstein, and Jeffrey Heer - Part 2 49 minutes
Hardcore Data Science
Hardcore Data Science Opening Remarks - Ben Lorica 2 minutes
Extreme Machine Learning - Alexander Gray 44 minutes
What the #@)*$ is Big Data? A Holistic View of Data and Algorithms - Alice Zheng 42 minutes
Overcoming the Barriers to Production-Ready Machine-Learning Workflows - Henrik Brink, and Joshua Bloom 25 minutes
Anomaly Detection - Ted Dunning 31 minutes
Neural Networks for Machine Perception - Ilya Sutskever 29 minutes
The Predictive Business - Kira Radinsky 37 minutes
Can We Make Big Data Management Easier? - Magda Balazinska 41 minutes
Design Challenges for Real Predictive Platforms - Max Gasner 31 minutes
Machine Learning Gremlins - Ben Hamner 30 minutes
Algebra for Scalable Analytics - Oscar Boykin 32 minutes
Data-Driven Business Day
Introduction to Data Driven Business Day - Alistair Croll 7 minutes
Those Numbers Wont Measure Themselves - Farrah Bostic 20 minutes
Social Data Intelligence: Integrating Social and Enterprise Data for Competitive Advantage - Susan Etlinger 18 minutes
Open Data: Its Not Just for Governments - Jen van der Meer 19 minutes
The Insight Economy - Krista Schnell 19 minutes
9 Levers for Converting Big Data and Analytics into Results - Christy Maver 11 minutes
Deploying a Data Sciences Team -- The Promise and the Pitfalls - Diane Chang 16 minutes
Sensing Best Practices - Ben Waber 22 minutes
Leveraging Value from Open Data Through Collaboration -Peter Pirnejad 17 minutes
Becoming a Learning Organization: From Data Teams to Corporate Influence - Pamela Peele 15 minutes
Making Big Data Small - Baron Schwartz 19 minutes
Big Data Meets Big Infrastructure: Going Underground in One Major European City - Narendra Mulani 11 minutes
The Era of Data-Powered Government - Beth Blauer 19 minutes
TripIt Uses Data to Organize Itineraries, No Matter Where You Book - Edith Harbaugh 11 minutes
Keynotes
Crossing the Chasm: What's New, What's Not - Geoffrey Moore 13 minutes
Evolution from Apache Hadoop to the Enterprise Data Hub - Amr Awadallah 5 minutes
Collecting Massive Data via Crowdsourcing - John Schitka 5 minutes
Empowering Personalized Learning with Big Data - Ramona Pierson 9 minutes
Hadoop in 5 Minutes or Less - John Schroeder 5 minutes
People are Data Too - Farrah Bostic 5 minutes
Bringing Big Data to One Billion People - Quentin Clark 10 minutes
Small Data in Sports: Little Differences that Mean Big Outcomes - David Epstein 9 minutes
The Art of Good Practice - Rodney Mullen 9 minutes
Big Data Moonshots and Ground Control - Joe Hellerstein and Tutti Taygerly 10 minutes
Data Science and Smart Systems: Creating the Digital Brain - Kaushik Das 10 minutes
How Companies are Using Spark, and Where the Edge in Big Data Will Be - Matei Zaharia 11 minutes
In-Hadoop Analytics: Bringing analytics to big data - Anjul Bhambhri 6 minutes
Record Linkage and Other Statistical Models for Quantifying Conflict Casualties in Syria - Megan Price 10 minutes
Ben Fry Keynote 9 minutes
Survivorship Bias and the Psychology of Luck - David McRaney 18 minutes
Sessions
Apache Hadoop and the Emergence of the Enterprise Data Hub - Eli Collins 39 minutes
Information Visualization for Large-Scale Data Workflows - Michael Conover 36 minutes
Adaptive Adversaries: Building Systems to Fight Fraud and Cyber Intruders - Ari Gesher 42 minutes
Fighting Global Cybercrime and BotNets using Big Data - Bryan Hurd and Herain Oberoi 38 minutes
Navigating the Big Data Vendor Landscape - Edd Dumbill 43 minutes
Best Practices for Hadoop In Production - Panel Discussion Facilitated by Forrester Analyst - Mike Gualtieri 38 minutes
Thorn in the Side of Big Data: Too Few Artists - Chris Re 39 minutes
10,000: The Most Dangerous Number in Sports - David Epstein 39 minutes
You're Halfway There: Moving from Insight to Action - Bob Filbin 40 minutes
Building the Next Generation Data Architecture with Hadoop, Data Warehouse & Data Discovery Platform - Bill Franks 36 minutes
Minority Report Meets Big Data: Touch and Interactive Big Data is Here - Justin Langseth, and Eva Andreasson 40 minutes
Machine Learning for Social Change - Fernand Pajot 30 minutes
Harness Data in Real-Time with Infinite Storage - Yuvaraj Athur Raghuvir 38 minutes
You Don't Need to Boil the Big Data Ocean with Hadoop - Ben Werther, and Sanjay Mathur 38 minutes
Predictive Modeling in the Cloud with Scikit-learn and IPython - Olivier Grisel 37 minutes
Mining Student Notes in Real Time to Provide Study Guides - Perry Samson 52 minutes
Thinking with Data - Max Shron 35 minutes
Building a Data-centered Data Center for Agile Development - Justin Makeig 43 minutes
Evolving Data Governance for the Big Data Enterprise - Scott Lee and Rachel Haines 41 minutes
Making Big Data Cost Effective in a Bare Metal Cloud - Harold Hannon 41 minutes
How Evernote Does Conversion Using Hadoop Analytics - Damon Cool 30 minutes
Crowdsourcing at Locu: How I Learned to Stop Worrying and Love the Crowd - Adam Marcus 24 minutes
Building a Lightweight Discovery Interface for Chinese Patents - Eric Pugh 40 minutes
Superconductor: Scaling Charts with Design and GPUs - Leo Meyerovich 22 minutes
Break Down Data Silos with Apache Accumulo - Adam Fuchs 21 minutes
Organizing Big Data with the Crowd - Lukas Biewald 14 minutes
Scalable PostgreSQL as your data platform - Ben Redman 33 minutes
Unlocking the Secrets of Gertrude Stein - Ian Timourian 41 minutes
A Different Look at Data and Security - Learning to Live with Fear - Pablos Holman 42 minutes
Stand Back, I'm Going To Try Science! - Rachel Poulsen and John Akred 20 minutes
Collaborative Advanced Analytics For Big Data - Bruno Aziza 39 minutes
Network Science Made Simple: SNA for Pie Chart Makers - Marc Smith 16 minutes
How Twitter Monitors Millions of Time-series - Yann Ramin 34 minutes
Harvard's Clean Energy Project: Big Data Maps To Renewable Energy - Kai Trepte 36 minutes
Working With Time Series Data Using Apache Cassandra - Patrick McFadin 15 minutes
Friending Graph Analytics: Large-Scale Graph Processing Made Easy - Ted Willke 21 minutes
Transforming Search Engine Marketing at Ask.com - Mohit Sati 41 minutes
Music Videos and Gastronomification for Big Data Analysis - Brian Abelson, and Thomas Levine 37 minutes
Soylent Mean: Data Science is Made of People - Cameran Hetrick and Kimberly Stedman 36 minutes
Big Data: Beyond Bare-Metal? - Mike Wendt 32 minutes
Secrets of Apache Hive Queries and UDFs - Shrikanth Shankar 42 minutes
Twitter and HP HAVEn: The Big Data Big Picture - Sanjay Goil 39 minutes
Data Science How to Build and Deploy a Team of Data Scientists - Diane Chang, Steven Hillion, Nick Kolegraff, and Matthew Gee 39 minutes
The Netflix Data Platform - A Recipe for High Business Impact - Kurt Brown 42 minutes
Bedtime Stories: Learning from Sleep Data - Monica Rogati 37 minutes
Tracking a Soccer Game with Big Data - Srinath Perera 36 minutes
Data Transformation: A User-Centric Approach to Accessing and Analyzing Big Data - Joe Hellerstein 38 minutes
Apache Hadoop 2.0: Migration from 1.0 to 2.0 - Vinod Kumar Vavilapalli 53 minutes
Getting a Handle on Hadoop and its Potential to Catalyze a New Information Architecture Model - Milan Vaclavik 42 minutes
The Sidekick Pattern: Using Small Data to Increase the Value of Big Data - Abe Gong 30 minutes
Exascale Data Analytics @ Facebook - Sambavi Muthukrishnan 44 minutes
Sending Millions of Surveys Around the World on Mobile Phones - Max Richman 40 minutes
Business Data Lake: An Evolution in Data Infrastructure - Jeffrey Kelly, Steven Hirsch, Steve Jones, and Sabrina Dahlgren 42 minutes
Expressing Yourself in R - Hadley Wickham 34 minutes
Data Journalism - Organized Crime and Corruption Reporting - Drew Sullivan 38 minutes
The Inflection Point - Hadoop and Big Data Analytics - Anjul Bhambhri 44 minutes
Spreadsheets: The Dark Matter of Big Data - Felienne Hermans 44 minutes
Scale-Invariant Intelligence - Vin Sharma 39 minutes
Probabilistic Programming: What, Why, How, and When - Beau Cronin 38 minutes
Beyond Hadoop MapReduce: Interactive Advertising Insights with Shark @ Yahoo! - Nandu Jayakumar and Tim Tully 41 minutes
Machine Learning for Machine Data - David Andrzejewski - Part 1 44 minutes
Machine Learning for Machine Data - David Andrzejewski - Part 2 44 minutes
Lessons from the Trenches: edo Interactive Leverages Hadoop to Build Customer Loyalty - Rob Rosen, and Tim Garnto 36 minutes
The IPython Notebook: Get Close to Your Data with Python and JavaScript - Brian Granger 45 minutes
Government Data on Both Sides of the Bridge - Moderated by: Jesse Robbins - Panelists: Shannon Spanhake and Eddie Tejeda 42 minutes
Enabling Business Transformation with Analytics over Real-time Streaming Data - Anand Venugopal, and Pranay Tonpay 35 minutes
The Next Wave of SQL-on-Hadoop: Building a Virtual EDW on Native Hadoop Data - Marcel Kornacker 47 minutes
How Comcast Turns Big Data into Real-Time Operational Insights - Patrick Shumate 42 minutes
Chicago Bars, Prisoners Dilemma, and Practical Models in Search -Chris Harland 38 minutes
Big Industrial Internet Data: Connecting and Optimizing at New Scales - Steven Gustafson and Parag Goradia - Part 1 34 minutes
Big Industrial Internet Data: Connecting and Optimizing at New Scales - Steven Gustafson, and Parag Goradia - Part 2 34 minutes
FAST and FURIOUS Big Data Analytics Meets Hadoop - Wayne Thompson, and Paul Kent 41 minutes
The Urgent Need to Appify Big Data - Ryan Cunningham 30 minutes
Unboxing Data Startups - Michael Abbott 38 minutes
Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop - Owen O'Malley, and Alan Gates 41 minutes
Querying Petabytes of Data in Seconds - Reynold Xin, and Sameer Agarwal 37 minutes
The Need for Speed & Scale: A Database for Real-Time Analytics - Eric Frenkiel 37 minutes
Graph All The Things! 11: Graph Database Use Cases That Aren't Social - Emil Eifrem 20 minutes
Graph Analysis with One Trillion Edges on Apache Giraph - Avery Ching 34 minutes
Big Data for Big Power: Smart Meters Smart Grid - Brett Sargent 36 minutes
The Last Mile: Challenges and Opportunities in Data Tools - Wes McKinney 18 minutes
Are We Data Scientists or Data Janitors? - Nenshad Bardoliwalla 39 minutes
Session with Ben Fry 36 minutes
Data for Good - Moderated by: Jake Porway - Panelists: Drew Conway, Rayid Ghani, and Elena Eneva 46 minutes
NonStop HBase - Making HBase Continuously Available for Enterprise Deployment - Jagane Sundar 35 minutes
Apache Mesos as an SDK for Building Distributed Frameworks - Paco Nathan 20 minutes
Agile Analytics - Neal Ford 19 minutes
Socializing Search. Professionally. - Sriram Sankar, and Daniel Tunkelang 39 minutes
Big Data for Better Data Centers - Krishna Raj Raja and Balaji Parimi 40 minutes
One Size Does Not Fit All: Analyzing Data at Scale with AWS - Rahul Pathak 19 minutes
Movie Reconstruction from Brain Signals: "Mind-Reading" - Bin Yu 19 minutes
Making Choices: What Kind of Relationship are You Seeking with Your Database? - J.R. Arredondo 35 minutes
StatusWolf: Creating Dashboards That Don't Suck Using Art and Engineering - Mark Troyer 32 minutes
Real-Time Analytics with NewSQL: Why Hadoop is not enough - Raj Bains 30 minutes
MLbase: Distributed Machine Learning Made Easy - Ameet Talwalkar and Evan Sparks 39 minutes
Real-time Analytics with Open Source Technologies - Fangjin Yang, and Gian Merlino 33 minutes
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 1 47 minutes
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 2 42 minutes
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 3 49 minutes
Introduction to Machine Learning with IPython and scikit-learn - Olivier Grisel - Part 4 35 minutes
IPython In Depth - Brian Granger and Fernando Prez - Part 1 1 hour 3 minutes
IPython In Depth - Brian Granger and Fernando Prez - Part 2 50 minutes
IPython In Depth - Brian Granger and Fernando Prez - Part 3 47 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 1 43 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 2 46 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 3 53 minutes
Building a Data Platform - John Akred, Richard Williamson, and Stephen O'Sullivan - Part 4 33 minutes
Design Thinking for Dummies (Data Scientists) - Michael Stringer, Dean Malmgren, and Laurie Skelly - Part 1 21 minutes
Design Thinking for Dummies (Data Scientists) - Michael Stringer, Dean Malmgren, and Laurie Skelly - Part 2 21 minutes
Design Thinking for Dummies (Data Scientists) - Michael Stringer, Dean Malmgren, and Laurie Skelly - Part 3 39 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 1 47 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 2 46 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 3 44 minutes
Dissecting Data Science Algorithms using Spreadsheets - John Foreman - Part 4 29 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 1 52 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 2 33 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 3 57 minutes
Introduction to Hadoop 2.0 - Rich Raposa - Part 4 40 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 1 37 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 2 40 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 3 48 minutes
Large-scale Machine Learning Cookbook using GraphLab - Carlos Guestrin - Part 4 35 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 1 33 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 2 47 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 3 42 minutes
From Scattered to Scatterplots: An Introduction to d3.js - Scott Murray - Part 4 43 minutes
Effective Data Science With Scalding - Vitaly Gordon - Part 1 43 minutes
Effective Data Science With Scalding - Vitaly Gordon - Part 2 48 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 1 45 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 2 29 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 3 35 minutes
Big Data Workflows on Mesos Clusters - Florian Leibert, Paco Nathan, and Benjamin Hindman - Part 4 37 minutes
Adviser: Learning How to get A Second Opinion on Your Analysis when it's Important to get it Right - Leland Wilkinson - Part 1 45 minutes
Adviser: Learning How to get A Second Opinion on Your Analysis when it's Important to get it Right - Leland Wilkinson - Part 2 39 minutes
Adviser: Learning How to get A Second Opinion on Your Analysis when it's Important to get it Right - Leland Wilkinson - Part 3 1 hour 0 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 1 28 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 2 33 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 3 45 minutes
Building Real-Time Apps with Apache HBase - Ronan Stokes - Part 4 43 minutes
Data Transformation: Skills of the Agile Data Wrangler - Joe Hellerstein, and Jeffrey Heer - Part 1 44 minutes
Data Transformation: Skills of the Agile Data Wrangler - Joe Hellerstein, and Jeffrey Heer - Part 2 49 minutes
Hardcore Data Science
Hardcore Data Science Opening Remarks - Ben Lorica 2 minutes
Extreme Machine Learning - Alexander Gray 44 minutes
What the #@)*$ is Big Data? A Holistic View of Data and Algorithms - Alice Zheng 42 minutes
Overcoming the Barriers to Production-Ready Machine-Learning Workflows - Henrik Brink, and Joshua Bloom 25 minutes
Anomaly Detection - Ted Dunning 31 minutes
Neural Networks for Machine Perception - Ilya Sutskever 29 minutes
The Predictive Business - Kira Radinsky 37 minutes
Can We Make Big Data Management Easier? - Magda Balazinska 41 minutes
Design Challenges for Real Predictive Platforms - Max Gasner 31 minutes
Machine Learning Gremlins - Ben Hamner 30 minutes
Algebra for Scalable Analytics - Oscar Boykin 32 minutes
Data-Driven Business Day
Introduction to Data Driven Business Day - Alistair Croll 7 minutes
Those Numbers Wont Measure Themselves - Farrah Bostic 20 minutes
Social Data Intelligence: Integrating Social and Enterprise Data for Competitive Advantage - Susan Etlinger 18 minutes
Open Data: Its Not Just for Governments - Jen van der Meer 19 minutes
The Insight Economy - Krista Schnell 19 minutes
9 Levers for Converting Big Data and Analytics into Results - Christy Maver 11 minutes
Deploying a Data Sciences Team -- The Promise and the Pitfalls - Diane Chang 16 minutes
Sensing Best Practices - Ben Waber 22 minutes
Leveraging Value from Open Data Through Collaboration -Peter Pirnejad 17 minutes
Becoming a Learning Organization: From Data Teams to Corporate Influence - Pamela Peele 15 minutes
Making Big Data Small - Baron Schwartz 19 minutes
Big Data Meets Big Infrastructure: Going Underground in One Major European City - Narendra Mulani 11 minutes
The Era of Data-Powered Government - Beth Blauer 19 minutes
TripIt Uses Data to Organize Itineraries, No Matter Where You Book - Edith Harbaugh 11 minutes
Keynotes
Crossing the Chasm: What's New, What's Not - Geoffrey Moore 13 minutes
Evolution from Apache Hadoop to the Enterprise Data Hub - Amr Awadallah 5 minutes
Collecting Massive Data via Crowdsourcing - John Schitka 5 minutes
Empowering Personalized Learning with Big Data - Ramona Pierson 9 minutes
Hadoop in 5 Minutes or Less - John Schroeder 5 minutes
People are Data Too - Farrah Bostic 5 minutes
Bringing Big Data to One Billion People - Quentin Clark 10 minutes
Small Data in Sports: Little Differences that Mean Big Outcomes - David Epstein 9 minutes
The Art of Good Practice - Rodney Mullen 9 minutes
Big Data Moonshots and Ground Control - Joe Hellerstein and Tutti Taygerly 10 minutes
Data Science and Smart Systems: Creating the Digital Brain - Kaushik Das 10 minutes
How Companies are Using Spark, and Where the Edge in Big Data Will Be - Matei Zaharia 11 minutes
In-Hadoop Analytics: Bringing analytics to big data - Anjul Bhambhri 6 minutes
Record Linkage and Other Statistical Models for Quantifying Conflict Casualties in Syria - Megan Price 10 minutes
Ben Fry Keynote 9 minutes
Survivorship Bias and the Psychology of Luck - David McRaney 18 minutes
Sessions
Apache Hadoop and the Emergence of the Enterprise Data Hub - Eli Collins 39 minutes
Information Visualization for Large-Scale Data Workflows - Michael Conover 36 minutes
Adaptive Adversaries: Building Systems to Fight Fraud and Cyber Intruders - Ari Gesher 42 minutes
Fighting Global Cybercrime and BotNets using Big Data - Bryan Hurd and Herain Oberoi 38 minutes
Navigating the Big Data Vendor Landscape - Edd Dumbill 43 minutes
Best Practices for Hadoop In Production - Panel Discussion Facilitated by Forrester Analyst - Mike Gualtieri 38 minutes
Thorn in the Side of Big Data: Too Few Artists - Chris Re 39 minutes
10,000: The Most Dangerous Number in Sports - David Epstein 39 minutes
You're Halfway There: Moving from Insight to Action - Bob Filbin 40 minutes
Building the Next Generation Data Architecture with Hadoop, Data Warehouse & Data Discovery Platform - Bill Franks 36 minutes
Minority Report Meets Big Data: Touch and Interactive Big Data is Here - Justin Langseth, and Eva Andreasson 40 minutes
Machine Learning for Social Change - Fernand Pajot 30 minutes
Harness Data in Real-Time with Infinite Storage - Yuvaraj Athur Raghuvir 38 minutes
You Don't Need to Boil the Big Data Ocean with Hadoop - Ben Werther, and Sanjay Mathur 38 minutes
Predictive Modeling in the Cloud with Scikit-learn and IPython - Olivier Grisel 37 minutes
Mining Student Notes in Real Time to Provide Study Guides - Perry Samson 52 minutes
Thinking with Data - Max Shron 35 minutes
Building a Data-centered Data Center for Agile Development - Justin Makeig 43 minutes
Evolving Data Governance for the Big Data Enterprise - Scott Lee and Rachel Haines 41 minutes
Making Big Data Cost Effective in a Bare Metal Cloud - Harold Hannon 41 minutes
How Evernote Does Conversion Using Hadoop Analytics - Damon Cool 30 minutes
Crowdsourcing at Locu: How I Learned to Stop Worrying and Love the Crowd - Adam Marcus 24 minutes
Building a Lightweight Discovery Interface for Chinese Patents - Eric Pugh 40 minutes
Superconductor: Scaling Charts with Design and GPUs - Leo Meyerovich 22 minutes
Break Down Data Silos with Apache Accumulo - Adam Fuchs 21 minutes
Organizing Big Data with the Crowd - Lukas Biewald 14 minutes
Scalable PostgreSQL as your data platform - Ben Redman 33 minutes
Unlocking the Secrets of Gertrude Stein - Ian Timourian 41 minutes
A Different Look at Data and Security - Learning to Live with Fear - Pablos Holman 42 minutes
Stand Back, I'm Going To Try Science! - Rachel Poulsen and John Akred 20 minutes
Collaborative Advanced Analytics For Big Data - Bruno Aziza 39 minutes
Network Science Made Simple: SNA for Pie Chart Makers - Marc Smith 16 minutes
How Twitter Monitors Millions of Time-series - Yann Ramin 34 minutes
Harvard's Clean Energy Project: Big Data Maps To Renewable Energy - Kai Trepte 36 minutes
Working With Time Series Data Using Apache Cassandra - Patrick McFadin 15 minutes
Friending Graph Analytics: Large-Scale Graph Processing Made Easy - Ted Willke 21 minutes
Transforming Search Engine Marketing at Ask.com - Mohit Sati 41 minutes
Music Videos and Gastronomification for Big Data Analysis - Brian Abelson, and Thomas Levine 37 minutes
Soylent Mean: Data Science is Made of People - Cameran Hetrick and Kimberly Stedman 36 minutes
Big Data: Beyond Bare-Metal? - Mike Wendt 32 minutes
Secrets of Apache Hive Queries and UDFs - Shrikanth Shankar 42 minutes
Twitter and HP HAVEn: The Big Data Big Picture - Sanjay Goil 39 minutes
Data Science How to Build and Deploy a Team of Data Scientists - Diane Chang, Steven Hillion, Nick Kolegraff, and Matthew Gee 39 minutes
The Netflix Data Platform - A Recipe for High Business Impact - Kurt Brown 42 minutes
Bedtime Stories: Learning from Sleep Data - Monica Rogati 37 minutes
Tracking a Soccer Game with Big Data - Srinath Perera 36 minutes
Data Transformation: A User-Centric Approach to Accessing and Analyzing Big Data - Joe Hellerstein 38 minutes
Apache Hadoop 2.0: Migration from 1.0 to 2.0 - Vinod Kumar Vavilapalli 53 minutes
Getting a Handle on Hadoop and its Potential to Catalyze a New Information Architecture Model - Milan Vaclavik 42 minutes
The Sidekick Pattern: Using Small Data to Increase the Value of Big Data - Abe Gong 30 minutes
Exascale Data Analytics @ Facebook - Sambavi Muthukrishnan 44 minutes
Sending Millions of Surveys Around the World on Mobile Phones - Max Richman 40 minutes
Business Data Lake: An Evolution in Data Infrastructure - Jeffrey Kelly, Steven Hirsch, Steve Jones, and Sabrina Dahlgren 42 minutes
Expressing Yourself in R - Hadley Wickham 34 minutes
Data Journalism - Organized Crime and Corruption Reporting - Drew Sullivan 38 minutes
The Inflection Point - Hadoop and Big Data Analytics - Anjul Bhambhri 44 minutes
Spreadsheets: The Dark Matter of Big Data - Felienne Hermans 44 minutes
Scale-Invariant Intelligence - Vin Sharma 39 minutes
Probabilistic Programming: What, Why, How, and When - Beau Cronin 38 minutes
Beyond Hadoop MapReduce: Interactive Advertising Insights with Shark @ Yahoo! - Nandu Jayakumar and Tim Tully 41 minutes
Machine Learning for Machine Data - David Andrzejewski - Part 1 44 minutes
Machine Learning for Machine Data - David Andrzejewski - Part 2 44 minutes
Lessons from the Trenches: edo Interactive Leverages Hadoop to Build Customer Loyalty - Rob Rosen, and Tim Garnto 36 minutes
The IPython Notebook: Get Close to Your Data with Python and JavaScript - Brian Granger 45 minutes
Government Data on Both Sides of the Bridge - Moderated by: Jesse Robbins - Panelists: Shannon Spanhake and Eddie Tejeda 42 minutes
Enabling Business Transformation with Analytics over Real-time Streaming Data - Anand Venugopal, and Pranay Tonpay 35 minutes
The Next Wave of SQL-on-Hadoop: Building a Virtual EDW on Native Hadoop Data - Marcel Kornacker 47 minutes
How Comcast Turns Big Data into Real-Time Operational Insights - Patrick Shumate 42 minutes
Chicago Bars, Prisoners Dilemma, and Practical Models in Search -Chris Harland 38 minutes
Big Industrial Internet Data: Connecting and Optimizing at New Scales - Steven Gustafson and Parag Goradia - Part 1 34 minutes
Big Industrial Internet Data: Connecting and Optimizing at New Scales - Steven Gustafson, and Parag Goradia - Part 2 34 minutes
FAST and FURIOUS Big Data Analytics Meets Hadoop - Wayne Thompson, and Paul Kent 41 minutes
The Urgent Need to Appify Big Data - Ryan Cunningham 30 minutes
Unboxing Data Startups - Michael Abbott 38 minutes
Apache Hive & Stinger: Petabyte Scale SQL, IN Hadoop - Owen O'Malley, and Alan Gates 41 minutes
Querying Petabytes of Data in Seconds - Reynold Xin, and Sameer Agarwal 37 minutes
The Need for Speed & Scale: A Database for Real-Time Analytics - Eric Frenkiel 37 minutes
Graph All The Things! 11: Graph Database Use Cases That Aren't Social - Emil Eifrem 20 minutes
Graph Analysis with One Trillion Edges on Apache Giraph - Avery Ching 34 minutes
Big Data for Big Power: Smart Meters Smart Grid - Brett Sargent 36 minutes
The Last Mile: Challenges and Opportunities in Data Tools - Wes McKinney 18 minutes
Are We Data Scientists or Data Janitors? - Nenshad Bardoliwalla 39 minutes
Session with Ben Fry 36 minutes
Data for Good - Moderated by: Jake Porway - Panelists: Drew Conway, Rayid Ghani, and Elena Eneva 46 minutes
NonStop HBase - Making HBase Continuously Available for Enterprise Deployment - Jagane Sundar 35 minutes
Apache Mesos as an SDK for Building Distributed Frameworks - Paco Nathan 20 minutes
Agile Analytics - Neal Ford 19 minutes
Socializing Search. Professionally. - Sriram Sankar, and Daniel Tunkelang 39 minutes
Big Data for Better Data Centers - Krishna Raj Raja and Balaji Parimi 40 minutes
One Size Does Not Fit All: Analyzing Data at Scale with AWS - Rahul Pathak 19 minutes
Movie Reconstruction from Brain Signals: "Mind-Reading" - Bin Yu 19 minutes
Making Choices: What Kind of Relationship are You Seeking with Your Database? - J.R. Arredondo 35 minutes
StatusWolf: Creating Dashboards That Don't Suck Using Art and Engineering - Mark Troyer 32 minutes
Real-Time Analytics with NewSQL: Why Hadoop is not enough - Raj Bains 30 minutes
MLbase: Distributed Machine Learning Made Easy - Ameet Talwalkar and Evan Sparks 39 minutes
Real-time Analytics with Open Source Technologies - Fangjin Yang, and Gian Merlino 33 minutes
Subscribe to:
Posts (Atom)