Session Details

A Survey of Machine Learning Techniques Using 1.5.0  

Level :
Date :
10:45 AM Sunday
Room :
Interested : (219) - Registered : (-)


Recent releases of Spark machine learning libraries have shifted focus from the individual algorithms approach of the spark.mllib package to the data-driven pipelines approach of We will look at how to structure ML processes of data loading, modeling, predictions, and results analysis and distribution using the latest api's.

Note: this year's session will focus only on the scala API's.

We will touch on one or more of the algorithms in the following areas:

  • Dimensionality Reduction / Feature extraction
  • Clustering
  • Classification and Regression
Depending on time available we may also touch on the following topics:
  • Statistical tools
  • Data generation and randomization
  • Evaluators

The Speaker(s)


Stephen Boesch

I am a developer focusing on scalable apps for data pipelines and machine learning on Hadoop and Spark infrastructures. My background is in Java/Oracle/ETL from 1996 until 2011, at which point I started to focus on Hadoop, Spark and Scala. My work has been at a mix of the familiar large Internet/Systems company names and startups.
  • Not Interested
  • Interested