Speakers: Stephen Boesch
Recent releases of Spark machine learning libraries have shifted focus from the individual algorithms approach of the spark.mllib package to the data-driven pipelines approach of spark.ml. We will look at how to structure ML processes of data loading, modeling, predictions, and results analysis and distribution using the latest spark.ml api's.
Note: this year's session will focus only on the scala API's.
We will touch on one or more of the algorithms in the following areas:
- Dimensionality Reduction / Feature extraction
- Classification and Regression
- Statistical tools
- Data generation and randomization
- Not Interested