Hien Luu is an engineering manager at LinkedIn and an instructor at UCSC Extension school. He is a machine learning & big data enthusiast and has extensive experience in building big data infrastructure and applications. One of his passions is teaching. He has been teaching at UCSC Extension school more than 10 years and his most recent course is Apache Spark.
Apache Spark has become one of the must-know big data technologies due to its speed, ease of use, and flexibility. With each newer version, Spark is even faster, provides more powerful new features to make it even easier than before to build intelligent and scalable data processing infrastructure and applications. This session will start with a quick introduction of Spark advanced features and then proceeds to demonstrate some of those advanced features through the analysis San Francisco Restaurant inspection data. The data analysis part will help answering the important question of whether San Francisco restaurants are clean. By attending this session, attendees will gain a good understanding of some of Spark’s advanced capabilities and see how Spark’s features make it easy to perform exploratory data analysis.