Big Data using Apache Spark with Scala and AWS
About This course offers you hands-on knowledge to create Apache Spark applications using Scala programming language in a completely case study based approach. Apache Spark is a fast and general-purpose distributed computing system. It provides high-level APIs in Scala, Java, Python and R, and an optimized engine that supports general execution graphs (DAG). It also supports a rich set of higherlevel APIs and tools including DataFrame for Structured data processing using Object Oriented Programming and SQL, Structered Streaming for realtime stream processing, MLlib for machine learning and GraphX for graph processing. Note: This is not an introductory or theory based course, though we'll be discussing about the basic comparison among Hadoop, Spark and Flink. We'll also be having some POCs around Hive, Pig, Presto, AWS Athena and Redshift Spectrum. Learning Outcomes. By the end of this course, You will be able to identify the type of data (structured, semi-structu...