Project (Apache Spark, Matplotlib, pyspark) | Project (Star Schema, Apache Spark, AWS S3, and Python) This solution applies star schema model by * extracting JSON data files on AWS S3 to DataFrames of Apache Spark (PySpark), * transforming raw data to star schema by using Apache Spark, and * loading these dataframes of star schema as Parquet files (column-oriented storage format) on AWS S3. |
NoSQL Apache Cassandra |  Project (Star Schema, AWS Redshift, AWS S3, and Python) |
Project Neo4J (REST, SparkJava, Neo4J Graph DB, Cypher QL, Java) Spark Java as Microservices framework to build REST APIs API supports limit+offset pagination via Cypher Queries Authentication is handled with Auth0 and JWT Tokens Graph Data Model for Movie database (Title, Actors, Directors, etc.) Queries are developed in Cypher Query Language | NoSQL Apache Cassandra Problem: Sparkify wants to analyze the song played event data on songs and user activity. Solution will use Apache Cassandra to support identified queries. Event log data in CSV format => ETL => Tables in Apache Cassandra database. |