Data Engineering Projects

Projects from Data Engineering program

Default alignedLeft aligned
https://ozdemirht.github.io/
Project (Apache Spark, Matplotlib, pyspark)
https://ozdemirht.github.io/
Project (Star Schema, Apache Spark, AWS S3, and Python)
This solution applies star schema model by
* extracting JSON data files on AWS S3 to DataFrames of Apache Spark (PySpark),
* transforming raw data to star schema by using Apache Spark, and
* loading these dataframes of star schema as Parquet files (column-oriented storage format) on AWS S3.
https://ozdemirht.github.io/
NoSQL Apache Cassandra
https://ozdemirht.github.io/
Project (Star Schema, AWS Redshift, AWS S3, and Python)
https://ozdemirht.github.io/
Project Neo4J (REST, SparkJava, Neo4J Graph DB, Cypher QL, Java)

Spark Java as Microservices framework to build REST APIs
API supports limit+offset pagination via Cypher Queries
Authentication is handled with Auth0 and JWT Tokens
Graph Data Model for Movie database (Title, Actors, Directors, etc.)
Queries are developed in Cypher Query Language
https://ozdemirht.github.io/
NoSQL Apache Cassandra

Problem: Sparkify wants to analyze the song played event data on songs and user activity.
Solution will use Apache Cassandra to support identified queries.
Event log data in CSV format => ETL => Tables in Apache Cassandra database.

Data Engineering

https://ozdemirht.github.io/
Project (Apache Spark, Matplotlib, pyspark)
https://ozdemirht.github.io/
Project (Star Schema, Apache Spark, AWS S3, and Python)

This solution applies star schema model by
  • extracting JSON data files on AWS S3 to DataFrames of Apache Spark (PySpark),
  • transforming raw data to star schema by using Apache Spark, and
  • loading these dataframes of star schema as Parquet files (column-oriented storage format) on AWS S3. </ul> </td> </tr>
https://ozdemirht.github.io/
NoSQL Apache Cassandra
https://ozdemirht.github.io/
Project (Star Schema, AWS Redshift, AWS S3, and Python)
https://ozdemirht.github.io/
Project Neo4J (REST, SparkJava, Neo4J Graph DB, Cypher QL, Java)
  • Spark Java as Microservices framework to build REST APIs
  • API supports limit+offset pagination via Cypher Queries
  • Authentication is handled with Auth0 and JWT Tokens
  • Graph Data Model for Movie database (Title, Actors, Directors, etc.)
  • Queries are developed in Cypher Query Language </ul> </td>
https://ozdemirht.github.io/
NoSQL Apache Cassandra

Problem: Sparkify wants to analyze the song played event data on songs and user activity.
Solution will use Apache Cassandra to support identified queries.
Event log data in CSV format => ETL => Tables in Apache Cassandra database.