Data Science Projects
Dimension Reduction | Clustering |
![]() Classification methods (Logistic Regression, Decision Tree, Random Forest) after applying PCA wine quality classification Data Set: Wine Quality Dataset Applied PCA for dimension reduction on input vectors Algorithms: Logistic Regression, Decision Tree, Random Forest (using Scikit-learn ) Compared performance of algorithms | ![]() K-Means (using Scikit-learn ) Discover classes in a given data set Data Set: Wine Quality Dataset Algorithm: K-Means clustering (using Scikit-learn) |
Regression | Classification |
![]() Regression methods (Linear, Ridge, Lasso, ElasticNet, Huber) Predict price of a house Data Set: House Sales in King County, USA Algorithms: Linear, Ridge, Lasso, ElasticNet, Huber (using Scikit-learn) | ![]() DNNClassifier, LinearClassifier (using TensorFlow) Classifiy wine quality based on given measurements Data set: Wine Quality Dataset Classification Algorithm: DNNClassifier (multi-layer neural network) Compare performans of classification algorithms |
Visualization (Exploratory Data Analysis - EDA) | |
Matplotlib | Seaborn |
![]() Matplotlib Example | ![]() Seaborn Example |
Convolutional-NN | Recurrent-NN |
Autoencoders | Transformers |
Reinforcement Learning | GAN |
# Data Science and Machine Learning Notebooks
- Visualization for Exploratory Data Analysis
- Matplotlib notebook (scatter plot, histogram/kde, box plot)
- Lightning - http://lightning-viz.org/
- Bokeh - https://bokeh.pydata.org/
- Data cleaning
- Notebook
- Dimension reduction
- Notebook for PCA, LogisticRegression, DecisionTree, RandomForest (from Scikit-learn )
- Data Set: Wine Quality
- Problem: Predict quality of wine.
- Approach: Used classification methods (Logistic Regression, Decision Tree, Random Forest) after applying PCA.
- Notebook for PCA, LDA, Autoencoder (from Scikit-learn )
- Data Set: Human Activity Recognition
- Problem: Predict activity from observations.
- Approach:
- Notebook for PCA, LogisticRegression, DecisionTree, RandomForest (from Scikit-learn )
Clustering
- Notebook for K-Means (from Scikit-learn )
- Data Set: Wine Quality
- Problem: Cluster data for two quality categories.
- Approach: Used K means algorithm to cluster.
- Notebook for K-Means (from Scikit-learn )
Regression
- Notebook for LinearRegression, Ridge, Lasso, ElasticNet, ElasticNet CrossValidation, HuberRegressor (from Scikit-learn )
- Data Set: House Sales in King County, USA
- Problem: Predict price of a house.
- Approach: Used regression methods (Linear, Ridge, Lasso, ElasticNet, Huber).
- TensorFlow.LinearRegressor
- TensorFlow.DNNRegressor
- Notebook for LinearRegression, Ridge, Lasso, ElasticNet, ElasticNet CrossValidation, HuberRegressor (from Scikit-learn )
Classification
- Notebook for DNNClassifier, LinearClassifier (from TensorFlow)
- Data Set: Wine Quality
- Problem: Predict quality of wine.
- Approach: Used classification methods (DNNClassifier, LinearClassifier).
- SVM
- Notebook for DNNClassifier, LinearClassifier (from TensorFlow)
- Text Analysis and NLP
- Word2Vec
- Reinforcement Learning