WORKING FILES
〰️
WORKING FILES 〰️
Apache Spark Training
〰️
Apache Spark Training 〰️
Step 1: Local Laptop Cluster Setup
Teach them Local laptops Cluster Setup… then ask them to do it on another laptop and try
Anaconda3-2022.10-Windows-x86_64
https://www.apache.org/dyn/closer.lua/spark/spark-3.3.1/spark-3.3.1-bin-hadoop2.tgz
https://github.com/steveloughran/winutils/archive/refs/heads/master.zip
Environment Variables (windows)
HADOOP_HOME = C:\SPARK\hadoopJAVA_HOME = C:\Program Files\Java\jdk1.8.0_202SCALA_HOME = C:\SPARK\scalaSPARK_HOME = C:\SPARK\sparkPYSPARK_PYTHON = C:\Users\user\anaconda3\python.exePATH Variables:%SPARK_HOME%\bin%HADOOP_HOME%\bin%SCALA_HOME%\bin%JAVA_HOME%\bin
Jupyter Environment Variables
PYSPARK_DRIVER_PYTHON = C:\Users\User\anaconda3\Scripts\jupyter.exePYSPARK_DRIVER_PYTHON_OPTS = notebookSTep 2: Local Spark Shell
Step 3: Spark cluster in Google Cloud
Step 4: Spark cluster in AWs Databricks
Step 5: Spark on colab
How To Start A Spark Session & Read in CSV frrom Website.ipynb
Datacamp Pyspark Cheatsheets
Step 6: Pyspark DataFrames
Dataframes with PySpark Example 1 by Dr Alvin.ipynb
Dataframes with PySpark Example 2 by Dr Alvin.ipynb
purchases.csv (this dataset has 4 millions rows.. u can’t open in Excel!)
DataFrames with PySpark Example 2 (EXERCISE) by Dr Alvin Ang.ipynb
Step 7: Showcase VAEX
VAEX for Crunching Big Data by Dr Alvin Ang.ipynb
Automated_Traffic_Volume_Counts.csv (this dataset is 3GB!!!)
Others
hierarchical-clustering-with-python-and-scikit-learn-shopping-data.csv
Structured_Streaming_with_PySpark.ipynb
KMEANS_PySpark_by_Dr_Alvin.ipynb
Hierarchical_Clustering_In_Spark_With_Bisecting_K_Means.ipynb
Normalizer,_Scaler,_Bucketizer_and_Binarizer_with_PySpark_by_Dr_Alvin_Ang.ipynb
https://www.linkedin.com/in/nandeshreddy/
Soya Kim - Big Data Healthcare
Spark Commands by Raj Bharat.txt
https://www.youtube.com/watch?v=lisIQ9ohU8g
https://www.datanami.com/2024/03/05/duckdb-walks-to-the-beat-of-its-own-analytics-drum/
Tensorflow Training
〰️
Tensorflow Training 〰️
https://starttechacademy.com/cnn-for-computer-vision-in-python/
Keras Datasets: https://keras.io/api/datasets/
CHEQUE - Digital Recognition by Deep Learning Techniques.pdf
halal food prediction paper using ANN.pdf
https://thehighestofthemountains.com/brainmaps.html
deep learning interview questions.pdf
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning
https://raw.githubusercontent.com/tertiarycourses/datasets/master/video_game_sales_training.csv
https://generated.photos/faces/asian-race
https://quickdraw.withgoogle.com/
https://neuralnetworksanddeeplearning.com/index.html
ANN
Dr. Alvin’s IBF Day 3 - ANN Regression.ipynb
Dr. Alvin’s IBF Day 3 - ANN Classification.ipynb
CNN
Understanding CNN with Dr. Alvin.ipynb
Dr. Alvin's IBF Day 4 CNN on MNIST Digits Dataset.ipynb
(Classifying Handwritten Digits with CNN)
Trained Classification Model for Fashion MNIST Dataset.h5
Predicting Images of Clothes using CNN by Dr Alvin Ang.ipynb
Predicting Images of Common Objects using CNN by Dr Alvin Ang.ipynb
https://setosa.io/ev/image-kernels/
RNN
Understanding LSTM Input and Output (RNN) by Dr. Alvin Ang.ipynb
IBF Day 4: Predicting DBS Stock Price using RNN by Dr Alvin Ang.ipynb
Datacamp DL Cheatsheets
Python Training
〰️
Python Training 〰️
List / Tuples / Dictionary / Sets
Intermediate
[Comprehension.i.for.i.in.range] + [Mounting.Google.Drive.File.IO] + [Object.Oriented.Programming.Class.Inheritance .Parents] + [SQLite.Databasing] + [Error.Handling.IndexError.IOError.TypeError.ValueError]Mounting and Creating Files in Google Drive with Colab.ipynb
Updating HR Records using Object Oriented Programming by Dr. Alvin Ang
https://pandas.pydata.org/docs/user_guide/visualization.html#visualization-barplot
https://python-tricks.com/matplotlib-introduction/
https://python-tricks.com/plotting-in-pandas/
https://www.tutorialspoint.com/how-to-change-the-text-color-of-font-in-the-legend-using-matplotlib
https://tonysyu.github.io/raw_content/matplotlib-style-gallery/gallery.html
https://www.digitalocean.com/community/tutorials?q=python&hits_per_page=12
Python PDFS
30 Python Libraries to Boost Your Data Science Productivity.pdf
Python Data Science Tips Full Archive by avi chawla. subs tack. com
Alvin's Answer for Plano's Assessment.pdf
How to learn Python like a Pro.pdf
https://www.hospitalmanagementasia.com/tech-innovation/harnessing-the-power-of-data-in-healthcare/ (Sean Singhealth)
https://developers.googleblog.com/en/data-science-agent-in-colab-with-gemini/
Python Cheatsheets
Datacamp - Importing_Data_Cheat_Sheet_ Python.pdf
Datacamp - Working_With_Text_Data_in_Python.pdf
Datacamp - Working_with_Dates_and_Times_Cheat_Sheet_ Python.pdf
Datacamp - Seaborn_Cheat_Sheet.pdf
Datacamp - Reshaping_data_with_Python.pdf
Datacamp - Python_Cheat_Sheet.pdf
Datacamp - Python_Basics_Cheat_Sheet.pdf
Datacamp - Pandas_Cheat_Sheet.pdf
Datacamp - Numpy_Cheat_Sheet.pdf
Datacamp - Matplotlib_Cheat_Sheet.pdf
Datacamp - Data Wrangling Cheat Sheet PYthon.pdf
Datacamp - Regular_Expressions_Cheat_Sheet.pdf
why python index start at 0.pdf
Data_Science_With_Python_Workflow by Business Science.io.pdf
PANDAS Training
slides for PANDAS wth Dr. Alvin.pdf
How to Join Append Concat Two Tables with Python by Dr Alvin Ang.ipynb
Restructuring CSV for Data Science.pdf
Statistics
[Descriptive.Stats] + [Seaborn.Visualization] + [Hypothesis.Testing.ANOVA] + [LR.MR.R2]Day 4 with Dr Alvin.ipynb (Statistics)
Python Visualization
Data Visualization with Python by Dr Alvin Ang.ipynb
Top 7 Python Libraries for Data Visualization.pdf
Practical Guide to Matplotlib.pdf
Python Specials
DataPrep + MissingNo by Dr Alvin Ang.ipynb
Ways to Display Json Formats Neatly in Python by Dr Alvin Ang.ipynb
How to Create Random Data with Python by Dr Alvin Ang.ipynb
https://nubela.co/proxycurl/linkdb.html (scrape linkedin)
https://github.com/xiaohk/stickyland
https://www.kdnuggets.com/2022/12/top-5-nlp-cheat-sheets-beginners-professional.html
https://docs.python.org/3/library/functions.html
https://docs.python.org/3/library/stdtypes.html
https://docs.python.org/3/index.html
https://docs.python.org/3/library/index.html
https://fortune.com/education/articles/using-python-for-data-science/
Data Cleansing and Wrangling
〰️
Data Cleansing and Wrangling 〰️
Steps in Data Wrangling and Cleansing
Converting JSON to CSV from Scikit Learn Datasets by Dr. Alvin Ang.ipynb
How to Do Train Test Splits with Python by Dr Alvin Ang.ipynb
Data Wrangling a Population of Countries’ Dataset
Population of Countries in 2000.csv
Data Wrangling a Population of Countries Dataset by Dr Alvin Ang.ipynb
Data Wrangling & Visualizing Healthcare Datasets
Hospital Admissions
hospital-admissions-by-sector-annual.csv
Data Cleansing a Hospital Admissions Dataset by Dr Alvin Ang.ipynb
Health Expenditure
government-health-expenditure.csv
Data Cleansing a Government Health Expenditure Dataset.ipynb
Long Term Care Facilities
number-of-residential-long-term-care-facilities-sector-breakdown.csv
Data Wrangling a Long Term Care Facilities Dataset by Dr Alvin Ang.ipynb
Data Cleansing a Rock Song Dataset
Data Cleansing a Rock Song Dataset with Python by Dr. Alvin Ang.ipynb
Data Wrangling Air Quality Datasets
Data Wrangling Air Quality Datasets with Python by Dr Alvin Ang.ipynb
Searching and Slicing a Video Games Dataset
Searching and Slicing a Video Games Dataset with Python by Dr Alvin Ang.ipynb
Wrangling Automobile Datasets
Slicing & Dicing a Motorcars Dataset (European + Japanese Cars)
Slicing & Dicing a Motorcars Dataset with Python by Dr Alvin Ang.ipynb
Factors Affecting Price of European and Japanese Cars
Feature Selection on Automobile Dataset with Python by Dr. Alvin Ang.ipynb
Dealing with Missing Data in European Cars Dataset
Data Cleansing a European Automobile Dataset with Python by Dr Alvin Ang.ipynb
Cross Validating a European and Japanese Cars Dataset
Cross Validating a European and Japanese Car Dataset by Dr Alvin Ang.ipynb
Python for Finance
https://www.kaggle.com/datasets/wordsforthewise/lending-club
https://drive.google.com/file/d/1VCaoIFxzpYgzCaIerj24EWx-wW87C440/view?usp=drive_link (alvin’s google drive FULL LendingClubLoan Dataset)
Feature Selection on Lending Club Loan Dataset by Dr Alvin Ang.ipynb
Data Cleansing the Lending Club Loan Dataset by Dr Alvin Ang.ipynb
Train Test Splitting the Lending Club Loan Dataset by Dr Alvin Ang.ipynb
https://wmi.edu.sg/dmp-online-singapore/ RAY DALIO
Hypothesis Testing and ANOVA with Python
Hypothesis Testing and ANOVA with Python by Dr Alvin Ang.ipynb
Machine Learning Training
〰️
Machine Learning Training 〰️
Confusion Matrix
https://mlu-explain.github.io/
ML guide with Code by Shivam Modi.pdf
ML Life Cycle by Shivam Modi.pdf
Quick Machine Learning in Python.pdf
ML DL AI Cheat sheet by NIKHIL YADAV.pdf
ML Cheatsheet.pdf
Machine Learning Infographics Cheatsheet.pdf
ML Cheat sheet by Business Science.io.pdf
the little book of deep learning
AI for Everyone notes by Andrew Ng
How to Load the Iris Dataset into Python by Dr. Alvin Ang.ipynb
Various Places to Get Datasets for Machine Learning by Dr Alvin Ang.ipynb
Various Ways of Train Test Splits with Python by Dr Alvin Ang.ipynb
https://machinelearningprojects.net/
https://thecleverprogrammer.com/2020/11/15/machine-learning-projects/
30 Python Libraries to Boost Your Data Science Productivity.pdf
https://terencelucasyap.com/predicting-singapore-pools-4d-lottery-winning-numbers-machine-learning/
Scalable Efficient Big Data Pipeline Architecture – Machine Learning for Developers
https://www.akkio.com/beginners-guide-to-machine-learning
ML Terminology - Chris Albon.zip
MLOps
MLOps for Dummies Databricks.pdf
https://www.dailydoseofds.com/mlops-crash-course-part-1/
Unsupervised Learning
Clustering
Overview of Clustering Methods.pdf
KMeans_using_Python_by_Dr_Alvin.ipynb
Hierarchical_Clustering_using_Python.ipynb
Clustering Cheatsheet by Business Science.io.pdf
PCA
Train Test Split
Supervised Learning
Linear / Multiple / Polynomial Regression
Simple Linear Regression with Statsmodel by Dr Alvin Ang.ipynb
Simple Linear Regression using SKLearn by Dr Alvin Ang.ipynb
Multiple Regression using Scikit Learn with Python by Dr Alvin Ang.ipynb (Advertising.csv)
Multiple Regression using Scikit Learn with Python (Part II) by Dr Alvin Ang.ipynb (AutomobileEDA.csv)
Polynomial Regression with Python by Dr Alvin Ang.ipynb
Support Vector Machine (SVM)
Understanding SVM using Python by Dr Alvin Ang.ipynb
Simple SVM Applied to Iris Dataset with Python by Dr Alvin Ang.ipynb
Grid, Random and Bayes Search - Hyperparameter Tuning on SVM with Python by Dr Alvin Ang.ipynb
Decision Tree / Random Forest
Decision Tree (Classification) on the Iris Flower Dataset using Python by Dr Alvin Ang.ipynb
Random Forest (Classification) on the Iris Flower Dataset using Python by Dr Alvin Ang.ipynb
Metrics, Normalization and Regularizations
Classification Metrics for ML Models by Dr Alvin Ang.ipynb
Bias / Variance
Understanding Bias vs Variance in Python by Dr. Alvin Ang.ipynb
L1 and L2 Regularization
L1 Lasso and L2 Ridge and Elastic Net Regression using Python by Dr Alvin Ang.ipynb
MinMax and Standard Scaler
Decision Tree (Classification) on the Iris Flower Dataset using Python by Dr Alvin Ang.ipynb
Datacamp ML Cheatsheets
ML4Trading
〰️
ML4Trading 〰️
Technical Analysis
Steps to Teach ML for Trading.txt
Steps for ML Trading by Dr. Alvin Ang
Facilitator Guide for Machine Learning 101 for Financial Trading.ipynb
Learning TA-Lib in Python by Dr. Alvin Ang.ipynb
How to Plot Candlestick Chart using Plotly by Dr. Alvin Ang.ipynb
pandas_ta full list of technical indicators as of 2024
MQL5 Programming for Traders.pdf
CFTE
https://algoventure.first-4.com/ai-algo-3267-2099-9273
https://www.ntuclearninghub.com/en-gb/-/course/algorithmic-trading-essentials
https://eoddata.com/stocklist/SGX.htm (ticker symbol)
https://algorithmictrading.substack.com/
https://www.priceactionlab.com/Blog/price-action-lab-software/
https://wire.insiderfinance.io/
https://www.gurufocus.com/guru/warren%2Bbuffett/summary
https://www.benzinga.com/apis/
https://www.youtube.com/@Algovibes
https://greyhoundanalytics.com/
https://www.ssga.com/sg/en/individual
https://singaporeanstocksinvestor.blogspot.com/ (AK)
https://www.dymonasia.com/career/
https://www.tower-research.com/
https://eodhd.medium.com/trading-predictions-using-ai-and-python-cdaad4de3447
https://github.com/suparjotamin/stockie
https://medium.com/trading-data-analysis/metatrader5-python-trading-bot-230bd19285e9
R Training
〰️
R Training 〰️
Files
https://r4ds.had.co.nz/index.html
https://www.tidytextmining.com/
Datacamp R Cheatsheets
Datacamp - Working_With_Text_Data_in_R.pdf
Datacamp - Working_with_Dates_and_Time_in_R.pdf
Datacamp - Reshaping_data_with_tidyR_in_R.pdf
Datacamp - ggplot2_cheat_sheet.pdf
Datacamp - data table cheat sheet_R.pdf
Datacamp - Manipulating_Data_in_dplyr_Cheat_Sheet.pdf
Datacamp - Tidyverse_Cheat_Sheet.pdf
Data_Science_With_R_Workflow by Business Science.io.pdf
R Sites
https://togaware.com/projects/rattle/index.html
https://universeofdatascience.com/
https://www.r-bloggers.com/2022/06/the-most-overlooked-r-package-that-can-get-you-through-a-data-science-job-interview/
https://online.stat.psu.edu/statprogram/tutorials/statistical-software/r
https://biostat.app.vumc.org/wiki/Main/RS
https://tuos-bio-data-skills.github.io/intro-stats-book/
https://cran.r-project.org/web/packages/available_packages_by_name.html
https://posit.co/resources/cheatsheets/
https://yihui.shinyapps.io/formatR/
https://www.kaggle.com/code/rtatman/data-cleaning-challenge-json-txt-and-xls
https://education.rstudio.com/learn/
https://www.business-science.io/finance/2020/02/26/r-for-excel-users.html
https://www.rdocumentation.org/
Text Mining with R
Text Mining with R by Dr. Alvin Ang.R
Data Wrangling with R
Data Wrangling with Tidyverse by Dr Alvin Ang.R
Data Wrangling with Core R by Dr Alvin Ang.R
Data Visualization with R
Data Visualisation with BASIC R by Dr. Alvin Ang.R
Data Visualisation with GGPLOT R by Dr. Alvin Ang.R
Regression with R
Simple Linear Regression using R by Dr. Alvin Ang.R
Multiple Regression using R by Dr Alvin Ang.R
Statistics with R
Statistics with Tidyverse by Dr Alvin Ang.R
ML with R
https://lgatto.github.io/IntroMachineLearningWithR/index.html
https://matthewrenze.com/workshops/practical-machine-learning-with-r/
Tableau Training
〰️
Tableau Training 〰️
1st Project
Extras
Tableau Desktop Specialist
Storytelling
Change Over Time:
https://public.tableau.com/app/profile/ben.jones/viz/WorldPopulationDay/1_ChangeOverTime
https://public.tableau.com/app/profile/andy.kriebel/viz/EPLInjuries/InjuryCrisis
Drill Down:
https://public.tableau.com/views/EarthquakesOnTheRise-Full/Earthquakestory
https://public.tableau.com/app/profile/mac.bryla/viz/TellmeaboutWill/TellmeaboutWill
Zoom Out:
https://public.tableau.com/app/profile/halftimeheroes/viz/OlympicGamesStories-ZoomOut/ZoomOut
Contrast:
https://public.tableau.com/app/profile/robertrouse/viz/Pyramids_1/EgyptianPyramids
Outliers
Datacamp Tableau Cheatsheets
Tableau Data Analyst
SQL Training
〰️
SQL Training 〰️
https://www.dbta.com/Columns/SQL-Server-Drill-Down/
https://blog.devops.dev/sql-analysis-of-netflix-dataset-808e870e5bd6
https://github.com/pawelsalawa/sqlitestudio/releases
https://www.databasejournal.com/
For those with problems installing….
https://alpha.sqliteviewer.app/
https://www.draxlr.com/tools/sql-formatter/
Oracle SQL
https://www.oracle.com/database/technologies/xe-downloads.html
https://www.oracle.com/database/sqldeveloper/
How to Speed Up SQL Queries.pdf
How to use SQL to Track User Retention.pdf
〰️
Power BI Training
〰️ Power BI Training
Datacamp Power BI Cheatsheets
Resource Allocation with Excel Solver
〰️
Resource Allocation with Excel Solver 〰️
Statistics Training
〰️
Statistics Training 〰️
Degree of Freedom
Statistics Cheatsheets
15 Data Fallacies
Excel Training
〰️
Excel Training 〰️
Data Analytics with Excel Course Activities
Data Analytics with Excel Course
Datacamp Excel Cheatsheets
Power Query / Power Pivot
〰️
Power Query / Power Pivot 〰️
Dashboard with Excel
〰️
Dashboard with Excel 〰️
Design of Experiments (DOE)
〰️
Design of Experiments (DOE) 〰️
Flexsim
〰️
Flexsim 〰️
WEKA
〰️
WEKA 〰️
GOOGLE WORKSPACE
〰️
GOOGLE WORKSPACE 〰️
Google Sheets
〰️
Google Sheets 〰️
Looker Studio
〰️
Looker Studio 〰️
Tech with Tim's Facial Recognition Project (on local laptop)
〰️
Tech with Tim's Facial Recognition Project (on local laptop) 〰️
Data Quality Training
〰️
Data Quality Training 〰️
Bodies
Open Refine
https://datacarpentry.github.io/OpenRefine-ecology-lesson/aio.html
https://web.archive.org/web/20190105063215/http://enipedia.tudelft.nl/wiki/OpenRefine_Tutorial
Tools
Datasets
Blogs
https://atlan.com/data-governance-framework/?ref=/open-source-data-governance-tools/
https://atlan.com/master-data-management-vs-metadata-management/
https://www.gartner.com/smarterwithgartner/how-to-improve-your-data-quality
https://tdan.com/category/data-topics/data-quality-articles-blogs-education
Singapore Public Service
Slides and Whitepapers
EY - Becoming an Analytics Organization.pdf
Creating an Enterprise Data Strategy - Beye Network.pdf
A Definitive Guide to Data Governance - Trillium Software.pdf
TADA Data Quality Concepts.pdf
Introduction to Data Governance and Stewardship - Salesforce.pdf
5 Levels of Master Data Management Maturity - Baseline Consulting.pdf
Dataprep - Acclerate Data for AI.pdf
ISO Data Quality 8000-1 (partial).pdf
ISO-8000-61-2016 (partial).pdf
A Product Perspective on Total Data Quality Management - Richard Wang.pdf
Datacamp Data Quality Cheatsheets
Datacamp - Cheat Sheet DS For Business Leaders.pdf
Datacamp - Data Quality Dimensions.pdf