Hey, Data Divers!
Welcome to the inaugural issue of Data Dive Diaries. Firstly, I'd like to express my heartfelt gratitude for your support. Your encouragement means the world to me.
If you're unsure about what to expect, who I am, my motivations, or why you should subscribe, please visit the About page.
There, you'll find detailed explanations that cover everything.
With that said let’s dive in 👇🏼
What I’ve been working on 🛠
I’m currently swiftly working my way through the Udemy course:
“Machine Learning A-Z™: AI, Python & R + ChatGPT Bonus [2023]”
I have found this to be an incredible introduction to data science and building machine learning models using Python in Google Colab.
K-Means Clustering
I wrote a Twitter thread about this method of unsupervised learning (in the form of clustering). So make sure to check that out.. I got some really nice visualisations of the clusters
The Kernel Trick (RBF)
One of the more complex topics I’ve been learning recently is applying the Kernel trick to non-linearly separable data.
This allows us to work with datapoints being projected from a 2D to 3D space (theoretically- to achieve linear separability) and then project them back on to a 2D plot.
I have talked further about this in the Data Science Demystified section of this newsletter further down.
“The Complete Python Bootcamp From Zero to Hero in Python”
With how much I’ve been raving out the machine learning course on Udemy, I thought I’d stick on the platform to start my journey learning Python.
I have only completed a couple of hours of setting up/ familiarisation of Python and some foundational stuff so far, so I’m going to wait until next week to share my learnings.
What’s Next?
Next time we touch base I’ll be considerably further through both courses in Machine Learning and Python.
I hope to start participating in some coding and data science challenges, and sharing my learning, difficulties, and triumphs with you next week.
In the meantime follow me @_AdamLowther for daily updates on my journey.
Data Digest: Insights 🚀
As you know I’m currently working my way through the A-Z Machine Learning course on Udemy led by Data Scientist Kirill Eremenko (@kirill_eremenko)
As part of the Regression and Classification modules I learned about Random Decision Forests.
What are Decision Forests?
This is an ensemble learning method (multiple models combining to produce a more powerful model that combines multiple decision trees to make predictions).
Each decision tree in the random forest is built independently using a random subset of the training data and a random subset of the features.
Random forests have gained popularity recently due to their ability to handle complex data, handle high-dimensional feature spaces, whilst providing good performance.
Real life application of this method:
“Real-Time Human Pose Recognition in Parts from Single Depth Images”
(PDF at bottom)
The paper introduces a method that accurately estimates the positions of body joints in 3D from single depth images without relying on temporal information (in real time).
The approach uses body part recognition as a way to understand human poses. By training advanced models on a diverse set of synthetic data, the method is able to handle different poses and body shapes without making mistakes.
The final 3D joint positions are determined by finding the most confident predictions using a special technique.
The results show that the method performs well on both real and synthetic data, and there is a strong connection between the intermediate body part classification and the accuracy of the final joint positions.
Emphasis is placed on the importance in considering different parts of the body separately.
My thoughts 🤔
I found this quite a challenging read and there was a lot of terminology I wasn’t familiar with, but I persisted and got through it..
I think it’s just so exciting to see how these methods of data science are used for real world applications!
Check out the paper for yourself→
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/BodyPartRecognition.pdf
I’d be interested to hear your thoughts, DM me on Twitter @_AdamLowther or reply to this Email.
Python Pro-Tips 🐍
Utilize Python libraries and packages 📦
Python has a vast ecosystem of libraries and packages that can save you a lot of time and effort.
Familiarize yourself with commonly used libraries like NumPy for numerical computations, Pandas for data manipulation, and Matplotlib for data visualization.
I have found the Machine Learning course great for this. I’m already navigating libraries such as scikit-learn using the API menu to find what I’m looking for.
Initially it was intimidating, but when you actually take the time to read the different modules, and the associated parameters it starts making more sense.
If in doubt I’ve found Stack Overflow to be useful
Comment your code 🗣
Only today did I discover you can do this. Adding comments to your code can help clarify its purpose and functionality.
At the stage I’m at (complete beginner) I haven’t found the need to comment on code, but I imagine when writing complex or non-obvious code, it will help you understood the logic and intent behind your implementation.
I imagine this especially relevant when working on a collaborative project.
Data Science Demystified 🪄
‘The Kernel Trick’ (RBF)
The Kernel Trick is a technique used in machine learning, particularly in support vector machines (SVMs), to efficiently perform calculations in higher-dimensional feature spaces without explicitly transforming the data into that space. It is a way to implicitly capture complex nonlinear relationships between data points.
We do have the option of mapping the non-linearly separable datapoints into a higher dimensionality, I.e. 2D space to 3D, but this is a computationally demanding process.
This is where the Kernel Trick helps us.
A common type is the Gaussian RBF Kernel.
This Kernel allows us to model nonlinear decision boundaries, by implicitly mapping the data (2D) into a high-dimensional space (3D) where linear separation is possible.
Essentially, the circumference of the RBF is projected from the 3D back onto our 2D plot.
From here we can assign a value above 0 to all datapoints in that circumference and a value of 0 (or close to 0) to the datapoints outside.
I have attached some of my (terrible) doodles below to help you visualise this 👇🏼
Community Spotlight 🌈
I think it’s important to recognise the milestones and accomplishments of others in the data science/coding community.
And so every week, I’ll be giving someone from the community a shoutout.
This week I want to shout out my friend Raj (@RajAyoosh).
Myself and Raj, connected through a Twitter group and I got off
a call with him the other day. He’s very ambitious and we are both starting out in our coding journeys at the same time- so go check him out!
Want to be featured in next week’s newsletter?
DM me @_AdamLowther
That’s all for this week,
Thanks for reading, and see you next week!
Adam x