How to ace Machine Learning Interviews: My Personal Playbook

Resources that helped me to prepare for ML interview rounds

Jul 03, 2024

Getting ready for a machine learning interview at a big tech company or startup? It can feel overwhelming with all the different rounds, from testing your broad knowledge to diving deep into specific problems, system design, and coding challenges. When I first started, I felt completely lost and had to dig deep to piece together the right resources.

That’s why I’m writing this article – to share the resources that helped me, so you don’t have to struggle through the similar confusion and can confidently navigate your Machine Learning interviews. 🚀

Note: This article won't cover general software engineering interview rounds like DSA coding or specialized roles like Data Analyst, where the focus is on SQL or data-specific questions.

Let’s get started – but first, a quick disclaimer1.

The Four Types of Rounds:

📚 Machine Learning Breadth: This round tests your broad knowledge across various ML topics. I usually spend about 35% of my prep time here.
🔍 Machine Learning Depth: This round requires about 25% of my prep time, focusing on specialized topics and detailed case studies.
🛠️ Machine Learning System Design: This round evaluates your ability to design scalable ML systems. I allocate around 40% of my prep time here.
💻 Machine Learning Coding: In this round, you'll tackle coding challenges around basic algorithms. Since this round is generally rare, I dedicate about 0-10% of my prep time (shared with Breadth).

Note: This breakdown will vary depending on your experience level and background. Adapt your preparation strategy to best fit your needs.

📚 Machine Learning Breadth

This round assesses an engineer's understanding of ML fundamentals across various topics. It's often a rapid-fire session where interviewers may jump between different topics or might ask general questions within an area.

A common mistake is relying solely on online questionnaires. Interviewers often reframe questions, making it essential to understand concepts deeply. For example, I was once asked how to handle imbalanced classes in a user-click dataset (where one class, like user click, is much less common than the other class). I suggested "oversampling," which can distort the dataset distribution and affect the model's accuracy. Instead, with a large dataset, it's better not to change the data distribution.

Topics vary by company. For instance, self-driving car companies may ask computer vision fundamentals for junior or senior roles, while other companies might focus on general ML fundamentals.

Fundamental Topics (All Levels)

Supervised Learning: Classification (Logistic Regression, SVMs, Decision Trees), Regression (Linear and Ridge regression)
Unsupervised Learning: Clustering (k-means, hierarchical, DBSCAN), Dimensionality Reduction, Latent Semantic Analysis (LSA)
Ensemble Methods: Random Forest (Bagging), Gradient Boosting Machines
Deep Learning: Multi-layer Perceptron, CNNs
Model Evaluation Metrics: Classification vs Regression evaluation metrics, Bias vs Variance Tradeoff
Loss Functions: Different types of loss functions, Regularization, Overfitting vs Underfitting
Feature Selection and Importance: Techniques to identify important features like Correlation analysis, recursive feature elimination, Lasso
Statistics: P-values, r-square, regression analysis, Naive Bayes, distributions, maximum likelihood estimation, A/B testing

💡 Highly Recommended Resources
General ML:
Andrew Ng's Machine Learning Course
Towards Data Science: Machine Learning Basics
Deep Learning:
Udacity's Deep Learning Nanodegree
Coursera's Deep Learning Specialization
Statistics: Statistics for Machine Learning
"The Hundred-Page Machine Learning Book" by Andriy Burkov

Specialized Topics (Senior/Specialized Roles)

Time-Series and Sequential Data:
- RNNs, LSTMs, Seq2Seq models: Understanding architectures designed for handling time-dependent and sequential data.
💡 Resources: Good tutorials for RNNs and LSTMs and Seq2Seq model
Natural Language Processing (NLP):
- Word Embeddings: Techniques like Word2Vec, GloVe.
- Attention Mechanisms and Transformer Models
- LLM Fundamentals: Becoming increasingly more relevant.
💡 Great articles:
- A Dummy’s guide to Word2Vec
- Transformers and Attention Mechanisms

Computer Vision:
- Not an expert on CV. You can consider this course: Udacity Computer Vision Nanodegree.
Reinforcement Learning [Rare]:
- Relevant for robotics company roles
💡 Resources: Reinforcement Learning Overview, Tutorial

🚀 Next Steps

Once you understand the fundamentals, practice with sample interview questions, Some good resources for practicing sample questions:

🔍 Machine Learning Depth

This round is particularly interesting and the most open-ended of all interview types. It is generally aimed for above entry-level jobs with some experience. The goal of this interview is threefold:

Theoretical and practical understanding of a specific problem space
Designing a valid experiment and analyzing results
Communication skills

Expect interviewers to have a significant background knowledge in this area. While it's challenging to prepare specifically for this round, understanding the general directions it can take can be helpful. Here are some common directions this interview can take:

Past Experience: Discussing your academic or industrial projects and their shortcomings. Be prepared to discuss your projects in detail.
Theoretical Background: Delves into the theoretical aspects of a specialized field, especially if the role requires such specialization.
Some examples of specializations and their helpful resources:
- NLP Resource:
  - Advanced Techniques in NLP
  - NLP MetaBlog by
    Pratik Bhavsar
  - LLMs Overview by
    Maxime Labonne
- Computer Vision Resource:
  - Coursera: Computer Vision Basics
- Reinforcement Learning Resource:
  - Github: Reinforcement Learning Overview
- Learning to Rank Resource:
  - Towardsdatascience: Introduction to Ranking
- Time-Series Forecasting Resource:
  - Medium: A complete guide to time-series forecasting

🛠️ Machine Learning System Design

This round resembles a typical software engineering system design interview, applying similar principles. You are given a product space (e.g., designing a YouTube recommendation ML system) and asked to define the problem, outline the design process, and communicate your thoughts, including tradeoffs.

Interviewers look for your problem-solving approach, thought process, and high-level design skills. During the interview, you may be asked to dive deeper into specific components. Generally most ML systems can be broken down into 5 critical components. Here’s main focus of each component:

1️⃣ Problem Definition

Clearly define the problem and formulate your hypothesis.

2️⃣ Evaluation Metrics

Understand the difference between model performance metrics and business metrics.
- Model Metrics: Precision, Recall, MSE, etc.
- Business Metrics: Revenue, number of clicks, number of frauds detected, etc.

3️⃣ Feature and Data

Handle offline and real-time data streaming pipelines and preprocess raw data to create features.

4️⃣ Model Development

Training:
- Select appropriate models and discuss your rationale behind model choices. Provide details on the training process (one-time vs. recurrent vs. online training).
Evaluation:
- Choose evaluation datasets and effectively evaluate model performance.

5️⃣ Inference Service and Deployment

Develop a deployment strategy, including the choice of technologies and ensuring the scalability of the inference service. Discuss your approach for automated deployment, evaluation, and A/B testing.

💡 Highly Recommended Courses
Grokking the Machine Learning Interview
Machine Learning System Design

💻 Machine Learning Coding

I've only encountered a Machine Learning-specific coding round once. This round is uncommon but more frequently seen in startups. Unless it's explicitly mentioned, I don't typically prepare for it. Always confirm with your recruiter if there will be ML coding questions in any of the rounds. If not, focus your preparation on other sections. If there is a specific ML coding round, reallocate some time from ML breadth to practice coding for a few days. The strategy is simple: merge your ML coding preparation with the fundamentals in ML breadth, practicing coding for basic models.

💡 Helpful Resources
Github: 100 Days of ML Code
Kaggle: Great platform where people share their code. For example, House Prices Prediction using TFDF
AlgoExpert: For visual learners, check out AlgoExpert's ML Coding Questions. I tried it once and found it helpful.

Parting note 🌟

Preparing for machine learning interviews can be a challenging journey, but with the right resources and planning, you can navigate it confidently. Remember, every interview is a learning experience, bringing you one step closer to your dream job. Stay curious, keep learning, and don't hesitate to ask for clarifications from your recruiter. Good luck! 🚀

As a parting note, I want to share few more resources for quick references:

ML Cheatsheet for quick revisions: https://sites.google.com/view/datascience-cheat-sheets
ML Interview book: https://huyenchip.com/ml-interviews-book/ by Chip Huyen

🎉 Good Reads for the weekends

Please let me know if I missed anything in comments. If you like the article, please hit ❤️ button. Consider subscribing if you haven’t already.

Disclaimer: This blog is based on personal experiences and publicly available resources. Please note, the opinions expressed are solely my own and do not represent those of my past or current employers. Always refer to official resources and guidelines from hiring companies for the most accurate information.