AI and Machine Learning Fundamentals

AI and Machine Learning (ML) Fundamentals

Understanding the fundamental concepts of AI and Machine Learning is essential for the AIF-C01 exam. This section covers the core theory, types of learning, neural network architectures, and the ML project lifecycle.

Machine Learning Types

Supervised Learning

What: The model learns from labeled data — each training example has an input and the correct output (label)
Goal: Learn a mapping from inputs to outputs that can generalize to new, unseen data
Data Required: Labeled dataset (input-output pairs)
Types:
- Classification: Predict a discrete category (spam/not spam, fraud/not fraud)
- Regression: Predict a continuous numerical value (price, temperature)
Examples:
- Email spam detection (label: spam/not spam)
- House price prediction (label: price)
- Image classification (label: cat/dog)
- Sentiment analysis (label: positive/negative/neutral)
Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, Neural Networks

Exam Tip: Supervised learning = labeled data. If the question mentions training data with known correct answers/labels, it's supervised learning.

Unsupervised Learning

What: The model learns from unlabeled data — it discovers hidden patterns and structures without predefined labels
Goal: Find underlying patterns, groupings, or anomalies in data
Data Required: Unlabeled dataset (no correct answers provided)
Types:
- Clustering: Group similar data points together (K-means, DBSCAN)
- Dimensionality Reduction: Reduce the number of features while preserving information (PCA, t-SNE)
- Association: Discover relationships between variables (market basket analysis)
- Anomaly Detection: Identify unusual patterns or outliers
Examples:
- Customer segmentation (group customers by behavior)
- Anomaly detection (detect unusual network traffic)
- Market basket analysis ("customers who bought X also bought Y")
- Topic modeling (discover topics in document collection)

Exam Tip: Unsupervised learning = no labels. If the question mentions finding patterns in data without predefined categories, it's unsupervised learning.

Reinforcement Learning

What: An agent learns by interacting with an environment, receiving rewards or penalties for its actions
Goal: Learn a policy (strategy) that maximizes cumulative reward over time
Key Concepts:
- Agent: The learner/decision-maker
- Environment: The world the agent interacts with
- State: Current situation of the agent
- Action: What the agent can do
- Reward: Feedback signal (positive or negative)
- Policy: Strategy mapping states to actions
Examples:
- Game playing (AlphaGo, Chess)
- Robotics (learning to walk, grasp objects)
- Autonomous driving
- Recommendation systems
- RLHF (training language models with human feedback)
AWS Service: Amazon SageMaker supports RL; also used in AWS DeepRacer

Exam Tip: Reinforcement learning = agent + rewards + environment. If the question mentions learning from interaction, trial-and-error, or reward optimization, it's reinforcement learning. Remember RLHF uses RL.

Deep Learning

What: A subset of machine learning that uses neural networks with multiple layers (deep neural networks) to learn complex patterns
Key Characteristics:
- Automatically learns feature representations from raw data (no manual feature engineering)
- Requires large amounts of data and compute power
- Excels at unstructured data (images, text, audio, video)
- Powers most modern AI applications
Relationship: AI → Machine Learning → Deep Learning → Foundation Models
Examples: Image recognition, natural language processing, speech recognition, generative AI
Hardware: Typically requires GPUs or TPUs for training

Neural Networks

Basic Neural Network

Neurons: Basic units that receive inputs, apply weights, sum them, apply an activation function, and produce an output
Layers:
- Input Layer: Receives the raw data
- Hidden Layers: Process and transform the data (the "learning" happens here)
- Output Layer: Produces the final prediction
Training: Uses backpropagation to adjust weights and minimize error (loss function)
Activation Functions: ReLU, Sigmoid, Tanh, Softmax

Convolutional Neural Networks (CNN)

What: Neural networks specialized for processing grid-like data (images, video)
Key Components:
- Convolutional Layers: Apply filters to detect features (edges, textures, shapes)
- Pooling Layers: Reduce spatial dimensions while preserving important features
- Fully Connected Layers: Classification at the end
Use Cases:
- Image classification (is this a cat or dog?)
- Object detection (where are the objects in this image?)
- Facial recognition
- Medical image analysis (X-rays, MRIs)
- Autonomous driving (scene understanding)

Exam Tip: CNN = images and spatial data. If the question involves image recognition, object detection, or visual data processing, CNNs are the answer.

Recurrent Neural Networks (RNN)

What: Neural networks designed for sequential data where order matters
Key Characteristics:
- Has memory — can remember information from previous time steps
- Processes data sequentially (one element at a time)
- Can handle variable-length input sequences
Variants:
- LSTM (Long Short-Term Memory): Solves the vanishing gradient problem; better at remembering long-term dependencies
- GRU (Gated Recurrent Unit): Simplified version of LSTM
Use Cases:
- Time series forecasting (stock prices, weather)
- Speech recognition
- Language translation
- Text generation
- Music generation
Limitation: Sequential processing is slow (cannot parallelize)

Exam Tip: RNN/LSTM = sequential data (text, time series, speech). If the question involves data with temporal ordering, think RNN.

Transformers

What: Neural network architecture that uses self-attention to process all elements of a sequence simultaneously (in parallel)
Key Innovation: Self-attention mechanism that allows every element in a sequence to attend to every other element directly
Key Characteristics:
- Parallel processing (much faster than RNNs)
- Handles long-range dependencies effectively
- Scales well with data and compute
- Foundation of all modern large language models (LLMs)
Architecture:
- Encoder: Processes input and creates representations (used in BERT)
- Decoder: Generates output sequences (used in GPT)
- Encoder-Decoder: Full architecture for translation tasks (used in T5)
Foundation Models Built on Transformers:
- GPT series (decoder-only)
- Claude (decoder-only)
- Llama (decoder-only)
- BERT (encoder-only)
- T5 (encoder-decoder)

Exam Tip: Transformers = self-attention + parallel processing. They are the architecture behind ALL modern LLMs and foundation models. If the question asks what architecture powers generative AI, the answer is Transformers.

ML Concepts

Labeled vs Unlabeled Data

Aspect	Labeled Data	Unlabeled Data
Definition	Data with known correct answers/tags	Raw data without annotations
Cost	Expensive to create (requires human annotation)	Cheap and abundant
Used For	Supervised learning, fine-tuning	Unsupervised learning, continued pre-training
Example	Image tagged "cat," email tagged "spam"	Raw images, raw text, log files

Structured vs Unstructured Data

Aspect	Structured Data	Unstructured Data
Definition	Data organized in tables with rows/columns	Data without a predefined format
Storage	Databases (SQL), spreadsheets, CSV	Object storage (S3), data lakes
Examples	Customer records, transaction logs, sensor readings	Images, videos, emails, PDFs, social media posts
ML Approach	Traditional ML (decision trees, regression)	Deep learning (CNNs, Transformers)

Model Training

Training Data: The data used to teach the model (typically 70-80% of dataset)
Validation Data: Used during training to tune hyperparameters and assess progress (10-15%)
Test Data: Held-out data used ONLY after training to evaluate final performance (10-15%)
Epochs: Number of complete passes through the training data
Batch Size: Number of training examples processed at once
Loss Function: Measures the error between predictions and actual values
Optimizer: Algorithm that adjusts model weights to minimize loss (SGD, Adam)

Model Evaluation

Accuracy: Percentage of correct predictions (overall)
Precision: Of all positive predictions, how many were actually positive (minimize false positives)
Recall (Sensitivity): Of all actual positives, how many were correctly identified (minimize false negatives)
F1 Score: Harmonic mean of precision and recall (balanced metric)
AUC-ROC: Area under the Receiver Operating Characteristic curve (how well the model separates classes)
RMSE (Root Mean Square Error): For regression tasks — measures average prediction error
Confusion Matrix: Table showing true positives, false positives, true negatives, false negatives

Exam Tip: Know when to prioritize precision vs. recall:

High precision needed: Spam detection (don't accidentally block important emails)
High recall needed: Medical diagnosis (don't miss a disease), fraud detection (catch all fraud)

Model Deployment

Real-time Inference: Deploy model as an API endpoint for instant predictions
Batch Transform: Process large datasets offline
Edge Deployment: Run models on edge devices (IoT, mobile)
A/B Testing: Deploy multiple model versions and compare performance
Blue/Green: Switch between model versions with zero downtime

Hyperparameters

What: Settings configured before training that control the learning process (not learned from data)
Examples:
- Learning Rate: How fast the model adjusts weights
- Number of Epochs: How many times to iterate over training data
- Batch Size: Number of samples per training step
- Number of Layers/Neurons: Model architecture decisions
- Dropout Rate: Regularization parameter
- Temperature (for LLMs): Controls randomness of output

Overfitting

What: Model performs excellently on training data but poorly on new/unseen data
Cause: Model has memorized the training data (including noise) instead of learning generalizable patterns
Signs: Very high training accuracy, low test/validation accuracy
Solutions:
- More training data
- Regularization (L1, L2, Dropout)
- Simpler model architecture
- Data augmentation
- Early stopping
- Cross-validation

Underfitting

What: Model performs poorly on BOTH training and test data
Cause: Model is too simple to capture the underlying patterns in the data
Signs: Low training accuracy AND low test accuracy
Solutions:
- More complex model
- More features / better feature engineering
- Longer training (more epochs)
- Reduce regularization

Bias and Variance

Bias: Error from simplifying assumptions — high bias = underfitting (model is too simple)
Variance: Error from sensitivity to training data — high variance = overfitting (model is too complex)
Bias-Variance Tradeoff: Increasing model complexity reduces bias but increases variance (and vice versa). The goal is to find the sweet spot.

Regularization

What: Techniques to prevent overfitting by adding constraints to the model
Methods:
- L1 (Lasso): Adds absolute value of weights as penalty; can reduce features to zero (feature selection)
- L2 (Ridge): Adds squared weights as penalty; shrinks weights but doesn't zero them
- Dropout: Randomly deactivates neurons during training (used in neural networks)
- Early Stopping: Stop training when validation performance starts declining

Exam Tip: Overfitting = model is too complex (memorizes data). Underfitting = model is too simple. Regularization prevents overfitting. These are fundamental concepts that will appear on the exam.

Phases of Machine Learning Project

Phase	Description	Key Activities
1. Business Problem Definition	Define the problem in ML terms	Identify objectives, success metrics, constraints
2. Data Collection	Gather relevant data	Identify sources, collect raw data, assess data availability
3. Data Preparation	Clean and transform data	Handle missing values, remove duplicates, normalize, encode categorical variables
4. Feature Engineering	Create informative features	Select features, create new features, transform variables
5. Model Selection	Choose appropriate algorithm	Compare algorithms, consider problem type, evaluate complexity
6. Model Training	Train the model on data	Split data, set hyperparameters, train, validate
7. Model Evaluation	Assess model performance	Use test data, calculate metrics, check for bias
8. Model Deployment	Put model into production	Set up endpoints, integrate with applications
9. Model Monitoring	Track performance over time	Monitor drift, track accuracy, retrain when needed

Exam Tip: Know the ML lifecycle phases and what happens in each. The exam may ask about the correct order or which activity belongs to which phase. Data preparation typically takes the most time (60-80% of a project).

ML Algorithms

ResNet (Residual Network)

What: A deep CNN architecture that uses skip connections (residual connections) to train very deep networks
Key Innovation: Skip connections allow gradients to flow through the network without vanishing, enabling networks with 100+ layers
Use Cases: Image classification, object detection, feature extraction
Significance: Won the ImageNet competition in 2015; foundational architecture for modern computer vision

SVM (Support Vector Machine)

What: A supervised learning algorithm that finds the optimal hyperplane to separate classes with the maximum margin
Key Characteristics:
- Works well with high-dimensional data
- Effective even with limited training data
- Can handle non-linear boundaries using kernel functions
Use Cases: Text classification, image classification, bioinformatics
Kernel Functions: Linear, Polynomial, RBF (Radial Basis Function)

GAN (Generative Adversarial Network)

What: A deep learning architecture with two competing neural networks: a Generator (creates fake data) and a Discriminator (distinguishes real from fake)
How It Works:
1. Generator creates synthetic data (images, text, etc.)
2. Discriminator tries to distinguish real data from generated data
3. Both networks improve through competition (adversarial training)
4. Generator eventually produces data indistinguishable from real data
Use Cases:
- Image generation and synthesis
- Data augmentation (generate more training data)
- Super-resolution (enhance image quality)
- Style transfer
- Deepfake creation/detection
Key Concept: GANs are part of the generative AI family, predating transformer-based models

Exam Tip: GAN = Generator vs. Discriminator (two competing networks). If the question mentions adversarial training or two networks competing to generate realistic data, think GAN. GANs are a form of generative AI distinct from transformer-based LLMs.