AI and Machine Learning (ML) Fundamentals
Understanding the fundamental concepts of AI and Machine Learning is essential for the AIF-C01 exam. This section covers the core theory, types of learning, neural network architectures, and the ML project lifecycle.
Machine Learning Types
Supervised Learning
- What: The model learns from labeled data — each training example has an input and the correct output (label)
- Goal: Learn a mapping from inputs to outputs that can generalize to new, unseen data
- Data Required: Labeled dataset (input-output pairs)
- Types:
- Classification: Predict a discrete category (spam/not spam, fraud/not fraud)
- Regression: Predict a continuous numerical value (price, temperature)
- Examples:
- Email spam detection (label: spam/not spam)
- House price prediction (label: price)
- Image classification (label: cat/dog)
- Sentiment analysis (label: positive/negative/neutral)
- Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, Neural Networks
Exam Tip: Supervised learning = labeled data. If the question mentions training data with known correct answers/labels, it's supervised learning.
Unsupervised Learning
- What: The model learns from unlabeled data — it discovers hidden patterns and structures without predefined labels
- Goal: Find underlying patterns, groupings, or anomalies in data
- Data Required: Unlabeled dataset (no correct answers provided)
- Types:
- Clustering: Group similar data points together (K-means, DBSCAN)
- Dimensionality Reduction: Reduce the number of features while preserving information (PCA, t-SNE)
- Association: Discover relationships between variables (market basket analysis)
- Anomaly Detection: Identify unusual patterns or outliers
- Examples:
- Customer segmentation (group customers by behavior)
- Anomaly detection (detect unusual network traffic)
- Market basket analysis ("customers who bought X also bought Y")
- Topic modeling (discover topics in document collection)
Exam Tip: Unsupervised learning = no labels. If the question mentions finding patterns in data without predefined categories, it's unsupervised learning.
Reinforcement Learning
- What: An agent learns by interacting with an environment, receiving rewards or penalties for its actions
- Goal: Learn a policy (strategy) that maximizes cumulative reward over time
- Key Concepts:
- Agent: The learner/decision-maker
- Environment: The world the agent interacts with
- State: Current situation of the agent
- Action: What the agent can do
- Reward: Feedback signal (positive or negative)
- Policy: Strategy mapping states to actions
- Examples:
- Game playing (AlphaGo, Chess)
- Robotics (learning to walk, grasp objects)
- Autonomous driving
- Recommendation systems
- RLHF (training language models with human feedback)
- AWS Service: Amazon SageMaker supports RL; also used in AWS DeepRacer
Exam Tip: Reinforcement learning = agent + rewards + environment. If the question mentions learning from interaction, trial-and-error, or reward optimization, it's reinforcement learning. Remember RLHF uses RL.
Deep Learning
- What: A subset of machine learning that uses neural networks with multiple layers (deep neural networks) to learn complex patterns
- Key Characteristics:
- Automatically learns feature representations from raw data (no manual feature engineering)
- Requires large amounts of data and compute power
- Excels at unstructured data (images, text, audio, video)
- Powers most modern AI applications
- Relationship: AI → Machine Learning → Deep Learning → Foundation Models
- Examples: Image recognition, natural language processing, speech recognition, generative AI
- Hardware: Typically requires GPUs or TPUs for training
Neural Networks
Basic Neural Network
- Neurons: Basic units that receive inputs, apply weights, sum them, apply an activation function, and produce an output
- Layers:
- Input Layer: Receives the raw data
- Hidden Layers: Process and transform the data (the "learning" happens here)
- Output Layer: Produces the final prediction
- Training: Uses backpropagation to adjust weights and minimize error (loss function)
- Activation Functions: ReLU, Sigmoid, Tanh, Softmax
Convolutional Neural Networks (CNN)
- What: Neural networks specialized for processing grid-like data (images, video)
- Key Components:
- Convolutional Layers: Apply filters to detect features (edges, textures, shapes)
- Pooling Layers: Reduce spatial dimensions while preserving important features
- Fully Connected Layers: Classification at the end
- Use Cases:
- Image classification (is this a cat or dog?)
- Object detection (where are the objects in this image?)
- Facial recognition
- Medical image analysis (X-rays, MRIs)
- Autonomous driving (scene understanding)
Exam Tip: CNN = images and spatial data. If the question involves image recognition, object detection, or visual data processing, CNNs are the answer.
Recurrent Neural Networks (RNN)
- What: Neural networks designed for sequential data where order matters
- Key Characteristics:
- Has memory — can remember information from previous time steps
- Processes data sequentially (one element at a time)
- Can handle variable-length input sequences
- Variants:
- LSTM (Long Short-Term Memory): Solves the vanishing gradient problem; better at remembering long-term dependencies
- GRU (Gated Recurrent Unit): Simplified version of LSTM
- Use Cases:
- Time series forecasting (stock prices, weather)
- Speech recognition
- Language translation
- Text generation
- Music generation
- Limitation: Sequential processing is slow (cannot parallelize)
Exam Tip: RNN/LSTM = sequential data (text, time series, speech). If the question involves data with temporal ordering, think RNN.
Transformers
- What: Neural network architecture that uses self-attention to process all elements of a sequence simultaneously (in parallel)
- Key Innovation: Self-attention mechanism that allows every element in a sequence to attend to every other element directly
- Key Characteristics:
- Parallel processing (much faster than RNNs)
- Handles long-range dependencies effectively
- Scales well with data and compute
- Foundation of all modern large language models (LLMs)
- Architecture:
- Encoder: Processes input and creates representations (used in BERT)
- Decoder: Generates output sequences (used in GPT)
- Encoder-Decoder: Full architecture for translation tasks (used in T5)
- Foundation Models Built on Transformers:
- GPT series (decoder-only)
- Claude (decoder-only)
- Llama (decoder-only)
- BERT (encoder-only)
- T5 (encoder-decoder)
Exam Tip: Transformers = self-attention + parallel processing. They are the architecture behind ALL modern LLMs and foundation models. If the question asks what architecture powers generative AI, the answer is Transformers.
ML Concepts
Labeled vs Unlabeled Data
| Aspect | Labeled Data | Unlabeled Data |
|---|---|---|
| Definition | Data with known correct answers/tags | Raw data without annotations |
| Cost | Expensive to create (requires human annotation) | Cheap and abundant |
| Used For | Supervised learning, fine-tuning | Unsupervised learning, continued pre-training |
| Example | Image tagged "cat," email tagged "spam" | Raw images, raw text, log files |
Structured vs Unstructured Data
| Aspect | Structured Data | Unstructured Data |
|---|---|---|
| Definition | Data organized in tables with rows/columns | Data without a predefined format |
| Storage | Databases (SQL), spreadsheets, CSV | Object storage (S3), data lakes |
| Examples | Customer records, transaction logs, sensor readings | Images, videos, emails, PDFs, social media posts |
| ML Approach | Traditional ML (decision trees, regression) | Deep learning (CNNs, Transformers) |
Model Training
- Training Data: The data used to teach the model (typically 70-80% of dataset)
- Validation Data: Used during training to tune hyperparameters and assess progress (10-15%)
- Test Data: Held-out data used ONLY after training to evaluate final performance (10-15%)
- Epochs: Number of complete passes through the training data
- Batch Size: Number of training examples processed at once
- Loss Function: Measures the error between predictions and actual values
- Optimizer: Algorithm that adjusts model weights to minimize loss (SGD, Adam)
Model Evaluation
- Accuracy: Percentage of correct predictions (overall)
- Precision: Of all positive predictions, how many were actually positive (minimize false positives)
- Recall (Sensitivity): Of all actual positives, how many were correctly identified (minimize false negatives)
- F1 Score: Harmonic mean of precision and recall (balanced metric)
- AUC-ROC: Area under the Receiver Operating Characteristic curve (how well the model separates classes)
- RMSE (Root Mean Square Error): For regression tasks — measures average prediction error
- Confusion Matrix: Table showing true positives, false positives, true negatives, false negatives
Exam Tip: Know when to prioritize precision vs. recall:
- High precision needed: Spam detection (don't accidentally block important emails)
- High recall needed: Medical diagnosis (don't miss a disease), fraud detection (catch all fraud)
Model Deployment
- Real-time Inference: Deploy model as an API endpoint for instant predictions
- Batch Transform: Process large datasets offline
- Edge Deployment: Run models on edge devices (IoT, mobile)
- A/B Testing: Deploy multiple model versions and compare performance
- Blue/Green: Switch between model versions with zero downtime
Hyperparameters
- What: Settings configured before training that control the learning process (not learned from data)
- Examples:
- Learning Rate: How fast the model adjusts weights
- Number of Epochs: How many times to iterate over training data
- Batch Size: Number of samples per training step
- Number of Layers/Neurons: Model architecture decisions
- Dropout Rate: Regularization parameter
- Temperature (for LLMs): Controls randomness of output
Overfitting
- What: Model performs excellently on training data but poorly on new/unseen data
- Cause: Model has memorized the training data (including noise) instead of learning generalizable patterns
- Signs: Very high training accuracy, low test/validation accuracy
- Solutions:
- More training data
- Regularization (L1, L2, Dropout)
- Simpler model architecture
- Data augmentation
- Early stopping
- Cross-validation
Underfitting
- What: Model performs poorly on BOTH training and test data
- Cause: Model is too simple to capture the underlying patterns in the data
- Signs: Low training accuracy AND low test accuracy
- Solutions:
- More complex model
- More features / better feature engineering
- Longer training (more epochs)
- Reduce regularization
Bias and Variance
- Bias: Error from simplifying assumptions — high bias = underfitting (model is too simple)
- Variance: Error from sensitivity to training data — high variance = overfitting (model is too complex)
- Bias-Variance Tradeoff: Increasing model complexity reduces bias but increases variance (and vice versa). The goal is to find the sweet spot.
Regularization
- What: Techniques to prevent overfitting by adding constraints to the model
- Methods:
- L1 (Lasso): Adds absolute value of weights as penalty; can reduce features to zero (feature selection)
- L2 (Ridge): Adds squared weights as penalty; shrinks weights but doesn't zero them
- Dropout: Randomly deactivates neurons during training (used in neural networks)
- Early Stopping: Stop training when validation performance starts declining
Exam Tip: Overfitting = model is too complex (memorizes data). Underfitting = model is too simple. Regularization prevents overfitting. These are fundamental concepts that will appear on the exam.
Phases of Machine Learning Project
| Phase | Description | Key Activities |
|---|---|---|
| 1. Business Problem Definition | Define the problem in ML terms | Identify objectives, success metrics, constraints |
| 2. Data Collection | Gather relevant data | Identify sources, collect raw data, assess data availability |
| 3. Data Preparation | Clean and transform data | Handle missing values, remove duplicates, normalize, encode categorical variables |
| 4. Feature Engineering | Create informative features | Select features, create new features, transform variables |
| 5. Model Selection | Choose appropriate algorithm | Compare algorithms, consider problem type, evaluate complexity |
| 6. Model Training | Train the model on data | Split data, set hyperparameters, train, validate |
| 7. Model Evaluation | Assess model performance | Use test data, calculate metrics, check for bias |
| 8. Model Deployment | Put model into production | Set up endpoints, integrate with applications |
| 9. Model Monitoring | Track performance over time | Monitor drift, track accuracy, retrain when needed |
Exam Tip: Know the ML lifecycle phases and what happens in each. The exam may ask about the correct order or which activity belongs to which phase. Data preparation typically takes the most time (60-80% of a project).
ML Algorithms
ResNet (Residual Network)
- What: A deep CNN architecture that uses skip connections (residual connections) to train very deep networks
- Key Innovation: Skip connections allow gradients to flow through the network without vanishing, enabling networks with 100+ layers
- Use Cases: Image classification, object detection, feature extraction
- Significance: Won the ImageNet competition in 2015; foundational architecture for modern computer vision
SVM (Support Vector Machine)
- What: A supervised learning algorithm that finds the optimal hyperplane to separate classes with the maximum margin
- Key Characteristics:
- Works well with high-dimensional data
- Effective even with limited training data
- Can handle non-linear boundaries using kernel functions
- Use Cases: Text classification, image classification, bioinformatics
- Kernel Functions: Linear, Polynomial, RBF (Radial Basis Function)
GAN (Generative Adversarial Network)
- What: A deep learning architecture with two competing neural networks: a Generator (creates fake data) and a Discriminator (distinguishes real from fake)
- How It Works:
- Generator creates synthetic data (images, text, etc.)
- Discriminator tries to distinguish real data from generated data
- Both networks improve through competition (adversarial training)
- Generator eventually produces data indistinguishable from real data
- Use Cases:
- Image generation and synthesis
- Data augmentation (generate more training data)
- Super-resolution (enhance image quality)
- Style transfer
- Deepfake creation/detection
- Key Concept: GANs are part of the generative AI family, predating transformer-based models
Exam Tip: GAN = Generator vs. Discriminator (two competing networks). If the question mentions adversarial training or two networks competing to generate realistic data, think GAN. GANs are a form of generative AI distinct from transformer-based LLMs.