Supervised learning is one of the most prominent paradigms of machine learning. This approach is termed “supervised” because the learning process is guided by labeled data, where each input has a corresponding, correct output. The primary goal is to generalize this mapping so that the model performs well on unseen data.

In Supervised Learning model learns by comparing its predictions with the actual labels provided in the training data. This technique gives minimize errors and improve accuracy.

Below are the concepts of Supervised Learning : 

  1. Input Data and Features
    Input data refers to the raw data or observations used to train the model. Each observation contains features, which are specific variables that describe the data. e.g. in a housing price prediction task, features might include the size of the house, location, and number of bedrooms.

  2. Labeled Data
    Labeled data includes both the input features and the corresponding target output. The labels are often human-annotated or derived from existing datasets. e.g. in an email classification task, labeled data might consist of emails (input) and their categories, such as “spam” or “not spam” (output).

  3. Learning Process
    The supervised learning process involves training a model to minimize the difference between its predictions and the actual labels.

    This is achieved by:

    • Defining a loss function that quantifies the error.
    • Using optimization techniques, like gradient descent, to iteratively adjust the model’s parameters to minimize the loss. 
  4. Model Generalization
    Generalization refers to a model’s ability to perform accurately on unseen data. A good supervised learning model avoids overfitting (memorizing training data) and underfitting (failing to capture underlying patterns). Techniques like cross-validation, regularization, and hyperparameter tuning help improve generalization.

  5. Evaluation Metrics
    The performance of supervised learning models is measured using metrics such as:

    • Accuracy: The proportion of correct predictions out of total predictions.
    • Precision and Recall: Metrics that evaluate the performance of classification models in imbalanced datasets.
    • Mean Squared Error (MSE): A common metric for regression tasks, measuring the average squared difference between predicted and actual values.

Supervised learning can be broadly categorized into below two types :

       1. Classification
  • It Predicts discrete labels or categories for given input data. The algorithm learns from labeled data, mapping input features to predefined classes. and Output will be Categorical (e.g., binary or multi-class).

            For Examples : a) Email classification (spam vs. not spam). b) Image recognition (cat, dog, car, etc.).

            Below are the common algorithms are used for classification :

                  a) Logistic Regression

                  b) Support Vector Machine

                  c) Random Forest

                  d) Decision Tree

                  e) K-Nearest Neighbors (KNN)

                  f) Naïve Bayes

                  g) Ensemble techniques

 
       2. Regression
  • It Predicts continuous values based on input data. It finds the relationship between independent variables (features) and a dependent variable (target) by fitting a curve or line that best describes the data. We will get Continuous output in regression (e.g., numerical).

            For Examples : a) Predicting house prices based on location and size. b) Forecasting stock prices.

            Below are the common algorithms are used for regression :

                  a) Linear Regression

                  b) Polynomial Regression

                  c) Ridge Regression

                  d) Lasso Regression

                  e) Decision tree

                  f) Random Forest