Dr. Kumod Kumar Gupta Deep Learning Unit I 1
Deep Learning
(BCSML0552)
Dr. Kumod kr. Gupta
(Associate Professor)
AI Department
Unit: I
INTRODUCTION
Course Details
(B. Tech. 5th
Sem)
Noida Institute of Engineering and Technology
Dr. Kumod Kumar Gupta Deep Learning Unit I 2
Faculty Introduction
Name Dr. Kumod Kr. Gupta
Qualification Ph.D., M. Tech
Designation Associate Professor
Department AI
Total Experience 17 years
NIET Experience 12 years
Subject Taught Python Basics, Advanced Python, ML, DL
Dr. Kumod Kumar Gupta Deep Learning Unit I 3
Evaluation Scheme
Sl.
No.
Subject Codes
Subject Name
Periods Evaluation Scheme End
Semester Total Credit
L T P CT TA TOTAL PS TE PE
1 ACSML0602 Deep Learning 3 0 0 30 20 50 100 150 3
2 ACSML0603 Advanced Database Management
Systems
3 1 0 30 20 50 100 150 4
3 ACSE0603 Software Engineering 3 0 0 30 20 50 100 150 3
4 Departmental Elective-III 3 0 0 30 20 50 100 150 3
5 Departmental Elective-IV 3 0 0 30 20 50 100 150 3
6 Open Elective-I 3 0 0 30 20 50 100 150 3
7 ACSML0652 Deep Learning Lab 0 0 2 25 25 50 1
8 ACSML0653 Advanced Database Management Systems
Lab
0 0 2 25 25 50 1
9 ACSE0653 Software Engineering Lab 0 0 2 25 25 50 1
10 ACSE0659 Mini Project 0 0 2 50 50 1
11
ANC0602 /
ANC0601
Essence of Indian Traditional Knowledge /
Constitution of India, Law and
Engineering (Non
Credit)
2 0 0 30 20 50 50 100
12 MOOCs (For B.Tech. Hons. Degree)
GRAND TOTAL 1100 23
Bachelor of Technology
Computer Science And Engineering (Artificial Intelligence & Machine Learning)
EVALUATION SCHEME
SEMESTER-VI
Dr. Kumod Kumar Gupta Deep Learning Unit I 4
Course Contents / Syllabus
Module 1 Introduction 14 hours
Model Improvement and Performance: Curse of Dimensionality, Bias and Variance Trade off, Overfitting and underfitting, Regression - MAE, MSE, RMSE, R Squared, Adjusted R Squared, p-Value,
Classification - Precision, Recall, F1, Other topics, K-Fold Cross validation, RoC curve, Hyper-Parameter Tuning Introduction – Grid search, random search, Introduction to Deep Learning. Artificial Neural
Network: Neuron, Nerve structure and synapse, Artificial Neuron and its model, activation functions, Neural network architecture: Single layer and Multilayer feed forward networks, recurrent networks.
Various learning techniques; Perception and Convergence rule, Hebb Learning. Perceptron, Multilayer perceptron, Gradient descent and the Delta rule, Multilayer networks, Derivation of Backpropagation
Algorithm
Module 2 CONVOLUTION NEURAL NETWORK 14 hours
What is computer vision? Why Convolutions (CNN)? Introduction to CNN, Train a simple convolutional neural net, Explore the design space for convolutional nets, Pooling layer motivation in CNN, Design a
convolutional layered application, Understanding and visualizing a CNN, Transfer learning and fine-tuning CNN, Image classification, Text classification, Image classification and hyper-parameter tuning,
Emerging NN architectures.
Module 3 DETECTION & RECOGNITION 14 hours
Padding & Edge Detection, Strided Convolutions, Networks in Networks and 1x1 Convolutions, Inception Network Motivation, Object Detection, YOLO Algorithm.
Module 4 RECURRENT NEURAL NETWORKS 15 hours
Why use sequence models? Recurrent Neural Network Model, Notation, Backpropagation through time (BTT), Different types of RNNs, Language model and sequence generation, Sampling novel sequences,
Vanishing gradients with RNNs, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Bidirectional RNN, Deep RNNs
Module 5 AUTO ENCODERS IN DEEP LEARNING 15 hours
Auto-encoders and unsupervised learning, Stacked auto-encoders and semi-supervised learning, Regularization - Dropout and Batch normalization.
Syllabus
Dr. Kumod Kumar Gupta Deep Learning Unit I 5
Syllabus
UNIT-I: Model Improvement and Performance
Curse of Dimensionality, Bias and Variance Trade off, Overfitting and underfitting,
Regression - MAE, MSE, RMSE, R Squared, Adjusted R Squared, p-Value, Classification -
Precision, Recall, F1, Other topics, K-Fold Cross validation, RoC curve, Hyper-Parameter
Tuning Introduction – Grid search, random search, Introduction to Deep Learning.
Artificial Neural Network: Neuron, Nerve structure and synapse, Artificial Neuron and its
model, activation functions, Neural network architecture: Single layer and Multilayer feed
forward networks, recurrent networks. Various learning techniques; Perception and
Convergence rule, Hebb Learning. Perceptron’s, Multilayer perceptron, Gradient descent and
the Delta rule, Multilayer networks, Derivation of Backpropagation Algorithm.
Dr. Kumod Kumar Gupta Deep Learning Unit I 6
Syllabus
UNIT-II: CONVOLUTION NEURAL NETWORK
What is computer vision? Why Convolutions (CNN)?
Introduction to CNN, Train a simple convolutional neural net, Explore the design space for convolutional nets,
Pooling layer motivation in CNN, Design a convolutional layered application, Understanding and visualizing a
CNN, Transfer learning and fine-tuning CNN, Image classification, Text classification, Image classification
and hyper-parameter tuning, Emerging NN architectures
Dr. Kumod Kumar Gupta Deep Learning Unit I 7
Syllabus
UNIT-III:DETECTION & RECOGNITION
Padding & Edge Detection, Strided Convolutions, Networks in Networks and
1x1Convolutions, Inception Network Motivation, Object Detection, YOLO Algorithm.
Dr. Kumod Kumar Gupta Deep Learning Unit I 8
Syllabus
UNIT-IV: RECURRENT NEURAL NETWORKS
Why use sequence models? Recurrent Neural Network Model, Notation, Back-propagation
through time (BTT), Different types of RNNs, Language model and sequence generation,
Sampling novel sequences, Vanishing gradients with RNNs, Gated Recurrent Unit (GRU),
Long Short-Term Memory (LSTM), Bidirectional RNN, Deep RNNs
Dr. Kumod Kumar Gupta Deep Learning Unit I 9
Syllabus
UNIT-V: AUTO ENCODERS IN DEEP LEARNING
Auto-encoders and unsupervised learning, Stacked auto-encoders and semi-supervised
learning,
Regularization - Dropout and Batch normalization.
Dr. Kumod Kumar Gupta Deep Learning Unit I 10
Course Objective
To be able to learn unsupervised techniques and provide continuous
improvement in accuracy and outcomes of various datasets with more reliable
and concise analysis results.
Dr. Kumod Kumar Gupta Deep Learning Unit I 11
Course Outcome (CO)
Course
Outcome
( CO)
At the end of course , the student will be able to:
Bloom’s
Knowledge
Level (KL)
CO1 Analyze ANN model and understand the ways of accuracy
measurement.
K4
CO2 Develop a convolutional neural network for multi-class
classification in images
K6
CO3 Apply Deep Learning algorithm to detect and recognize an
object.
K3
CO4 Apply RNNs to Time Series Forecasting, NLP, Text and
Image Classification
K4
CO5 Apply Lower-dimensional representation over higher-
dimensional data for dimensionality reduction and capture
the important features of an object.
K3
Dr. Kumod Kumar Gupta Deep Learning Unit I 12
Program Outcomes (POs)
Engineering Graduates will be able to:
PO1 : Engineering Knowledge
PO2 : Problem Analysis
PO3 : Design/Development of solutions
PO4 : Conduct Investigations of complex problems
PO5 : Modern tool usage
PO6 : The engineer and society
Dr. Kumod Kumar Gupta Deep Learning Unit I 13
Program Outcomes (POs)
Engineering Graduates will be able to:
PO7 : Environment and sustainability
PO8 : Ethics
PO9 : Individual and teamwork
PO10 : Communication
PO11 : Project management and finance
PO12 : Life-long learning
Dr. Kumod Kumar Gupta Deep Learning Unit I 14
CO-PO Mapping
CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 3 3 3 3 2 2 1 - 1 - 2 2
CO2 3 3 3 3 2 2 1 - 1 1 2 2
CO3 3 3 3 3 3 2 2 - 2 1 2 3
CO4 3 3 3 3 3 2 2 1 2 1 2 3
CO5 3 3 3 3 3 2 2 1 2 1 2 2
AVG 3.0 3.0 3.0 3.0 2.6 2.0 1.6 0.4 1.6 0.8 2.0 2.4
Dr. Kumod Kumar Gupta Deep Learning Unit I 15
Result Analysis 2022-2023 (Even semester )
Institute Result
FACULTY NAME BRANCH/SECTION RESULT
Dr. Kumod Kumar Gupta Deep Learning Unit I 16
Pattern of Online External Exam Question Paper (100 marks)
Dr. Kumod Kumar Gupta Deep Learning Unit I 17
Pattern of Online External Exam Question Paper (100 marks)
Dr. Kumod Kumar Gupta Deep Learning Unit I 18
Pattern of Online External Exam Question Paper (100 marks)
Dr. Kumod Kumar Gupta Deep Learning Unit I 19
Pattern of Online External Exam Question Paper (100 marks)
Dr. Kumod Kumar Gupta Deep Learning Unit I 20
Pattern of Online External Exam Question Paper (100 marks)
Dr. Kumod Kumar Gupta Deep Learning Unit I 21
Pattern of Online External Exam Question Paper (100 marks)
Dr. Kumod Kumar Gupta Deep Learning Unit I 22
Model Improvement and Performance:
• Curse of Dimensionality,
• Bias and Variance Trade off
• Overfitting and underfitting,
• Regression - MAE, MSE, RMSE,
• R Squared, Adjusted R Squared, p-Value,
• Classification - Precision, Recall, F1,
• Other topics, K-Fold Cross validation,
• RoC curve,
• Hyper-Parameter Tuning Introduction –
Grid search, random search,
• Introduction to Deep Learning.
Artificial Neural Network:
• Neuron, Nerve structure and synapse,
• Artificial Neuron and its model,
• activation functions,
• Neural network architecture: Single
layer and Multilayer feed forward
networks, recurrent networks.
• Various learning techniques; Perception
and Convergence rule, Hebb Learning.
Perceptron’s, Multilayer perceptron,
Gradient descent and the Delta rule,
• Multilayer networks,
• Derivation of Backpropagation
Algorithm.
Unit I Content
Dr. Kumod Kumar Gupta Deep Learning Unit I 23
Analyze ANN model and understand the ways of accuracy measurement.
Unit I Objective
Dr. Kumod Kumar Gupta Deep Learning Unit I 24
• Python, Basic Modeling Concepts
Topis Prerequisite
Dr. Kumod Kumar Gupta Deep Learning Unit I 25
To be able to learn unsupervised techniques and provide continuous improvement in accuracy
and outcomes of various datasets with more reliable and concise analysis results.
Analyze ANN model and understand the ways of accuracy measurement.
Topic Objective
Dr. Kumod Kumar Gupta Deep Learning Unit I 26
Model Improvement and Performance
Unit 1 Introduction
Curse of Dimensionality,
Bias and Variance Trade off,
Overfitting and underfitting,
Regression - MAE, MSE, RMSE, R Squared, Adjusted R Squared, p-Value,
Classification - Precision, Recall, F1, Other topics, K-Fold Cross validation, RoC curve,
Hyper-Parameter Tuning Introduction – Grid search, random search,
Introduction to Deep Learning.
• Increasing the number of features will not always improve
classification accuracy.
• In practice, the inclusion of more features might actually lead
to worse performance.
• The number of training examples required increases
exponentially with dimensionality d (i.e., kd
).
32
bins
33
bins
31
bins
k=3
Dr. Kumod Kumar Gupta Deep Learning Unit I 27
CURSE OF DIMENSIONALITY
Dr. Kumod Kumar Gupta Deep Learning Unit I 28
CURSE OF DIMENSIONALITY
Problem Effect in High Dimensions
Data sparsity
• Data Sparsity means that in a given dataset, most
of the possible values or combinations of
features are empty or have very few data points.
• Hard to find dense regions or clusters;
neighborhood methods (k-NN) fail.
Overfitting
Too many features → model memorizes noise rather
than learning patterns.
Distance metrics degrade
Distances between points become similar, reducing
discrimination power.
Exponential growth of computation
More features mean heavier calculations and storage
requirements.
Increased sample requirement
Need exponentially more samples to maintain
statistical significance.
29
• What is the objective?
– Choose an optimum set of features of lower dimensionality to improve classification
accuracy.
• Different methods can be used to reduce dimensionality:
– Feature extraction
– Feature selection
Dimensionality Reduction (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I
30
Dimensionality Reduction (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I
There are two components of dimensionality reduction:
•Feature selection: In this, we try to find a subset of the original set of variables, or features, to get a
smaller subset which can be used to model the problem. It usually involves three ways:
• Filter
• Wrapper
• Embedded
•Feature extraction: This reduces the data in a high dimensional space to a lower dimension space, i.e. a
space with lesser no. of dimensions.
Methods of Dimensionality Reduction
The various methods used for dimensionality reduction include:
•Principal Component Analysis (PCA)
•Linear Discriminant Analysis (LDA)
•Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the method used. The
prime linear method, called Principal Component Analysis, or PCA, is discussed below.
31
Dimensionality Reduction (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I
Type How it Works
Forward Selection
Start with no features → add one at a time → keep if
performance improves.
Backward Elimination Start with all features → remove one at a time → drop if
performance improves or stays the same.
Recursive Feature Elimination (RFE)
Train model → remove least important feature(s) →
repeat until desired number remains.
Types of Wrapper Methods
32
Dimensionality Reduction (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I
Advantages of Dimensionality Reduction
•It helps in data compression, and hence reduced storage space.
•It reduces computation time.
•It also helps remove redundant features, if any.
•Improved Visualization: High dimensional data is difficult to visualize, and dimensionality reduction
techniques can help in visualizing the data in 2D or 3D, which can help in better understanding and analysis.
•Overfitting Prevention: High dimensional data may lead to overfitting in machine learning models, which can
lead to poor generalization performance. Dimensionality reduction can help in reducing the complexity of the
data, and hence prevent overfitting.
•Feature Extraction: Dimensionality reduction can help in extracting important features from high dimensional
data, which can be useful in feature selection for machine learning models.
•Data Preprocessing: Dimensionality reduction can be used as a preprocessing step before applying machine
learning algorithms to reduce the dimensionality of the data and hence improve the performance of the model.
•Improved Performance: Dimensionality reduction can help in improving the performance of machine learning
models by reducing the complexity of the data, and hence reducing the noise and irrelevant information in the
data.
33
Dimensionality Reduction (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I
Disadvantages of Dimensionality Reduction
•It may lead to some amount of data loss.
•PCA tends to find linear correlations between variables, which is sometimes undesirable.
•PCA fails in cases where mean and covariance are not enough to define datasets.
•We may not know how many principal components to keep- in practice, some thumb rules are applied.
•Interpretability: The reduced dimensions may not be easily interpretable, and it may be difficult to
understand the relationship between the original features and the reduced dimensions.
•Overfitting: In some cases, dimensionality reduction may lead to overfitting, especially when the number
of components is chosen based on the training data.
•Sensitivity to outliers: Some dimensionality reduction techniques are sensitive to outliers, which can result
in a biased representation of the data.
•Computational complexity: Some dimensionality reduction techniques, such as manifold learning, can be
computationally intensive, especially when dealing with large datasets.
Dr. Kumod Kumar Gupta Deep Learning Unit I 34
Bias-Variance Tradeoff (CO1)
• It is important to understand prediction errors (bias and variance) when it comes to accuracy in any
machine-learning algorithm.
• There is a tradeoff between a model’s ability to minimize bias and variance which is referred to as the best
solution for selecting a value of Regularization constant.
• A proper understanding of these errors would help to avoid the overfitting and underfitting of a data set
while training the algorithm.
Dr. Kumod Kumar Gupta Deep Learning Unit I 35
Bias(CO1)
What is Bias?
• The bias is known as the difference between the prediction of the values by the Machine Learning model
and the correct value.
• Being high in biasing gives a large error in training as well as testing data.
• It recommended that an algorithm should always be low-biased to avoid the problem of underfitting.
• By high bias, the data predicted is in a straight line format, thus not fitting accurately in the data in the data
set. Such fitting is known as the Underfitting of Data. This happens when the hypothesis is too simple or
linear in nature.
High Bias in the Model
Dr. Kumod Kumar Gupta Deep Learning Unit I 36
Variance(CO1)
What is Variance?
• The variability of model prediction for a given data point which tells us the spread of our data is called the
variance of the model.
• The model with high variance has a very complex fit to the training data and thus is not able to fit accurately
on the data which it hasn’t seen before. As a result, such models perform very well on training data but
have high error rates on test data.
• When a model is high on variance, it is then said to as Overfitting of Data.
• Overfitting is fitting the training set accurately via complex curve and high order hypothesis but is not the
solution as the error with unseen data is high. While training a data model variance should be kept low. The
high variance data looks as follows.
High Variance in the Model
Dr. Kumod Kumar Gupta Deep Learning Unit I 37
Variance(CO1)
Bias and Variance Trade-Off
Dr. Kumod Kumar Gupta Deep Learning Unit I 38
Bias- Variance trade off (CO1)
Bias- Variance Trade-off
Bias and variance should be low
Dr. Kumod Kumar Gupta Deep Learning Unit I 39
• In supervised learning, underfitting happens when a model unable to capture the underlying pattern of the data.
These models usually have high bias and low variance. It happens when we have very less amount of data to
build an accurate model or when we try to build a linear model with a nonlinear data. Also, these kind of models
are very simple to capture the complex patterns in data like Linear and logistic regression.
Underfitting(CO1)
Reasons for Underfitting
1.High bias and low variance.
2.The size of the training dataset used is not enough.
3.The model is too simple.
4.Training data is not cleaned and also contains noise in it.
Techniques to Reduce Underfitting
5.Increase model complexity.
6.Increase the number of features, performing feature engineering.
7.Remove noise from the data.
8.Increase the number of epochs or increase the duration of training to get better results.
Dr. Kumod Kumar Gupta Deep Learning Unit I 40
• In supervised learning, Overfitting happens when our model captures the noise along with the underlying pattern
in data. It happens when we train our model a lot over noisy dataset. These models have low bias and high
variance. These models are very complex like Decision trees which are prone to overfitting.
Overfitting(CO1)
Overfitting is a problem where the evaluation of machine learning algorithms on training data is different from
unseen data.
Reasons for Overfitting:
1. High variance and low bias.
2.The model is too complex.
3.The size of the training data.
Techniques to Reduce Overfitting
4.Increase training data.
5.Reduce model complexity.
6.Early stopping during the training phase (have an eye over the loss over the training period as soon as loss
begins to increase stop training).
7.Ridge Regularization and Lasso Regularization.
8.Use dropout for neural networks to tackle overfitting.
9.Cross- Validation (K- Fold Cross Validation)
10.Batch normalization
Dr. Kumod Kumar Gupta Deep Learning Unit I 41
Overfitting(CO1)
Underfitting and Overfitting in Machine Learning
Dr. Kumod Kumar Gupta Deep Learning Unit I 42
Overfitting(CO1)
Regularization
• The word “regularize” means to make things regular or acceptable.
• This is exactly why we use it for. Regularization is a form of regression used to reduce the error by
fitting a function appropriately on the given training set and avoid overfitting.
• It discourages the fitting of a complex model, thus reducing the variance and chances of overfitting. It
is used in the case of multicollinearity (when independent variables are highly correlated).
the equation of Linear Regression. Let be the prediction made.
We also introduced the concept of loss functions. We will use one such loss function in this post -
Residual Sum of Squares (RSS). It can be mathematically given as:
Dr. Kumod Kumar Gupta Deep Learning Unit I 43
Solution of Overfitting(CO1)
Regularization can be of two kinds,
1. Ridge / L2 Regularization
2. Lasso Regression/L1 Regularization
Ridge Regression / L2 Regularization
In this regression, we add a penalty term to the RSS loss function. Our modified loss function now
becomes:
• Here, λ is called the “tuning parameter” which decides how heavily we want to penalize the
flexibility of our model.
• If we look closely, we might observe that if λ=0, it performs like linear regression
• as λ→inf, the impact of the shrinkage penalty grows, and the ridge regression coe cient estimates
ffi
will approach zero.
• As can be seen, selecting a good value of λ is critical. The coefficient estimates produced by this
method are sometimes also known as the “L2 norm”.
Dr. Kumod Kumar Gupta Deep Learning Unit I 44
Solution of Overfitting(CO1)
Lasso Regression / L1 Regularization
This regression adopts the same idea
as Ridge Regression with a change in
the penalty term. Instead of , we use
Thus our new loss function becomes:
this is sometimes called the “L1 norm”.
Dr. Kumod Kumar Gupta Deep Learning Unit I 45
Solution of Overfitting(CO1)
• Note:
• The tuning parameter λ controls the impact on bias and variance.
• As the value of λ rises, it reduces the value of coefficients and thus reducing the variance.
• Till a point, this increase in λ is beneficial as it is only reducing the variance (hence avoiding overfitting),
without losing any important properties in the data.
• But after a certain value, the model starts losing important properties, giving rise to bias in the model and
thus underfitting. Therefore, the value of λ should be carefully selected.
• λ is optimized using cross-validation(K –Fold Cross Validation)
Dr. Kumod Kumar Gupta Deep Learning Unit I 46
Solution of Overfitting(CO1)
Regularization:
• The regularization model promotes smoother functions by creating a new criterion function
that relies not only on the training error, but also on algorithmic intricacy.
• Particularly, the new criterion function punishes extremely complex hypotheses; looking for
the minimum in this criterion is to balance error on the training set with complexity.
• Formally, it is possible to write the new criterion as a sum of the error on the training set plus
a regularization term, which depicts constraints or sought after properties of solutions
• The second term penalizes complex hypotheses with large variance.
• When we minimize augmented error function instead of the error on data only, we penalize
complex hypotheses and thus decrease variance.
• When λ is taken too large, only very simple functions are allowed and we risk introducing bias. λ
is optimized using cross-validation
Dr. Kumod Kumar Gupta Deep Learning Unit I 47
Solution of Overfitting(CO1)
• We consider here the example of neural network hypotheses class .
• The hypothesis complexity may be expressed as,
• The regularizer encourages smaller weights .
• For small values of weights, the network mapping is approximately linear.
• Relatively large values of weights lead to overfitted mapping with regions of large curvature
Dr. Kumod Kumar Gupta Deep Learning Unit I 48
Solution of Overfitting(CO1)
Early ­
Stopping:­
• The training of a learning machine corresponds to iterative decrease in the error function defined as
per the training data.
• During a specific training session, this error generally reduces as a function of the number of iterations
in the algorithm.
• Stopping the training before attaining a minimum training error, represents a technique of restricting
the effective hypothesis complexity.
Pruning:­
• An alternative solution that sometimes is more successful than early stopping the growth (complexity)
of the hypothesis is pruning the full-grown hypothesis that is likely to be overfitting the training data.
• Pruning is the basis of search in many decision-tree algorithms; weakest branches of large tree
overfitting the training data, which hardly reduce the error rate, are removed.
• Regression analysis is a set of statistical methods used for the estimation of relationships between a
dependent variable and one or more independent variables.
• Regression analysis includes several variations, such as linear, multiple linear, and nonlinear.
• The most common models are simple linear and multiple linear.
• Nonlinear regression analysis is commonly used for more complicated data sets in which the
dependent and independent variables show a nonlinear relationship.
UNIT-1 Regression
• Regression Analysis
– Simple Linear Regression: A model that assesses the relationship
between a dependent variable and an independent variable
Y = mx + c + e
– Where:
• Y – Dependent variable
• x – Independent (explanatory) variable
• c – Intercept
• m – Slope
• e – Residual (error)
UNIT-1 Regression
• Multiple linear regression analysis is essentially similar to the
simple linear model, with the exception that multiple
independent variables are used in the model.
• The mathematical representation of multiple linear regression
is:
Y = a + bX1 + cX2 + dX3 + ϵ
• Where:
• Y – Dependent variable
• X1, X2, X3 – Independent (explanatory) variables
• a – Intercept
• b, c, d – Slopes
• ϵ – Residual (error)
UNIT-1 Regression
• Loss Function
– Loss function is a way to know the performance of a model.
– High Loss function leads to bad train model and low loss function leads to good train model.
– loss function should be as minimum as possible.
– Loss function calculated over a single training data.
L = (Actual_Value - Predicted_Value)2
– Loss function Sometime also known as error function.
• Cost Function
– Cost function calculated for complete batch of data
C = 2
UNIT-1 Regression
– Example for Loss and Cost Function
UNIT-1 Regression
Roll No. CGPA IQ
Actual_Value Predicted_Value
Loss Function Cost Function
Package Predicted
1 5.2 100 6.3 6.4 0.01
3.475
2 4.3 91 4.5 5.3 0.64
3 8.2 83 6.5 5.2 1.69
4 8.9 102 5.5 8.9 11.56
NOTE: Loss function calculated for Individual Data while Cost
Function calculate for Entire Dataset
• MAE (Mean Absolute Error): MAE is a metric that measures
the average absolute difference between the predicted values
and the actual values. It gives an idea of how far off the
predictions are from the true values, regardless of the direction
of the error.
L = |Actual_Value - Predicted_Value|
C =
UNIT-1 Regression
• Advantages
– Easy to Understand
– Same unit as unit of Actual_Value
– It is Robust to Outlier: It means outlier will not affect error, so if there
is no outliers in dataset then it better to use MAE instead of MSE
• Disadvantages
– Grap is not differenciable due which Gradient Descent(GD) algorithm
not easy to implement.
– To implement GD we need to calculate Sub-Gradient.
UNIT-1 Regression
UNIT-1 Regression
Actual values (y): [3, 5, 2, 7,]
Predicted values (ŷ): [2.5, 5.5, 2, 8]
• MSE (Mean Squared Error): MSE is a metric that calculates the average squared difference
between the predicted values and the actual values.
• Squaring the errors gives more weight to larger errors, making it useful for penalizing significant
deviations from the true values.
L = (Actual_Value - Predicted_Value)2
C = 2
UNIT-1 Regression
• Advantages
– Easy to interpret
– Loss function is differenciable that allows to implement GD easily
– One Local Minima: It means function has one minimum value that we have to find.
• Disadvantage
– Unit of error is Square: That creates an confusion to understand it, so to extract accurate error we
have to find square root of MSE.
– It is not Robust to Outlier: If dataset consists outliers then. MSE is not useful
UNIT-1 Regression
• Huber loss
• Huber Loss is applicable when Outlier data is around 25% because 25% is
a significant amount of data and if we use MSE then it will ignore the 75%
data which is correct, because graph will deviate towards Outliers and if
we use MAE, it will ignore 25% outlier data that is also significant. In this
type of situation Huber Loss is useful.
UNIT-1 Regression
• RMSE
• It quantifies the differences between predicted values and actual values, squaring the errors, taking the
mean, and then finding the square root.
• RMSE provides a clear understanding of the model’s performance, with lower values indicating better
predictive accuracy.
• RMSE is computed by taking the square root of MSE
• RMSE value with zero indicates that the model has a perfect fit
UNIT-1 Regression
• RMSE
• The lower the RMSE, the better the model and its predictions.
• A higher RMSE indicates that there is a large deviation from the
residual to the ground truth.
UNIT-1 Regression
• Pros of the RMSE Evaluation Metric:
– RMSE is easy to understand.
– It serves as a heuristic for training models.
– It is computationally simple and easily differentiable which many
optimization algorithms desire.
– RMSE does not penalize the errors as much as MSE does due to the
square root.
• Cons of the RMSE metric:
– Like MSE, RMSE is dependent on the scale of the data. It increases in
magnitude if the scale of the error increases.
– One major drawback of RMSE is its sensitivity to outliers and the
outliers have to be removed for it to function properly.
UNIT-1 Regression
UNIT-1 Regression(USE of MAE, MSE, and RMSE)
•MAE Example use: Predicting delivery time, demand forecasting,
house prices (when big and small errors should be treated equally).
MSE Example use: Medical predictions, credit risk, fault detection
(where a large error is much worse than small ones).
RMSE Example use: Weather forecasting, energy load prediction, traffic
prediction (applications where occasional big errors are unacceptable).
Quick Rules:
•MAE: Robust, easy to explain → Good for reporting general accuracy.
•MSE: Sensitive to large errors → Good for training.
•RMSE: Sensitive + interpretable → Good for evaluation.
• R Squared
• R-squared (Coefficient of Determination) is a statistical measure that
quantifies the proportion of the variance in the dependent variable that is
explained by the independent variables in a regression model.
• Where:
– SSR (Sum of Squares Residual) represents the sum of squared differences between
the observed values and the predicted values by the model.
– SST (Total Sum of Squares) represents the sum of squared differences between the
observed values and the mean of the dependent variable.
UNIT-1 Regression
• R-squared ranges between 0 and 1, with the following
interpretations:
– =0: The model does not explain any of the variability in the dependent
variable. It's a poor fit.
– : The model explains a proportion of the variability. A higher R-squared
indicates a better fit, with 1 indicating a perfect fit where the model
explains all the variability.
– =1: The model perfectly predicts the dependent variable based on the
independent variables.
UNIT-1 Regression
• R-squared evaluates regression model fit but has limitations:
• High R-squared doesn't always mean good fit; high value may imply overfitting, lacking
generalization.
• Including more predictors can inflate R-squared, even if they're weak; adjusted R-squared adjusts for
this.
• "Good" R-squared varies by field; lower values acceptable in data-rich areas.
• R-squared may miss fit quality with nonlinearity or outliers.
UNIT-1 Regression
• Adjusted R Squared
• Where −
– n = the number of points in your data sample.
– k = the number of independent regressors, i.e. the number of variables
in your model, excluding the constant.
UNIT-1 Regression
• Adjusted R Squared
– Adjusted R-squared adjusts the statistic based on the number of independent variables in the
model
– Adjusted R2
also indicates how well terms fit a curve or line, but adjusts for the number of terms
in a model.
– If you add more and more useless variables to a model, adjusted r-squared will decrease.
– If you add more useful variables, adjusted r-squared will increase.
– Adjusted R2
will always be less than or equal to R2
UNIT-1 Regression
• Adjusted R Squared
– Problem Statement −
• A fund has a sample R-squared value close to 0.5 and it is doubtlessly offering higher risk
adjusted returns with the sample size of 50 for 5 predictors. Find Adjusted R square value.
– Sample size = 50 Number of predictor = 5 Sample R - square = 0.5.Substitute the qualities in the
equation,
UNIT-1 Regression
• RMSE (Root Mean Squared Error): RMSE is the square root of the MSE and is
commonly used to express the average magnitude of the prediction errors in the same
units as the dependent variable. It provides a measure of the model's accuracy, and
lower values indicate better performance.
• R Squared (Coefficient of Determination): R-squared is a statistical measure that
represents the proportion of the variance in the dependent variable that is explained by
the independent variables in the regression model. It ranges from 0 to 1, where 1
indicates that the model explains all the variance, and 0 indicates that the model
doesn't explain any of the variance.
UNIT-1 Regression
• Adjusted R Squared: Adjusted R-squared is a modified version
of R-squared that takes into account the number of
independent variables in the model. It penalizes the addition of
irrelevant variables that might artificially inflate the R-squared
value.
• p-Value: The p-value is a measure of the evidence against a null
hypothesis in a statistical hypothesis test. In the context of
regression analysis, p-values are used to determine whether
the coefficients of the independent variables are statistically
significant. A low p-value (typically below a significance level
like 0.05) suggests that the variable has a significant impact on
UNIT-1 Regression
• A Fraud Detection Classifier
• Objective: To detect fraud claim
• Assumption:
– The output of your fraud detection model is the probability [0.0–1.0] that a transaction is
fraudulent.
– If this probability is below 0.5, you classify the transaction as non-fraudulent; otherwise, you
classify the transaction as fraudulent.
• Methodology
– Collect 10,000 manually classified transactions, with 300 fraudulent transaction and 9,700 non-
fraudulent transactions.
– You run your classifier on every transaction, predict the class label (fraudulent or non-fraudulent)
and summarise the results in the following confusion matrix:
UNIT-1 Classification
UNIT-1 Classification
What is the Confusion Matrix? A confusion matrix is a nn matrix that is used for evaluating
the performance of the classification model. For Binary classification — The confusion Matrix is
a 22 matrix. If the target class is 3 means Confusion Matrix is 3*3 matrix and so on.
UNIT-1 Classification
Terminologies used in Confusion Matrix
•True Positive → Positive class which is predicted as positive.
•*True Negative *→ Negative class which is predicted as negative.
•False Positive → Negative class which is predicted as positive.[Type I Error]
•False Negative →Positive class which is predicted as negative.[Type II Error]
1. Recall Recall is a measure of how many positives your model is able to recall
from the data.
Out of all positive records, how many records are predicted correctly.
Recall is also known as Sensitivity or TPR (True Positive Rate)
UNIT-1 Classification
2. Precision Precision is the ratio of correct positive predictions to
the total positive predictions.
Out of all positives been predicted, how many are actually positive.
UNIT-1 Classification
Example Cancer Prediction-For this dataset, if the model predicts cancer
records as non-cancer means it’s risky. All our cancer records should be
predicted correctly.
In this example, recall metrics is more important than precision. The recall
rate should be 100%. All positive records( cancer records) should be predicted
correctly. False Negative should be 0.
For this cancer dataset, recall metrics is given more importance while
evaluating the performance of the model.
If non-cancer records are predicted as cancer means it’s not that risky.
UNIT-1 Classification
Example. The cancer data set has 100 records, out of which 94 are cancer
records and 6 are non-cancer records. But the model is predicting 90 out of 94
cancer records correctly. Four cancer records are not predicted correctly [ 4 —
FN]
UNIT-1 Classification
Precision — Example
Email Spam Filtering- For this dataset, if the model predicts good email as spam means it's
risky. We don’t want any of our good emails to be predicted as Spam. So, the precision
metric is given more importance while evaluating this model. False Positive should be 0.
If the spam filtering dataset has 100 records, out of which 94 are predicted as spam
emails. Only 90 out of 94 records is predicted correctly. 4 good emails are classified as
spam. It’s risky. The precision rate is 95%. It should be 100%. No good emails should be
classified as “Spam”. False-positive should be 0 for this model.
UNIT-1 Classification
F1 Score F1 score is a harmonic mean of precision and recall. F1 score metric is used
when you seek a balance between precision and recall.
F1 score vs Accuracy
Accuracy deals with True positive and True Negative. It doesn't mention about
False-positive and False-negative. So we are not aware of the distribution of False-
positive and False-negative. If accuracy is 95% means, we don't know how the
remaining 5% is distributed between False-positive and False-negative.
F1 Score deals with False-positive and False-negative. For some models, we want to
know about the distribution of False-negative and False positive. For those models,
the F1 Score metric is used for evaluating the performance.
UNIT-1 Classification
UNIT-1 Classification
Accuracy: Correctly predicted values out of total given data.
• Area Under Curve
• Area Under Curve(AUC) is one of the most widely used metrics for evaluation.
• It is used for binary classification problem.
• AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen
positive example higher than a randomly chosen negative example.
• Two basic terms used in AUC:
– True Positive Rate (Sensitivity)
– True Negative Rate (Specificity)
UNIT-1 Classification (AUC)
• Area Under Curve
• Few basic terms used in AUC:
– True Positive Rate (Sensitivity) : True Positive Rate is defined as TP/ (FN+TP). True Positive
Rate corresponds to the proportion of positive data points that are correctly considered as positive,
with respect to all positive data points.
– True Negative Rate (Specificity) : True Negative Rate is defined as TN / (FP+TN). False
Positive Rate corresponds to the proportion of negative data points that are correctly considered as
negative, with respect to all negative data points.
UNIT-1 Classification(AUC)
• Area Under Curve
– False Positive Rate : False Positive Rate is defined as FP / (FP+TN). False Positive Rate
corresponds to the proportion of negative data points that are mistakenly considered as
positive, with respect to all negative data points.
• False Positive Rate and True Positive Rate both have values in the range [0, 1].
• FPR and TPR both are computed at varying threshold values such as (0.00, 0.02, 0.04, …., 1.00)
and a graph is drawn.
• AUC is the area under the curve of plot False Positive Rate vs True Positive Rate at different
points in [0, 1].
UNIT-1 Classification (AUC)
• Area Under Curve
• As evident, AUC has a range of [0, 1]. The greater the value, the better is the performance of our
model.
UNIT-1 Classification (AUC)
Dr. Kumod Kumar Gupta Deep Learning Unit I 86
An ROC curve (receiver operating characteristic curve) is a graph showing the performance
of a classification model at all classification thresholds. This curve plots two parameters:
•True Positive Rate
•False Positive Rate
True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:
TPR=TP/(TP+FN)
False Positive Rate (FPR) is defined as follows:
FPR=FP/(FP+TN)
An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False Positives and
True Positives. The following figure shows a typical ROC curve.
ROC curve (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 87
ROC curve (CO1)
AUC-ROC curve
Let’s first understand the meaning of the two
terms ROC and AUC.
•ROC: Receiver Operating Characteristics
•AUC: Area Under Curve
ROC Curve
ROC stands for Receiver Operating Characteristics, and the ROC curve is the graphical representation of the
effectiveness of the binary classification model. It plots the true positive rate (TPR) vs the false positive rate (FPR)
at different classification thresholds.
AUC Curve:
AUC stands for Area Under the Curve, and the AUC curve represents the area under the ROC curve. It measures
the overall performance of the binary classification model. As both TPR and FPR range between 0 to 1, So, the area
will always lie between 0 and 1, and A greater value of AUC denotes better model performance. Our main goal is to
maximize this area in order to have the highest TPR and lowest FPR at the given threshold. The AUC measures the
probability that the model will assign a randomly chosen positive instance a higher predicted probability compared
to a randomly chosen negative instance.
88
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
TPR and FPR
This is the most common definition that you would have encountered when you would Google AUC-ROC.
Basically, the ROC curve is a graph that shows the performance of a classification model at all possible thresholds(
threshold is a particular value beyond which you say a point belongs to a particular class). The curve is plotted
between two parameters
•TPR – True Positive Rate
•FPR – False Positive Rate
89
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
• Specificity measures the proportion of actual negative instances that are correctly
identified by the model as negative.
• It represents the ability of the model to correctly identify negative instances And as said
earlier ROC is nothing but the plot between TPR and FPR across all possible thresholds
and AUC is the entire area beneath this ROC curve.
Sensitivity versus False Positive Rate plot
90
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
91
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
By changing cutoff point false positive increases
By changing cutoff point false negative increases
ROC curve can be used to determine cutoff point, which optimize the sensitivity,
specificity of a given test.
92
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
AUC measures how well a model is able to distinguish between classes.
An AUC of 0.75 would actually mean that let’s say we take two data points belonging to separate classes then there is a 75% chance the model would
be able to segregate them or rank order them correctly i.e positive point has a higher prediction probability than the negative class. (assuming a
higher prediction probability means the point would ideally belong to the positive class). Here is a small example to make things more clear.
93
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
94
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
95
Dr. Kumod Kumar Gupta Deep Learning Unit I
ROC curve (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 96
• The concept of p-value comes from statistics and widely used in machine learning and data
science.
• p-value is also used as an alternative to determine the point of rejection in order to provide the
smallest significance level at which the null hypothesis is least or rejected.
• it is expressed as the level of significance that lies between 0 and 1, and if there is smaller p-value,
then there would be strong evidence to reject the null hypothesis. if the value of p-value is very
small, then it means the observed output is feasible but doesn't lie under the null hypothesis
conditions (h0).
• the p-value of 0.05 is known as the level of significance (α). usually, it is considered using two
suggestions, which are given below:
– if p-value>0.05: the large p-value shows that the null hypothesis needs to be accepted.
– if p-value<0.05: the small p-value shows that the null hypothesis needs to be rejected, and the
result is declared as statically significant.
P-value (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 97
• Cross-validation is a statistical method used to estimate the skill of machine learning models.
• It is commonly used in applied machine learning to compare and select a model for a given
predictive modeling problem because it is easy to understand, easy to implement, and results in
skill estimates that generally have a lower bias than other methods.
• Cross-validation is a resampling procedure used to evaluate machine learning models on a
limited data sample.
• The procedure has a single parameter called k that refers to the number of groups that a given
data sample is to be split into. As such, the procedure is often called k-fold cross-validation.
• When a specific value for k is chosen, it may be used in place of k in the reference to the
model, such as k=10 becoming 10-fold cross-validation.
k-Fold Cross-Validation
Dr. Kumod Kumar Gupta Deep Learning Unit I 98
k-Fold Cross-Validation
• We can’t check the ability of this person because 70 math questions are from
algebra but in test 30 questions,10 from calculus so we can’t judge the ability of
person.
• That’s why we are going for K-Fold Cross- Validation, to get good results
Dr. Kumod Kumar Gupta Deep Learning Unit I 99
k-Fold Cross-Validation
Here K=5, total data is divided into 5 fold. First time we are using first fold for test and rest
80% for training purpose, repeat this process 5 times, after that we take average of 5
results.
https://www.youtube.com/watch?v=gJo0uNL-5Qw
Dr. Kumod Kumar Gupta Deep Learning Unit I 100
• Hyperparameters in Machine learning/Deep learning are those parameters that are explicitly defined by the
user to control the learning process.
• These hyperparameters are used to improve the learning of the model, and their values are set before
starting the learning process of the model.
• They are usually fixed before the actual training process begins.
• These parameters express important properties of the model such as its complexity or how fast it should
learn.
• Some examples of model hyper parameters include:
• The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization
• The learning rate for training a neural network.
• The C and sigma hyperparameters for support vector machines.
• The k in k-nearest neighbors.
Hyper parameter tuning(CO1)
https://www.geeksforgeeks.org/hyperparameter-tuning/
Dr. Kumod Kumar Gupta Deep Learning Unit I 101
Hyper parameter tuning(CO1)
Models can have many hyperparameters and finding the best combination of parameters can be
treated as a search problem. The two best strategies for Hyperparameter tuning are:
•GridSearchCV: Grid search Cross Validation
•RandomizedSearchCV: Randonized search Cross-Validation
In general, if the number of combinations is limited enough, we can use the Grid
Search technique. But when the number of combinations increases, we should
try Random Search or Bayes Search as they are not computationally expensive.
Dr. Kumod Kumar Gupta Deep Learning Unit I 102
Grid Search technique (CO1)
GridSearchCV is a brute-force technique for hyperparameter tuning. It trains
the model using all possible combinations of specified hyperparameter values
to find the best-performing setup. It is slow and uses a lot of computer power
which makes it hard to use with big datasets or many settings.
It works using below steps:
•Create a grid of potential values for each hyperparameter.
•Train the model for every combination in the grid.
•Evaluate each model using cross-validation.
•Select the combination that gives the highest score.
Dr. Kumod Kumar Gupta Deep Learning Unit I 103
Grid Search technique (CO1)
GridSearchCV
For example,
if we want to set two hyperparameters C and Alpha of the Logistic Regression Classifier model, with
different sets of values. The grid search technique will construct many versions of the model with all
possible combinations of hyperparameters and will return the best one.
As in the image, for C = [0.1, 0.2, 0.3, 0.4, 0.5] and Alpha = [0.1, 0.2, 0.3, 0.4]. For a combination
of C=0.3 and Alpha=0.2, the performance score comes out to be 0.726(Highest), therefore it is selected.
Dr. Kumod Kumar Gupta Deep Learning Unit I 104
Grid Search technique Code (CO1)
# Necessary imports
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
# Creating the hyperparameter grid
c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space}
# Instantiating logistic regression classifier
logreg = LogisticRegression()
# Instantiating the GridSearchCV object
logreg_cv = GridSearchCV(logreg, param_grid, cv = 5)
logreg_cv.fit(X, y)
# Print the tuned parameters and score
print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_))
print("Best score is {}".format(logreg_cv.best_score_))
https://www.geeksforgeeks.org/hyperparameter-tuning/
Dr. Kumod Kumar Gupta Deep Learning Unit I 105
Grid Search technique (CO1)
Drawback: GridSearchCV will go through all the intermediate combinations of hyperparameters
which makes grid search computationally very expensive.
• RandomizedSearchCV
RandomizedSearchCV solves the drawbacks of GridSearchCV, as it goes through only a fixed number
of hyperparameter settings.
• It moves within the grid in a random fashion to find the best set of hyperparameters. This approach
reduces unnecessary computation.
Dr. Kumod Kumar Gupta Deep Learning Unit I 106
Random Search Code (CO1)
# Necessary imports
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV
# Creating the hyperparameter grid
param_dist = {"max_depth": [3, None],
"max_features": randint(1, 9),
"min_samples_leaf": randint(1, 9),
"criterion": ["gini", "entropy"]}
# Instantiating Decision Tree classifier
tree = DecisionTreeClassifier()
# Instantiating RandomizedSearchCV object
tree_cv = RandomizedSearchCV(tree, param_dist, cv = 5)
tree_cv.fit(X, y)
# Print the tuned parameters and score
print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))
Dr. Kumod Kumar Gupta Deep Learning Unit I 107
Random Search Code (CO1)
Tuned Decision Tree Parameters: {‘min_samples_leaf’: 5, ‘max_depth’: 3, ‘max_features’: 5,
‘criterion’: ‘gini’} Best score is 0.7265625
OutPut
Dr. Kumod Kumar Gupta Deep Learning Unit I 108
• Deep learning is a class of machine learning algorithms that use several layers of nonlinear processing
units for feature extraction and transformation. Each successive layer uses the output from the
previous layer as input.
• Deep neural networks, deep belief networks and recurrent neural networks have been applied to fields
such as computer vision, speech recognition, natural language processing, audio recognition, social
network filtering, machine translation, and bioinformatics where they produced results comparable to
and in some cases better than human experts have.
• Deep Learning Algorithms and Networks −
are based on the unsupervised learning of multiple levels of features or representations of the data.
Higher-level features are derived from lower level features to form a hierarchical representation.
use some form of gradient descent for training.
Introduction to Deep Learning(CO1)
109
Here are just a few examples of deep learning at work:
• A self-driving vehicle slows down as it approaches a
pedestrian crosswalk.
• An ATM rejects a counterfeit bank note.
• A smartphone app gives an instant translation of a
foreign street sign.
• Deep learning is especially well-suited to identification
applications such as face recognition, text translation,
voice recognition, and advanced driver assistance
systems, including, lane classification and traffic sign
recognition.
Deep Learning Applications (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I
110
Some other Applications (CO1)
Used for speed of machine Digital imaging
Fraud Detection Increasing phone efficiency
Dr. Kumod Kumar Gupta Deep Learning Unit I
111
In a word, accuracy. Advanced tools and techniques have dramatically improved deep learning algorithms
—to the point where they can outperform humans at classifying images, win against the world’s best GO
player, or enable a voice-controlled assistant like Amazon Echo® and Google Home to find and download
that new song you like.
What Makes Deep Learning State-of-the-Art? (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I
112
Three technology enablers make this degree of accuracy possible:
Easy access to massive sets of labeled data Data sets such as
ImageNet and PASCAL VoC are freely available, and are useful for
training on many different types of objects.
What Makes Deep Learning State-of-the-Art? (CO1)
Increased computing power High-performance GPUs accelerate
the training of the massive amounts of data needed for deep
learning, reducing training time from weeks to hours.
Pretrained models built by experts Models such as AlexNet can be retrained to perform new recognition
tasks using a technique called transfer learning. While AlexNet was trained on 1.3 million high-resolution
images to recognize 1000 different objects, accurate transfer learning can be achieved with much smaller
datasets.
Dr. Kumod Kumar Gupta Deep Learning Unit I
AI ML DL
DL
ML
AI
Difference between AI, ML, DL (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 113
1. Huge amount of data
(Initially we started with ML, its major drawback is, its efficiency is degraded with higher data
or data sets)
(x-axis: number of data, Y-axis: efficiency )
And the solution given by deep learning, that can handled huge amount of data, which may
be structured or unstructured.
2. Complex problem
These are basically included the real time data analysis, medical diagnosis system etc., which
are handled by deep learning
Why it needed deep learning (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 114
Dr. Kumod Kumar Gupta Deep Learning Unit I 115
• The term "Artificial neural network" refers to a biologically inspired sub-field of artificial
intelligence modeled after the brain.
• An Artificial neural network is usually a computational network based on biological neural networks
that construct the structure of the human brain.
• Similar to a human brain has neurons interconnected to each other, artificial neural networks also
have neurons that are linked to each other in various layers of the networks. These neurons are known
as nodes.
Artificial Neural Network(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 116
 The Brain is A massively parallel information processing system.
 Our brains are a huge network of processing elements. A typical brain contains a network of 10 billion
neurons.
How do our brains work?(CO1)
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
117
Neural Network
To begin understanding deep Learning, We will build up our model
abstractions
• Single Biological Neuron
• Perceptron
• Multi-Layer Perceptron Model
• Deep Learning Neural Network
Dr. Kumod Kumar Gupta Deep Learning Unit I 118
 A processing element
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
Synapse: weight
How do our brains work?(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 119
An artificial neuron is an imitation of a human neuron
How do our brains work?(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 120
• Dendrites from Biological Neural Network represent inputs in
Artificial Neural Networks, cell nucleus represents Nodes,
synapse represents Weights, and Axon represents Output.
• Relationship between Biological neural network and artificial
neural network:
Biological Neural Network Artificial Neural Network
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
Biological neural network and artificial neural network(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 121
Artificial neural network(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 122
Artificial neural network(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 123
Artificial neural network(CO1)
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
124
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
125
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
126
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
127
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
128
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
129
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
130
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
131
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
132
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
133
Neural Network
09/01/2025 Dr. kumod Kumar Gupta Programming for Data analyti
cs Unit 4
134
Neural Network
Activation function is using to introduce Non-Linearity
Dr. Kumod Kumar Gupta Deep Learning Unit I 135
Our basic computational element (model neuron) is often called a node or unit. It receives input from some other
units, or perhaps from an external source. Each input has an associated weight w, which can be modified so as to
model synaptic learning. The unit computes some function f of the weighted sum of its inputs:
The typical Artificial Neural Network looks something like the given figure(CO1)
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
Artificial Neural Network primarily consists of three layers:
Input Layer:
As the name suggests, it accepts inputs in several different formats
provided by the programmer.
Hidden Layer:
• The hidden layer presents in-between input and output layers. It
performs all the calculations to find hidden features and patterns.
Output Layer:
• The input goes through a series of transformations using the hidden
layer, which finally results in output that is conveyed using this
layer.
• The artificial neural network takes input and computes the weighted
sum of the inputs and includes a bias. This computation is
represented in the form of a transfer function.
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
ARCHITECTURE OF AN ARTIFICIAL
NEURAL NETWORK
Kumod kumar Gupta Machine Learning Unit 2
Dr. Kumod Kumar Gupta Deep Learning Unit I 146
• Bipolar binary and unipolar binary are called as hard limiting activation functions used in discrete
neuron model
• Unipolar continuous and bipolar continuous are called soft limiting activation functions are called
sigmoidal characteristics.
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 147
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 148
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 149
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 150
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 151
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 152
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 153
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 154
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 155
Activation function (CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 156
Feedforward Network
• It is a non-recurrent network having processing units/nodes in layers and all the nodes in a
layer are connected with the nodes of the previous layers.
• The connection has different weights upon them.
• There is no feedback loop means the signal can only flow in one direction, from input to
output. It may be divided into the following two types −
Neural network architecture
Dr. Kumod Kumar Gupta Deep Learning Unit I 157
Neural network architecture Cont…(CO1)
•Single layer feedforward network − The concept is of feedforward ANN having only one
weighted layer. In other words, we can say the input layer is fully connected to the output layer.
Dr. Kumod Kumar Gupta Deep Learning Unit I 158
Single layer Feedforward Network
Neural network architecture Cont…(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 159
• Multilayer feedforward network − The concept is of feedforward ANN having more than
one weighted layer. As this network has one or more layers between the input and the output
layer, it is called hidden layers.
Neural network architecture Cont…(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 160
Can be used to solve complicated problems
Multilayer feed forward network(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 161
• Feedback Network :As the name suggests, a feedback network has feedback
paths, which means the signal can flow in both directions using loops. This
makes it a non-linear dynamic system, which changes continuously until it
reaches a state of equilibrium.
• Recurrent networks − They are feedback networks with closed loops. It is a
closed loop network in which the output will go to the input again as feedback
as shown in the following diagram.
Neural network architecture Cont…(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 162
When outputs are directed back as inputs to same
or preceding layer nodes it results in the formation
of feedback networks
Feedback network(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 163
• Single node with own feedback
• Competitive nets
• Single-layer recurrent network
• Multilayer recurrent networks
Feedback networks with closed loop are called Recurrent Networks. The response at the k+1’th instant depends on
the entire history of the network starting at k=0.
Automaton: A system with discrete time inputs and a discrete data representation is called an automaton
Recurrent network(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 164
FEED FORWARD UNSUPERVISED LEARNING
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 165
FEED FORWARD UNSUPERVISED LEARNING
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 166
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 167
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 168
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 169
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 170
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 171
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 172
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 173
• The learning signal is equal to the neuron’s output
FEED FORWARD UNSUPERVISED LEARNING
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 174
• Feedforward unsupervised learning
• “When an axon of a cell A is near enough to exicite a cell B and repeatedly and persistently takes
place in firing it, some growth process or change takes place in one or both cells increasing the
efficiency”
• If oixj is positive the results is increase in weight else vice versa
Features of Hebbian Learning(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 175
Final answer:
Hebbian Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 176
• For the same inputs for bipolar continuous activation function
the final updated weight is given by
Hebbian Learning Rule(CO1)
• Learning signal is the difference between the desired and actual neuron’s
response
• Learning is supervised
Perceptron Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 177
Dr. Kumod Kumar Gupta Deep Learning Unit I 178
Perceptron Learning Rule(CO1)
• Only valid for continuous activation function
• Used in supervised training mode
• Learning signal for this rule is called delta
• The aim of the delta rule is to minimize the error over all training patterns
Delta Learning Rule(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 179
Learning rule is derived from the condition of least squared error.
Calculating the gradient vector with respect to wi
Minimization of error requires the weight changes to be in the negative gradient direction
Delta Learning Rule Contd.(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 180
Dr. Kumod Kumar Gupta Deep Learning Unit I 181
A Multi-Layer Perceptron (MLP) neural network trained using the Backpropagation learning algorithm is
one of the most powerful forms of supervised neural network system.
The training of such a network involves three stages:
• feedforward of the input training pattern,
• calculation and backpropagation of the associated error
• adjustment of the weights
This procedure is repeated for each pattern over several complete passes (epochs) through the training
set.
After training, application of the net only involves the computations of the feedforward phase.
MLP training algorithm(CO1)
Dr. Kumod Kumar Gupta Deep Learning Unit I 182
Feed Forward phase:
• Xi = input[i]
• Yj = f( bj + XiWij)
• Zk = f( bk + YjWjk)
Backpropagation of errors:
• k = Zk[1 - Zk](dk - Zk)
• j = Yj[1 - Yj]  k Wjk
Weight updating:
• Wjk(t+1) = Wjk(t) + kYj + [Wjk(t) - Wjk(t - 1)]
• bk(t+1) = bk(t) + kYtn +[bk(t) - bk(t - 1)]
• Wij(t+1) = Wij(t) + jXi + [Wij(t) - Wij(t - 1)]
• bj(t+1) = bj(t) + jXtn +[bj(t) - bj(t - 1)]
Backpropagation Learning Algorithm(CO1)
183
• 1. https://nptel.ac.in/courses/117/105/117105084/
• 2. https://nptel.ac.in/courses/106/106/106106184/
• 3. https://nptel.ac.in/courses/108/105/108105103/
• 4.https://www.youtube.com/watch?
v=DKSZHN7jftI&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi
• 5.https://www.youtube.com/watch?
v=aPfkYu_qiF4&list=PLyqSpQzTE6M9gCgajvQbc68Hk_JKGBAYT
Faculty Video Links, Youtube & NPTEL Video Links and Online Courses Details
Dr. Kumod Kumar Gupta Deep Learning Unit I
184
Quiz
Dr. Kumod Kumar Gupta Deep Learning Unit I
1. Which element method for computing is started early days.
(a)Machine learning (b) Artificial intelligence (c) Deep learning (d) none of these
2. In which method for efficiency is higher with more in data set.
(b)Machine learning (b) Artificial intelligence (c) Deep learning (d) none of these
3. What is tensor flow.
(c) all the mathematics in the form of flow chart (b) Artificial intelligence algorithm (c)
Deep learning algorithm (d) none of these
185
Quiz
Dr. Kumod Kumar Gupta Deep Learning Unit I
4. What are the benefits of TensorFlow over other libraries?
(a)Scalability (b) Visualization of Data (c) Pipelining (d) all of these
5. What do you mean by pipelining?
(b)Whole work doing at a time (b) whole work divide in small segments and then
execute in parallel manner (c) copying a work from another processor (d) none of
these
186
QUIZ
Dr. Kumod Kumar Gupta Deep Learning Unit I
6. What is API?
(a)A programming interface(b) After programming interface (c)
Application Programming Interface (d) none of these.
7. What is the main operation in TensorFlow?
(b)Computing(b) calculation (c) Pipelining (d) passing values and
assigning the output to another tensor.
8. TensorFlow is the product of which company?
(c) Google research team (b) Amazon technical team (c) PayPal
(d) none of these
9. What is the execution speed of brain neuron?
(d) (b) (c) (d) none of these
187
Weekly Assignment
Dr. Kumod Kumar Gupta Deep Learning Unit I
Q1. For which purpose Convolutional Neural Network is used?
Q2. What is the biggest advantage utilizing CNN?Q3. Discuss the history of deep learning.
Q4. What is the difference between Neural Networks and Deep Learning?
Q5. How can a neural network learn itself?
Q6.Explain the Concept of ANN with the help of an example.
Q7. Define the term Gradient descent. Also discuss its importance.
Q8. Explain Perceptron Convergence Theorem.
Q9.Define the term Bias.
Q10. Why ReLu function is required ?
188
MCQ
Dr. Kumod Kumar Gupta Deep Learning Unit I
Q1. Which neural network has only one hidden layer between the input and output?
A. Shallow neural network
B. Deep neural network
C. Feed-forward neural networks
D. Recurrent neural networks
Q2. Which of the following is/are Limitations of deep learning?
A. Data labeling
B. Obtain huge training datasets
C. Both A and B
D. None of the above
189
MCQ
Dr. Kumod Kumar Gupta Deep Learning Unit I
Q3.Deep learning algorithms are _______ more accurate than machine learning algorithm in image
classification.
A. 33%
B. 37%
C. 40%
D. 41%
Q4. Which of the following functions can be used as an activation function in the output layer if we wish
to predict the probabilities of n classes (p1, p2..pk) such that sum of p over all n equals to 1?
A. Softmax
B. ReLu
C. Sigmoid
D. Tanh
190
MCQ
Dr. Kumod Kumar Gupta Deep Learning Unit I
Q5. Which of the following would have a constant input in each epoch of training a Deep Learning model?
A. Weight between input and hidden layer
B. Weight between hidden and output layer
C. Biases of all hidden layer neurons
D. Activation function of output layer
6. If in the training method we are not obtained the accurate output then which value the neural network
changes to get accurate output?
(a) bias (b) perceptron (c) weight (d) all value can change
B. 7. What are benefit of using graph in the tensor flow?
(a) parallelism (b) high execution speed (c) less complexity (d) all of these
C. 8. In between CPU and GPU which have high execution speed?
(a) GPU (b) CPU (c) Both have same speed (d) cannot distinguished
191
Old Question Papers
Dr. Kumod Kumar Gupta Deep Learning Unit I
192
Old Question Papers
Dr. Kumod Kumar Gupta Deep Learning Unit I
193
Expected Questions for University Exam
Dr. Kumod Kumar Gupta Deep Learning Unit I
Q1. Define Batch Normalization. Why Batch Normalization helps in faster
convergence?
Q2. Define Deep Learning . Also discuss its importance.
Q3. Discuss the history of deep learning.
Q4. What is the difference between Neural Networks and Deep Learning?
Q5. How can a neural network learn itself?
Q6.Explain the Concept of ANN with the help of an example.
Q7. Define the term Gradient descent. Also discuss its importance.
Q8. Explain Perceptron Convergence Theorem.
Q9.Define the term Bias.
Q10. Why ReLu function is required ?
194
Summary
Dr. Kumod Kumar Gupta Deep Learning Unit I
 Deep Learning is a subfield of machine learning concerned with algorithms inspired
by the structure and function of the brain called artificial neural networks.
 If you are just starting out in the field of deep learning or you had some experience
with neural networks some time ago, you may be confused. I know I was confused
initially and so were many of my colleagues and friends who learned and used
neural networks in the 1990s and early 2000s.
195
 1. https://www.slideshare.net/lablogga/deep-learning-explained2. Qin, T. (2020).
 Deep Learning Basics. In Dual Learning (pp. 25-46). Springer, Singapore.
 3.http://people.uncw.edu/chenc
/STT592_Deep%20Learning/STT592DeepLearning_Index.html
 4. Gulli, Antonio, and Sujit Pal. Deep learning with Keras. Packt Publishing Ltd, 2017.
References
Dr. Kumod Kumar Gupta Deep Learning Unit I
Thank You
Dr. Kumod Kumar Gupta Deep Learning Unit I 196
THANK YOU

Unit1_Kumod_deeplearning.pptx DEEP LEARNING

  • 1.
    Dr. Kumod KumarGupta Deep Learning Unit I 1 Deep Learning (BCSML0552) Dr. Kumod kr. Gupta (Associate Professor) AI Department Unit: I INTRODUCTION Course Details (B. Tech. 5th Sem) Noida Institute of Engineering and Technology
  • 2.
    Dr. Kumod KumarGupta Deep Learning Unit I 2 Faculty Introduction Name Dr. Kumod Kr. Gupta Qualification Ph.D., M. Tech Designation Associate Professor Department AI Total Experience 17 years NIET Experience 12 years Subject Taught Python Basics, Advanced Python, ML, DL
  • 3.
    Dr. Kumod KumarGupta Deep Learning Unit I 3 Evaluation Scheme Sl. No. Subject Codes Subject Name Periods Evaluation Scheme End Semester Total Credit L T P CT TA TOTAL PS TE PE 1 ACSML0602 Deep Learning 3 0 0 30 20 50 100 150 3 2 ACSML0603 Advanced Database Management Systems 3 1 0 30 20 50 100 150 4 3 ACSE0603 Software Engineering 3 0 0 30 20 50 100 150 3 4 Departmental Elective-III 3 0 0 30 20 50 100 150 3 5 Departmental Elective-IV 3 0 0 30 20 50 100 150 3 6 Open Elective-I 3 0 0 30 20 50 100 150 3 7 ACSML0652 Deep Learning Lab 0 0 2 25 25 50 1 8 ACSML0653 Advanced Database Management Systems Lab 0 0 2 25 25 50 1 9 ACSE0653 Software Engineering Lab 0 0 2 25 25 50 1 10 ACSE0659 Mini Project 0 0 2 50 50 1 11 ANC0602 / ANC0601 Essence of Indian Traditional Knowledge / Constitution of India, Law and Engineering (Non Credit) 2 0 0 30 20 50 50 100 12 MOOCs (For B.Tech. Hons. Degree) GRAND TOTAL 1100 23 Bachelor of Technology Computer Science And Engineering (Artificial Intelligence & Machine Learning) EVALUATION SCHEME SEMESTER-VI
  • 4.
    Dr. Kumod KumarGupta Deep Learning Unit I 4 Course Contents / Syllabus Module 1 Introduction 14 hours Model Improvement and Performance: Curse of Dimensionality, Bias and Variance Trade off, Overfitting and underfitting, Regression - MAE, MSE, RMSE, R Squared, Adjusted R Squared, p-Value, Classification - Precision, Recall, F1, Other topics, K-Fold Cross validation, RoC curve, Hyper-Parameter Tuning Introduction – Grid search, random search, Introduction to Deep Learning. Artificial Neural Network: Neuron, Nerve structure and synapse, Artificial Neuron and its model, activation functions, Neural network architecture: Single layer and Multilayer feed forward networks, recurrent networks. Various learning techniques; Perception and Convergence rule, Hebb Learning. Perceptron, Multilayer perceptron, Gradient descent and the Delta rule, Multilayer networks, Derivation of Backpropagation Algorithm Module 2 CONVOLUTION NEURAL NETWORK 14 hours What is computer vision? Why Convolutions (CNN)? Introduction to CNN, Train a simple convolutional neural net, Explore the design space for convolutional nets, Pooling layer motivation in CNN, Design a convolutional layered application, Understanding and visualizing a CNN, Transfer learning and fine-tuning CNN, Image classification, Text classification, Image classification and hyper-parameter tuning, Emerging NN architectures. Module 3 DETECTION & RECOGNITION 14 hours Padding & Edge Detection, Strided Convolutions, Networks in Networks and 1x1 Convolutions, Inception Network Motivation, Object Detection, YOLO Algorithm. Module 4 RECURRENT NEURAL NETWORKS 15 hours Why use sequence models? Recurrent Neural Network Model, Notation, Backpropagation through time (BTT), Different types of RNNs, Language model and sequence generation, Sampling novel sequences, Vanishing gradients with RNNs, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Bidirectional RNN, Deep RNNs Module 5 AUTO ENCODERS IN DEEP LEARNING 15 hours Auto-encoders and unsupervised learning, Stacked auto-encoders and semi-supervised learning, Regularization - Dropout and Batch normalization. Syllabus
  • 5.
    Dr. Kumod KumarGupta Deep Learning Unit I 5 Syllabus UNIT-I: Model Improvement and Performance Curse of Dimensionality, Bias and Variance Trade off, Overfitting and underfitting, Regression - MAE, MSE, RMSE, R Squared, Adjusted R Squared, p-Value, Classification - Precision, Recall, F1, Other topics, K-Fold Cross validation, RoC curve, Hyper-Parameter Tuning Introduction – Grid search, random search, Introduction to Deep Learning. Artificial Neural Network: Neuron, Nerve structure and synapse, Artificial Neuron and its model, activation functions, Neural network architecture: Single layer and Multilayer feed forward networks, recurrent networks. Various learning techniques; Perception and Convergence rule, Hebb Learning. Perceptron’s, Multilayer perceptron, Gradient descent and the Delta rule, Multilayer networks, Derivation of Backpropagation Algorithm.
  • 6.
    Dr. Kumod KumarGupta Deep Learning Unit I 6 Syllabus UNIT-II: CONVOLUTION NEURAL NETWORK What is computer vision? Why Convolutions (CNN)? Introduction to CNN, Train a simple convolutional neural net, Explore the design space for convolutional nets, Pooling layer motivation in CNN, Design a convolutional layered application, Understanding and visualizing a CNN, Transfer learning and fine-tuning CNN, Image classification, Text classification, Image classification and hyper-parameter tuning, Emerging NN architectures
  • 7.
    Dr. Kumod KumarGupta Deep Learning Unit I 7 Syllabus UNIT-III:DETECTION & RECOGNITION Padding & Edge Detection, Strided Convolutions, Networks in Networks and 1x1Convolutions, Inception Network Motivation, Object Detection, YOLO Algorithm.
  • 8.
    Dr. Kumod KumarGupta Deep Learning Unit I 8 Syllabus UNIT-IV: RECURRENT NEURAL NETWORKS Why use sequence models? Recurrent Neural Network Model, Notation, Back-propagation through time (BTT), Different types of RNNs, Language model and sequence generation, Sampling novel sequences, Vanishing gradients with RNNs, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Bidirectional RNN, Deep RNNs
  • 9.
    Dr. Kumod KumarGupta Deep Learning Unit I 9 Syllabus UNIT-V: AUTO ENCODERS IN DEEP LEARNING Auto-encoders and unsupervised learning, Stacked auto-encoders and semi-supervised learning, Regularization - Dropout and Batch normalization.
  • 10.
    Dr. Kumod KumarGupta Deep Learning Unit I 10 Course Objective To be able to learn unsupervised techniques and provide continuous improvement in accuracy and outcomes of various datasets with more reliable and concise analysis results.
  • 11.
    Dr. Kumod KumarGupta Deep Learning Unit I 11 Course Outcome (CO) Course Outcome ( CO) At the end of course , the student will be able to: Bloom’s Knowledge Level (KL) CO1 Analyze ANN model and understand the ways of accuracy measurement. K4 CO2 Develop a convolutional neural network for multi-class classification in images K6 CO3 Apply Deep Learning algorithm to detect and recognize an object. K3 CO4 Apply RNNs to Time Series Forecasting, NLP, Text and Image Classification K4 CO5 Apply Lower-dimensional representation over higher- dimensional data for dimensionality reduction and capture the important features of an object. K3
  • 12.
    Dr. Kumod KumarGupta Deep Learning Unit I 12 Program Outcomes (POs) Engineering Graduates will be able to: PO1 : Engineering Knowledge PO2 : Problem Analysis PO3 : Design/Development of solutions PO4 : Conduct Investigations of complex problems PO5 : Modern tool usage PO6 : The engineer and society
  • 13.
    Dr. Kumod KumarGupta Deep Learning Unit I 13 Program Outcomes (POs) Engineering Graduates will be able to: PO7 : Environment and sustainability PO8 : Ethics PO9 : Individual and teamwork PO10 : Communication PO11 : Project management and finance PO12 : Life-long learning
  • 14.
    Dr. Kumod KumarGupta Deep Learning Unit I 14 CO-PO Mapping CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 CO1 3 3 3 3 2 2 1 - 1 - 2 2 CO2 3 3 3 3 2 2 1 - 1 1 2 2 CO3 3 3 3 3 3 2 2 - 2 1 2 3 CO4 3 3 3 3 3 2 2 1 2 1 2 3 CO5 3 3 3 3 3 2 2 1 2 1 2 2 AVG 3.0 3.0 3.0 3.0 2.6 2.0 1.6 0.4 1.6 0.8 2.0 2.4
  • 15.
    Dr. Kumod KumarGupta Deep Learning Unit I 15 Result Analysis 2022-2023 (Even semester ) Institute Result FACULTY NAME BRANCH/SECTION RESULT
  • 16.
    Dr. Kumod KumarGupta Deep Learning Unit I 16 Pattern of Online External Exam Question Paper (100 marks)
  • 17.
    Dr. Kumod KumarGupta Deep Learning Unit I 17 Pattern of Online External Exam Question Paper (100 marks)
  • 18.
    Dr. Kumod KumarGupta Deep Learning Unit I 18 Pattern of Online External Exam Question Paper (100 marks)
  • 19.
    Dr. Kumod KumarGupta Deep Learning Unit I 19 Pattern of Online External Exam Question Paper (100 marks)
  • 20.
    Dr. Kumod KumarGupta Deep Learning Unit I 20 Pattern of Online External Exam Question Paper (100 marks)
  • 21.
    Dr. Kumod KumarGupta Deep Learning Unit I 21 Pattern of Online External Exam Question Paper (100 marks)
  • 22.
    Dr. Kumod KumarGupta Deep Learning Unit I 22 Model Improvement and Performance: • Curse of Dimensionality, • Bias and Variance Trade off • Overfitting and underfitting, • Regression - MAE, MSE, RMSE, • R Squared, Adjusted R Squared, p-Value, • Classification - Precision, Recall, F1, • Other topics, K-Fold Cross validation, • RoC curve, • Hyper-Parameter Tuning Introduction – Grid search, random search, • Introduction to Deep Learning. Artificial Neural Network: • Neuron, Nerve structure and synapse, • Artificial Neuron and its model, • activation functions, • Neural network architecture: Single layer and Multilayer feed forward networks, recurrent networks. • Various learning techniques; Perception and Convergence rule, Hebb Learning. Perceptron’s, Multilayer perceptron, Gradient descent and the Delta rule, • Multilayer networks, • Derivation of Backpropagation Algorithm. Unit I Content
  • 23.
    Dr. Kumod KumarGupta Deep Learning Unit I 23 Analyze ANN model and understand the ways of accuracy measurement. Unit I Objective
  • 24.
    Dr. Kumod KumarGupta Deep Learning Unit I 24 • Python, Basic Modeling Concepts Topis Prerequisite
  • 25.
    Dr. Kumod KumarGupta Deep Learning Unit I 25 To be able to learn unsupervised techniques and provide continuous improvement in accuracy and outcomes of various datasets with more reliable and concise analysis results. Analyze ANN model and understand the ways of accuracy measurement. Topic Objective
  • 26.
    Dr. Kumod KumarGupta Deep Learning Unit I 26 Model Improvement and Performance Unit 1 Introduction Curse of Dimensionality, Bias and Variance Trade off, Overfitting and underfitting, Regression - MAE, MSE, RMSE, R Squared, Adjusted R Squared, p-Value, Classification - Precision, Recall, F1, Other topics, K-Fold Cross validation, RoC curve, Hyper-Parameter Tuning Introduction – Grid search, random search, Introduction to Deep Learning.
  • 27.
    • Increasing thenumber of features will not always improve classification accuracy. • In practice, the inclusion of more features might actually lead to worse performance. • The number of training examples required increases exponentially with dimensionality d (i.e., kd ). 32 bins 33 bins 31 bins k=3 Dr. Kumod Kumar Gupta Deep Learning Unit I 27 CURSE OF DIMENSIONALITY
  • 28.
    Dr. Kumod KumarGupta Deep Learning Unit I 28 CURSE OF DIMENSIONALITY Problem Effect in High Dimensions Data sparsity • Data Sparsity means that in a given dataset, most of the possible values or combinations of features are empty or have very few data points. • Hard to find dense regions or clusters; neighborhood methods (k-NN) fail. Overfitting Too many features → model memorizes noise rather than learning patterns. Distance metrics degrade Distances between points become similar, reducing discrimination power. Exponential growth of computation More features mean heavier calculations and storage requirements. Increased sample requirement Need exponentially more samples to maintain statistical significance.
  • 29.
    29 • What isthe objective? – Choose an optimum set of features of lower dimensionality to improve classification accuracy. • Different methods can be used to reduce dimensionality: – Feature extraction – Feature selection Dimensionality Reduction (CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I
  • 30.
    30 Dimensionality Reduction (CO1) Dr.Kumod Kumar Gupta Deep Learning Unit I There are two components of dimensionality reduction: •Feature selection: In this, we try to find a subset of the original set of variables, or features, to get a smaller subset which can be used to model the problem. It usually involves three ways: • Filter • Wrapper • Embedded •Feature extraction: This reduces the data in a high dimensional space to a lower dimension space, i.e. a space with lesser no. of dimensions. Methods of Dimensionality Reduction The various methods used for dimensionality reduction include: •Principal Component Analysis (PCA) •Linear Discriminant Analysis (LDA) •Generalized Discriminant Analysis (GDA) Dimensionality reduction may be both linear and non-linear, depending upon the method used. The prime linear method, called Principal Component Analysis, or PCA, is discussed below.
  • 31.
    31 Dimensionality Reduction (CO1) Dr.Kumod Kumar Gupta Deep Learning Unit I Type How it Works Forward Selection Start with no features → add one at a time → keep if performance improves. Backward Elimination Start with all features → remove one at a time → drop if performance improves or stays the same. Recursive Feature Elimination (RFE) Train model → remove least important feature(s) → repeat until desired number remains. Types of Wrapper Methods
  • 32.
    32 Dimensionality Reduction (CO1) Dr.Kumod Kumar Gupta Deep Learning Unit I Advantages of Dimensionality Reduction •It helps in data compression, and hence reduced storage space. •It reduces computation time. •It also helps remove redundant features, if any. •Improved Visualization: High dimensional data is difficult to visualize, and dimensionality reduction techniques can help in visualizing the data in 2D or 3D, which can help in better understanding and analysis. •Overfitting Prevention: High dimensional data may lead to overfitting in machine learning models, which can lead to poor generalization performance. Dimensionality reduction can help in reducing the complexity of the data, and hence prevent overfitting. •Feature Extraction: Dimensionality reduction can help in extracting important features from high dimensional data, which can be useful in feature selection for machine learning models. •Data Preprocessing: Dimensionality reduction can be used as a preprocessing step before applying machine learning algorithms to reduce the dimensionality of the data and hence improve the performance of the model. •Improved Performance: Dimensionality reduction can help in improving the performance of machine learning models by reducing the complexity of the data, and hence reducing the noise and irrelevant information in the data.
  • 33.
    33 Dimensionality Reduction (CO1) Dr.Kumod Kumar Gupta Deep Learning Unit I Disadvantages of Dimensionality Reduction •It may lead to some amount of data loss. •PCA tends to find linear correlations between variables, which is sometimes undesirable. •PCA fails in cases where mean and covariance are not enough to define datasets. •We may not know how many principal components to keep- in practice, some thumb rules are applied. •Interpretability: The reduced dimensions may not be easily interpretable, and it may be difficult to understand the relationship between the original features and the reduced dimensions. •Overfitting: In some cases, dimensionality reduction may lead to overfitting, especially when the number of components is chosen based on the training data. •Sensitivity to outliers: Some dimensionality reduction techniques are sensitive to outliers, which can result in a biased representation of the data. •Computational complexity: Some dimensionality reduction techniques, such as manifold learning, can be computationally intensive, especially when dealing with large datasets.
  • 34.
    Dr. Kumod KumarGupta Deep Learning Unit I 34 Bias-Variance Tradeoff (CO1) • It is important to understand prediction errors (bias and variance) when it comes to accuracy in any machine-learning algorithm. • There is a tradeoff between a model’s ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant. • A proper understanding of these errors would help to avoid the overfitting and underfitting of a data set while training the algorithm.
  • 35.
    Dr. Kumod KumarGupta Deep Learning Unit I 35 Bias(CO1) What is Bias? • The bias is known as the difference between the prediction of the values by the Machine Learning model and the correct value. • Being high in biasing gives a large error in training as well as testing data. • It recommended that an algorithm should always be low-biased to avoid the problem of underfitting. • By high bias, the data predicted is in a straight line format, thus not fitting accurately in the data in the data set. Such fitting is known as the Underfitting of Data. This happens when the hypothesis is too simple or linear in nature. High Bias in the Model
  • 36.
    Dr. Kumod KumarGupta Deep Learning Unit I 36 Variance(CO1) What is Variance? • The variability of model prediction for a given data point which tells us the spread of our data is called the variance of the model. • The model with high variance has a very complex fit to the training data and thus is not able to fit accurately on the data which it hasn’t seen before. As a result, such models perform very well on training data but have high error rates on test data. • When a model is high on variance, it is then said to as Overfitting of Data. • Overfitting is fitting the training set accurately via complex curve and high order hypothesis but is not the solution as the error with unseen data is high. While training a data model variance should be kept low. The high variance data looks as follows. High Variance in the Model
  • 37.
    Dr. Kumod KumarGupta Deep Learning Unit I 37 Variance(CO1) Bias and Variance Trade-Off
  • 38.
    Dr. Kumod KumarGupta Deep Learning Unit I 38 Bias- Variance trade off (CO1) Bias- Variance Trade-off Bias and variance should be low
  • 39.
    Dr. Kumod KumarGupta Deep Learning Unit I 39 • In supervised learning, underfitting happens when a model unable to capture the underlying pattern of the data. These models usually have high bias and low variance. It happens when we have very less amount of data to build an accurate model or when we try to build a linear model with a nonlinear data. Also, these kind of models are very simple to capture the complex patterns in data like Linear and logistic regression. Underfitting(CO1) Reasons for Underfitting 1.High bias and low variance. 2.The size of the training dataset used is not enough. 3.The model is too simple. 4.Training data is not cleaned and also contains noise in it. Techniques to Reduce Underfitting 5.Increase model complexity. 6.Increase the number of features, performing feature engineering. 7.Remove noise from the data. 8.Increase the number of epochs or increase the duration of training to get better results.
  • 40.
    Dr. Kumod KumarGupta Deep Learning Unit I 40 • In supervised learning, Overfitting happens when our model captures the noise along with the underlying pattern in data. It happens when we train our model a lot over noisy dataset. These models have low bias and high variance. These models are very complex like Decision trees which are prone to overfitting. Overfitting(CO1) Overfitting is a problem where the evaluation of machine learning algorithms on training data is different from unseen data. Reasons for Overfitting: 1. High variance and low bias. 2.The model is too complex. 3.The size of the training data. Techniques to Reduce Overfitting 4.Increase training data. 5.Reduce model complexity. 6.Early stopping during the training phase (have an eye over the loss over the training period as soon as loss begins to increase stop training). 7.Ridge Regularization and Lasso Regularization. 8.Use dropout for neural networks to tackle overfitting. 9.Cross- Validation (K- Fold Cross Validation) 10.Batch normalization
  • 41.
    Dr. Kumod KumarGupta Deep Learning Unit I 41 Overfitting(CO1) Underfitting and Overfitting in Machine Learning
  • 42.
    Dr. Kumod KumarGupta Deep Learning Unit I 42 Overfitting(CO1) Regularization • The word “regularize” means to make things regular or acceptable. • This is exactly why we use it for. Regularization is a form of regression used to reduce the error by fitting a function appropriately on the given training set and avoid overfitting. • It discourages the fitting of a complex model, thus reducing the variance and chances of overfitting. It is used in the case of multicollinearity (when independent variables are highly correlated). the equation of Linear Regression. Let be the prediction made. We also introduced the concept of loss functions. We will use one such loss function in this post - Residual Sum of Squares (RSS). It can be mathematically given as:
  • 43.
    Dr. Kumod KumarGupta Deep Learning Unit I 43 Solution of Overfitting(CO1) Regularization can be of two kinds, 1. Ridge / L2 Regularization 2. Lasso Regression/L1 Regularization Ridge Regression / L2 Regularization In this regression, we add a penalty term to the RSS loss function. Our modified loss function now becomes: • Here, λ is called the “tuning parameter” which decides how heavily we want to penalize the flexibility of our model. • If we look closely, we might observe that if λ=0, it performs like linear regression • as λ→inf, the impact of the shrinkage penalty grows, and the ridge regression coe cient estimates ffi will approach zero. • As can be seen, selecting a good value of λ is critical. The coefficient estimates produced by this method are sometimes also known as the “L2 norm”.
  • 44.
    Dr. Kumod KumarGupta Deep Learning Unit I 44 Solution of Overfitting(CO1) Lasso Regression / L1 Regularization This regression adopts the same idea as Ridge Regression with a change in the penalty term. Instead of , we use Thus our new loss function becomes: this is sometimes called the “L1 norm”.
  • 45.
    Dr. Kumod KumarGupta Deep Learning Unit I 45 Solution of Overfitting(CO1) • Note: • The tuning parameter λ controls the impact on bias and variance. • As the value of λ rises, it reduces the value of coefficients and thus reducing the variance. • Till a point, this increase in λ is beneficial as it is only reducing the variance (hence avoiding overfitting), without losing any important properties in the data. • But after a certain value, the model starts losing important properties, giving rise to bias in the model and thus underfitting. Therefore, the value of λ should be carefully selected. • λ is optimized using cross-validation(K –Fold Cross Validation)
  • 46.
    Dr. Kumod KumarGupta Deep Learning Unit I 46 Solution of Overfitting(CO1) Regularization: • The regularization model promotes smoother functions by creating a new criterion function that relies not only on the training error, but also on algorithmic intricacy. • Particularly, the new criterion function punishes extremely complex hypotheses; looking for the minimum in this criterion is to balance error on the training set with complexity. • Formally, it is possible to write the new criterion as a sum of the error on the training set plus a regularization term, which depicts constraints or sought after properties of solutions • The second term penalizes complex hypotheses with large variance. • When we minimize augmented error function instead of the error on data only, we penalize complex hypotheses and thus decrease variance. • When λ is taken too large, only very simple functions are allowed and we risk introducing bias. λ is optimized using cross-validation
  • 47.
    Dr. Kumod KumarGupta Deep Learning Unit I 47 Solution of Overfitting(CO1) • We consider here the example of neural network hypotheses class . • The hypothesis complexity may be expressed as, • The regularizer encourages smaller weights . • For small values of weights, the network mapping is approximately linear. • Relatively large values of weights lead to overfitted mapping with regions of large curvature
  • 48.
    Dr. Kumod KumarGupta Deep Learning Unit I 48 Solution of Overfitting(CO1) Early ­ Stopping:­ • The training of a learning machine corresponds to iterative decrease in the error function defined as per the training data. • During a specific training session, this error generally reduces as a function of the number of iterations in the algorithm. • Stopping the training before attaining a minimum training error, represents a technique of restricting the effective hypothesis complexity. Pruning:­ • An alternative solution that sometimes is more successful than early stopping the growth (complexity) of the hypothesis is pruning the full-grown hypothesis that is likely to be overfitting the training data. • Pruning is the basis of search in many decision-tree algorithms; weakest branches of large tree overfitting the training data, which hardly reduce the error rate, are removed.
  • 49.
    • Regression analysisis a set of statistical methods used for the estimation of relationships between a dependent variable and one or more independent variables. • Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. • The most common models are simple linear and multiple linear. • Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship. UNIT-1 Regression
  • 50.
    • Regression Analysis –Simple Linear Regression: A model that assesses the relationship between a dependent variable and an independent variable Y = mx + c + e – Where: • Y – Dependent variable • x – Independent (explanatory) variable • c – Intercept • m – Slope • e – Residual (error) UNIT-1 Regression
  • 51.
    • Multiple linearregression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model. • The mathematical representation of multiple linear regression is: Y = a + bX1 + cX2 + dX3 + ϵ • Where: • Y – Dependent variable • X1, X2, X3 – Independent (explanatory) variables • a – Intercept • b, c, d – Slopes • ϵ – Residual (error) UNIT-1 Regression
  • 52.
    • Loss Function –Loss function is a way to know the performance of a model. – High Loss function leads to bad train model and low loss function leads to good train model. – loss function should be as minimum as possible. – Loss function calculated over a single training data. L = (Actual_Value - Predicted_Value)2 – Loss function Sometime also known as error function. • Cost Function – Cost function calculated for complete batch of data C = 2 UNIT-1 Regression
  • 53.
    – Example forLoss and Cost Function UNIT-1 Regression Roll No. CGPA IQ Actual_Value Predicted_Value Loss Function Cost Function Package Predicted 1 5.2 100 6.3 6.4 0.01 3.475 2 4.3 91 4.5 5.3 0.64 3 8.2 83 6.5 5.2 1.69 4 8.9 102 5.5 8.9 11.56 NOTE: Loss function calculated for Individual Data while Cost Function calculate for Entire Dataset
  • 54.
    • MAE (MeanAbsolute Error): MAE is a metric that measures the average absolute difference between the predicted values and the actual values. It gives an idea of how far off the predictions are from the true values, regardless of the direction of the error. L = |Actual_Value - Predicted_Value| C = UNIT-1 Regression
  • 55.
    • Advantages – Easyto Understand – Same unit as unit of Actual_Value – It is Robust to Outlier: It means outlier will not affect error, so if there is no outliers in dataset then it better to use MAE instead of MSE • Disadvantages – Grap is not differenciable due which Gradient Descent(GD) algorithm not easy to implement. – To implement GD we need to calculate Sub-Gradient. UNIT-1 Regression
  • 56.
    UNIT-1 Regression Actual values(y): [3, 5, 2, 7,] Predicted values (ŷ): [2.5, 5.5, 2, 8]
  • 57.
    • MSE (MeanSquared Error): MSE is a metric that calculates the average squared difference between the predicted values and the actual values. • Squaring the errors gives more weight to larger errors, making it useful for penalizing significant deviations from the true values. L = (Actual_Value - Predicted_Value)2 C = 2 UNIT-1 Regression
  • 58.
    • Advantages – Easyto interpret – Loss function is differenciable that allows to implement GD easily – One Local Minima: It means function has one minimum value that we have to find. • Disadvantage – Unit of error is Square: That creates an confusion to understand it, so to extract accurate error we have to find square root of MSE. – It is not Robust to Outlier: If dataset consists outliers then. MSE is not useful UNIT-1 Regression
  • 59.
    • Huber loss •Huber Loss is applicable when Outlier data is around 25% because 25% is a significant amount of data and if we use MSE then it will ignore the 75% data which is correct, because graph will deviate towards Outliers and if we use MAE, it will ignore 25% outlier data that is also significant. In this type of situation Huber Loss is useful. UNIT-1 Regression
  • 60.
    • RMSE • Itquantifies the differences between predicted values and actual values, squaring the errors, taking the mean, and then finding the square root. • RMSE provides a clear understanding of the model’s performance, with lower values indicating better predictive accuracy. • RMSE is computed by taking the square root of MSE • RMSE value with zero indicates that the model has a perfect fit UNIT-1 Regression
  • 61.
    • RMSE • Thelower the RMSE, the better the model and its predictions. • A higher RMSE indicates that there is a large deviation from the residual to the ground truth. UNIT-1 Regression
  • 62.
    • Pros ofthe RMSE Evaluation Metric: – RMSE is easy to understand. – It serves as a heuristic for training models. – It is computationally simple and easily differentiable which many optimization algorithms desire. – RMSE does not penalize the errors as much as MSE does due to the square root. • Cons of the RMSE metric: – Like MSE, RMSE is dependent on the scale of the data. It increases in magnitude if the scale of the error increases. – One major drawback of RMSE is its sensitivity to outliers and the outliers have to be removed for it to function properly. UNIT-1 Regression
  • 63.
    UNIT-1 Regression(USE ofMAE, MSE, and RMSE) •MAE Example use: Predicting delivery time, demand forecasting, house prices (when big and small errors should be treated equally). MSE Example use: Medical predictions, credit risk, fault detection (where a large error is much worse than small ones). RMSE Example use: Weather forecasting, energy load prediction, traffic prediction (applications where occasional big errors are unacceptable). Quick Rules: •MAE: Robust, easy to explain → Good for reporting general accuracy. •MSE: Sensitive to large errors → Good for training. •RMSE: Sensitive + interpretable → Good for evaluation.
  • 64.
    • R Squared •R-squared (Coefficient of Determination) is a statistical measure that quantifies the proportion of the variance in the dependent variable that is explained by the independent variables in a regression model. • Where: – SSR (Sum of Squares Residual) represents the sum of squared differences between the observed values and the predicted values by the model. – SST (Total Sum of Squares) represents the sum of squared differences between the observed values and the mean of the dependent variable. UNIT-1 Regression
  • 65.
    • R-squared rangesbetween 0 and 1, with the following interpretations: – =0: The model does not explain any of the variability in the dependent variable. It's a poor fit. – : The model explains a proportion of the variability. A higher R-squared indicates a better fit, with 1 indicating a perfect fit where the model explains all the variability. – =1: The model perfectly predicts the dependent variable based on the independent variables. UNIT-1 Regression
  • 66.
    • R-squared evaluatesregression model fit but has limitations: • High R-squared doesn't always mean good fit; high value may imply overfitting, lacking generalization. • Including more predictors can inflate R-squared, even if they're weak; adjusted R-squared adjusts for this. • "Good" R-squared varies by field; lower values acceptable in data-rich areas. • R-squared may miss fit quality with nonlinearity or outliers. UNIT-1 Regression
  • 67.
    • Adjusted RSquared • Where − – n = the number of points in your data sample. – k = the number of independent regressors, i.e. the number of variables in your model, excluding the constant. UNIT-1 Regression
  • 68.
    • Adjusted RSquared – Adjusted R-squared adjusts the statistic based on the number of independent variables in the model – Adjusted R2 also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. – If you add more and more useless variables to a model, adjusted r-squared will decrease. – If you add more useful variables, adjusted r-squared will increase. – Adjusted R2 will always be less than or equal to R2 UNIT-1 Regression
  • 69.
    • Adjusted RSquared – Problem Statement − • A fund has a sample R-squared value close to 0.5 and it is doubtlessly offering higher risk adjusted returns with the sample size of 50 for 5 predictors. Find Adjusted R square value. – Sample size = 50 Number of predictor = 5 Sample R - square = 0.5.Substitute the qualities in the equation, UNIT-1 Regression
  • 70.
    • RMSE (RootMean Squared Error): RMSE is the square root of the MSE and is commonly used to express the average magnitude of the prediction errors in the same units as the dependent variable. It provides a measure of the model's accuracy, and lower values indicate better performance. • R Squared (Coefficient of Determination): R-squared is a statistical measure that represents the proportion of the variance in the dependent variable that is explained by the independent variables in the regression model. It ranges from 0 to 1, where 1 indicates that the model explains all the variance, and 0 indicates that the model doesn't explain any of the variance. UNIT-1 Regression
  • 71.
    • Adjusted RSquared: Adjusted R-squared is a modified version of R-squared that takes into account the number of independent variables in the model. It penalizes the addition of irrelevant variables that might artificially inflate the R-squared value. • p-Value: The p-value is a measure of the evidence against a null hypothesis in a statistical hypothesis test. In the context of regression analysis, p-values are used to determine whether the coefficients of the independent variables are statistically significant. A low p-value (typically below a significance level like 0.05) suggests that the variable has a significant impact on UNIT-1 Regression
  • 72.
    • A FraudDetection Classifier • Objective: To detect fraud claim • Assumption: – The output of your fraud detection model is the probability [0.0–1.0] that a transaction is fraudulent. – If this probability is below 0.5, you classify the transaction as non-fraudulent; otherwise, you classify the transaction as fraudulent. • Methodology – Collect 10,000 manually classified transactions, with 300 fraudulent transaction and 9,700 non- fraudulent transactions. – You run your classifier on every transaction, predict the class label (fraudulent or non-fraudulent) and summarise the results in the following confusion matrix: UNIT-1 Classification
  • 73.
    UNIT-1 Classification What isthe Confusion Matrix? A confusion matrix is a nn matrix that is used for evaluating the performance of the classification model. For Binary classification — The confusion Matrix is a 22 matrix. If the target class is 3 means Confusion Matrix is 3*3 matrix and so on.
  • 74.
    UNIT-1 Classification Terminologies usedin Confusion Matrix •True Positive → Positive class which is predicted as positive. •*True Negative *→ Negative class which is predicted as negative. •False Positive → Negative class which is predicted as positive.[Type I Error] •False Negative →Positive class which is predicted as negative.[Type II Error] 1. Recall Recall is a measure of how many positives your model is able to recall from the data. Out of all positive records, how many records are predicted correctly. Recall is also known as Sensitivity or TPR (True Positive Rate)
  • 75.
    UNIT-1 Classification 2. PrecisionPrecision is the ratio of correct positive predictions to the total positive predictions. Out of all positives been predicted, how many are actually positive.
  • 76.
    UNIT-1 Classification Example CancerPrediction-For this dataset, if the model predicts cancer records as non-cancer means it’s risky. All our cancer records should be predicted correctly. In this example, recall metrics is more important than precision. The recall rate should be 100%. All positive records( cancer records) should be predicted correctly. False Negative should be 0. For this cancer dataset, recall metrics is given more importance while evaluating the performance of the model. If non-cancer records are predicted as cancer means it’s not that risky.
  • 77.
    UNIT-1 Classification Example. Thecancer data set has 100 records, out of which 94 are cancer records and 6 are non-cancer records. But the model is predicting 90 out of 94 cancer records correctly. Four cancer records are not predicted correctly [ 4 — FN]
  • 78.
    UNIT-1 Classification Precision —Example Email Spam Filtering- For this dataset, if the model predicts good email as spam means it's risky. We don’t want any of our good emails to be predicted as Spam. So, the precision metric is given more importance while evaluating this model. False Positive should be 0. If the spam filtering dataset has 100 records, out of which 94 are predicted as spam emails. Only 90 out of 94 records is predicted correctly. 4 good emails are classified as spam. It’s risky. The precision rate is 95%. It should be 100%. No good emails should be classified as “Spam”. False-positive should be 0 for this model.
  • 79.
    UNIT-1 Classification F1 ScoreF1 score is a harmonic mean of precision and recall. F1 score metric is used when you seek a balance between precision and recall. F1 score vs Accuracy Accuracy deals with True positive and True Negative. It doesn't mention about False-positive and False-negative. So we are not aware of the distribution of False- positive and False-negative. If accuracy is 95% means, we don't know how the remaining 5% is distributed between False-positive and False-negative. F1 Score deals with False-positive and False-negative. For some models, we want to know about the distribution of False-negative and False positive. For those models, the F1 Score metric is used for evaluating the performance.
  • 80.
  • 81.
    UNIT-1 Classification Accuracy: Correctlypredicted values out of total given data.
  • 82.
    • Area UnderCurve • Area Under Curve(AUC) is one of the most widely used metrics for evaluation. • It is used for binary classification problem. • AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example. • Two basic terms used in AUC: – True Positive Rate (Sensitivity) – True Negative Rate (Specificity) UNIT-1 Classification (AUC)
  • 83.
    • Area UnderCurve • Few basic terms used in AUC: – True Positive Rate (Sensitivity) : True Positive Rate is defined as TP/ (FN+TP). True Positive Rate corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. – True Negative Rate (Specificity) : True Negative Rate is defined as TN / (FP+TN). False Positive Rate corresponds to the proportion of negative data points that are correctly considered as negative, with respect to all negative data points. UNIT-1 Classification(AUC)
  • 84.
    • Area UnderCurve – False Positive Rate : False Positive Rate is defined as FP / (FP+TN). False Positive Rate corresponds to the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. • False Positive Rate and True Positive Rate both have values in the range [0, 1]. • FPR and TPR both are computed at varying threshold values such as (0.00, 0.02, 0.04, …., 1.00) and a graph is drawn. • AUC is the area under the curve of plot False Positive Rate vs True Positive Rate at different points in [0, 1]. UNIT-1 Classification (AUC)
  • 85.
    • Area UnderCurve • As evident, AUC has a range of [0, 1]. The greater the value, the better is the performance of our model. UNIT-1 Classification (AUC)
  • 86.
    Dr. Kumod KumarGupta Deep Learning Unit I 86 An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: •True Positive Rate •False Positive Rate True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows: TPR=TP/(TP+FN) False Positive Rate (FPR) is defined as follows: FPR=FP/(FP+TN) An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve. ROC curve (CO1)
  • 87.
    Dr. Kumod KumarGupta Deep Learning Unit I 87 ROC curve (CO1) AUC-ROC curve Let’s first understand the meaning of the two terms ROC and AUC. •ROC: Receiver Operating Characteristics •AUC: Area Under Curve ROC Curve ROC stands for Receiver Operating Characteristics, and the ROC curve is the graphical representation of the effectiveness of the binary classification model. It plots the true positive rate (TPR) vs the false positive rate (FPR) at different classification thresholds. AUC Curve: AUC stands for Area Under the Curve, and the AUC curve represents the area under the ROC curve. It measures the overall performance of the binary classification model. As both TPR and FPR range between 0 to 1, So, the area will always lie between 0 and 1, and A greater value of AUC denotes better model performance. Our main goal is to maximize this area in order to have the highest TPR and lowest FPR at the given threshold. The AUC measures the probability that the model will assign a randomly chosen positive instance a higher predicted probability compared to a randomly chosen negative instance.
  • 88.
    88 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1) TPR and FPR This is the most common definition that you would have encountered when you would Google AUC-ROC. Basically, the ROC curve is a graph that shows the performance of a classification model at all possible thresholds( threshold is a particular value beyond which you say a point belongs to a particular class). The curve is plotted between two parameters •TPR – True Positive Rate •FPR – False Positive Rate
  • 89.
    89 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1) • Specificity measures the proportion of actual negative instances that are correctly identified by the model as negative. • It represents the ability of the model to correctly identify negative instances And as said earlier ROC is nothing but the plot between TPR and FPR across all possible thresholds and AUC is the entire area beneath this ROC curve. Sensitivity versus False Positive Rate plot
  • 90.
    90 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1)
  • 91.
    91 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1) By changing cutoff point false positive increases By changing cutoff point false negative increases ROC curve can be used to determine cutoff point, which optimize the sensitivity, specificity of a given test.
  • 92.
    92 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1) AUC measures how well a model is able to distinguish between classes. An AUC of 0.75 would actually mean that let’s say we take two data points belonging to separate classes then there is a 75% chance the model would be able to segregate them or rank order them correctly i.e positive point has a higher prediction probability than the negative class. (assuming a higher prediction probability means the point would ideally belong to the positive class). Here is a small example to make things more clear.
  • 93.
    93 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1)
  • 94.
    94 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1)
  • 95.
    95 Dr. Kumod KumarGupta Deep Learning Unit I ROC curve (CO1)
  • 96.
    Dr. Kumod KumarGupta Deep Learning Unit I 96 • The concept of p-value comes from statistics and widely used in machine learning and data science. • p-value is also used as an alternative to determine the point of rejection in order to provide the smallest significance level at which the null hypothesis is least or rejected. • it is expressed as the level of significance that lies between 0 and 1, and if there is smaller p-value, then there would be strong evidence to reject the null hypothesis. if the value of p-value is very small, then it means the observed output is feasible but doesn't lie under the null hypothesis conditions (h0). • the p-value of 0.05 is known as the level of significance (α). usually, it is considered using two suggestions, which are given below: – if p-value>0.05: the large p-value shows that the null hypothesis needs to be accepted. – if p-value<0.05: the small p-value shows that the null hypothesis needs to be rejected, and the result is declared as statically significant. P-value (CO1)
  • 97.
    Dr. Kumod KumarGupta Deep Learning Unit I 97 • Cross-validation is a statistical method used to estimate the skill of machine learning models. • It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than other methods. • Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. • The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. • When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation. k-Fold Cross-Validation
  • 98.
    Dr. Kumod KumarGupta Deep Learning Unit I 98 k-Fold Cross-Validation • We can’t check the ability of this person because 70 math questions are from algebra but in test 30 questions,10 from calculus so we can’t judge the ability of person. • That’s why we are going for K-Fold Cross- Validation, to get good results
  • 99.
    Dr. Kumod KumarGupta Deep Learning Unit I 99 k-Fold Cross-Validation Here K=5, total data is divided into 5 fold. First time we are using first fold for test and rest 80% for training purpose, repeat this process 5 times, after that we take average of 5 results. https://www.youtube.com/watch?v=gJo0uNL-5Qw
  • 100.
    Dr. Kumod KumarGupta Deep Learning Unit I 100 • Hyperparameters in Machine learning/Deep learning are those parameters that are explicitly defined by the user to control the learning process. • These hyperparameters are used to improve the learning of the model, and their values are set before starting the learning process of the model. • They are usually fixed before the actual training process begins. • These parameters express important properties of the model such as its complexity or how fast it should learn. • Some examples of model hyper parameters include: • The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization • The learning rate for training a neural network. • The C and sigma hyperparameters for support vector machines. • The k in k-nearest neighbors. Hyper parameter tuning(CO1) https://www.geeksforgeeks.org/hyperparameter-tuning/
  • 101.
    Dr. Kumod KumarGupta Deep Learning Unit I 101 Hyper parameter tuning(CO1) Models can have many hyperparameters and finding the best combination of parameters can be treated as a search problem. The two best strategies for Hyperparameter tuning are: •GridSearchCV: Grid search Cross Validation •RandomizedSearchCV: Randonized search Cross-Validation In general, if the number of combinations is limited enough, we can use the Grid Search technique. But when the number of combinations increases, we should try Random Search or Bayes Search as they are not computationally expensive.
  • 102.
    Dr. Kumod KumarGupta Deep Learning Unit I 102 Grid Search technique (CO1) GridSearchCV is a brute-force technique for hyperparameter tuning. It trains the model using all possible combinations of specified hyperparameter values to find the best-performing setup. It is slow and uses a lot of computer power which makes it hard to use with big datasets or many settings. It works using below steps: •Create a grid of potential values for each hyperparameter. •Train the model for every combination in the grid. •Evaluate each model using cross-validation. •Select the combination that gives the highest score.
  • 103.
    Dr. Kumod KumarGupta Deep Learning Unit I 103 Grid Search technique (CO1) GridSearchCV For example, if we want to set two hyperparameters C and Alpha of the Logistic Regression Classifier model, with different sets of values. The grid search technique will construct many versions of the model with all possible combinations of hyperparameters and will return the best one. As in the image, for C = [0.1, 0.2, 0.3, 0.4, 0.5] and Alpha = [0.1, 0.2, 0.3, 0.4]. For a combination of C=0.3 and Alpha=0.2, the performance score comes out to be 0.726(Highest), therefore it is selected.
  • 104.
    Dr. Kumod KumarGupta Deep Learning Unit I 104 Grid Search technique Code (CO1) # Necessary imports from sklearn.linear_model import LogisticRegression from sklearn.model_selection import GridSearchCV # Creating the hyperparameter grid c_space = np.logspace(-5, 8, 15) param_grid = {'C': c_space} # Instantiating logistic regression classifier logreg = LogisticRegression() # Instantiating the GridSearchCV object logreg_cv = GridSearchCV(logreg, param_grid, cv = 5) logreg_cv.fit(X, y) # Print the tuned parameters and score print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_)) print("Best score is {}".format(logreg_cv.best_score_)) https://www.geeksforgeeks.org/hyperparameter-tuning/
  • 105.
    Dr. Kumod KumarGupta Deep Learning Unit I 105 Grid Search technique (CO1) Drawback: GridSearchCV will go through all the intermediate combinations of hyperparameters which makes grid search computationally very expensive. • RandomizedSearchCV RandomizedSearchCV solves the drawbacks of GridSearchCV, as it goes through only a fixed number of hyperparameter settings. • It moves within the grid in a random fashion to find the best set of hyperparameters. This approach reduces unnecessary computation.
  • 106.
    Dr. Kumod KumarGupta Deep Learning Unit I 106 Random Search Code (CO1) # Necessary imports from scipy.stats import randint from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import RandomizedSearchCV # Creating the hyperparameter grid param_dist = {"max_depth": [3, None], "max_features": randint(1, 9), "min_samples_leaf": randint(1, 9), "criterion": ["gini", "entropy"]} # Instantiating Decision Tree classifier tree = DecisionTreeClassifier() # Instantiating RandomizedSearchCV object tree_cv = RandomizedSearchCV(tree, param_dist, cv = 5) tree_cv.fit(X, y) # Print the tuned parameters and score print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_)) print("Best score is {}".format(tree_cv.best_score_))
  • 107.
    Dr. Kumod KumarGupta Deep Learning Unit I 107 Random Search Code (CO1) Tuned Decision Tree Parameters: {‘min_samples_leaf’: 5, ‘max_depth’: 3, ‘max_features’: 5, ‘criterion’: ‘gini’} Best score is 0.7265625 OutPut
  • 108.
    Dr. Kumod KumarGupta Deep Learning Unit I 108 • Deep learning is a class of machine learning algorithms that use several layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. • Deep neural networks, deep belief networks and recurrent neural networks have been applied to fields such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bioinformatics where they produced results comparable to and in some cases better than human experts have. • Deep Learning Algorithms and Networks − are based on the unsupervised learning of multiple levels of features or representations of the data. Higher-level features are derived from lower level features to form a hierarchical representation. use some form of gradient descent for training. Introduction to Deep Learning(CO1)
  • 109.
    109 Here are justa few examples of deep learning at work: • A self-driving vehicle slows down as it approaches a pedestrian crosswalk. • An ATM rejects a counterfeit bank note. • A smartphone app gives an instant translation of a foreign street sign. • Deep learning is especially well-suited to identification applications such as face recognition, text translation, voice recognition, and advanced driver assistance systems, including, lane classification and traffic sign recognition. Deep Learning Applications (CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I
  • 110.
    110 Some other Applications(CO1) Used for speed of machine Digital imaging Fraud Detection Increasing phone efficiency Dr. Kumod Kumar Gupta Deep Learning Unit I
  • 111.
    111 In a word,accuracy. Advanced tools and techniques have dramatically improved deep learning algorithms —to the point where they can outperform humans at classifying images, win against the world’s best GO player, or enable a voice-controlled assistant like Amazon Echo® and Google Home to find and download that new song you like. What Makes Deep Learning State-of-the-Art? (CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I
  • 112.
    112 Three technology enablersmake this degree of accuracy possible: Easy access to massive sets of labeled data Data sets such as ImageNet and PASCAL VoC are freely available, and are useful for training on many different types of objects. What Makes Deep Learning State-of-the-Art? (CO1) Increased computing power High-performance GPUs accelerate the training of the massive amounts of data needed for deep learning, reducing training time from weeks to hours. Pretrained models built by experts Models such as AlexNet can be retrained to perform new recognition tasks using a technique called transfer learning. While AlexNet was trained on 1.3 million high-resolution images to recognize 1000 different objects, accurate transfer learning can be achieved with much smaller datasets. Dr. Kumod Kumar Gupta Deep Learning Unit I
  • 113.
    AI ML DL DL ML AI Differencebetween AI, ML, DL (CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I 113
  • 114.
    1. Huge amountof data (Initially we started with ML, its major drawback is, its efficiency is degraded with higher data or data sets) (x-axis: number of data, Y-axis: efficiency ) And the solution given by deep learning, that can handled huge amount of data, which may be structured or unstructured. 2. Complex problem These are basically included the real time data analysis, medical diagnosis system etc., which are handled by deep learning Why it needed deep learning (CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I 114
  • 115.
    Dr. Kumod KumarGupta Deep Learning Unit I 115 • The term "Artificial neural network" refers to a biologically inspired sub-field of artificial intelligence modeled after the brain. • An Artificial neural network is usually a computational network based on biological neural networks that construct the structure of the human brain. • Similar to a human brain has neurons interconnected to each other, artificial neural networks also have neurons that are linked to each other in various layers of the networks. These neurons are known as nodes. Artificial Neural Network(CO1)
  • 116.
    Dr. Kumod KumarGupta Deep Learning Unit I 116  The Brain is A massively parallel information processing system.  Our brains are a huge network of processing elements. A typical brain contains a network of 10 billion neurons. How do our brains work?(CO1)
  • 117.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 117 Neural Network To begin understanding deep Learning, We will build up our model abstractions • Single Biological Neuron • Perceptron • Multi-Layer Perceptron Model • Deep Learning Neural Network
  • 118.
    Dr. Kumod KumarGupta Deep Learning Unit I 118  A processing element Dendrites: Input Cell body: Processor Synaptic: Link Axon: Output Synapse: weight How do our brains work?(CO1)
  • 119.
    Dr. Kumod KumarGupta Deep Learning Unit I 119 An artificial neuron is an imitation of a human neuron How do our brains work?(CO1)
  • 120.
    Dr. Kumod KumarGupta Deep Learning Unit I 120 • Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output. • Relationship between Biological neural network and artificial neural network: Biological Neural Network Artificial Neural Network Dendrites Inputs Cell nucleus Nodes Synapse Weights Axon Output Biological neural network and artificial neural network(CO1)
  • 121.
    Dr. Kumod KumarGupta Deep Learning Unit I 121 Artificial neural network(CO1)
  • 122.
    Dr. Kumod KumarGupta Deep Learning Unit I 122 Artificial neural network(CO1)
  • 123.
    Dr. Kumod KumarGupta Deep Learning Unit I 123 Artificial neural network(CO1)
  • 124.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 124 Neural Network
  • 125.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 125 Neural Network
  • 126.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 126 Neural Network
  • 127.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 127 Neural Network
  • 128.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 128 Neural Network
  • 129.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 129 Neural Network
  • 130.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 130 Neural Network
  • 131.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 131 Neural Network
  • 132.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 132 Neural Network
  • 133.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 133 Neural Network
  • 134.
    09/01/2025 Dr. kumodKumar Gupta Programming for Data analyti cs Unit 4 134 Neural Network Activation function is using to introduce Non-Linearity
  • 135.
    Dr. Kumod KumarGupta Deep Learning Unit I 135 Our basic computational element (model neuron) is often called a node or unit. It receives input from some other units, or perhaps from an external source. Each input has an associated weight w, which can be modified so as to model synaptic learning. The unit computes some function f of the weighted sum of its inputs: The typical Artificial Neural Network looks something like the given figure(CO1)
  • 136.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2 Artificial Neural Network primarily consists of three layers: Input Layer: As the name suggests, it accepts inputs in several different formats provided by the programmer.
  • 137.
    Hidden Layer: • Thehidden layer presents in-between input and output layers. It performs all the calculations to find hidden features and patterns. Output Layer: • The input goes through a series of transformations using the hidden layer, which finally results in output that is conveyed using this layer. • The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias. This computation is represented in the form of a transfer function. ARCHITECTURE OF AN ARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 138.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 139.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 140.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 141.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 142.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 143.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 144.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 145.
    ARCHITECTURE OF ANARTIFICIAL NEURAL NETWORK Kumod kumar Gupta Machine Learning Unit 2
  • 146.
    Dr. Kumod KumarGupta Deep Learning Unit I 146 • Bipolar binary and unipolar binary are called as hard limiting activation functions used in discrete neuron model • Unipolar continuous and bipolar continuous are called soft limiting activation functions are called sigmoidal characteristics. Activation function (CO1)
  • 147.
    Dr. Kumod KumarGupta Deep Learning Unit I 147 Activation function (CO1)
  • 148.
    Dr. Kumod KumarGupta Deep Learning Unit I 148 Activation function (CO1)
  • 149.
    Dr. Kumod KumarGupta Deep Learning Unit I 149 Activation function (CO1)
  • 150.
    Dr. Kumod KumarGupta Deep Learning Unit I 150 Activation function (CO1)
  • 151.
    Dr. Kumod KumarGupta Deep Learning Unit I 151 Activation function (CO1)
  • 152.
    Dr. Kumod KumarGupta Deep Learning Unit I 152 Activation function (CO1)
  • 153.
    Dr. Kumod KumarGupta Deep Learning Unit I 153 Activation function (CO1)
  • 154.
    Dr. Kumod KumarGupta Deep Learning Unit I 154 Activation function (CO1)
  • 155.
    Dr. Kumod KumarGupta Deep Learning Unit I 155 Activation function (CO1)
  • 156.
    Dr. Kumod KumarGupta Deep Learning Unit I 156 Feedforward Network • It is a non-recurrent network having processing units/nodes in layers and all the nodes in a layer are connected with the nodes of the previous layers. • The connection has different weights upon them. • There is no feedback loop means the signal can only flow in one direction, from input to output. It may be divided into the following two types − Neural network architecture
  • 157.
    Dr. Kumod KumarGupta Deep Learning Unit I 157 Neural network architecture Cont…(CO1) •Single layer feedforward network − The concept is of feedforward ANN having only one weighted layer. In other words, we can say the input layer is fully connected to the output layer.
  • 158.
    Dr. Kumod KumarGupta Deep Learning Unit I 158 Single layer Feedforward Network Neural network architecture Cont…(CO1)
  • 159.
    Dr. Kumod KumarGupta Deep Learning Unit I 159 • Multilayer feedforward network − The concept is of feedforward ANN having more than one weighted layer. As this network has one or more layers between the input and the output layer, it is called hidden layers. Neural network architecture Cont…(CO1)
  • 160.
    Dr. Kumod KumarGupta Deep Learning Unit I 160 Can be used to solve complicated problems Multilayer feed forward network(CO1)
  • 161.
    Dr. Kumod KumarGupta Deep Learning Unit I 161 • Feedback Network :As the name suggests, a feedback network has feedback paths, which means the signal can flow in both directions using loops. This makes it a non-linear dynamic system, which changes continuously until it reaches a state of equilibrium. • Recurrent networks − They are feedback networks with closed loops. It is a closed loop network in which the output will go to the input again as feedback as shown in the following diagram. Neural network architecture Cont…(CO1)
  • 162.
    Dr. Kumod KumarGupta Deep Learning Unit I 162 When outputs are directed back as inputs to same or preceding layer nodes it results in the formation of feedback networks Feedback network(CO1)
  • 163.
    Dr. Kumod KumarGupta Deep Learning Unit I 163 • Single node with own feedback • Competitive nets • Single-layer recurrent network • Multilayer recurrent networks Feedback networks with closed loop are called Recurrent Networks. The response at the k+1’th instant depends on the entire history of the network starting at k=0. Automaton: A system with discrete time inputs and a discrete data representation is called an automaton Recurrent network(CO1)
  • 164.
    Dr. Kumod KumarGupta Deep Learning Unit I 164 FEED FORWARD UNSUPERVISED LEARNING Hebbian Learning Rule(CO1)
  • 165.
    Dr. Kumod KumarGupta Deep Learning Unit I 165 FEED FORWARD UNSUPERVISED LEARNING Hebbian Learning Rule(CO1)
  • 166.
    Dr. Kumod KumarGupta Deep Learning Unit I 166 Hebbian Learning Rule(CO1)
  • 167.
    Dr. Kumod KumarGupta Deep Learning Unit I 167 Hebbian Learning Rule(CO1)
  • 168.
    Dr. Kumod KumarGupta Deep Learning Unit I 168 Hebbian Learning Rule(CO1)
  • 169.
    Dr. Kumod KumarGupta Deep Learning Unit I 169 Hebbian Learning Rule(CO1)
  • 170.
    Dr. Kumod KumarGupta Deep Learning Unit I 170 Hebbian Learning Rule(CO1)
  • 171.
    Dr. Kumod KumarGupta Deep Learning Unit I 171 Hebbian Learning Rule(CO1)
  • 172.
    Dr. Kumod KumarGupta Deep Learning Unit I 172 Hebbian Learning Rule(CO1)
  • 173.
    Dr. Kumod KumarGupta Deep Learning Unit I 173 • The learning signal is equal to the neuron’s output FEED FORWARD UNSUPERVISED LEARNING Hebbian Learning Rule(CO1)
  • 174.
    Dr. Kumod KumarGupta Deep Learning Unit I 174 • Feedforward unsupervised learning • “When an axon of a cell A is near enough to exicite a cell B and repeatedly and persistently takes place in firing it, some growth process or change takes place in one or both cells increasing the efficiency” • If oixj is positive the results is increase in weight else vice versa Features of Hebbian Learning(CO1)
  • 175.
    Dr. Kumod KumarGupta Deep Learning Unit I 175 Final answer: Hebbian Learning Rule(CO1)
  • 176.
    Dr. Kumod KumarGupta Deep Learning Unit I 176 • For the same inputs for bipolar continuous activation function the final updated weight is given by Hebbian Learning Rule(CO1)
  • 177.
    • Learning signalis the difference between the desired and actual neuron’s response • Learning is supervised Perceptron Learning Rule(CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I 177
  • 178.
    Dr. Kumod KumarGupta Deep Learning Unit I 178 Perceptron Learning Rule(CO1)
  • 179.
    • Only validfor continuous activation function • Used in supervised training mode • Learning signal for this rule is called delta • The aim of the delta rule is to minimize the error over all training patterns Delta Learning Rule(CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I 179
  • 180.
    Learning rule isderived from the condition of least squared error. Calculating the gradient vector with respect to wi Minimization of error requires the weight changes to be in the negative gradient direction Delta Learning Rule Contd.(CO1) Dr. Kumod Kumar Gupta Deep Learning Unit I 180
  • 181.
    Dr. Kumod KumarGupta Deep Learning Unit I 181 A Multi-Layer Perceptron (MLP) neural network trained using the Backpropagation learning algorithm is one of the most powerful forms of supervised neural network system. The training of such a network involves three stages: • feedforward of the input training pattern, • calculation and backpropagation of the associated error • adjustment of the weights This procedure is repeated for each pattern over several complete passes (epochs) through the training set. After training, application of the net only involves the computations of the feedforward phase. MLP training algorithm(CO1)
  • 182.
    Dr. Kumod KumarGupta Deep Learning Unit I 182 Feed Forward phase: • Xi = input[i] • Yj = f( bj + XiWij) • Zk = f( bk + YjWjk) Backpropagation of errors: • k = Zk[1 - Zk](dk - Zk) • j = Yj[1 - Yj]  k Wjk Weight updating: • Wjk(t+1) = Wjk(t) + kYj + [Wjk(t) - Wjk(t - 1)] • bk(t+1) = bk(t) + kYtn +[bk(t) - bk(t - 1)] • Wij(t+1) = Wij(t) + jXi + [Wij(t) - Wij(t - 1)] • bj(t+1) = bj(t) + jXtn +[bj(t) - bj(t - 1)] Backpropagation Learning Algorithm(CO1)
  • 183.
    183 • 1. https://nptel.ac.in/courses/117/105/117105084/ •2. https://nptel.ac.in/courses/106/106/106106184/ • 3. https://nptel.ac.in/courses/108/105/108105103/ • 4.https://www.youtube.com/watch? v=DKSZHN7jftI&list=PLZoTAELRMXVPGU70ZGsckrMdr0FteeRUi • 5.https://www.youtube.com/watch? v=aPfkYu_qiF4&list=PLyqSpQzTE6M9gCgajvQbc68Hk_JKGBAYT Faculty Video Links, Youtube & NPTEL Video Links and Online Courses Details Dr. Kumod Kumar Gupta Deep Learning Unit I
  • 184.
    184 Quiz Dr. Kumod KumarGupta Deep Learning Unit I 1. Which element method for computing is started early days. (a)Machine learning (b) Artificial intelligence (c) Deep learning (d) none of these 2. In which method for efficiency is higher with more in data set. (b)Machine learning (b) Artificial intelligence (c) Deep learning (d) none of these 3. What is tensor flow. (c) all the mathematics in the form of flow chart (b) Artificial intelligence algorithm (c) Deep learning algorithm (d) none of these
  • 185.
    185 Quiz Dr. Kumod KumarGupta Deep Learning Unit I 4. What are the benefits of TensorFlow over other libraries? (a)Scalability (b) Visualization of Data (c) Pipelining (d) all of these 5. What do you mean by pipelining? (b)Whole work doing at a time (b) whole work divide in small segments and then execute in parallel manner (c) copying a work from another processor (d) none of these
  • 186.
    186 QUIZ Dr. Kumod KumarGupta Deep Learning Unit I 6. What is API? (a)A programming interface(b) After programming interface (c) Application Programming Interface (d) none of these. 7. What is the main operation in TensorFlow? (b)Computing(b) calculation (c) Pipelining (d) passing values and assigning the output to another tensor. 8. TensorFlow is the product of which company? (c) Google research team (b) Amazon technical team (c) PayPal (d) none of these 9. What is the execution speed of brain neuron? (d) (b) (c) (d) none of these
  • 187.
    187 Weekly Assignment Dr. KumodKumar Gupta Deep Learning Unit I Q1. For which purpose Convolutional Neural Network is used? Q2. What is the biggest advantage utilizing CNN?Q3. Discuss the history of deep learning. Q4. What is the difference between Neural Networks and Deep Learning? Q5. How can a neural network learn itself? Q6.Explain the Concept of ANN with the help of an example. Q7. Define the term Gradient descent. Also discuss its importance. Q8. Explain Perceptron Convergence Theorem. Q9.Define the term Bias. Q10. Why ReLu function is required ?
  • 188.
    188 MCQ Dr. Kumod KumarGupta Deep Learning Unit I Q1. Which neural network has only one hidden layer between the input and output? A. Shallow neural network B. Deep neural network C. Feed-forward neural networks D. Recurrent neural networks Q2. Which of the following is/are Limitations of deep learning? A. Data labeling B. Obtain huge training datasets C. Both A and B D. None of the above
  • 189.
    189 MCQ Dr. Kumod KumarGupta Deep Learning Unit I Q3.Deep learning algorithms are _______ more accurate than machine learning algorithm in image classification. A. 33% B. 37% C. 40% D. 41% Q4. Which of the following functions can be used as an activation function in the output layer if we wish to predict the probabilities of n classes (p1, p2..pk) such that sum of p over all n equals to 1? A. Softmax B. ReLu C. Sigmoid D. Tanh
  • 190.
    190 MCQ Dr. Kumod KumarGupta Deep Learning Unit I Q5. Which of the following would have a constant input in each epoch of training a Deep Learning model? A. Weight between input and hidden layer B. Weight between hidden and output layer C. Biases of all hidden layer neurons D. Activation function of output layer 6. If in the training method we are not obtained the accurate output then which value the neural network changes to get accurate output? (a) bias (b) perceptron (c) weight (d) all value can change B. 7. What are benefit of using graph in the tensor flow? (a) parallelism (b) high execution speed (c) less complexity (d) all of these C. 8. In between CPU and GPU which have high execution speed? (a) GPU (b) CPU (c) Both have same speed (d) cannot distinguished
  • 191.
    191 Old Question Papers Dr.Kumod Kumar Gupta Deep Learning Unit I
  • 192.
    192 Old Question Papers Dr.Kumod Kumar Gupta Deep Learning Unit I
  • 193.
    193 Expected Questions forUniversity Exam Dr. Kumod Kumar Gupta Deep Learning Unit I Q1. Define Batch Normalization. Why Batch Normalization helps in faster convergence? Q2. Define Deep Learning . Also discuss its importance. Q3. Discuss the history of deep learning. Q4. What is the difference between Neural Networks and Deep Learning? Q5. How can a neural network learn itself? Q6.Explain the Concept of ANN with the help of an example. Q7. Define the term Gradient descent. Also discuss its importance. Q8. Explain Perceptron Convergence Theorem. Q9.Define the term Bias. Q10. Why ReLu function is required ?
  • 194.
    194 Summary Dr. Kumod KumarGupta Deep Learning Unit I  Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.  If you are just starting out in the field of deep learning or you had some experience with neural networks some time ago, you may be confused. I know I was confused initially and so were many of my colleagues and friends who learned and used neural networks in the 1990s and early 2000s.
  • 195.
    195  1. https://www.slideshare.net/lablogga/deep-learning-explained2.Qin, T. (2020).  Deep Learning Basics. In Dual Learning (pp. 25-46). Springer, Singapore.  3.http://people.uncw.edu/chenc /STT592_Deep%20Learning/STT592DeepLearning_Index.html  4. Gulli, Antonio, and Sujit Pal. Deep learning with Keras. Packt Publishing Ltd, 2017. References Dr. Kumod Kumar Gupta Deep Learning Unit I Thank You
  • 196.
    Dr. Kumod KumarGupta Deep Learning Unit I 196 THANK YOU