Essential Mathematics for Data Science

Mathematics is the basis of any scientific discipline. Almost all the concepts of Data Science and Machine Learning are based on mathematical foundations.

As you learn programming techniques, algorithms, and languages in order to become a Data Scientist or to deepen your knowledge, you should not FORGET math.

It is often better to know the basics of algorithms you will be using than to be a simple performer. Therefore, a solid understanding of mathematics will give you a competitive advantage over your peers.

Consider a developer or an analyst. They can process a lot of data and information, but they are not interested in modeling that data. Often the emphasis is on using the data for an immediate need rather than on deep scientific exploration.

Data Science, on the other hand, should be interested in models and thus follow a scientific process. The scientific process is as follows:

Model a process by probing the underlying dynamics
Build hypotheses
Measure the quality of the data source
Quantify uncertainty
Identify the hidden pattern from the analyzed data
Understand the limitations of the model

Data Science is applicable to, almost, all fields. Thus, it can deal with problems as diverse as cancer diagnosis and analysis of social behavior.

This gives the possibility of a dizzying array of n-dimensional mathematical objects, statistical distributions, optimization functions, etc.

In the rest of the article, I will provide you with the concepts you need to master to be among the best Data Scientists.

Functions, variables, equations, and graphs

Logarithm, exponential functions, polynomial functions, rational numbers
Geometry, trigonometric identities
Inequality
Real and complex numbers, basic properties
Graph, Cartesian and polar coordinates
Series, suites

Use case

If you want to understand how a query executes quickly in a database with sorted big data, you will come across the concept of “binary search”.

To understand this concept, you need to understand logarithms and recurrence.

Or, if you want to analyze a time series, you may come across concepts like “periodic functions”.

Statistics

Many data scientists actually consider machine learning to be statistical learning.

Obviously, this is a very broad subject and planning and organization are essential to cover the most important concepts:

Descriptive statistics, variance, covariance, correlation
Base probability, expectation, probability calculus, Bayes theorem, conditional probability
Probability distribution functions
Sampling, measurement, error, random number generation
Hypothesis tests, A / B tests, confidence intervals, p values
ANOVA, t test
Linear regression, regularization

Discrete mathematics

You should know the concepts of Discrete Maths of algorithms and data structures in an analysis project:

Sets, sub-sets, power sets
Counting, combinatorial, accounting functions
Basic proof techniques: induction, proof by contradiction
Basics of inductive, deductive, and propositional logic
Basic data structures: stacks, queues, charts, tables, hash tables, trees
Graph properties: connected components, degree, maximum/minimum cutting flow concepts, graph coloring
Recurrence relations and equations
Growth of functions and the concept of O (n) notation

Linear algebra

You’ve probably had friend suggestions on Facebook or video recommendations on YouTube, to upload your selfie to a Salvador Dali-style portrait using deep transfer learning. All of these examples involve matrices and matrix algebra.

The concepts you need to learn:
Basic properties of matrix and vectors: scalar multiplication, linear transformation, transposition, conjugation, rank, determinant
Internal and external products, matrix multiplication rule and various algorithms, inverse matrix
Concept of matrix factorization / LU decomposition, Gauss / Gauss-Jordan elimination, resolution of the linear equation system Ax = b
Special matrices: square matrices, identity matrices, triangular matrices, ideas on sparse and dense matrices, unit vectors, symmetric matrices, Hermitian, asymmetric and unit matrices
Eigenvalues, eigenvectors, diagonalization, decomposition into singular values
Vector space, base, extent, orthogonality, orthonormality, linear least square

Use case

To do a principal component analysis we use the singular value decomposition to obtain a compact dimensional representation of the dataset with fewer parameters.
Neural network algorithms use linear algebra techniques to represent and process network structures and learning operations.

Calculation

The calculations are behind the seemingly simple analytical solution to an ordinary least squares problem in linear regression or integrated with each feedback from your neural network to learn a new pattern.

If you were to focus only on the essential concepts, learn these topics:

Functions of a single variable, limit, continuity, differentiability
Mean value theorems, indeterminate forms, L’Hospital rule
Maxima and minima
Product and chain rule
Taylor series, concepts of summation/integration of infinite series
Fundamental and mean value theorems of integral calculus, evaluation of defined and improper integrals
Beta and gamma functions
Functions of multiple variables, limit, continuity, partial derivatives
Basics of ordinary and partial differential equations

Optimization and operational research themes

These concepts are very relevant because a basic understanding of these powerful techniques can be fruitful in the practice of Machine Learning.

Virtually all machine learning algorithms aim to minimize some type of estimation error subject to various constraints, which is an optimization problem.

You must at least be interested in these subjects:

Basics of optimization
Formulate the optimization problem
Maxima, minima, convex function, global solution
Randomized optimization techniques: escalation, simulated annealing, genetic algorithms
Linear programming, full programming
Constraint programming, backpack problem

Functions, variables, equations, and graphs

Use case

Statistics

Discrete mathematics

Linear algebra

Use case

Calculation

Optimization and operational research themes

Please Share This Share this content

You Might Also Like

Data Scientist vs Data Engineer vs Data Analyst

10 Best Free Data Scientist Tools

Business Intelligence vs Data Science

Leave a Reply Cancel reply

Share this content