Statistics is a fundamental tool of data scientists, who are expected to gather and analyze large amounts of structured and unstructured data and report on their findings. In this article, I offer you the Best Books to Learn Statistics for Data Science.
As an Amazon Associate, we earn a small commission from qualifying purchases, when you click links on Cloudit-eg….. at no added cost to you.
by David Spiegelhalter | Sep 3, 2019
The definitive guide to statistical thinking
Statistics are everywhere, as integral to science as they are to a business, and in the popular media hundreds of times a day. In this age of big data, a basic grasp of statistical literacy is more important than ever if we want to separate the fact from the fiction, the ostentatious embellishments from the raw evidence — and even more so if we hope to participate in the future, rather than being simple bystanders.
In The Art of Statistics, world-renowned statistician David Spiegelhalter shows readers how to derive knowledge from raw data by focusing on the concepts and connections behind math. Drawing on real-world examples to introduce complex issues, he shows us how statistics can help us determine the luckiest passenger on the Titanic, whether a notorious serial killer could have been caught earlier, and if screening for ovarian cancer is beneficial. The Art of Statistics not only shows us how mathematicians have used statistical science to solve these problems — it teaches us how we too can think like statisticians. We learn how to clarify our questions, assumptions, and expectations when approaching a problem, and — perhaps even more importantly — we learn how to responsibly interpret the answers we receive.
Combining the incomparable insight of an expert with the playful enthusiasm of an aficionado, The Art of Statistics is the definitive guide to stats that every modern person needs.
by Charles Wheelan | Jan 13, 2014
“Brilliant, funny…the best math teacher you never had.” ―San Francisco Chronicle
Once considered tedious, the field of statistics is rapidly evolving into a discipline Hal Varian, the chief economist at Google, has actually called “sexy.” From batting averages and political polls to game shows and medical research, the real-world application of statistics continues to grow by leaps and bounds. How can we catch schools that cheat on standardized tests? How does Netflix know which movies you’ll like? What is causing the rising incidence of autism? As best-selling author Charles Wheelan shows us in Naked Statistics, the right data and a few well-chosen statistical tools can help us answer these questions and more.
For those who slept through Stats 101, this book is a lifesaver. Wheelan strips away the arcane and technical details and focuses on the underlying intuition that drives statistical analysis. He clarifies key concepts such as inference, correlation, and regression analysis reveals how biased or careless parties can manipulate or misrepresent data, and shows us how brilliant and creative researchers are exploiting the valuable data from natural experiments to tackle thorny questions.
And in Wheelan’s trademark style, there’s not a dull page insight. You’ll encounter clever Schlitz Beer marketers leveraging basic probability, an International Sausage Festival illuminating the tenets of the central limit theorem, and a head-scratching choice from the famous game show Let’s Make a Deal―and you’ll come away with insights each time. With the wit, accessibility, and sheer fun that turned Naked Economics into a bestseller, Wheelan defies the odds yet again by bringing another essential, formerly unglamorous discipline to life.
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open-source statistical software platform.
Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.
- With this book, you’ll learn:
- Why exploratory data analysis is a key preliminary step in data science
- How random sampling can reduce bias and yield a higher-quality dataset, even with big data
- How the principles of experimental design yield definitive answers to questions
- How to use regression to estimate outcomes and detect anomalies
- Key classification techniques for predicting which categories a record belongs to
- Statistical machine learning methods that “learn” from data
- Unsupervised learning methods for extracting meaning from unlabeled data
5- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) 2nd Edition
This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with the liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book’s coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees, and boosting—the first comprehensive treatment of this topic in any book.
This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for “wide” data (p bigger than n), including multiple testing and false discovery rates.
by Alex Reinhart | Mar 1, 2015
Scientific progress depends on good research, and good research needs good statistics. But statistical analysis is tricky to get right, even for the best and brightest of us. You’d be surprised how many scientists are doing it wrong.
Statistics Done Wrong is a pithy, essential guide to statistical blunders in modern science that will show you how to keep your research blunder-free. You’ll examine embarrassing errors and omissions in recent research, learn about the misconceptions and scientific politics that allow these mistakes to happen, and begin your quest to reform the way you and your peers do statistics.
You’ll find advice on:
Asking the right question, designing the right experiment, choosing the right statistical analysis, and sticking to the plan
How to think about p values, significance, insignificance, confidence intervals, and regression
Choosing the right sample size and avoiding false positives
Reporting your analysis and publishing your data and source code
Procedures to follow, precautions to take, and analytical software that can help
Scientists: Read this concise, powerful guide to help you produce statistically sound research. Statisticians: Give this book to everyone you know.
The first step toward statistics done right is Statistics Done Wrong.
by Allen B. Downey | Oct 4, 2013
If you know how to program with Python and also know a little about probability, you’re ready to tackle Bayesian statistics. With this book, you’ll learn how to solve statistical problems with Python code instead of mathematical notation and use discrete probability distributions instead of continuous mathematics. Once you get the math out of the way, the Bayesian fundamentals will become clearer, and you’ll begin to apply these techniques to real-world problems.
Bayesian statistical methods are becoming more common and more important, but not many resources are available to help beginners. Based on undergraduate classes taught by author Allen Downey, this book’s computational approach helps you get a solid start.
Use your existing programming skills to learn and understand Bayesian statistics
Work with problems involving estimation, prediction, decision analysis, evidence, and hypothesis testing
Get started with simple examples, using coins, M&Ms, Dungeons & Dragons dice, paintball, and hockey
Learn computational methods for solving real-world problems, such as interpreting SAT scores, simulating kidney tumors, and modeling the human microbiome.
by Dawn Griffiths | Sep 16, 2008
Wouldn’t it be great if there were a statistics book that made histograms, probability distributions, and chi-square analysis more enjoyable than going to the dentist? Head First Statistics brings this typically dry subject to life, teaching you everything you want and need to know about statistics through engaging, interactive, and thought-provoking material, full of puzzles, stories, quizzes, visual aids, and real-world examples.
Whether you’re a student, a professional, or just curious about statistical analysis, Head First’s brain-friendly formula helps you get a firm grasp of statistics so you can understand key points and actually use them. Learn to present data visually with charts and plots; discover the difference between taking the average with mean, median, and mode, and why it’s important; learn how to calculate probability and expectation; and much more.
Head First Statistics is ideal for high school and college students taking statistics and satisfies the requirements for passing the College Board’s Advanced Placement (AP) Statistics Exam. With this book, you’ll:
Study the full range of topics covered in first-year statistics
Tackle tough statistical concepts using Head First’s dynamic, visually rich format proven to stimulate learning and help you retain knowledge
Explore real-world scenarios, ranging from casino gambling to prescription drug testing, to bring statistical principles to life
Discover how to measure spread, calculate odds through probability, and understand the normal, binomial, geometric, and Poisson distributions
Conduct sampling, use correlation and regression, do hypothesis testing, perform chi-square analysis, and more
Before you know it, you’ll not only have mastered statistics, you’ll also see how they work in the real world. Head First Statistics will help you pass your statistics course, and give you a firm understanding of the subject so you can apply the knowledge throughout your life.
The R version of Andy Field′s hugely popular Discovering Statistics Using SPSS takes students on a journey of statistical discovery using the freeware R. Like its sister textbook, Discovering Statistics Using R is written in an irreverent style and follows the same ground-breaking structure and pedagogical approach. The core material is enhanced by a cast of characters to help the reader on their way, hundreds of examples, self-assessment tests to consolidate knowledge, and additional website material for those wanting to learn more.
Renowned for its clear prose and no-nonsense emphasis on core concepts, Statistics covers fundamentals using real examples to illustrate the techniques.
The Fourth Edition has been carefully revised and updated to reflect current data.
Written by an author team of accomplished leaders in statistics education, The Basic Practice of Statistics (BPS) reflects the actual practice of statistics, where data analysis and design of data production join with probability-based inference to form a coherent science of data. The authors’ ultimate goal is to equip students to carry out common statistical procedures and to follow statistical reasoning in their fields of study and in their future employment.
For courses in Introductory Business Statistics.
Real Data. Real Decisions. Real Business.
Now in its Thirteenth Edition, Statistics for Business and Economics introduces statistics in the context of contemporary business. Emphasizing statistical literacy in thinking, the text applies its concepts with real data and uses technology to develop a deeper conceptual understanding. Examples, activities, and case studies foster active learning while emphasizing intuitive concepts of probability and teaching readers to make informed business decisions. The Thirteenth Edition continues to highlight the importance of ethical behavior in collecting, interpreting, and reporting on data, while also providing a wealth of new and updated exercises and case studies.
Also available with MyLab Statistics
MyLab™ Statistics is an online homework, tutorial, and assessment program designed to work with this text to engage students and improve results. Within its structured environment, students practice what they learn, test their understanding, and pursue a personalized study plan that helps them absorb course material and understand difficult concepts.