Recommended books on Data Mining and Data Science

At some point Data Mining began to be called Data Science and marketers became data scientists.

Unfortunately, except when it comes to testing, data mining is rarely that scientific… The real revolutions have been in the amount of data that can be collected and in the computing power that can be used to train neural networks to predict variables.

Michael J.A. Berry and Gordon Linoff, Wiley , 1997

Case studies and practical guidance. Good introductory text. A personal favorite.

Jiawei Han, Micheline Kamber, Jian Pei, Morgan Kaufmann , 2011

Good algorithm descriptions. Covers the major areas in reasonable technical detail, with several alternative algorithms presented for classification, prediction, association rule induction and cluster analysis.

Jean-Marc Adamo, Springer , 2001

Strictly a specialist book. The title says it all.

Paul Cohen, MIT Press , 1995

If you are working in artificial intelligence or data mining you are often in an unusual position: you are automatically generating one or more hypotheses based upon a sample of data, then testing the resulting hypothesis to see if it is true.

Most statistics books adequately cover hypothesis testing. They cover the basic use of Null Hypothesis (is this hypothesis really needed), tests for Normal Distributions, etc. (Basic Business Statistics is definitely one of the better books if you need detailed coverage of these areas.)

They do not, unfortunately, cover the material you really need for assessing the performance of programs which automatically generate their own hypothesis or interact in some way with their environment. This book does.

Ian H. Witten, Eibe Frank, Morgan Kaufmann , 2000

A practical and technical introduction to algorithms for data mining. Includes Java implementations of some of the major algorithms.

Alex Berson, Stephen Smith, Kurt Thearling, McGraw-Hill , 2000

CRM (Customer Relationship Management) is a major application area for data mining. Some interesting chapters on the business applications and cost justifications. Good book if you are trying to figure out how data mining might fit into your business.

Jesus Mena, Digital Press , 1999

Aimed at executive (read non-specialist) level. Good introduction to some of the things you can do with web log data. The key message here is that it helps a lot if you know more about your visitors than just what pages they clicked on.

Rhonda Delmater and Monte Hancock, Digital Press , 2001

Introduction to the methodology, techniques, and applications of data mining from a management perspective. The chapter on why data mining projects fail is well worth reading.

Chris Thornton, MIT Press , 2000

Good introduction to machine learning, although I found the latter part of the book (from which it gets its title) a bit disappointing. The book's real strength is in placing existing machine learning methods in a good technical (and philosophical) perspective.

Relevant Books
If you purchase a book using one of these links, we receive a small payment from Amazon, which helps pay for this site.

See Also