(Undergraduate Topics in Computer Science) 2nd ed. 2019 Edition
by Boris Mirkin (Author)
This text examines the goals of data analysis with respect to
enhancing knowledge, and identifies data summarization and correlation
analysis as the core issues. Data summarization, both quantitative and
categorical, is treated within the encoder-decoder paradigm bringing
forward a number of mathematically supported insights into the methods
and relations between them. Two Chapters describe methods for
categorical summarization: partitioning, divisive clustering and
separate cluster finding and another explain the methods for
quantitative summarization, Principal Component Analysis and PageRank.
Features:
·
An in-depth presentation of K-means partitioning including a
corresponding Pythagorean decomposition of the data scatter.
·
Advice regarding such issues as clustering of categorical and mixed
scale data, similarity and network data, interpretation aids, anomalous
clusters, the number of clusters, etc.
· Thorough attention
to data-driven modelling including a number of mathematically stated
relations between statistical and geometrical concepts including those
between goodness-of-fit criteria for decision trees and data
standardization, similarity and consensus clustering, modularity
clustering and uniform partitioning.
New edition highlights:
·
Inclusion of ranking issues such as Google PageRank, linear
stratification and tied rankings median, consensus clustering,
semi-average clustering, one-cluster clustering
· Restructured to make the logics more straightforward and sections self-contained
Core Data Analysis: Summarization, Correlation and Visualization is aimed at those who are eager to participate in developing the field as well as appealing to novices and practitioners.