A Picture is Worth a Thousand Words
The value of a data science or analytics project resides in its ability to effectively communicate the underlying pattern in the data to its target audience.
The objective of information visualization is to map values to visuals: effective data visualization should lead to an “aha!” moment of understanding. In the age of big data, where hundreds of response variables are used to hypothesize the behavior of a target variable, it is incredibly important to capture the trends, distribution, and correlations in an intuitive way the reader can easily understand.
Data visualization is both an art and a science and it is an important component of EDA. The visualization process helps researchers detect outliers, clusters, and relationships using pictures and charts. Interactive and responsive graphics empower the user to explore and understand the underlying pattern in the data.
Statisticians Edward Tufte and Leyland Wilkinson are known for formalizing many of the visualization design principles. Tufte introduced the word ‘chartjunk’ to refer to unnecessary elements of information visuals. The other key concepts which he introduced are Lie factor, data ink ratio, and data density of a graph.
Lie factor is used to measure the integrity of a graphic, or how well a graphic actually represents its underlying data. Lie factor is computed by dividing the size of the effect shown in the graphic by the size of the effect shown in the data. Its value typically ranges from 0.95 to 1.05.
A Lie factor of value 1 is often considered as ideal. Data ink ratio describes the ratio of the ink used to describe the data relative to the ink used to describe everything else. It is generally optimal to have a high data-to-ink ratio.
Leland Wikinson proposed a theory called Grammar of Graphics.
This theory is based on two principles regarding the relationship between graphics and their underlying grammar.
The first principle states that graphics are made up of distinct layers of grammatical elements, and the second principle states that meaningful plots are built around appropriate aesthetic mapping.
Layers are like adjectives and nouns, and aesthetic mappings are the grammatical rules that glue them together.
The essential graphical elements are data, aesthetics, and geometries. The aesthetic refers to the scale on which we want to map our data and the geometry refers to the actual shape the data will take in the plot.
Let's look at the two powerful visualizations tools available for data miners.
2D Histogram Contour Plot With Histogram Subplots
D3.js and Grammar of Graphics
Its logical flow acts on the grammar of graphic concepts. Grammar of Graphics applies a different transformation to each step, going from source, to variables, to algebra, until it renders the final graphic on a webpage.
In the case of D3, the render is simply a web browser and the browser displays the final graphics in the form of a webpage.
Written by Mithilesh Kumar
Edited by Alexandar Ristic, Kevin Ma, Thomas Braun, Bryan Xiao & Alexander Fleiss