Lecture 3 - Data Manipulation
Lecture 3 - Data Manipulation
Lecture 3 - Data Manipulation
Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Data Manipulation
Data Visualization
! Data Visualization Frame Principles
! Data Manipulation Approaches
! Visualization Design Process
2
Titles
Tableau Public 4
Multiple Axis Plotting
jpgraph.net 5
Data Manipulation
! Why do we need to manipulate data?
! Data in the real world is rarely formatted
! Different visualization tools use different data formats
! Reformatting helps us re-purpose data
Methods of Formatting
! Data formatting applications
! Manual formatting
! Applying programming methods
Data Format in Excel
! Excel Format:
Comma Separated Values
! Delimited Text(CSV):
Javascript Object Notation
! JavaScript Object Notation (JSON):
! Data is in name/value pairs
"firstName":"John
Source:(h*p://www.w3schools.com/xml/xml_syntax.asp((
Extensible Markup Language
! Extensible Mark-up Language(XML):
! All XML Elements Must Have a Closing Tag
<p>This is a paragraph.</p>
! XML Elements Must be Properly Nested
<b><i>This text is bold and italic</b></i>
! XML Tags are Case Sensitive
<Message>This is different</Message>
<message>than this</message>
! XML Elements Must be Properly Nested
<b><i>This text is bold and italic</b></i>
Source:(h*p://www.w3schools.com/xml/xml_syntax.asp((
Extensible Markup Language
! Extensible Mark-up Language(XML):
Image(Source:(h*p://json.org/example.html((
Hadoop File System
! Hadoop Sequence File Format:
! Consists of binary key/value pairs
Image(Source:(h*p://hadooptutorial.info/hadoop?sequence?les?example/((
Hadoop Sequence File Format
File Header Uncompressed Format
Record(Length(
Key(Length(
only(values(
all(except(record(count(
Image(Source:(h*p://hadooptutorial.info/hadoop?sequence?les?example/((
Formatting Tools
! Excel Spreadsheet
! Google Refine
! Mr. Data Convertor
! Python Programming Language
Image(Source:(h*p://json.org/example.html((
Google Refine
! Powerful and easy to use tool for formatting messy data
! Cleanse data using Transform/Clusters/Filters
! Extensive functionalities and easy to use
Image(Source:(h*p://json.org/example.html((
Google Refine - Example
! Create Text Facet on Type field:
Image(Source:(h*p://mpvp4u.blogspot.com/2012/04/google?rene?tutorial.html((
Google Refine - Example
! Total records in Type field: 17828
! Number of distinct values: 18
Image(Source:(h*p://mpvp4u.blogspot.com/2012/04/google?rene?tutorial.html((
Google Refine - Example
! Redundancies and duplicates in the Type field
! Select edit and correct description
Image(Source:(h*p://mpvp4u.blogspot.com/2012/04/google?rene?tutorial.html((
Google Refine - Example
! After correction only 15 categories left
Image(Source:(h*p://mpvp4u.blogspot.com/2012/04/google?rene?tutorial.html((
Mr. Data Convertor
! Simple and free tool to convert Excel and CSV data into XML,
JSON and other formats
! Very useful data formatting tool specially when you create
graphics for the web
Mr. Data Convertor - Example
! Paste the CSV data in Input area and select output file format
Image(Source:(h*p://webscripts.soKpedia.com/script/Text?Management/Text?Tools/Mr??Data?Converter?75802.html((
Spreadsheet Software
! Excellent software to make changes to individual data points
! Easy sorting and filtering
! Quick formatting
! Conditional formatting
! Text to number manipulation
Spreadsheet Software - Example
! Format Table
Image(Source:(h*p://www.dummies.com/how?to/content/format?tables?from?the?ribbon?in?excel?2013.html((
Spreadsheet Software - Example
! Conditional Formatting
Spreadsheet Software - Example
! Formatting text and numbers
Image(Source:(h*p://www.gcearnfree.org/excel2013/9.5((
Programming Language
! Useful in a situation when software cannot handle large data
files
! Programming provides flexibility of tailoring scripts
specifically for your data
Programming Language - Example
! Requirement:
! Merge Place and State
! Transpose Years & Population
Programmatic Solution
! Python program used to format data
30
Programming Language - Example
! Result
Four Nested Levels of Visualization
Tamara Munzner 32
Data Domain
! Familiarity with relationships in this type of data
! Often used for industry classification
! Sometimes used for classification of field
! Infovis, Scivis, VAST
! Leverage subject matter experts
! Asking good questions (remember Bertin)
! Requirements definition
33
Task Abstraction (user experience)
! Defining the functionality of the visualization
! Overview
! Browsing
! Comparison
! Summarization
! Drill down
! Filtering
! Synchronization
! Animation
34
Task Abstraction - Analysis
Tamara Munzner 35
Task Abstraction - Search
Tamara Munzner 36
Task Abstraction - Query
Tamara Munzner 37
Data Abstraction (structure)
Tamara Munzner 38
Visual Data Encoding
! Automatic adjusts to data selections
! Bar chart provides comparison
! Line chart provides trend
! Area provides trend and comparison
! Shape provides complex comparison
! Maps provide proximity in space
! Pie provides % contribution
! Gantt provides relationship of measures in
time
! Polygon creates data areas
39
Best Practices Recommender
40
Plotted Points
41
Encoding with Color
42
Encoding with Shape
43
Encoding with Size
44
Encoding Labels
45
Encoding Details
46
Encoding Tooltips
47
Selection of Business Algorithm
! Internal Descriptive Presentation
! External Descriptive Presentation
! Internal/External Exploratory Application
48
Interactivity Discussion
www.babynamewizard.com
Martin Wattenberg 49
Summary
! Define your titles and axis for what you want to say
! Reformat data files for the visualization tools you wish to use
! Clean and format data to a solid executable file
! Create knowledge of the data domain
! Define the user experience
! Determine how you want to structure the data
! Visually encode the data
! Create/enhance an algorithm for replication of the experience
50
Deeper Resources
! Seeing Data Working Group
www.seeingdata.org
51