DV Unitiii

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

UNITIII

VISUALISINGDATAPROCESS
Acquiring data- Acquiring Data : The first step in visualizing data is to load it
into your application.Typical data sources might be a file on a disk, a stream from
a network, or a digitized signal (e.g.,audio or sensor readings). Unless you own
the data and it’s recorded ina definable, digitizable format,things
cangetmessyquickly.Thus, theacquisitionstagecoversseveraltasksthat
sometimesgetcomplicated:

• Unlessyouaregeneratingyourowndata,youhavetofind agoodsourceforthedatayouwant.
• Ifyoudon’t ownthedata,youhaveto makesureyouhavetherighttouseit.
• You may have to go through contortions to extract the data from a web page
or other sourcethatwasn’tset uptomakeiteasy foryourapplication.
• Youhavetodownloadthedata,whichmaypresentdifficultiesifthevol
umeislarge,especially ifit’sfast-changing.
.
LocatingFilesforUsewithProcessing

In most cases, you’ll want to acquire data from within a program rather than copy
it beforehand.Command-line tools like Wget are useful when grabbing very large
data sets or when taking a look at aset of information before incorporating a live
download into your code. Processing supports methodsfor loading data found at a
range of locations. It’s important to understand this structure before we
getintothespecificsoftheAPIfunctionsforacquiringdata.
The Data Folder: The most common data source is a file placed in the data
folder of a Processingsketch. When you export an application or applet, all
classes and the contents ofthe data folder willbe bundled into a single .jar file. If
the contents of these files need to change after the application orapplet has been
created, remove them from the data folder before exporting, and place them
inanotherfoldernameddatathatisadjacenttotheapplicationorapplet.
If you need to address this folder from other code, the dataPath( ) method returns
an absolutepath to anitem found in the data folder. It takes the name or path of the
file in the data folder as its onlyparameterandprependsitwiththe
necessarylocationinformation.
UniformResourceLocator(URL):FilescanalsobelocatedatspecificURLs,forinstance:loadStri
ngs("http://benfry.com/writing/blah.txt");
Many different protocols can be used in URLs. The most common is HTTP, but
others—suchasHTTPS and FTP—are also common. It’s safe to assume that HTTP
will work properly across
systems,butimplementationofotherprotocolswillvaryfromone
Javaimplementationtoanother.

LoadingTextData
Weusealotoftextfilesduringprocessingmainly becausetextiseasyto
readandsimpletoeditwithout special software or complicated parsers. That’s one
reason that XML is so popular: it can
begeneratedandeditedbyhandjustaseasilyasbymachine.Toreadafileaslinesoftext,usethef
ollowing:

String[]lines=loadStrings("visualisation.tsv");
Because the loadStrings( ) method also automatically handles loading files from
URLs, thefile couldbeloadeddirectlyonlinevia:
String[]lines=loadStrings("http://benfry.com/writing/series/visualisation.tsv");
The URL method is most useful for data that continually changes. For instance, if
this data wereupdated nightly, the information could be reloaded easily. In such a
case, the saveStream()
methodcouldalsobeused,whichhandlesdownloadingthecontentsofaURLandsavingitt
odisk.Itcouldbeused once a
day,andthefileitcreatescouldthenbeloadedthroughloadStrings().

DealingwithFilesandFolders
It’s often helpful to start with local data before developing a version of your
code that runsover thenetworkorfromadatabase.
Forsometasks,theJavaFileobjectmaybehelpful.Forinstance,
thefollowingcodeloadsafilenamedbar.txtfromafoldercalledfooandretrievesallthetext
fromit.Relativepathslikethisareproblematic, however, because you can’t be sure
from which directory the application will run, so
it’sbesttouseanabsolutepathwiththismethod:
File foo = new
File("foo",
"bar.txt");String[]lines=
loadStrings(foo);
Many of the I/O methods in Processing allow a File as a parameter. But for
those that lack such avariant, here is a version that takes an absolute path as a
String, in conjunction with thegetAbsolutePath()methodfromFile:
File foo = new File("foo", "bar.txt"); String path =
foo.getAbsolutePath( );String[]lines=loadStrings(path);
ListingFilesinaFolder

A common use of the File objectis to list files in a directory, which is handled with
the list( )method ofthe Fileclass:
File folder = new
File("/path/to/folder");String
[]names =folder.list();
if(names!=null){//willbenullifinaccessibleprintln(names);}

In practice, this is convenient for listing the contents of the data folder in your
Processing sketch. Thebuilt-inStringvariable
sketchPathprovidesanabsolutepathtothecurrentfolder.TocreateaFileobjectthatpointst
othe datafolder,usethefollowing:
FiledataFolder =newFile(sketchPath,"data");

AsynchronousImageDownloads

Like the other file loading functions, loadImage( ) halts execution until it has
completed. Thatis not aproblem for smaller sketches with a few images, but when
loading dozens or hundreds of images, it hasa significant impact on speed because
it means that multiple images are not downloading at once (theserverprovidingthe
datamightbejustasmuchofabottleneckasthenetworkconnectionitself)andthe
interfacehaltsuntiltheimageshaveloaded.
Whenhandlingmanyimagesatonce,wecaninsteadrelyonJavamethodsforretrievingthei
magedata,and then either download all the files as a batch once they’ve been
queued or simply proceed as normaluntiltheimages havecompleteddownloading.

ParsingData

Parsing converts a raw stream of data into a structure that can be manipulated
in software. Lots ofparsing is detective work, requiring you to spend time
looking at files or data streamsto figure outwhat’sinside.
Thedatamightbeavailableinaneasilyparsedformat(suchasanRSSfeedinXMLfor
mat)orinaproprietarybinaryformat.
Parsing may also seem to be quite disconnected from the actual process of data
visualization.However,it’s part of the process for a reason: chances are, you’ll have
to obtain data from a source that’s notunderyourcontroland willspend alot
oftimefiguring outhowto usethe datathatyou’re given.
Generally, data boils down to lists (one-dimensional sets), matrices (two
dimensional tables,such as aspreadsheet), or trees and graphs (individual “nodes”
of data and sets of “edges” that describeconnections between them). Strictly
speaking, a matrix can be used to representa list, or graphs
canrepresentlistsandmatrices,butsuchover-abstractionisnotuseful.

LevelsofEffort

We are trying to understand the importance of knowing when to write


generalizable code andwhen towrite aquickhack.The parsingstepisone
occasionwhenit’sa commonissue.
Therearethreebasicscenarios:
A simple hack: This can get you up and running quickly. It works especially well
if your datais notchanging, or the data need not be generalized for other
situations. In this case, we’d ignore everythinginthefileexceptfor
certaintypesofshapecommands, e.g.,onlylookingfor“path”dataandignoring
everythingelse.
A basic parser: This scenario is often a good solution when you need code that’s
not too large, so itcanbedeployedover
theWeb.Thisexceedsthesimplehack,butdoesn’tquiteapproachafull-blownparser.
Afullparsing:APIforlocalapplicationswherecodefootprintdoesnotmatter,afullparsermi
ghtbenecessary.
Generally speaking, each of these options takes an order of magnitude more time
to implement
thanthepreviousone.Theirruntimespeedsalsotendtodecreaseaswemovedownthelist,
althoughrobustnessandmaintainabilitytendtoincreaseinthesamedirection.TheProce
ssingAPIandlibrariestarget the first two cases, in keeping with its focus on
sketching, with the assumption that the
thirdstepisalwaysavailablebyattachingtolargerJavalibraries.

Dataisn’talwaysclean,andsometimesyouhavetowritedirtycodetoparseit.Writingparsersiseasy,butw
riting onesthatfailgracefullyisnot,which makesparsingissuesdeceptivelycomplex.
Inshort,theperceptionof“ugliness”canoften leadtononessentialtasksthattakeyou
downlong,unproductive pathsandleadyouastrayfromthe prioritiesofyourproject.
ToolsforGatheringClues

It’s important to have a decent text editor and hex viewer available for detective work on
datafiles. Atext editor should be capable of efficiently loading files that are many
megabytes in size. A hex editoris useful when dealing with binary data because
sometimes it’s necessary to take a look at the first fewbytesofa datafile toidentifyits
type.Command-line utilities arealsoveryimportant.
TextIsBest

Perhaps the most useful file format is simple delimited text. In this format, lines
of text areseparatedbydelimiters(usuallya taboracomma)thatseparate
individualcolumnsofatable.
Tab-Separated Values (TSV): A TSV file contains rows of text data made up of
columns separated bytab characters. The formatis useful because it’s easy to parse
and can be loadedand edited with anyspreadsheetprogram.
Comma-SeparatedValues(CSV):CSVworkssimilarlytoTSV,exceptthatthedelimiterisa

, (comma) character. Because commas might be part of the data, any column
that includes a commamust be placed inside quotes. Of course, quotes might be
part of the data as well, so apair of doublequotesisusedtoindicateadoublequote
inthedata.
TextMarkupLanguages

Toallowflexibilityinstructure—
suchasincludingarbitrarynumbersofelementsofanysizeinvarying
orders—many formats embed structure tags in their content. Markup languages,
such as HTML andXML,areprime examples—wheresetsoftagsdelineate
andidentifythecontentfoundinthedocument. Such documents are relatively easy to
parse, and they are fortunately becoming morecommon, particularly XML. But
even though the documents are designed to facilitate parsing, keep
inmindwhichdatayouactuallyneedfromthefile.

You might also like