Semantic SearchMonkey

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Monkey with the Semantic Web

SearchMonkey

Presentation by:

Paul Tarjan, Chief Technical Monkey


([email protected])

Online at:

http://www.slideshare.net/ptarjan/semantic-searchmonkey
The web was / is fragmented

Funny pictures
Super secret
military site

Friend’s
website
University Cool
event page bookmarks
So we added search to find stuff

Google Yahoo

Super
Funny
secret
pictures
military site

Friend’s
University website Cool
event page bookmarks
But there are many similar sites

Facebook Events Evite Events Upcoming Events

Youtube Metacafe Vimeo

Digg Reddit Technorati

Let’s treat these as “views” onto “objects”


Wouldn’t it be cool if you could do:

•  object:video creator:”Paul Tarjan”


length<=60s
Wouldn’t it be cool if you could do:

•  object:video creator:http://paulisageek.com/
length<=60s
Wouldn’t it be cool if you could do:

•  object:game name:”Desktop Tower Defense”


version:1.5 publishdate:”May 2 2005”
Wouldn’t it be cool if you could do:

•  object:video author:”The Escapist”


game:”Left 4 Dead”
It gets even
cooler
Aggregation:

•  object:review type:camera make:canon


model:D40
Aggregation:

•  object:event date:”May 16, 2008”


type:party price<$5
Aggregation:

•  object:photo person:“Paul Tarjan”


Aggregation:

•  object:photo person:http://paulisageek.com
The Semantic What?

•  Web pages are views of data for people to


read
•  Search Engines are a hack
•  They treat pages as a bucket of words
•  Lets turn the web into a database
•  APIs are good, but there is no “web” of APIs
•  If you figure out a good way of doing that, let
me know 
Ok, I want to do it.
Now what?
Recommendation: µF

•  If there is a microformat for your data, use it


–  hcard
–  hreview
–  hresume
–  hcalendar
–  rel-tag
–  rel-licence
–  xfn
–  hatom
–  geo
µF in a nutshell

•  Change your @class to something that is known


•  <div>
–  <span class=“name”>Paul Tarjan</span>
–  <span class=‘email’>[email protected]</span>
•  </div>
•  BECOMES
•  <div class=“vcard”>
–  <span class=“fn”>Paul Tarjan</span>
–  <span class=“email”>[email protected]</span>
•  </div>
Recommendation: RDFa

•  If you have data that doesn’t really fit in a


µF
•  Examples:
–  Markup APIs (YUI, javadoc, etc)
–  Media (Audios, Videos, Games, Presentations)
–  Job Postings
RDFa in a nutshell

•  Make a namespace
•  Use @property, @rel and @resource
•  For DATA: @property makes the node
contents into the value
•  For URLs: @rel makes the @resource into
the value
Normal HTML

•  <html>

<div class="private”>
private static String
<strong>_createCookieHash </strong>
(hash)

RDFa: example

•  <html xmlns:yui="http://yuilibrary.com/rdf/
1.0/yui.rdf#">

<div class="private” rel="yui:method"
resource="#method__createCookieHash">
private static String
<strong property="yui:name">
_createCookieHash </strong> (hash)

That’s it!

•  Automatically picked up by semantic


parsers / crawlers
•  Can build a SearchMonkey app on it
•  Can make a mashup way easier than screen
scraping
•  Can get the data from Yahoo! BOSS
What is SearchMonkey?

an open platform for using structured data to build more


useful and relevant search results

Before After
Enhanced Result: Zagat

Image Links Key/Value Pairs


or Abstract
Infobar: Wikipedia Preview

Summary Blob
Part of the puzzle

Semantic vocabularies

Semantic markup on web pages

SearchMonkey
Vocabularies

•  Need to speak the same language


•  I like to see girls of that... caliber.
•  English, French, Spanish, Esparanto?
•  URLs to the rescue
–  Dublin Core (http://purl.org/dc/elements/1.1/)
–  Friend of a Friend (http://xmlns.com/foaf/0.1/)
–  X-Friend Network (http://gmpg.org/xfn/11/)
–  … (many more)
Syntax

•  Nouns, Verbs, and Adjectives, oh my!


•  All phrases become lots of triples
•  (Subject, Verb / Adj. / Prep. / etc, Object)
•  Key / Value pairs ++
–  Everything is a URL or String
–  Subject doesn’t have to be the document
Syntax 2

•  Key / Value pair


–  Title = Awesome SearchMonkey Presentation
–  Homepage =
http://search.yahoo.com/searchmonkey
•  Triples
–  (self, http://purl.org/dc#title, “Awesome
SearchMonkey Presentation”)
–  (self, http://vcard#url,
http://search.yahoo.com/searchmonkey)
Decompose to triples

•  My friend “Bob” is an idiot.


–  (self, http://xmlns.com/foaf/0.1/knows,
genid:Ui__152310312_366)
–  (genid:Ui__152310312_366, http://
www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”)
–  (genid:Ui__152310312_366, http://
example.org/ptarjan/isInstanceOf, http://
example.org/ptarjan/idiot)
•  Unnamed nodes are O.K.
Writing URLs takes a lot of work!

•  xmlns:foaf=http://xmlns.com/foaf/0.1/
•  xmlns:vcard=http://www.w3.org/2001/vcard-rdf/
3.0#
•  xmlns:junk=http://example.org/ptarjan/
•  My friend “Bob” is an idiot.
–  (self, foaf:knows, genid:Ui__152310312_366)
–  (genid:Ui__152310312_366, vcard:fn, “Bob”)
–  (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot)
•  Unnamed nodes are O.K.
RDFa

•  <html xmlns:foaf=“http://xmlns.com/foaf/0.1”
xmlns:vcard=http://www.w3.org/2001/vcard-rdf/
3.0# xmlns:junk=http://example.org/ptarjan/>
<div rel=“foaf:knows”>
<span property=“vcard:fn”>Bob</span>
<span rel=“junk:isInstanceOf”
resource=“junk:idiot” />
</div>
</html>
•  </SemanticWeb>

•  Questions?
Innards of SearchMonkey

•  You build a web-service inside our


framework
•  When a search page renders
–  We check which SM apps are enabled
–  We call them
• 50ms for in-page
• Long time for AJAX
–  They return data in our template
–  We render them (and cache)
Prototyping with XSLT

•  What if I don’t have structured data?


–  I don’t own the site
–  I do own the site, but I want to prototype first
•  Build an XSLT custom data service first
–  Write some XSLT to extract the data and
transform it into DataRSS
–  Mostly about finding the right XPath (use
Firebug or XPather )
–  Quick to implement, but brittle
–  Can’t do a good Enhanced Result
Do it for real

•  Demo
Examples

•  Rubic’s cube
•  VTA Bus
•  API Monkey
•  BugMeNot
•  RetailMeNot
•  Amazon
questions?

You might also like