SovRen Resume Parser Overview
SovRen Resume Parser Overview
SovRen Resume Parser Overview
Contents
Introduction................................................................................................................ 2
Key Differentiators...................................................................................................... 3
Integration.................................................................................................................. 4
Parser Component...................................................................................................... 4
Converter Component................................................................................................ 4
Features/Scope........................................................................................................... 5
Skills Taxonomies...................................................................................................... 10
Languages and Regions............................................................................................ 11
Sovren Document Converter.................................................................................... 12
Parser Technology..................................................................................................... 13
Parser Workflows...................................................................................................... 14
Parser Architecture................................................................................................... 15
Parser Control........................................................................................................... 17
Scalability................................................................................................................. 17
Parser Source Code................................................................................................... 17
Sample Applications................................................................................................. 18
About the Sovren Group........................................................................................... 20
Introduction
The Sovren Group produces and markets recruitment
intelligence components that provide document
conversion, resume/CV parsing, and semantic profile
matching capabilities that can be used in any
software system.
Resume Parsing, with output to HR-XML Resume 2.1, 2.4, and 2.5 schemas,
CSV files, and human readable text.
This document addresses only the Sovren Resume/CV Parser, which includes the
Sovren Document Converter. A separate whitepaper is available for the Sovren
Semantic Matching Engine (which includes the Sovren Job Parser).
Key Differentiators
Superior features. The Sovren Resume Parser offers more coverage of the
HR-XML Resume 2.x schemas than any other product, by a wide margin.
Typically, we pull out 4x as many kinds of data and perform 2x as many kinds
of evaluative analysis as our competitors.
Superior business profile. The Sovren Group is privately held, and has no
VC funding and no funded debt and never has. We have been profitable
each year for 12 years. Importantly, we are not owned by an ATS company or
job board.
Superior technology. We are the only vendor to offer our own Document
Converter as well as our own Parser. We are the only native Microsoft .NET
parsing solution, yet over half of our customers are non-Microsoft shops.
Superior control and security. You run our software on your hardware, not
ours. You never have to worry about where your data is going to end up after
you send it off to a third partys hosted service, because you run our software
on your own servers or your customers servers.
Integration
The Parser and Converter are components, not applications, and can be
incorporated into your application in several ways:
As a SOAP web service run on a Windows server and accessed from any
platform/language
Conversion and parsing using default configurations requires less than 10 lines of
code.
Sovren provides free offline integration support, sample applications with sample
integration source code (C#), best practices consulting, and code reviews.
Parser Component
The Sovren Resume/CV Parser is a 100% pure managed code Microsoft .NET
assembly (a single DLL). It requires the Microsoft .NET Framework runtime version
2.0 or higher and works in 32-bit or 64-bit applications.
The Parser consumes plain text and produces an HR-XML Resume 2.1/2.4/2.5
schema compliant output record (or its properties can be read directly by COM or
.NET code). Raw resumes must be converted to plain text using the Converter or
some other method before they can be processed by the Parser.
As a .NET component, the Parsers results can (optionally) be used directly, by
reading the components properties, rather than by outputting the results to an XML
string. In addition, the Parser has methods to output the results to CSV files, or to
human-readable text.
Converter Component
The Sovren Document Converter is Microsoft .NET assembly (a single DLL). It
requires the Microsoft .NET Framework runtime version 2.0 or higher. It can be run
in a 100% Pure Managed mode, with reduced functionality, or it can run in its
default Mixed Mode configuration, with full functionality by utilizing several
embedded native C++ libraries.
Features/Scope
The Sovren Resume Parser provides parsing of resumes with output to the HRXML.org Resume 2.1/2.4/2.5 schema. The Parser implements virtually the entire
schema, including these sections:
Note: Items marked with a red asterisk ( * ) are Sovren extensions to the
schema, using HR-XML approved extension schemas.
Contact Info
Person Name
o Given Name
o Preferred Name
o Middle Initial
o Family Name
o Suffixes, and suffix types
(educational, generational,
qualification)
o Formatted Name
Postal Addresses
o Use/Location (i.e. home, work, school)
o Street Address lines
o Municipality
o Region(s)
o Country
o Postal Code
Phone Numbers
o Use/Location (i.e. home, work, personal)
o Phone Type: Telephone, Mobile, Fax, Pager, TTYTDD
o Phone Number: Original Format, Normalized Format, or Structured
o When Available
Email Addresses
o Use/Location (i.e. home, work, personal)
Personal URLs
Job Objective
Executive Summary
Qualification Summary
Employment History
Start Date
End Date
Education History
Start Date
End Date
Graduation Date
School Name
Location: Municipality, Region, Country
Degree Type (normalized)
Degree Name
Major
Minor
GPA (actual/scale)
Full Text / Description
* Graduated (true/false) *
* Normalized GPA (compare GPA across different scales) *
* Training History *
Start Date
End Date
Type of training
Name of training
Entity providing the training
Qualifications
Description
Competencies
Skill Name
Date Last Used (calculated by parser)
ID values: Skill Id, Parent Id, Taxonomy Id
* Context (Work History, Education, etc. as well as specific Positions or
Degrees) *
* Cumulative Months (calculated by parser) *
Name
Date
Achievements
Description
Foreign Languages
Read
Write
Speak
Fluent?
Military History
Unit or Division
Rank
Start Date
End Date
Recognition
Disciplinary Action
Discharge Disposition
Security Clearances
Associations
Organization
Role
Speaking Engagements
Date
Title
Publications
Authors
Title
Journal
Volume
Publisher
Publication Date
Publication Type
ISBN
Patents
Patent Name
Inventors
Patent Status
Patent Date
References
* Hobbies *
* Culture *
* Custom Data *
* Other information *
Skills Taxonomies
The Parser ships with the industrys most comprehensive taxonomy, covering:
In addition, the Parser has the most flexible and extensible taxonomy available. You
can define your own custom taxonomies -- and at runtime, on a per-resume basis,
you can specify what combination of taxonomies to use:
The parser performs Taxonomy Best Fit analysis, weighted by a number of factors
including the type and breadth of experience, length of experience, and recency of
that experience. In addition, the Parser is able to recognize, characterize, and
summarize a candidates management experience throughout her career.
France
Germany
Greece
Hong Kong
Hungary
India
Ireland
Italy
Lichtenstein
Netherlands
New Zealand
Norway
Russia
Singapore
Spain
South Africa
Sweden
Switzerland
United Kingdom
United States of
America
Coming Soon
Region support for all of South America, Mexico, Portugal, Poland, Romania.
Language and region support for Italian, Danish, Polish, Romanian, and
Flemish.
OpenOffice 2.+
Corel WordPerfect
Excel
The Converter is very fast, with a typical throughput of 50-100 resumes per CPU per
second. The Converter does NOT use Word automation, nor require any source
authoring application such as Word or Acrobat to be installed. The documents are
never opened and it is impossible for any viruses, macros, or malicious code to be
executed. Some third-party converters like IFilters may run faster, but they are only
designed to tokenize words for full-text searching, whereas our converter is
designed to retain as much of the original layout as possible which is important for
parsing accuracy.
The Converter checks the validity of the incoming resume, identifying problems
such as resumes that are actually images rather than text, and resumes that are
password protected. In addition, the Converter is able to analyze the validity of the
converted text and warn of potential issues.
Parser Technology
The Sovren Resume Parser employs a wide array of very
sophisticated algorithms for extracting and identifying
data. The Parser is built upon Sovrens own code libraries
which implement many sophisticated data structures and
search methods. The Parser uses proprietary
modifications of popular search methodologies.
Although each sub-parser has its own design, in general,
all of the parsers use a voting methodology. Data is
extracted and analyzed by multiple sub-parsers which
then vote as to how the data should be used.
Some of the techniques include:
Pattern matching
List matching
Fuzzy matching
Depth control
Voting
Contextual analysis
Outlier analysis
Case analysis
Order analysis
Delimiter analysis
Probability testing
Rationality testing
Prequalification
Disqualification
Modified Bayesian classification
Length analysis
Domain analysis
Gap analysis
Density analysis
Semantic analysis
Spatial measurement
Parser Workflows
Parser Architecture
The Parser is logically divided into a master parser and many sub-parsers. The
master parser is responsible for normalizing the text for parsing, extracting the
cover letter, and identifying the relevant resume sections. It then delegates parsing
of each resume section to a section-specific sub-parser. Thus, Employment History
sections are parsed using the Employment History sub-parser, and this sub-parser
will in turn employ the services of other specific sub-parsers such as the Date
Parser.
As the Parser completes the parsing for each section, it outputs data into a top-level
Resume object. After all sections have finished parsing, this Resume object is filled
with all the data that could be (or was configured to be) extracted from the resume.
You can then read the resume data directly from the properties on this Resume
object, or you can request all of the data in an HR-XML Resume schema compliant
format.
Parser Control
The Parser is designed for efficient control
of resources. You can configure the Parser
to parse only what you need, while
ignoring the rest. Thus, if skills parsing is
not needed, then the skills parser can be
turned off by just setting a parameter.
Similarly, any of the sub-parsers can be
enabled or disabled. This configuration
can be controlled per installation, per
instance, and per transaction.
In addition, parsing can be instructed to adhere to strict time limits. The Parser has
a built-in time-out mechanism which can perform soft timeouts (timeout requests)
or hard timeouts (thread aborts). In all cases, the Parser is able to return valid
results to the point that it stopped.
Scalability
No other Resume Parser handles single-site parsing volumes as high as those
handled by the Sovren Resume/CV Parser. The highest-volume career site on the
Internet uses the Sovren Resume Parser to extract data from over 300 million
resumes per year.
And no other full-featured Resume Parser can scale as small as the Sovren
Resume/CV Parser. Customers can embed the parser directly into their applications
(even desktop applications) by deploying 2 DLL files with a total memory footprint
as low as 100 MB.
Sample Applications
Please note: Sovren licenses only components, not applications. Our components
have no user interface and use no database. The following sample applications are
provided only by way of demonstration of sample code for various obvious
integration scenarios. Supplying sample applications does NOT imply that we are
"authorizing" any customer to violate any third party's intellectual property rights,
not=r indemnifying customers who do so. Some uses illustrated may be subject to
third party business method/system patents in some jurisdictions in some time
frames, and it is the sole responsibility of licensees, and not of Sovren, to research,
identify and obtain any applicable third party licenses.
Sample applications are furnished with commented integration code, and may be
modified by customers for their own purposes. These applications are not supported
by Sovren, but rather, are the responsibility of the licensees.
Sample applications include:
Zero-code server applications
1. A File System Watcher application that monitors a user-designated folder for
incoming resumes, converts them, parses them, and outputs the plain text
and HR-XML files to a user-defined destination folder. The source and
destination folders can be local folders or network shares.
2. The Sovren Resume Parser Batch Processor application. This is a GUI
application that can process whole folders full of raw resumes, and output the
converted text, converted HTML, the cover letters, the parsed HR-XML
records, and various reports.
3. The Sovren Bulk Parser application. This is a command-line application that
can process whole folders full of raw resumes or job orders, and output the
converted text, converted HTML, and the parsed XML records. It is a multithreaded application that utilizes all available CPUs to complete the
processing as quickly as possible.
Zero-code web services
A SOAP web service that can be installed in 15 minutes and that provides
easy integration with other systems regardless of platform (Java, Cold Fusion,
PHP, Ruby, etc.). Code samples are provided for several platforms. You can
be parsing resumes within an hour from any operating system or
programming language.
Full source code is included for this web service, so you are able to use it as
is, customize it to meet specific needs, or copy it into your existing
application architecture.
Web Application for Resume Upload and Edit
Applicants can submit their resumes online and then view and edit the parsed
results in a fielded form with the fields pre-populated from the results of the
Parser.
Automatic polling and processing of unlimited email accounts
Applicants can submit their resumes by email to recruiter-specific, functionspecific, and/or job-posting-specific mailboxes, and this application will
automatically poll each mailbox, download the mail, identify the resume
(attachment? in the body?), the cover letter, and the references letters,
convert the documents to plain text, parse the documents, and then store or
forward the results per your business rules. This application runs as a
Windows Service so it can run continuously in the background and
automatically start after server reboots. A desktop manual editing/approval
application is supplied with this application.
Desktop applications
1. C# WinForms application that processes either a file or pasted text, then
displays the resulting plain text, HTML, XML, XSLT transformation, and
performance timings. This application can perform the work locally (using
.NET components) or remotely (using the SovrenConvertAndParse web
service).
2. Visual Basic 6 sample application showing the Sovren Resume Parser running
as a late-bound COM object.
3. Visual C++ sample application, showing the Sovren Resume Parser running
as an early-bound COM object.
4. Java sample application that uses the SovrenConvertAndParse web service.
Variations are provided for JAX-WS, Axis, Axis2, JAX-WS, and JSP/Axis.
5. Sample pages for ColdFusion and PHP that use the SovrenConvertAndParse
web service.
6. Drag-and-drop desktop application to convert and parse resumes from files or
email attachments that are dragged-and-dropped onto the application.
7. C# Console application that demonstrates the use of XSL to transform
Resume XML into several examples of HTML and RTF, suitable for branding
resumes in a common format.
Libraries
Sovren.DataSet: This assembly provides a default implementation of
mapping the Resume data into a SQL Server database.
Utilities
Print Skills: Output the built-in skills taxonomy from the Sovren Resume
Parser. Test your custom SDF-formatted skills taxonomy files to verify that
they do not contain any validation errors.
Skills Editor: Create, view, search and edit skills using a hierarchical editor.
Easily edit your skills hierarchy and view node counts to quickly see areas
that may need to be filled out more completely. Supports loading of the builtin skills or your custom skills files, and then saves to custom skills files (SDF
format).
Change Assembly: Adds a suffix to the name of any .NET assembly file and
its namespaces. For example, changes "SrpAllInOne.dll" to
"SrpAllInOne_648.dll" and changes the "Sovren" namespace to "Sovren_648".
This makes it easy to reference and use multiple versions of a .NET assembly
within the same application.