Google Resume - Maricia Scott
Google Resume - Maricia Scott
Google Resume - Maricia Scott
Google Resume
We organize the world's News. Fast. We haven't yet managed to do it before the news actually happens,
but we are working on it. News impacts ~1 Billion users/week, through news.google.com, News search,
and News Universal.
I transitioned from the Borg frontend to work on backend infrastructure for Google News in 2006,
became Tech Lead of Build Infrastructure in Q1 2009, and Tech Lead/Manager for MTV Infrastructure
in Q3 2010.
Over the years I have been a key contributor to our pre-indexing ("build") side pipeline, as well as
working on many smaller-scale projects . I have also done a lot of production and process work that has
helped keep both the product and the team scale.
https://maricia.users.x20web.corp.google.com/www/google-resume.html 1/6
11/29/2017 Google Resume - Maricia Scott
Figuring out how to allow work (and launch) of OneTree to progress in parallel with the
indexing pipeline merge
Making sure that the nothing slipped through the crack between build (project above) and
serve
Eval setup and help
Production Responsibilities (2007 - present)
I am one of the "go-to" people on Google News for debugging the "hard production problems"
both within our code, and outside interactions (bigtable issues, borg issues, gfs, etc). I am one of
two people that do most of the production maintenance for our Build datacenters (running 350+
jobs when last I checked).
Prior to SRE onboarding, wrote many and refactored most of our borgcfgs for build side
(10K+ lines of borgcfg); familiarity with all
Work with SRE and News team to make sure the system has 99.99% uptime, realtime
indexing (1-2 min crawl->index), and reduces pages and outages
Have held the pager for 2-3 times/quarter since Q1 2007.
Found and initiated current SRE relationship.
News Build Pipeline (implementation 2006-2007. production, maintenance,
improvements and advice through 2012)
The Article Repository Bigtable was a refactoring of the News pipeline; previously all News data
was written out to logs by the crawl. The refactored design makes it easier to add new features,
modify old content, and modularize processing. Over the last 6 years, the team has scaled the
build pipeline to 7 different bigtables containing all of our content (articles, images, hubs,
clustering), ~100 scanlets, 5 mapreduces, and several one-off servers. The entire pipeline is
running 350+ jobs per Build DC, with an end-to-end processing time of ~1-2 minutes. See Life of
an Article for more details.
Designed and implemented the initial repository bigtable, including helper functions that
make it relatively easy to add new columns quickly.
Wrote the first populator, which takes FetchReply's from the crawl log and inserts them
into the repository.
Hooked the original clustering workflow up to the new repository, making it possible to use
in production.
Continued maintenance on this critical component of code - it has held up well for 6 years
Infrastructure and documentation for how to write and test processes that work on the repo.
Multihoming Build Pipeline and Recrawl (implementation 2008-2009. Design
advice for build components ongoing)
I led the team (of Marisa Bauer, Martin Law and I) working on adding multihoming and recrawl
to the build pipeline. Our production build pipeline had been singly homed -- this is fine for many
products that can tolerate a 10 hour PCR in their preprocessing stages; but News can't be 10 hours
old. We needed a way to make the entire Build pipeline (from Crawl to Index) resilient to a
datacenter outage, without going stale. We also needed a way to be able to make more disruptive
changes (such as backfilling years of data) without disturbing the running system. At the same
time, Marisa was working on recrawl, which required rethinking the way we stored articles, so
we wanted an architecture that could address all three problems. We now have a system that
stores all copies of articles, then picks a "live" version to store in the primary article repository,
and can failover from one of our build datacenters to the other in 3 hours without going stale.
Led design process and was the go-to person for multihoming-friendly architecture of new
components
Modified article repo populators to handle multihoming, including writing extractor scanlet
Many code reviews for other parts of the project
Productionized the new setup (bringing up two datacenters, borgcfgs, monitoring, etc)
https://maricia.users.x20web.corp.google.com/www/google-resume.html 2/6
11/29/2017 Google Resume - Maricia Scott
https://maricia.users.x20web.corp.google.com/www/google-resume.html 3/6
11/29/2017 Google Resume - Maricia Scott
With UI designer Chad Thornton, redesigned the layout of the Borg UI. I was then responsible for
the implementation of the redesign.
Designed and implemented many features beyond the basic layout, including:
Machine page search box
User page regular-expression searching for names, users, cmdlines
Cookie preferences to allow users to customize the UI
Help links for all the major functionality
Many smaller features (user limits, usermaps, recent termination log, overrides, pending
tasks) that needed to be exposed on the UI when added to the borgmaster, often on a short
turnaround.
Machine utilization accounting. Since these numbers needed to be displayed on the UI, I had to
get accurate measurements for all resource usage.
Debugged and kept up with resource accounting
Improved upon and added to the borgmon resource rules, adding user and cell graphs to the
UI.
Helped initial efforts by SpaceJam to get resource accounting information
Worked on the frontend, including implementing the blogger look-and-feel (designed by Ellen
Beldner), added special search operators for blog title/author/etc, and added an rss output format.
Implemented the initial prototype for blogger profile pages, by using Googler data available in
feed format (as is used by Moma).
Enterprise crawl.
(10/2002 - 7/2004) Freshmaker The Enterprise Installation tool. Took over the tool written by an
intern, revamped it and expanded it to handle 15+ possible different installations of 2 different
types of Enterprise systems (deployed and new real-time versions) over both single machines and
clusters, and made it push-button so the testers could re-install with minimal manual intervention.
(1/2003-7/2004) Adminrunner regressiontest, Loadtest, Monitoring test Test infrastructure that
can be used within Enterprise regressiontests. Tests the "adminrunner" (SysAdmin UI for the
GSA), serving loadtest, monitoring uptime/crashing of backend binaries.
Initiated group - I wanted a mentor, many other people seemed to want one too! Let's do it.
We are a "go-to" group to talk to when new career development resources are starting up.
6-month alpha program with 30 (?) pairs in gwe community in Q3 2007
6-month beta program with 60 pairs worldwide over eng in 2008. China-Hr team found program
and replicated it for that office.
MentorsOnCall, a "short-term" mentoring program rolled out in 2009. Piloted in NY/CAM/PIT.
Now worldwide, with major groups in US-West, US-East, EMEA, ramping up JAPAC.
More than 100 mentees paired in MentorsOnCall
Initial survey results show positive impact for mentees
Was a mentee in alpha program, and mentor in beta and MentorsOncall
(2006, '07, '08, '09) US Anita Borg Scholarship committee. Participated in the scholarship retreats
in Mountain View
(2006, 2007) Australia Anita Borg Scholarship comittee. Attended retreat in 2006 in Sydney.
(Spring 2007) With Marisa Bauer, Lucy Zhang and Alice Tull, started a series of 20%
brainstorming sessions for female engineers.
3 teams of female engineers began investigating 20% projects such as gmail features and
video closed-captioning.
(Winter 2006) Organization committee member for National Engineering Week events for middle
and high school girls
(Winter 2006) Organization committee member for Take Your Child to Work day
(Winter 2005) Femeng diversity engineering interviews
https://maricia.users.x20web.corp.google.com/www/google-resume.html 5/6
11/29/2017 Google Resume - Maricia Scott
Awards
Peer Bonuses (12 Total): 2004 (1), 2007 (1), 2008 (3), 2009 (1), 2010 (3), 2011 (2), 2012 (1)
Spot Bonuses (8 Total): 2003 (1), 2004 (1), 2008 (1), 2009 (1), 2010 (2), 2011 (2)
Patents submitted (3 Total): 2007 (1), 2011 (2)
https://maricia.users.x20web.corp.google.com/www/google-resume.html 6/6