TF Estimators KDD Paper
TF Estimators KDD Paper
TF Estimators KDD Paper
Flexibility in
High-Level Machine Learning Frameworks
Heng-Tze Cheng† Zakaria Haque† Lichan Hong† Mustafa Ispir† Clemens Mewald†∗
Illia Polosukhin† Georgios Roumpos† D Sculley† Jamie Smith† David Soergel†
Yuan Tang‡ Philipp Tucker† Martin Wicke†∗ Cassandra Xia† Jianwei Xie†
† ‡
Google, Inc. Uptake Technologies, Inc.
REFERENCES
Figure 4: Current usage of Estimators at Google. [1] Running your models in production with TensorFlow
Serving. https : / / research . googleblog . com / 2016 / 02 /
running-your-models-in-production-with.html, accessed 2017-02-
transformed into embedding columns before being fed into the 08.
hidden layers. The FeatureColumn API greatly simplifies how [2] Martı́n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy
we construct the input layer of our model. Additionally, the Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey
Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat
train-to-serve support of TensorFlow Estimators considerably Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner,
reduced the engineering effort to productionize the Watch Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke,
Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for
Next model. Furthermore, the Estimator framework made Large-Scale Machine Learning. In OSDI. 265–283.
it easy to implement new Estimators and experiment with [3] Amit Agarwal, Eldar Akchurin, Chris Basoglu, Guoguo Chen,
new model architectures such as multiple-objective learning Scott Cyphers, Jasha Droppo, Adam Eversole, Brian Guenter,
Mark Hillebrand, Ryan Hoens, Xuedong Huang, Zhiheng Huang,
to accommodate specific product needs. Vladimir Ivanov, Alexey Kamenev, Philipp Kranen, Oleksii
The initial version of the model pipeline was developed Kuchaiev, Wolfgang Manousek, Avner May, Bhaskar Mitra,
using low-level TensorFlow primitives prior to the release of Olivier Nano, Gaizka Navarro, Alexey Orlov, Marko Padmilac,
Hari Parthasarathi, Baolin Peng, Alexey Reznichenko, Frank
Estimators. While debugging why the model quality failed to Seide, Michael L. Seltzer, Malcolm Slaney, Andreas Stolcke,
match our expectation, we discovered critical bugs related to Yongqiang Wang, Huaming Wang, Kaisheng Yao, Dong Yu, Yu
Zhang, and Geoffrey Zweig. 2014. An Introduction to Com-
how the network layers were constructed and how the input putational Networks and the Computational Network Toolkit.
data were processed. Technical Report MSR-TR-2014-112. http://research.microsoft.
As an early adopter, Watch Next prompted the develop- com/apps/pubs/default.aspx?id=226641
[4] Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof
ment of missing features such as shared embedding columns. Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric
Shared embedding columns allow multiple semantically simi- Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky,
lar features to share a common embedding space, with the Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bis-
son, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-
benefit of transfer learning across features and smaller model Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier
size. Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski,
Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam
Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien
5.2 Adoption within Google Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh,
Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou,
Software engineers at Google have a variety of choices for Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier
how to implement their machine learning models. Before we Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe
developed the higher-level framework in TensorFlow, engi- Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina
Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov,
neers were effectively forced to implement one-off versions of Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César
the components in our framework. Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas
An internal survey has shown that, since we introduced Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah
Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro,
this framework and Estimators less than a year ago, close Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer,
to 1,000 Estimators have been checked into the Google Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher
Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel
codebase and more than 120,000 experiments have been Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth,
recorded (an experiment in this context is a complete train- Peter Sadowski, John Salvatier, François Savard, Jan Schlüter,
ing run; not all runs are recorded, so the true number is John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy
Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann,
significantly higher). Of those, over half (57%) use imple- S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay,
mentations of canned Estimators (e.g., LinearClassifier, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent,
DNNLinearCombinedRegressor). There are now over 20 Esti- Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J.
Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng
mator classes implementing various standard machine learn- Zhang, and Ying Zhang. 2016. Theano: A Python framework for
ing algorithms in the TensorFlow code base. Examples in- fast computation of mathematical expressions. arXiv e-prints
clude DynamicRnnEstimator (implementing dynamically un- abs/1605.02688 (May 2016). http://arxiv.org/abs/1605.02688
[5] Amazon. 2016. Dsstne. https://github.com/amznlabs/amazon-
rolled RNNs for classification or regression problems) and dsstne. (2016).
TensorForestEstimator (implementing random forests). Fig- [6] Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel,
Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir,
ure 4 shows the current distribution of Estimator usage. This
Vihan Jain, Levent Koc, Chiu Yuen Koo, Lukasz Lew, Clemens Heilman, diogo149, Brian McFee, Hendrik Weideman, takacsg84,
Mewald, Akshay Naresh Modi, Neoklis Polyzotis, Sukriti Ramesh, peterderivaz, Jon, instagibbs, Dr. Kashif Rasul, CongLiu, Brite-
Sudip Roy, Steven Euijong Whang, Martin Wicke, Jarek fury, and Jonas Degrave. 2015. Lasagne: First release. (Aug.
Wilkiewicz, Xin Zhang, and Martin Zinkevich. 2017. The Anatomy 2015). DOI:http://dx.doi.org/10.5281/zenodo.27878
of a Production-Scale Continuously-Training ML Platform. KDD [16] Sergio Guadarrama and Nathan Silberman. 2016. TF Slim. https:
[under review]. (2017). //github.com/tensorflow/tensorflow/tree/master/tensorflow/
[7] Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable contrib/slim. (2016).
Tree Boosting System. CoRR abs/1603.02754 (2016). http: [17] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev,
//arxiv.org/abs/1603.02754 Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor
[8] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature
Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. Embedding. In Proceedings of the 22Nd ACM International
2015. MXNet: A Flexible and Efficient Machine Learning Library Conference on Multimedia (MM ’14). ACM, New York, NY,
for Heterogeneous Distributed Systems. CoRR abs/1512.01274 USA, 675–678. DOI:http://dx.doi.org/10.1145/2647868.2654889
(2015). http://arxiv.org/abs/1512.01274 [18] Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks,
[9] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai,
Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J.
Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016.
Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res.
Deep Learning for Recommender Systems. In DLRS. 7–10. 17, 1 (Jan. 2016), 1235–1241. http://dl.acm.org/citation.cfm?
[10] François Chollet. 2015. keras. https://github.com/fchollet/keras. id=2946645.2946679
(2015). [19] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vin-
[11] Ronan Collobert, Samy Bengio, and Johnny Marithoz. 2002. cent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel,
Torch: A Modular Machine Learning Software Library. (2002). Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vander-
[12] The Scipy community. 2012. NumPy Reference Guide. SciPy.org. plas, Alexandre Passos, David Cournapeau, Matthieu Brucher,
http://docs.scipy.org/doc/numpy/reference/ Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Ma-
[13] Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu chine Learning in Python. J. Mach. Learn. Res. 12 (Nov. 2011),
Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew 2825–2830. http://dl.acm.org/citation.cfm?id=1953048.2078195
Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large [20] Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. 2015.
Scale Distributed Deep Networks. In Proceedings of the 25th Chainer: a Next-Generation Open Source Framework for Deep
International Conference on Neural Information Processing Learning. In Proceedings of Workshop on Machine Learning
Systems (NIPS’12). Curran Associates Inc., USA, 1223–1231. Systems (LearningSys) in The Twenty-ninth Annual Conference
http://dl.acm.org/citation.cfm?id=2999134.2999271 on Neural Information Processing Systems (NIPS). http://
[14] Deeplearning4j Development Team. 2016. Deeplearning4j: Open- learningsys.org/papers/LearningSys 2015 paper 33.pdf
source distributed deep learning for the JVM, Apache Software [21] Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin,
Foundation License 2.0. http://deeplearning4j.org. (2016). Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, and
[15] Sander Dieleman, Jan Schlüter, Colin Raffel, Eben Olson, Yoshua Bengio. 2015. Blocks and Fuel: Frameworks for deep
Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, Martin learning. CoRR abs/1506.00619 (2015). http://arxiv.org/abs/
Thoma, Eric Battenberg, Jack Kelly, Jeffrey De Fauw, Michael 1506.00619