Friday, 10 March 2017

Enabling open and reproducible computer systems research: the good, the bad and the ugly

14 March 2017, CNRS webinar, Grenoble, France
Slides are now available here!

A decade ago my research nearly stalled. I was investigating how to crowdsource performance analysis and optimization of realistic workloads across diverse hardware provided by volunteers and combine it with machine learning [1]. Often, it was simply impossible to reproduce crowdsourced empirical results and build predictive models due to the continuously changing software and hardware stack. Worse still, lack of realistic workloads and representative data sets in our community severely limited the usefulness of such models.

All these problems forced motivated me to develop an open-source framework and repository (cTuning.org) to share, validate and reuse workloads, data sets, tools, experimental results and predictive models, while involving the community in this effort [2]. This experience, in turn, helped us initiate so-called Artifact Evaluation (AE) at the premier ACM conferences on parallel programming, architecture and code generation (CGO, PPoPP, PACT and SC). AE aims to independently validate experimental results reported in the publications, and to encourage code and data sharing.

I would like to invite you to my webinar “Enabling open and reproducible research at computer systems conferences: the good, the bad and the ugly” at CNRS Grenoble on 14 March 2017, 1:30PM (UTC+1). I will share our practical experience organizing Artifact Evaluation over the past three years, along with encountered problems and possible solutions. You can find further info at this GitHub page including links to the video stream and the pad for notes.

On the one hand, we have received incredible support from the research community, ACM, universities and companies. We have even received a record number of artifact submissions at the CGO/PPoPP'17 AE (27 vs 17 two years ago) sponsored by NVIDIA, dividiti and cTuning foundation. We have also introduced Artifact Appendices and co-authored the new ACM Result and Artifact Review and Badging policy now used at Supercomputing. 

On the other hand, the use of proprietary benchmarks, rare hardware platforms, and totally ad-hoc scripts to set up, run and process experiments all place a huge burden on evaluators. It is simply too difficult and time-consuming to customize and rebuild experimental setups, reuse artifacts and eventually build upon others’ efforts - the main pillars of open science!

I will then present Collective Knowledge (CK), our humble attempt to introduce a customizable workflow framework with a unified JSON API and a cross-platform package manager, which can automate experimentation and enable interactive articles, while automatically adapting to the ever evolving software and hardware [3]. I will also demonstrate a practical CK workflow for collaboratively optimizing deep learning engines (such as Caffe and TensorFlow) and models across different compilers, libraries, data sets and diverse platforms from constrained mobile devices to data centers (CK-Caffe on GitHub / Android app to crowdsource DNN optimization) [4].

Finally, I will describe our open research initiative to publicly evaluate artifacts and papers which we have successfully validated at CGO-PPoPP’17, and plan to keep building upon in the future [5]. 

I am looking forward to your participation and feedback! Please feel free to contact me at Grigori.Fursin@cTuning.org or grigori@dividiti.com if you have any questions or comments!

References
[3] Collective Knowledge: towards R&D sustainability”, Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2016
[4] Optimizing Convolutional Neural Networks on Embedded Platforms with OpenCL”, IWOCL'16, Vienna, Austria, 2016
[5] “Community-driven reviewing and validation of publications”, Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering @ PLDI’14, Edinburgh, UK