Monday, 23 July 2018

Enabling virtual environment for multiple LLVM versions on Linux, MacOS, Windows and Android

It's a painful process to continuously switch between different versions of LLVM (including the ones built locally), set up numerous environment variables, and fix cmake when benchmarking and optimizing different AI/ML/math libraries.

Eventually, we decided to automate this process by introducing "virtual environments" for LLVM and related tools in the CK framework similar to Python virtualenv while supporting Linux, MacOS, Windows and Android. It automatically sets all necessary environment variables for
different versions of different tools natively installed on a user machine.

CK virtual environment requires minimal dependencies (just python, pip and git) and you can try it on your machine as follows (you can check CK installation guide if you have issues):

 $ (sudo) pip install ck
 $ ck pull repo:ck-env

 $ ck detect soft --tags=compiler,llvm


CK will search for all installed clang instances on your machine (including Microsoft Visual Studio dependency on Windows) and will ask you which one to register for the CK virtual environment. You can then repeat this process and register multiple versions you use for testing. You can then see all registered virtual environments as follows:

 $ ck show envor
 $ ck show env --tags=compiler,llvm

Now you can start a specific virtual environment as follows:

 $ ck virtual env --tags=compiler,llvm

CK will set up PATH, LD_LIBRARY_PATH and other variables to point to a specific clang version, and will start bash on Linux/MacOS or shell on Windows. You can then use different environment variable specific to a given Clang instance and starting from CK_ in your own scripts:

 $ export | grep "CK_"

We also added the possibility to install pre-built versions of LLVM on different platforms using CK packages and automatically register CK virtual environment:

 $ ck search package --tags=compiler,llvm
 $ ck install package --tags=compiler,llvm,v6.0.0
 $ ck show env --tags=compiler,llvm
 $ ck virtual env --tags=compiler,llvm,v6.0.0
 

  > ${CK_CC_FULL_PATH} --version 
or
  > %CK_CC_FULL_PATH% --version

Finally, you can also rebuild LLVM from a trunk and again automatically register it in the CK virtual environment:

 $ ck install package --tags=compiler,llvm,vtrunk

You can find other software which you can automatically detect and register in the CK virtual environment using CK plugins in this online list.

If you are interested to know more about this and other CK functionality, please check the "CK getting started guide" or feel free to get in touch with the CK community!

Thursday, 8 February 2018

ACM ReQuEST: 1st open and reproducible tournament to co-design Pareto-efficient deep learning (speed, accuracy, energy, size, costs)

The first Reproducible Quality-Efficient Systems Tournament (ReQuEST) will debut at ASPLOS’18 ( ACM conference on Architectural Support for Programming Languages and Operating Systems, which is the premier forum for multidisciplinary systems research spanning computer architecture and hardware, programming languages and compilers, operating systems and networking).

Organized by a consortium of leading universities (Washington, Cornell, Toronto, Cambridge, EPFL) and the cTuning foundation, ReQuEST aims to provide a open-source tournament framework, a common experimental methodology and an open repository for continuous evaluation and multi-objective optimization of the quality vs. efficiency Pareto optimality of a wide range of real-world applications, models and libraries across the whole software/hardware stack.

ReQuEST will use the established artifact evaluation methodology together with the Collective Knowledge framework validated at leading ACM/IEEE conferences to reproduce results, display them on a live dashboard and share artifacts with the community. Distinguished entries will be presented at the associated workshop and published in the ACM Digital Library. To win, the results of an entry do not necessarily have to lie on the Pareto frontier, as an entry can be also praised for its originality, reproducibility, adaptability, scalability, portability, ease of use, etc.

The first ReQuEST competition will focus on deep learning for image recognition with an ambitious long-term goal to build a public repository of portable and customizable “plug&play” AI/ML algorithms optimized across diverse data sets, models and platforms from IoT to supercomputers (see live demo). Future competitions will consider other emerging workloads, as suggested by our Industrial Advisory Board.

For more information, please visit http://cKnowledge.org/request



Monday, 4 September 2017

Video from ARM presenting our collaborative DNN co-design technology

ARM shared a video from Embedded Vision Summit'17 with a brief demonstration of our open-source technology to collaboratively optimize Deep Learning Applications (sw/hw/model co-design) across diverse hardware and software stack:
 It is an on-going project bringing industry, academia and end-users together to collaboratively co-design more efficient software and hardware for emerging workloads such as deep learning:

Wednesday, 31 May 2017

Difference between MILEPOST GCC (machine learning based self-tuning compiler) and Collective Knowledge Framework

I recently received several questions about the differences between MILEPOST GCC compiler and Collective Knowledge Framework. This motivated me to write this slightly nostalgic post with the R&D history behind MILEPOST GCC and our CK framework.

MILEPOST GCC is an extended GCC which includes:

1) Interactive Compilation Interface aka ICI - a plugin based framework to expose or change various information and optimization decisions inside compilers at fine-grain level via external plugins. I originally developed it for Open64 and later collaborated with Zbigniew Chamski and colleagues from Google and Mozilla to make it a standard plugin framework for GCC.

2) Feature extractor developed by Mircea Namolaru from IBM as an ICI plugin to expose low-level program features at a function level (see available features here). It was also extended by Jeremy Singer (ft57–65).

However, to keep MILEPOST project complexity under control, I decided to separate MILEPOST GCC from an infrastructure to auto-tune workloads, build models and use them to predict optimizations. Therefore, I developed the first version of the cTuning framework to let users auto-tune GCC flags for shared benchmarks and data sets, use MILEPOST GCC to extract features for these benchmarks, build predictive models (possibly on the fly, i.e. via active learning), and then use them to predict optimizations for previously unseen programs (using ICI to change optimizations).

However, since it was still taking really too long to train models (my PhD students, Yuriy Kashnikov and Abdul Memon, spent 5 months preparing experiments in 2010 for our MILEPOST GCC paper), we decided to crowdsource autotuning via a common repository across diverse hardware provided by volunteers and thus dramatically speed up training process. Accelerating training process and improving the diversity of a training set is the main practical reason why my autotuning frameworks use crowdtuning mode by default nowadays ;) … 

The first cTuning framework turned out very heavy and difficult to install and port (David Del Vento and his interns from NCAR used it in 2010 to tune their workloads and provided lots of useful feedback — thanks guys!). This motivated me to develop a common research SDK (Collective Knowledge aka CK) to simplify, unify and automate general experiments in computer engineering.

CK framework lets the community share their artifacts (benchmarks, data sets, tools, models, experimental results) as customizable and reusable Python components with JSON API. So, you can take advantage from already shared components to quickly prototype your own research workflows such as benchmarking, multi-objective autotuning, machine-learning based optimization, run-time adaptation, etc. That is rather then re-building numerous ad-hoc in-house tools or scripts for autotuning and machine-learning based optimization which rarely survive after PhD students are gone, you can now participate in collaborative and open research with the community, reproduce and improve collaborative experiments, and build upon them ;) … That’s why ACM is now considering using CK for unified artifact sharing (see CK on the ACM DL front page).

You can also take advantage of integrated and cross-platform CK package manager which can prepare your workflow and install missing dependencies on Linux, Windows, MacOS and Android.

For example, see highest ranked artifact from CGO’17 shared as a customizable and portable CK workflow at GitHub.

To conclude my nostalgic overview of the MILEPOST project and CK ;) — MILEPOST GCC is now added to the CK as a unified workflow while taking advantage of a growing number of shared benchmarks, data sets, and optimization statistics (see CK GitHub repo).

I just didn’t have time to provide all the ML gluing, i.e. building models from all optimization statistics and features shared by the community at cKnowledge.org/repo . But it should be quite straightforward, so I hope our community will eventually help implement it. We are now particularly interested to check the prediction accuracy from different models (SVM, KNN, DNN, etc) or to find extra features which improve optimization prediction.

Friday, 10 March 2017

Enabling open and reproducible computer systems research: the good, the bad and the ugly

14 March 2017, CNRS webinar, Grenoble, France
Slides are now available here!

A decade ago my research nearly stalled. I was investigating how to crowdsource performance analysis and optimization of realistic workloads across diverse hardware provided by volunteers and combine it with machine learning [1]. Often, it was simply impossible to reproduce crowdsourced empirical results and build predictive models due to the continuously changing software and hardware stack. Worse still, lack of realistic workloads and representative data sets in our community severely limited the usefulness of such models.

All these problems forced motivated me to develop an open-source framework and repository (cTuning.org) to share, validate and reuse workloads, data sets, tools, experimental results and predictive models, while involving the community in this effort [2]. This experience, in turn, helped us initiate so-called Artifact Evaluation (AE) at the premier ACM conferences on parallel programming, architecture and code generation (CGO, PPoPP, PACT and SC). AE aims to independently validate experimental results reported in the publications, and to encourage code and data sharing.

I would like to invite you to my webinar “Enabling open and reproducible research at computer systems conferences: the good, the bad and the ugly” at CNRS Grenoble on 14 March 2017, 1:30PM (UTC+1). I will share our practical experience organizing Artifact Evaluation over the past three years, along with encountered problems and possible solutions. You can find further info at this GitHub page including links to the video stream and the pad for notes.

On the one hand, we have received incredible support from the research community, ACM, universities and companies. We have even received a record number of artifact submissions at the CGO/PPoPP'17 AE (27 vs 17 two years ago) sponsored by NVIDIA, dividiti and cTuning foundation. We have also introduced Artifact Appendices and co-authored the new ACM Result and Artifact Review and Badging policy now used at Supercomputing. 

On the other hand, the use of proprietary benchmarks, rare hardware platforms, and totally ad-hoc scripts to set up, run and process experiments all place a huge burden on evaluators. It is simply too difficult and time-consuming to customize and rebuild experimental setups, reuse artifacts and eventually build upon others’ efforts - the main pillars of open science!

I will then present Collective Knowledge (CK), our humble attempt to introduce a customizable workflow framework with a unified JSON API and a cross-platform package manager, which can automate experimentation and enable interactive articles, while automatically adapting to the ever evolving software and hardware [3]. I will also demonstrate a practical CK workflow for collaboratively optimizing deep learning engines (such as Caffe and TensorFlow) and models across different compilers, libraries, data sets and diverse platforms from constrained mobile devices to data centers (CK-Caffe on GitHub / Android app to crowdsource DNN optimization) [4].

Finally, I will describe our open research initiative to publicly evaluate artifacts and papers which we have successfully validated at CGO-PPoPP’17, and plan to keep building upon in the future [5]. 

I am looking forward to your participation and feedback! Please feel free to contact me at Grigori.Fursin@cTuning.org or grigori@dividiti.com if you have any questions or comments!

References
[3] Collective Knowledge: towards R&D sustainability”, Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2016
[4] Optimizing Convolutional Neural Networks on Embedded Platforms with OpenCL”, IWOCL'16, Vienna, Austria, 2016
[5] “Community-driven reviewing and validation of publications”, Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering @ PLDI’14, Edinburgh, UK

Wednesday, 15 February 2017

Our CGO'07 paper on machine learning based workload optimization received the CGO "Test of Time" award!

I had a really nice surprise at the last International Symposium on Code Generation and Optimization (CGO) - our CGO'07 research paper on "rapidly selecting good compiler optimizations using performance counters" co-authored with my colleagues from INRIA and the University of Edinburgh has won the "test of time" award! This award recognises outstanding papers published at GGO one decade earlier, whose influence is still strong today!

When preparing that paper, I really suffered a lot from the continuously changing software and hardware stack when performing and processing huge amounts of experiments to build and train models which could predict optimizations. That experience eventually motivated me to continue my work on machine learning based optimization as a community effort [1,2] while sharing all my benchmarks, data sets, models, tools and scripts as customizable and reusable components. It also motivated me to develop an open-source framework and repository to crowdsource empirical experiments (such as multi-objective optimization of deep learning and other realistic workloads) across diverse hardware and input provided by volunteers which later became known as the Collective Knowledge (CK):
    Therefore, I would really like to thank the community for such a strong support of our open and reproducible research initiative during past 10 years and for all the constructive feedback and help to develop common experimental infrastructure and methodology!

    For example, CK now assists various Artifact Evaluation initiatives at the premier ACM conferences on parallel programming, architecture and code generation (CGO, PPoPP, PACT, SC), which aim to encourage sharing of code and data, and independently validate experimental results from published papers:
    We also use CK to crowdsource benchmarking and optimizations of realistic workloads across embedded devices such as mobile phones and tablets, while publicly sharing all optimization statistics for further collaborative analysis and mining:
    dividiti (a startup based in Cambridge, UK) and the cTuning foundation (non-profit research organization) also use above technology to lead interdisciplinary research with ARM, General Motors and other companies to build faster, smaller, more power efficient and more reliable software and hardware:
    Hope you will also join our community effort to accelerate computer systems' research and enable cheap and efficient computing from IoT devices to supercomputers!



    Tuesday, 17 January 2017

    Artifact Evaluation discussion session at CGO/PPoPP'17

    News:  notes from this joint CGO-PPoPP AE session are now available online.

    We would like to invite all researchers to an open CGO-PPoPP'17 Artifact Evaluation discussion on February 6 (Monday) at 17:15-17:45 (room 400/402, Hilton Austin, Texas, USA).

    The program is the following:
    • Briefly presenting Artifact Evaluation results for CGO'17 and PPoPP'17

    • Announcing joint CGO/PPoPP'17 distinguished artifact awards:
      • 500$ cheque presented by Grigori Fursin from dividiti for the highest-ranked artifact implemented using Collective Knowledge (open-source framework to share artifacts as customizable and reusable Python components with JSON API, automate software installation/detection and quickly prototype cross-platform experimental workflows).
    • Discussing how to improve future AE and make it more scalable, introduce a new option of open reviews, discuss open challenges in computer engineering, and share knowledge about tools and techniques to enable collaborative and reproducible computer systems' research.
    We had a record number of artifact submissions this time: 27 vs 17 two years ago. It is really great to see that researchers are now taking AE seriously, but it also highlighted new issues:

    1) A growing number of diverse artifacts made it somewhat difficult to find AE members with appropriate knowledge, skills and access to rare hardware and software.

    2) Ad-hoc experimental setups placed considerable burden on AE members and committee when installing, running and processing very complex experiments particularly when native environment is required (for example, for performance analysis and tuning) and Docker/VM images are not suitable.

    3) It is still not clear whether we are ready to demand full validation of all experiments from a paper or still allow partial validation. However, we do understand that the complexity of experiments, lack of common experimental frameworks and methodology makes full validation of some experiments really challenging if possible.

    Note that to solve some of these issue we tried for the first time "open reviewing" this year: for example, we asked the community to help us evaluate several open-source artifacts already publicly available at the time of submission. It turned out very well (see links to public discussions) since we managed to find researchers with an access to rare hardware and appropriate skills. Furthermore, public comments helped authors communicate with reviewers directly (note that reviewers can still be anonymous) and fix all encountered issues immediately rather than waiting for the rebuttal.

    We really want to know your options and suggestions about how to solve these and improve AE. Therefore we hope you will be able to join us at this discussion session! Also do not hesitate to contact Artifact Evaluation Steering Committee directly! Remember that new AE procedures may affect you at the future conferences!

    Looking forward to your participation and suggestions!