I recently received several questions about the differences between MILEPOST GCC compiler and Collective Knowledge Framework. This motivated me to write this slightly nostalgic post with the R&D history behind MILEPOST GCC and our CK framework.
MILEPOST GCC is an extended GCC which includes:
1) Interactive Compilation Interface aka ICI - a plugin based framework to expose or change various information and optimization decisions inside compilers at fine-grain level via external plugins. I originally developed it for Open64 and later collaborated with Zbigniew Chamski and colleagues from Google and Mozilla to make it a standard plugin framework for GCC.
2) Feature extractor developed by Mircea Namolaru from IBM as an ICI plugin to expose low-level program features at a function level (see available features here). It was also extended by Jeremy Singer (ft57–65).
1) Interactive Compilation Interface aka ICI - a plugin based framework to expose or change various information and optimization decisions inside compilers at fine-grain level via external plugins. I originally developed it for Open64 and later collaborated with Zbigniew Chamski and colleagues from Google and Mozilla to make it a standard plugin framework for GCC.
2) Feature extractor developed by Mircea Namolaru from IBM as an ICI plugin to expose low-level program features at a function level (see available features here). It was also extended by Jeremy Singer (ft57–65).
However, to keep MILEPOST project complexity under control, I decided to separate MILEPOST GCC from an infrastructure to auto-tune workloads, build models and use them to predict optimizations. Therefore, I developed the
first version of the cTuning framework
to let users auto-tune GCC flags for shared benchmarks and data
sets, use MILEPOST GCC to extract features for these benchmarks, build
predictive models (possibly on the fly, i.e. via active learning), and then use them to predict optimizations for previously unseen
programs (using ICI to change optimizations).
However, since it was still taking really too long to train models (my PhD students, Yuriy Kashnikov and Abdul Memon, spent 5 months preparing experiments in 2010 for our MILEPOST GCC paper), we decided to crowdsource autotuning via a common repository across diverse hardware provided by volunteers and thus dramatically speed up training process. Accelerating training process and improving the diversity of a training set is the main practical reason why my autotuning frameworks use crowdtuning mode by default nowadays ;) …
However, since it was still taking really too long to train models (my PhD students, Yuriy Kashnikov and Abdul Memon, spent 5 months preparing experiments in 2010 for our MILEPOST GCC paper), we decided to crowdsource autotuning via a common repository across diverse hardware provided by volunteers and thus dramatically speed up training process. Accelerating training process and improving the diversity of a training set is the main practical reason why my autotuning frameworks use crowdtuning mode by default nowadays ;) …
The first cTuning framework turned out very
heavy and difficult to install and port (David Del Vento and his interns
from NCAR used it in 2010 to tune their workloads and provided lots of
useful feedback — thanks guys!). This motivated me to develop a common
research SDK (Collective Knowledge aka CK) to simplify, unify and
automate general experiments in computer engineering.
CK
framework lets the community share their artifacts (benchmarks, data
sets, tools, models, experimental results) as customizable and reusable
Python components with JSON API. So, you can take advantage from already
shared components to quickly prototype your own research workflows such
as benchmarking, multi-objective autotuning, machine-learning based
optimization, run-time adaptation, etc. That is rather then re-building
numerous ad-hoc in-house tools or scripts for autotuning and
machine-learning based optimization which rarely survive after PhD
students are gone, you can now participate in collaborative and open
research with the community, reproduce and improve collaborative
experiments, and build upon them ;) … That’s why ACM is now considering
using CK for unified artifact sharing (see CK on the ACM DL front page).
You can also take advantage of integrated and cross-platform CK package manager which can prepare your workflow and install missing dependencies on Linux, Windows, MacOS and Android.
For example, see highest ranked artifact from CGO’17 shared as a customizable and portable CK workflow at GitHub.
To conclude my nostalgic overview of the MILEPOST project and CK ;) — MILEPOST GCC is now added to the CK as a unified workflow while taking advantage of a growing number of shared benchmarks, data sets, and optimization statistics (see CK GitHub repo).
I just didn’t have time to provide all the ML gluing, i.e. building models from all optimization statistics and features shared by the community at cKnowledge.org/repo . But it should be quite straightforward, so I hope our community will eventually help implement it. We are now particularly interested to check the prediction accuracy from different models (SVM, KNN, DNN, etc) or to find extra features which improve optimization prediction.
You can also take advantage of integrated and cross-platform CK package manager which can prepare your workflow and install missing dependencies on Linux, Windows, MacOS and Android.
For example, see highest ranked artifact from CGO’17 shared as a customizable and portable CK workflow at GitHub.
To conclude my nostalgic overview of the MILEPOST project and CK ;) — MILEPOST GCC is now added to the CK as a unified workflow while taking advantage of a growing number of shared benchmarks, data sets, and optimization statistics (see CK GitHub repo).
I just didn’t have time to provide all the ML gluing, i.e. building models from all optimization statistics and features shared by the community at cKnowledge.org/repo . But it should be quite straightforward, so I hope our community will eventually help implement it. We are now particularly interested to check the prediction accuracy from different models (SVM, KNN, DNN, etc) or to find extra features which improve optimization prediction.
No comments:
Post a Comment