Chainer-compiler調査（1）

今日から Chainer-compiler について調べてみよう。 DeepLearning コンパイラ、先日のFPGAXでも話題になってましたね（リモートでちょっと見てました）

PFN からの release はこれ

Chainer モデルのさらなる高速化、デプロイの簡便化、可搬性の向上に向けた実験的な取り組みについて | Preferred Research

今日見るソースとバージョンはこれ。experimental な点注意。

pfnet-research/chainer-compiler at 1c788abbaf5fc74533d332b3141d5a141b6db020 https://github.com/pfnet-research/chainer-compiler/tree/1c788abbaf5fc74533d332b3141d5a141b6db020

README をよむ

This is an experimental toolchain expected to be used with Chainer. This project aims to achieve a bunch of correlated goals such as

- Make Python Chainer model deployable without Python runtime
- Efficiently execute Chainer models with optimization techniques
- Integrate Chainer with other systems or domain-specific chips
- Be a playground to try algorithms for neural network frameworks
  without sacrificing flexibility and coverage of Chainer.

思ったこと

パフォーマンスはバックエンド次第だけど、モデルの前処理後処理の行列演算まで Python で記述しそれをコンパイルできるのはかなり便利（移植コスト等々の観点で）
- Python の構文木を解釈するってことで点では、利用者からしても嬉しい
  - 専用の関数等で AST 構築しなくても、Python で書けばコンパイルできる
- TVM でも Python→ 任意バックエンドへのコンパイルは実現できるけど、tvm の API で IR を構築する必要がある ← これ結構めんどいんだよな
  - tvm の hybrid frontend は Python 構文をコンパイルできるように見せかけている点で、chainer-compiler とはアプローチが違うような気がする（たぶん）

さらに読む

To achieve these goals, this toolchain

- Translates Python AST to extended ONNX. As this is a compiler rather than an execution tracer, it can export Python code with control-flows (e.g., LSTM with attention written by Python's loop)
- Modifies the graph for optimization, auto-differentiation, etc. It then generates deployable code.
- Runs the exported code with ChainerX's C++ API. Currently, the only backend it supports is a simple virtual machine implemented by ChainerX.

This project is still in the early stage and is not expected to be used by end-users. Interfaces can change quickly and some features may be abandoned. That said, it will be appreciated if you try this a bit and give us any feedbacks. Also, importantly, we are hiring! If you are interested in working on deep learning frameworks, please consider applying to Preferred Networks.

いつぞや ONNX+といってたやつかな
Python → ONNX+ → 最適化等々 → codegen って感じかな
現在対応するバックエンドは、ChainerX の VM 向け
- NVRTC と TVM のバックエンドも開発すすんでるみたい

Examples

雰囲気は example をみるとわかりやすい

chainer-compiler/train_mnist.py at 1c788abbaf5fc74533d332b3141d5a141b6db020 · pfnet-research/chainer-compiler https://github.com/pfnet-research/chainer-compiler/blob/1c788abbaf5fc74533d332b3141d5a141b6db020/examples/mnist/train_mnist.py

コンパイルしたグラフをそのまま Python から training-loop で呼び出せるのかー。

    # Set up a neural network to train
    # Classifier reports softmax cross entropy loss and accuracy at every
    # iteration, which will be used by the PrintReport extension below.
    mlp = MLP(args.unit, 10)
    if args.compile:
        mlp = chainer_compiler.compile(mlp, dump_onnx=args.dump_onnx) # これな
    model = L.Classifier(mlp)
    model.to_device(device)
    device.use()

Overview of components

コンポーネントを知る

chainer-compiler/overview.md at 1c788abbaf5fc74533d332b3141d5a141b6db020 · pfnet-research/chainer-compiler https://github.com/pfnet-research/chainer-compiler/blob/1c788abbaf5fc74533d332b3141d5a141b6db020/docs/overview.md

プロジェクトルート以下の各ディレクトリについて

ch2o
- Python-to-ONNX コンパイラ。CHainer2Onnx の略かな
elichika
- ch2o を置き換えるために（おっ？）開発してる Python-to-ONNX コンパイラ
- まだ top-level の api からは elichika は呼び出されていないみたい。ch2o を使っている。いずれ置き換えてくのか。
common
- C++ functions which are used by other components.
compiler
- コアっぽいなー
- ONNX グラフの操作
- 自動微分
- Naive code generators which uses NVRTC/TVM
- Generate code for XCVM, a virtual machine based on ChainerX
- gen_node.py については後述
runtime
- XCVM の実装. ChainerX 側に実装してあるのかと思ってたけど、ここにあったのね
python
- ここの先程の example の chainer_compiler.compile などが実装
tools
scripts

chainer-compiler/python

chainer-compiler/python at 1c788abbaf5fc74533d332b3141d5a141b6db020 · pfnet-research/chainer-compiler https://github.com/pfnet-research/chainer-compiler/tree/1c788abbaf5fc74533d332b3141d5a141b6db020/python

chainer_compiler_core.cc を pybind11 で wrap して、chainer_compiler.py からコールしてる
現状は ch2o をつかっている（elichika は使ってない）
chainer_compiler_tvm.py
- chainer_compiler.tvm.schedule_conv2d などを TVM の operator registry へ登録している
- 以下で呼ばれてる
  - tools/run_onnx.py
  - tools/train_imagenet.py
登録した Op 自体は compiler/tvm/compiler.cc で呼ばれてる。TVMCompiler is 何
- https://github.com/pfnet-research/chainer-compiler/blob/d6ab1e0612a8db1ed1bb20c792100e81de8b597a/compiler/tvm/compiler.cc#L142-L149

次にやりたいこと

動かしてみる

Python -> XCVM へ変換するまでのフローを追ってみる
XCVM の仕様を調べる
- Most operations are/should be simple wrappers of ChainerX's routines らしいのでわかりやすそうではある

通勤前の 1 時間程度で調べられる範囲を調べてみた。時間を区切って blog 書いたほうが続けられる気がした。気が済むまで調べてみて、その後まとめるかなという気持ち。

この辺の気持ちが強い

https://twitter.com/_tkato_/status/1088819537619894272