{
"authorEmail": "ntv@google.com",
"authorName": "Nicolas Vasilache",
"bookmark": null,
"branch": "master",
"changes": [
{
"addLines": "5",
"awayPaths": [],
"commitHash": null,
"currentPath": "mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
"delLines": "79",
"fileType": "1",
"hunks": [
{
"addLines": null,
"corpus": " //===- LinalgBase.td - Linalg dialect base support ---------*- tablegen -*-===//\n //\n // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.\n // See https://llvm.org/LICENSE.txt for license information.\n // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception\n //\n //===----------------------------------------------------------------------===//\n //\n // This is the definition file for base linear algebra support.\n //\n //===----------------------------------------------------------------------===//\n \n #ifndef LINALG_BASE\n #define LINALG_BASE\n \n include \"mlir/IR/OpBase.td\"\n \n def Linalg_Dialect : Dialect {\n let name = \"linalg\";\n let description = [{\n The `linalg` dialect groups together a set of types, operations and\n transformations that are useful to implement a structured abstraction on\n buffers and tensors. These abstractions are useful for transformations and\n can lower to scalar load/store and other operations or to more general\n library calls.\n \n- The `linalg` dialect manipulates the following types and operations:\n-\n- ### Core data types and special ops.\n-\n- The following abstractions are used by the `linalg` dialect:\n-\n- #### Views\n- The current implementation uses the strided memref abstraction. In the\n- future other abstractions than strided memref will be used.\n-\n- #### `!linalg.range`\n- This data type is currently just a triple (`min`,`max`, `step`) that does\n- not pass function boundaries.\n-\n- #### `linalg.yield`\n- This op is used as a terminator within the appropriate `linalg` regions.\n-\n- In the future, richer `view` and `range` representations are expected, in\n- particular to represent sparse traversals.\n-\n- ### Metadata Ops\n- A set of ops that manipulate metadata but do not move memory. These ops take\n- `view` operands + extra attributes and return new `view`s. The returned\n- `view`s generally alias the operand `view`. At the moment the existing ops\n- are:\n-\n- * `std.view`,\n- * `std.subview`,\n- * `linalg.range`,\n- * `linalg.slice`,\n- * `linalg.transpose`.\n-\n- Future ops are added on a per-need basis but should include:\n-\n- * `linalg.reshape`,\n- * `linalg.tile`,\n- * `linalg.intersection`,\n- * `linalg.convex_union`,\n- * `linalg.difference` (would need to work on a list of views).\n-\n- ### Payload Ops\n- A set of payload carrying operations that implement the [structured ops](\n- https://docs.google.com/presentation/d/1P-j1GrH6Q5gLBjao0afQ-GfvcAeF-QU4GXXeSy0eJ9I/edit#slide=id.p\n- )\n- abstraction on tensors and buffers. `linalg` has `2` generic operations\n- `linalg.generic` and `linalg.indexed_generic` for expressing custom\n- operations.\n- This is subject to further evolution as transformations and analyses\n- continue to be developed.\n-\n- Additionally, `linalg` provides some commonly named operations:\n-\n- * `linalg.copy`,\n- * `linalg.fill`,\n- * `linalg.dot`,\n- * `linalg.matmul`,\n- * `linalg.conv`.\n-\n- Future ops are added on a per-need basis but should include:\n-\n- * `linalg.pad`.\n-\n- In an ideal world, all the named ops would be automatically generated from\n- a description in terms of only the `2` generic ops. Unfortunately we do not\n- have such support yet (contributions are most welcome).\n-\n- ### Convention for external library interop\n- The `linalg` dialect adopts a convention that is similar to `BLAS` when\n- offloading operations to fast library implementations: pass a non-owning\n- pointer to input and output data with additional metadata. This convention\n- is also found in libraries such as `MKL`, `OpenBLAS`, `BLIS`, `cuBLAS`,\n- `cuDNN`, etc.. and more generally at interface points across language\n- boundaries (e.g. C++ / Python).\n-\n- Generally, `linalg` passes non-owning pointers to strided memref data\n- structures to precompiled library calls linked externally. The name `view`\n- is used interchangeably in `linalg` to signify strided memref discussed at\n- length in the [strided memref RFC](\n- https://groups.google.com/a/tensorflow.org/g/mlir/c/MaL8m2nXuio/m/a_v07o9yBwAJ).\n+ Additional [Linalg Dialect\n+ Documentation](https://mlir.llvm.org/docs/Dialects/Linalg) and a\n+ [Rationale Document](https://mlir.llvm.org/docs/RationaleLinalgDialect) are\n+ are also available and should be read first before going in the details of\n+ the op semantics.\n }];\n }\n \n // Whether a type is a RangeType.\n def LinalgIsRangeTypePred : CPred<\"$_self.isa()\">;\n def Range : Type;\n \n #endif // LINALG_BASE\n",
"delLines": null,
"isMissingNewNewline": null,
"isMissingOldNewline": null,
"newLength": "39",
"newOffset": "1",
"oldLength": "113",
"oldOffset": "1"
}
],
"id": "1789842",
"metadata": {
"hash.effect": "BdkRJb0dKElA",
"line:first": 27
},
"newProperties": [],
"oldPath": "mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
"oldProperties": [],
"type": "2"
},
{
"addLines": "624",
"awayPaths": [],
"commitHash": null,
"currentPath": "mlir/docs/RationaleLinalgDialect.md",
"delLines": "0",
"fileType": "1",
"hunks": [
{
"addLines": null,
"corpus": "+# Linalg Dialect Rationale: The Case For Compiler-Friendly Custom Operations\n+\n+[TOC]\n+\n+# Introduction\n+\n+## Positioning\n+\n+\n+\n+This document describes the key design principles \n+that led to the existing implementation of Linalg and aims at exposing\n+the tradeoffs involved when building higher-level Intermediate\n+Representations (IR) and Dialects to facilitate code\n+generation. Consider the simplified schema describing codegen in MLIR.\n+Linalg is designed to solve the High-level Hierarchical Optimization\n+(HHO box) and to interoperate nicely within a\n+*Mixture Of Expert Compilers* environment (i.e. the *CGSel* box). \n+This work is inspired by a wealth of [prior art](#prior_art) in\n+the field, from which it seeks to learn key lessons. This documentation\n+and introspection effort also comes in the context of the proposal for a\n+working group for discussing the [Development of high-level Tensor Compute\n+Primitives dialect(s) and\n+transformations](https://llvm.discourse.group/t/development-of-high-level-tensor-compute-primitives-dialect-s-and-transformations/388/3). \n+We hope that the lessons from prior art, the design principles outlined in\n+this doc and the architecture of Linalg can help inform the community on a \n+path to defining these High-Level Tensor Compute Primitives.\n+\n+\n+## Inception\n+ \n+Linalg started as a pragmatic dialect to bootstrap code generation in MLIR, by\n+*defining away* complex code generation problems like precise dependence\n+analysis or polyhedral code generation and by introducing the ability to call\n+into fast library implementations when available. Linalg **defines ops and\n+transformations declaratively** and was originally restricted to ops with\n+*linear-algebra like* semantics (`pointwise`, `matmul`, `conv`...). This\n+approach enables building a high-level productivity-first codegen solution that\n+leverages *both* compiler optimizations *and* efficient library implementations\n+so as not to miss out on simple performance benefits. For example, if\n+one's favorite HPC library or ISA has a `matmul` primitive running at 95% of\n+the achievable peak performance, for operands stored in some memory, one should\n+be able to **use the primitive** when possible *and* generate code otherwise.\n+ \n+However, as the design of Linalg co-evolved with the design of MLIR, it became\n+apparent that it could extend to larger application domains than just machine\n+learning on dense tensors.\n+ \n+The design and evolution of Linalg follows a *codegen-friendly* approach where\n+the IR and the transformations evolve hand-in-hand.\n+The key idea is that op semantics *declare* and transport information that is\n+traditionally obtained by compiler analyses. \n+This information captures the legality and applicability of transformations and\n+is **not lost by lowering prematurely to loop or CFG form**. The key\n+transformations are designed so as to **preserve this information** as long as\n+necessary. For example, `linalg.matmul` remains `linalg.matmul` after tiling\n+and fusion.\n+ \n+Furthermore, Linalg decouples transformation validity from profitability\n+considerations and voluntarily leaves the latter aside in the first iteration\n+(see the [suitability for search](#suitability_for_search) guiding principle).\n+ \n+The first incarnation of these ideas was presented as an example at the\n+EuroLLVM 2019 developer's meeting as part of the\n+[Linalg section](https://llvm.org/devmtg/2019-04/slides/Tutorial-AminiVasilacheZinenko-MLIR.pdf)\n+of the first [MLIR Tutorial](https://www.youtube.com/watch?v=cyICUIZ56wQ).\n+ \n+## Evolution\n+Since the initial implementation, the design has evolved with, and partially\n+driven the evolution of the core MLIR infrastructure to use\n+[Regions](https://mlir.llvm.org/docs/LangRef/#regions),\n+[OpInterfaces](https://mlir.llvm.org/docs/Interfaces/),\n+[ODS](https://mlir.llvm.org/docs/OpDefinitions/) and\n+[Declarative Rewrite Rules](https://mlir.llvm.org/docs/DeclarativeRewrites/)\n+among others. The approach adopted by Linalg was extended to become\n+[StructuredOps abstractions](\n+https://drive.google.com/drive/u/0/folders/1sRAsgsd8Bvpm_IxREmZf2agsGU2KvrK-),\n+with Linalg becoming its incarnation on tensors and buffers.\n+It is complemented by the\n+[Vector dialect](https://mlir.llvm.org/docs/Dialects/Vector/),\n+which define structured operations on vectors, following the same rationale and\n+design principles as Linalg. (Vector dialect includes the higher-level\n+operations on multi-dimensional vectors and abstracts away the lowering to\n+single-dimensional vectors).\n+ \n+The Linalg dialect itself grew beyond linear algebra-like operations to become\n+more expressive, in particular by providing an abstraction of a loop nest\n+supporting parallelism, reductions and sliding windows around arbitrary MLIR\n+[regions](https://mlir.llvm.org/docs/LangRef/#regions). It also has the\n+potential of growing beyond *dense* linear-algebra to support richer data\n+types, such as sparse and ragged tensors and buffers.\n+ \n+Linalg design remains open to evolution and cross-pollination with other\n+dialects and approaches. It has been successfully used as the staging ground\n+for code generation-related abstractions, spinning off the generalization of\n+the following:\n+- the `!linalg.view` type folded into the *\"Strided MemRef\"* type while\n+preserving structure to allow calling into external C++ libraries with\n+unsurprising ABI conventions;\n+- the `linalg.view` and `linalg.subview` ops evolved into the standard dialect;\n+- the `linalg.for`, `linalg.load` and `linalg.store` ops evolved into a prelude\n+to the *structured control flow* dialect (named `LoopOps`).\n+More components can be extracted, redesigned and generalized when new uses or\n+requirements arise.\n+ \n+Several [design questions](#open_issues) remain open in Linalg, which does not\n+claim to be a general solution to all compilation problems.\n+It does aim at driving thinking and implementations of domain-specific\n+abstractions where programmer's intent can be captured at a very high level,\n+directly in the IR.\n+ \n+Given the evolution of the scope, it becomes apparent that a better name than\n+\"Linalg\" could remove some of the confusions related to the dialect (and the\n+underlying approach), its goals and limitations.\n+\n+# Prior Art\n+Linalg draws inspiration from decades of prior art to design a modern a\n+pragmatic solution. The following non-exhaustive list refers to some of the\n+projects that influenced Linalg design:\n+ \n+- [ONNX](https://onnx.ai/),\n+- [LIFT](https://www.lift-project.org/),\n+- [XLA](https://www.tensorflow.org/xla/architecture),\n+- [Halide](https://halide-lang.org/) and [TVM](https://tvm.apache.org/),\n+- [TACO](http://tensor-compiler.org/),\n+- [Darkroom](http://darkroom-lang.org/) and [Terra](http://terralang.org/),\n+- [Sigma-LL](http://spiral.ece.cmu.edu:8080/pub-spiral/pubfile/cgo16-preprint_248.pdf),\n+- [Tensor Comprehensions](https://arxiv.org/abs/1802.04730),\n+- [Polyhedral Compilers](https://en.wikipedia.org/wiki/Polytope_model),\n+- the [Affine dialect](https://mlir.llvm.org/docs/Dialects/Affine/) in MLIR,\n+- Generic Loop Transformations (see Ken Kennedy's\n+[Optimizing Compilers for Modern Architectures](\n+https://www.elsevier.com/books/optimizing-compilers-for-modern-architectures/allen/978-0-08-051324-9))\n+- Traditional compiler CFGs with SSA forms.\n+ \n+Additionally, experience with the following tools proved very valuable when\n+thinking holistically about how all these components interplay all the way\n+up to the user and down to the hardware:\n+ \n+- the [Torch](http://torch.ch/) machine-learning framework,\n+- the LLVM compiler, specifically in JIT mode,\n+- high-performance libraries (MKL, CUBLAS, FBFFT)\n+- the [PeachPy](https://www.cs.utexas.edu/users/flame/BLISRetreat/BLISRetreatTalks/PeachPy.pdf) assembler\n+- current and potentially upcoming hardware ISAs.\n+ \n+The novelty of MLIR's code base and its unprecedented support for defining and\n+mixing abstractions, enabling one to reflect on and integrate the key elements\n+of the prior art success as well as avoid the common pitfalls in the area of\n+code generation. Thus, instead of diverging into a discussion about the\n+implications of adopting any of the existing solutions, Linalg had the\n+possibility to build on all of them and learn from their experience while\n+leveraging the benefit of hindsight.\n+ \n+The following reflections on prior art have influenced the design of Linalg.\n+The discussion is by no means exhaustive but should capture the key motivations\n+behind Linalg.\n+ \n+## Lessons from ONNX\n+ONNX is a specification of operations that appear in Machine Learning\n+workloads. As such, it is predominantly driven by the expressiveness requirements\n+of ML, and less by the considerations of IR design for HPC code generation.\n+ \n+Similarly to ONNX, Linalg defines *\"semantically charged\" named ops*.\n+But it also considers *transformations on these ops* as a key component and\n+defines the IR to support the transformations, preferring transformations over\n+expressiveness if necessary.\n+ \n+Linalg hopes to additionally address the following:\n+- facilitate frontend-compiler co-design by taking into account compiler\n+ transformations and lowerings in op definition;\n+- minimize the set of available ops by making them non-overlapping with each\n+ other, thus simplifying the intermediate representation.\n+ \n+## Lessons from LIFT\n+[LIFT](https://www.lift-project.org/) is a system to write computational\n+kernels based on functional abstractions. Transformations are\n+represented by additional nodes in the IR, whose semantics are at the\n+level of the algorithm (e.g. `partialReduce`).\n+LIFT applies and composes transformations by using [local rewrite\n+rules](https://www.lift-project.org/presentations/2015/ICFP-2015.pdf) that\n+embed these additional nodes directly in the functional abstraction.\n+ \n+Similarly to LIFT, Linalg uses local rewrite rules implemented with the MLIR\n+[Declarative Rewrite Rules](https://mlir.llvm.org/docs/DeclarativeRewrites/)\n+mechanisms.\n+ \n+Linalg builds on, and helps separate concerns in the LIFT approach as follows:\n+- transformations are either separated from the representation or expressed as\n+ composable attributes that are independent of the actual computation,\n+ avoiding intricate effects on performance;\n+- abstractions are split into smaller components (e.g., control flow and data\n+ structure abstractions) potentially reusable across different dialects in the\n+ MLIR's open ecosystem.\n+ \n+LIFT is expected to further influence the design of Linalg as it evolve. In\n+particular, extending the data structure abstractions to support non-dense\n+tensors can use the experience of LIFT abstractions for\n+[sparse](https://www.lift-project.org/publications/2016/harries16sparse.pdf)\n+and [position-dependent\n+arrays](https://www.lift-project.org/publications/2019/pizzuti19positiondependentarrays.pdf).\n+\n+## Lessons from XLA\n+[XLA](https://www.tensorflow.org/xla/architecture) is one of the first\n+post-Theano ML compilers that was introduced as a pragmatic compilation\n+solution for TensorFlow. It shines on Google's xPU \n+hardware and is an important piece of the puzzle. It is particularly good at\n+(1) transforming code back and forth between the scalar and the vector\n+worlds, (2) passing function boundaries for handling both host and device\n+code, and (3) complying to stringent requirements imposed by energy-efficient\n+xPUs.\n+XLA followed a pragmatic design process where the compiler is given perfect\n+knowledge of each op's semantic, all starting from the mighty `conv` and\n+`matmul` ops. XLA transformations consist of writing emitters that compose, as C++\n+functions. Perfect op semantics knowledge has 2 big benefits: (1) transformations are\n+correct by construction (2) very strong performance on difficult xPU targets.\n+\n+Similarly, Linalg ops *\"know their semantics\"* and *\"know how to transform and\n+lower themselves\"*. The means by which this information is made available and\n+how it is used in MLIR are, however, very different.\n+\n+Linalg hopes to additionally address the following:\n+- HLOs are expressive as a whole, but each op has very limited and fixed\n+semantics: ops are not configurable. As a consequence, HLOs have evolved into\n+a too large set of ops whose semantics intersect.\n+This echoes the ops proliferation problem also exhibited by ONNX.\n+- Reliance on perfect op knowledge leads to situations where transformations and\n+ops end up needing to know about each other's semantics (e.g. during fusion).\n+Since the transformations themselves are not simple local rewrite patterns\n+(unlike LIFT), code complexity grows quickly.\n+- XLA lacks an independent IR that can be inspected, unit tested and used\n+independently. This monolithic design makes the system not portable: xPU passes\n+and GPU passes do not share much code.\n+\n+## Lessons from Halide and TVM\n+[Halide](https://halide-lang.org/) is a DSL embedded in C++ that provides a\n+way of metaprogramming the HalideIR and applying transformations declaratively\n+to let the expert user transform and optimize the program in tailored ways.\n+Halide, initially targeted the SIGGRAPH community but is now more generally\n+applicable. [TVM](https://tvm.apache.org/) is an evolution of Halide into the\n+machine learning and deep-neural network space, based on HalideIR.\n+\n+The Halide transformation methodology follows similar principles to the\n+[URUK](http://icps.u-strasbg.fr/~bastoul/research/papers/GVBCPST06-IJPP.pdf)\n+and\n+[CHiLL](https://pdfs.semanticscholar.org/6a46/20589f63f3385707d2d590f7b7dc8ee4d74f.pdf)\n+compiler transformation frameworks, but without the strengths (and especially\n+complexity) of the polyhedral model.\n+\n+Halide particularly shines at making the HPC transformation methodology\n+accessible to $\\Omega$(10-100) users, at a time when polyhedral tools are\n+still only accessible to $\\Omega$(1-10) users. Halide makes heavy usage of\n+canonicalization rules that are also very prevalent in MLIR.\n+\n+Linalg hopes to additionally address the following:\n+- Halide scheduling is powerful and explores a large swath of possible\n+transformations. But it's still too hard for newcomers to use or extend. The \n+level of performance you get from Halide is very different depending on\n+whether one is a seasoned veteran or a newcomer. This is especially true as\n+the number of transformations grow.\n+- Halide raises rather than lowers in two ways, going counter-current to the \n+design goals we set for high-level codegen abstractions in in MLIR. First,\n+canonical Halide front-end code uses explicit indexing and math on scalar \n+values, so to target BLAS/DNN libraries one needs to add pattern matching\n+which is similarly brittle as in the affine case. While Halide's performance \n+is on par with the libraries on programmable targets (CPU/GPU), that \n+approach doesn't work on mobile accelerators or on xPUs, where the framework\n+ingests whole-tensor operations. \n+Second, reductions and scans are expressed using serial iteration, again \n+requiring pattern matching before they can be transformed (e.g. to do a \n+reduction using atomics, or hierarchically). The lesson to draw is that we \n+should start with higher-level primitives than Halide.\n+\n+## Lessons from Tensor Comprehensions\n+[Tensor Comprehensions](https://arxiv.org/abs/1802.04730) is a\n+high-level language to express tensor computations with a syntax\n+generalizing the Einstein notation, coupled to an end-to-end\n+compilation flow capable of lowering to efficient GPU code. It was\n+integrated with 2 ML frameworks: Caffe2 and PyTorch. \n+\n+\n+\n+The compilation flow combines [Halide](#lessonshalide) and a Polyhedral Compiler\n+derived from [ISL](https://en.wikipedia.org/wiki/Integer_set_library)\n+and uses both HalideIR and the ISL *schedule-tree* IR. \n+The compiler provides a collection of polyhedral compilation\n+algorithms to perform fusion and favor multi-level parallelism and\n+promotion to deeper levels of the memory hierarchy.\n+Tensor Comprehensions showed that, fixing a few predefined strategies\n+with parametric transformations and tuning knobs, can already provide\n+great results. In that previous work, simple \n+genetic search combined with an autotining framework was sufficient\n+to find good implementations in the ***non-compute bound regime***.\n+This requires code versions obtainable by the\n+various transformations to encompass versions that get close to the\n+roofline limit.\n+The ultimate goal of Tensor Comprehensions was to concretely mix\n+Halide high-level transformations with polyhedral mid-level\n+transformations and build a pragmatic system that could take advantage\n+of both styles of compilation.\n+\n+Linalg hopes to additionally address the following:\n+- Halide was never properly used in Tensor Comprehensions beyond shape\n+inference. Most of the investment went into simplifying polyhedral\n+transformations and building a usable end-to-end system. MLIR was\n+deemed a better infrastructure to mix these types of compilation.\n+- The early gains provided by reusing established infrastructures\n+(HalideIR and ISL schedule trees) turned into more impedance mismatch\n+problems than could be solved with a small tactical investment.\n+- Tensor Comprehensions emitted CUDA code which was then JIT compiled\n+with NVCC from a textual representation. While this was a pragmatic\n+short-term solution it made it hard to perform low-level rewrites that\n+would have helped with register reuse in the ***comput-bound regime***.\n+- The same reliance on emitting CUDA code made it difficult to\n+create cost models when time came. This made it artifically harder to\n+prune out bad solutions than necessary. This resulted in excessive\n+runtime evaluation, as reported in the paper [Machine Learning Systems\n+are Stuck in a Rut](https://dl.acm.org/doi/10.1145/3317550.3321441).\n+\n+Many of those issues are naturally addressed by implementing these ideas\n+in the MLIR infrastructure.\n+\n+## Lessons from Polyhedral compilers\n+The polyhedral model has been on the cutting edge of loop-level optimization for\n+decades, with several incarnations in production compilers such as\n+[GRAPHITE](https://gcc.gnu.org/wiki/Graphite) for GCC and\n+[Polly](https://polly.llvm.org) for LLVM. Although it has proved crucial to\n+generate efficient code from domain-specific languages such as\n+[PolyMage](http://mcl.csa.iisc.ac.in/polymage.html) and [Tensor\n+Comprehensions](https://dl.acm.org/doi/abs/10.1145/3355606), it has never been\n+fully included into mainstream general-purpose optimization pipelines. Detailed\n+analysis of the role of polyhedral transformations is provided in the\n+[simplified polyhedral\n+form](https://mlir.llvm.org/docs/RationaleSimplifiedPolyhedralForm/) document\n+dating back to the inception of MLIR.\n+ \n+In particular, polyhedral abstractions have proved challenging to integrate with\n+a more conventional compiler due to the following.\n+- The transformed code (or IR) quickly gets complex and thus hard to analyze and\n+ understand.\n+- Code generation from the mathematical form used in the polyhedral model relies\n+ on non-trivial exponentially complex algorithms.\n+- The mathematical form is rarely composable with the SSA representation and\n+ related algorithms, on which most mainstream compilers are built today.\n+- Expressiveness limitations, although addressed in the scientific literature\n+ through, e.g., summary functions, often remain present in actual\n+ implementations.\n+ \n+The Affine dialect in MLIR was specifically designed to address the integration\n+problems mention above. In particular, it maintains the IR in the same form\n+(loops with additional constraints on how the bounds are expressed) throughout\n+the transformation, decreasing the need for one-shot conversion between\n+drastically different representations. It also embeds the polyhedral\n+representation into the SSA form by using MLIR regions and thus allows one to\n+combine polyhedral and SSA-based transformations.\n+ \n+## Lessons from the Affine dialect\n+The Affine dialect in MLIR brings the polyhedral abstraction closer to the\n+conventional SSA representation. It addresses several long-standing integration\n+challenges as described above and is likely to be more suitable when compiling\n+from a C language-level abstraction.\n+ \n+MLIR makes it possible to start from a higher-level abstraction than C, for\n+example in machine learning workloads. In such cases, it may be possible to\n+avoid complex analyses (data-flow analysis across loop iterations is\n+exponentially complex) required for polyhedral transformation by leveraging the\n+information available at higher levels of abstractions, similarly to DSL\n+compilers. Linalg intends to use this information when available and ensure\n+*legality of transformations by construction*, by integrating legality\n+preconditions in the op semantics (for example, loop tiling can be applied to\n+the loop nest computing a matrix multiplication, no need to additionally rely on\n+affine dependence analysis to check this). This information is not readily\n+available in the Affine dialect, and can only be derived using potentially\n+expensive pattern-matching algorithms.\n+ \n+Informed by the practical experience in polyhedral compilation and with the\n+Affine dialects in particular, Linalg takes the following decisions.\n+- **Discourage loop skewing**: the loop skewing transformation, that is\n+ sometimes used to enable parallelization, often has surprising (negative)\n+ effects on performance. In particular, polyhedral auto-transformation can be\n+ expressed in a simpler way without loop skewing; skewing often leads to\n+ complex control flow hampering performance on accelerators such as GPUs.\n+ Moreover, the problems loop skewing addresses can be better addressed by other\n+ approaches, e.g., diamond tiling. In the more restricted case of ML workloads,\n+ multi-for loops with induction variables independent of each other (referred\n+ to as hyper-rectangular iteration domains in the literature) such as the\n+ proposed\n+ [affine.parallel]((https://llvm.discourse.group/t/rfc-add-affine-parallel/350)\n+ are sufficient in the majority of cases.\n+- **Declarative Tiling**: the *tiling* transformation is ubiquitous in HPC code\n+ generation. It can be seen as a decomposition of either the iteration space or\n+ the data space into smaller regular parts, referred to as tiles. Polyhedral\n+ approaches, including the Affine dialect, mostly opt for iteration space\n+ tiling, which introduces additional control flow and complex address\n+ expressions. If the tile sizes are not known during the transformation (so\n+ called parametric tiling), the address expressions and conditions quickly\n+ become non-affine or require exponentially complex algorithms to reason about\n+ them. Linalg focuses tiling on the data space instead, creating views into the\n+ buffers that leverage MLIR's strided `memref` abstraction. These views compose\n+ and the complexity of access expressions remains predictable.\n+- **Preserve high-level information**: Linalg maintains the information provided\n+ by the op semantics as long as necessary for transformations. For example, the\n+ result of tiling a matrix multiplication is loops around a smaller matrix\n+ multiplication. Even with pattern-matching on top of the Affine dialect, this\n+ would have required another step of pattern-matching after the transformation.\n+ \n+Given these choices, Linalg intends to be a better fit for **high-level\n+compilation** were significantly more information is readily available in the\n+input representation and should be leveraged before lowering to other\n+abstractions. Affine remains a strong abstraction for mid-level transformation\n+and is used as a lowering target for Linalg, enabling further transformations\n+and combination of semantically-loaded and lower-level inputs. As such, Linalg\n+is intended to complement Affine rather than replace it.\n+\n+# Core Guiding Principles\n+\n+## Transformations and Simplicity First\n+The purpose of the Linalg IR and its operations is primarily to:\n+- develop a set of key transformations, and\n+- make them correct by construction by carefully curating the set of\n+generic operation properties that drive applicability, and\n+- make them very simple to implement, apply, verify and especially\n+maintain.\n+\n+The problem at hand is fundamentally driven by compilation of domain-specific\n+workloads for high-performance and parallel hardware architectures: **this is\n+an HPC compilation problem**.\n+\n+The selection of relevant transformations follows a codesign approach and\n+involves considerations related to:\n+- concrete current and future needs of the application domain,\n+- concrete current and future hardware properties and ISAs,\n+- understanding of strengths and limitations of [existing approaches](#prior_art),\n+- taking advantage of the coexistence of multiple levels of IR in MLIR,\n+\n+One needs to be methodical to avoid proliferation and redundancy. A given\n+transformation could exist at multiple levels of abstraction but **just\n+because one can write transformation X at level Y absolutely does not mean\n+one should**. This is where evaluation of existing\n+systems and acknowledgement of their strengths and weaknesses is crucial:\n+simplicity and maintainability aspects must be first-order concerns. Without\n+this additional effort of introspection, a design will not stand the test of\n+time. At the same time, complexity is very hard to ward off. It seems one needs\n+to suffer complexity to be prompted to take a step back and rethink\n+abstractions.\n+\n+This is not merely a reimplementation of idea X in system Y: simplicity\n+**must be the outcome** of this introspection effort.\n+\n+## Preservation of Information\n+The last two decades have seen a proliferation of Domain-Specific Languages\n+(DSLs) that have been very successful at limited application domains.\n+The main commonality between these systems is their use of a significantly\n+richer structural information than CFGs or loops.\n+Still, another commonality of existing systems is to lower to LLVM very quickly,\n+and cross a wide abstraction gap in a single step. This process often drops\n+semantic information that later needs to be reconstructed later,\n+when it is not irremediably lost.\n+\n+These remarks, coupled with MLIR's suitability for defining IR at multiple\n+levels of abstraction led to the following 2 principles.\n+\n+### Declarative Specification: Avoid Raising\n+\n+Compiler transformations need static structural information (e.g. loop-nests,\n+graphs of basic blocks, pure functions etc). When that structural information\n+is lost, it needs to be reconstructed.\n+\n+A good illustration of this phenomenon is the notion of *raising* in polyhedral\n+compilers: multiple polyhedral tools start by raising from a simplified C\n+form or from SSA IR into a higher-level representation that is more amenable\n+to loop transformations.\n+\n+In advanced polyhedral compilers, a second type of raising\n+may typically exist to detect particular patterns (often variations of\n+BLAS). Such patterns may be broken by transformations making their detection\n+very fragile or even just impossible (incorrect).\n+\n+MLIR makes it easy to define op semantics declaratively thanks to the use of\n+regions and attributes. This is an ideal opportunity to define new abstractions\n+to convey user-intent directly into the proper abstraction.\n+\n+### Progressive Lowering: Don't Lose Information too Quickly\n+\n+Lowering too quickly to affine, generic loops or CFG form reduces the\n+amount of structure available to derive transformations from. While\n+manipulating loops is a net gain compared to CFG form for a certain class of\n+transformations, important information is still lost (e.g. parallel loops, or\n+mapping of a loop nest to an external implementation).\n+\n+This creates non-trivial phase ordering issues. For instance, loop fusion may\n+easily destroy the ability to detect a BLAS pattern. One possible alternative\n+is to perform loop fusion, tiling, intra-tile loop distribution and then hope to\n+detect the BLAS pattern. Such a scheme presents difficult phase-ordering\n+constraints that will likely interfere with other decisions and passes.\n+Instead, certain Linalg ops are designed to maintain high-level information\n+across transformations such as tiling and fusion.\n+\n+MLIR is designed as an infrastructure for ***progressive lowering***.\n+Linalg fully embraces this notion and thinks of codegen in terms of\n+*reducing a potential function*. That potential function is loosely\n+defined in terms of number of low-level instructions in a particular\n+Linalg ops (i.e. how heavy or lightweight the Linalg op is). \n+Linalg-based codegen and transformations start from higher-level IR\n+ops and dialects. Then each transformation application reduces the\n+potential by introducing lower-level IR ops and *smaller* Linalg ops.\n+This gradually reduces the potential, all the way to Loops + VectorOps\n+and LLVMIR.\n+\n+## Composable and Declarative Transformations\n+Complex and impactful transformations need not be hard to manipulate, write or\n+maintain. Mixing XLA-style high-level op semantics knowledge with generic\n+properties to describe these semantics, directly in MLIR, is a promising way to:\n+- Design transformations that are correct by construction, easy to\n+write, easy to verify and easy to maintain. \n+- Provide a way to specify transformations and the units of IR they manipulate\n+declaratively. In turn this allows using local pattern rewrite rules in MLIR\n+(i.e. [DRR](https://mlir.llvm.org/docs/DeclarativeRewrites/)).\n+- Allow creating customizable passes declaratively by simply selecting rewrite\n+rules. This allows mixing transformations, canonicalizations, constant folding\n+and other enabling rewrites in a single pass. The result is a system where pass\n+fusion is very simple to obtain and gives hope to solving certain\n+[phase ordering issues](https://dl.acm.org/doi/10.1145/201059.201061).\n+\n+## Suitability for Search and Machine Learning\n+Compiler heuristics are hand-crafted human-engineered features: it is\n+ripe for disruption by machine-learning techniques.\n+To enable search, compiler transformations should be fine-grained, \n+[composable](#declarative_transformations) and expose tuning parameters that\n+can modify their behavior, guided by lessons from previous experience\n+with [Tensor Comprehensions](#lessonstc).\n+\n+Of course, we are not advocating for using ML everywhere in the stack\n+immediately: low-level compilation and machine models are still quite performant\n+in LLVM. However, for the high-level and mid-level optimization problems,\n+models need to be conditioned (probalistically) on the low-level\n+compiler which acts as a blackbox. For these reasons we prioritize the\n+design of IR and transformations with search-friendly properties over\n+building cost models.\n+Still, this does not mean Linalg refuses cost models: instead we\n+prefer to invest in infrastructure that will enable [ML-based\n+techniques to automatically build cost\n+models](http://homepages.inf.ed.ac.uk/hleather/publications/2009_autofeatures_cgo.pdf). \n+\n+## Extensibility and Future-Proofness\n+MLIR allows defining IR for structured control flow and structured\n+data types. We choose to take advantage of these properties for the\n+reasons described above.\n+In particular, the `MemRefType` represents dense non-contiguous memory regions.\n+This structure should extend beyond simple dense data types and generalize to\n+ragged, sparse and mixed dens/sparse tensors as well as to trees, hash tables,\n+tables of records and maybe even graphs.\n+\n+For such more advanced data types, the control-flow required to traverse the\n+data structures, termination conditions etc are much less simple to analyze and\n+characterize statically. As a consequence we need to also design solutions that\n+stand a chance of evolving into runtime-adaptive computations (e.g.\n+inspector-executor in which an *inspector* runs a cheap runtime\n+analysis on the data to configure how the *executor* should run).\n+While there is no concrete solution\n+today to solve these problems in MLIR, it is pretty clear that perfect\n+static knowledge and analyses will not be serious contenders for these problems.\n+\n+# Key Observations\n+The following key observations have influenced the design of Linalg and helped\n+reconcile [core guiding principles](#guiding_principles) with real-world\n+requirements when producing an implementation based on MLIR.\n+\n+## Algorithms + Data Structures = Programs\n+This is a twist on Niklaus Wirth's formulation but captures the essence of the\n+design of Linalg: control-flow does not exist in a vacuum, independently of\n+data.\n+On the contrary, there is a very strong relationship between control-flow and\n+data structures: one cannot exist without the other. This has multiple\n+implications on the [semantics of Linalg Ops](#linalg_ops) and their\n+transformations. In particular, this observation influences whether\n+certain transformations are better done:\n+- as control flow or data structure manipulation,\n+- on Linalg ops attributes or on loops after some partial lowering\n+occurred,\n+- as extensions to the Linalg dialect in terms of new ops or attributes.\n+\n+## The Dialect Need not be Closed Under Transformations\n+This is probably the most surprising and counter-intuitive\n+observation. When one designs IR for transformations, closed-ness is\n+often a nonnegotiable property.\n+This is a key design principle of polyhedral IRs such as\n+[URUK](http://icps.u-strasbg.fr/~bastoul/research/papers/GVBCPST06-IJPP.pdf)\n+and \n+[ISL-based IRs](https://en.wikipedia.org/wiki/Integer_set_library):\n+they are closed under affine transformations.\n+In MLIR, multiple dialects coexist and form a coherent whole. After \n+experimenting with different alternatives, it became clear that strict\n+dialect closed-ness wasn't necessary and could be relaxed. Previous\n+systems did not have simple and principled means of building new IR\n+and probably suffered from this limitation. We conjecture this is a\n+key reason they required the IR to be closed under transformations. \n+\n+Despite the fact that Linalg ops only allow perfectly nested\n+semantics, once tiling and fusion kick in, imperfectly nested loops\n+are gradually introduced.\n+In other words, imperfectly nested control flow appears as ***the result of\n+applying key transformations***.\n+\n+Considering the *potential* described during the discussion on\n+[Progressive Lowering](#progressive_lowering), closed-ness under\n+transformation would dictate that the potential remains constant.\n+In contrast, Linalg advocates for ***monotonicity*** under\n+transformations.\n+\n+## Summary of Existing Alternatives a Picture\n+Lastly, we summarize our observations of lessons from [Prior\n+Art](#prior_art)---when viewed under the lense of our [Core Guiding\n+Principles](#guiding_principles)---with the following picture.\n+\n+\n+\n+This figure is not meant to be perfectly accurate but a rough map of\n+how we view the distribution of structural information in existing\n+systems, from a codegen-friendly angle. Unsurprisingly, the\n+[Linalg Dialect](https://mlir.llvm.org/docs/Dialects/Linalg) and its\n+future evolutions aspire to a position in the top-right of this map.\n+\n",
"delLines": null,
"isMissingNewNewline": null,
"isMissingOldNewline": null,
"newLength": "624",
"newOffset": "1",
"oldLength": "0",
"oldOffset": "0"
}
],
"id": "1789841",
"metadata": {
"hash.effect": "MQ9S7aAtL7lF",
"line:first": 1
},
"newProperties": {
"unix:filemode": "100644"
},
"oldPath": null,
"oldProperties": [],
"type": "1"
},
{
"addLines": "468",
"awayPaths": [],
"commitHash": null,
"currentPath": "mlir/docs/Dialects/Linalg.md",
"delLines": "4",
"fileType": "1",
"hunks": [
{
"addLines": null,
"corpus": " # Linalg Dialect\n \n-To generate the documentation:\n+[TOC]\n \n-```sh\n-mlir-tblgen --gen-op-doc -I /path/to/mlir/include \\\n-/path/to/mlir/include/mlir/Dialect/Linalg/IR/LinalgDoc.td\n+# Rationale\n+\n+\n+\n+Linalg is designed to solve the High-level Hierarchical Optimization\n+(HHO box) in MLIR and to interoperate nicely within a\n+*Mixture Of Expert Compilers* environment (i.e. the *CGSel* box). \n+\n+The [Rationale Document](https://mlir.llvm.org/docs/RationaleLinalgDialect)\n+goes into significantly more design and architectural decision details.\n+\n+# Set of Key Transformations\n+\n+The following key transformations have been central to driving the design of\n+Linalg. They are all implemented in terms of the properties of the\n+`linalg.generic` OpInterface and avoid the pitfall of relying on hardcoded\n+one-off op knowledge.\n+\n+The textual form description of these transformations is left for future\n+work. Still, it is useful to at least the key transformations that are\n+performed on the Linalg IR and that have influenced its design:\n+1. Progressive Buffer Allocation.\n+1. Parametric Tiling.\n+1. Promotion to Temporary Buffer in Fast Memory.\n+1. Tiled Producer-Consumer Fusion with Parametric Tile-And-Fuse.\n+1. Map to Parallel and Reduction Loops and Hardware.\n+1. Vectorization: Rewrite in Vector Form.\n+1. Lower to Loops (Affine and/or Generic).\n+1. Lower to Library Calls or Special Instructions, Intrinsics or ISA.\n+1. Partially Lower to Iterations Over a Finer-Grained Linalg Op.\n+\n+# High-Level Description of Linalg Ops\n+Linalg takes at least some inspiration from all previously [listed prior\n+art](#prior_art). The design enables the definition of ***CustomOps*** with\n+generic properties that enable [key transformations](#key_transformations),\n+including lowering to scalar load/store and other operations or to external\n+library calls and intrinsics.\n+\n+These ops can have ***either tensor or buffer operands***.\n+\n+## Payload-Carrying Ops\n+Linalg defines two payload carrying operations that implement the [structured ops](\n+https://docs.google.com/presentation/d/1P-j1GrH6Q5gLBjao0afQ-GfvcAeF-QU4GXXeSy0eJ9I/edit#slide=id.p\n+) abstraction on tensors and buffers. This is architected as two generic operations\n+`linalg.generic` (resp. `linalg.indexed_generic`) that can express custom\n+operations with *index-free semantics* (resp. *indexing semantics*).\n+The properties of these generic ops are the result of applying the\n+[guiding principles](#guiding_principles). They are listed next, with a brief example\n+and discussion for each.\n+\n+### Property 1: Input and Output Operands Define The Iteration Space\n+A `linalg.generic` op fully *derives* the specification of its iteration space\n+from its operands.\n+The property enforces that a localized IR element (the op) *has* all the information\n+needed to synthesize the control-flow required to iterate over its operands,\n+according to their type. This notion of IR localization bears some resemblance\n+to [URUK](http://icps.u-strasbg.fr/~bastoul/research/papers/GVBCPST06-IJPP.pdf).\n+\n+Consider the following, partially specified, `linalg.generic` example:\n+```\n+#attrs = {args_in: 1, args_out: 1}\n+func @example(%A: memref, \n+ %B: memref>) {\n+ linalg.generic #attrs (%2, %3): memref,\n+ memref>\n+ return\n+}\n+```\n+\n+The property \"*Input and Output Operands Define The Iteration Space*\" is\n+materialized by a lowering into a form that will resemble:\n+```\n+func @example(%A: memref, \n+ %B: memref>) {\n+ %M = \"dim\" %A, 0: index\n+ %N = \"dim\" %B, 0: index\n+ %eq = eq %M, %N: i1 // iteration space is consistent with data\n+ assert(%eq): (i1) -> ()\n+ for %i = 0 to %M {\n+ %a = load %A[%i]: memref\n+ %b = load %B[%i]: memref, layout2>\n+ // compute arg types match elemental tensor types\n+ %c = \"some_compute\"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)\n+ store %c, %B[%i]: memref, layout2>\n+ }\n+ return\n+}\n+```\n+\n+The property participates in simplifying analyses and transformations. For\n+instance, it guarantees no out-of bounds access can occur by construction\n+(assuming dynamic operand dimensions agree with each other, which is the\n+purpose of the `assert` runtime check).\n+\n+Before lowering to loop form, loop induction variables and iterators are *not yet\n+materialized*. This is a necessary property if we want an abstraction that\n+works on both tensor values and buffers because ***values don\u2019t escape\n+loops/nesting***.\n+\n+The main implications are that:\n+1. The semantics of the ops are *restricted to operate on structured data\n+types*, on which we can define an iterator.\n+2. This does not model arbitrary code with side-effects.\n+\n+We do not think these are serious limitations in practice because MLIR is all\n+about mixing different levels of abstractions in the same IR. As long as\n+Linalg can progressively lower to the next level of abstraction, it can also\n+be just bypassed for things that do not fit.\n+\n+At the same time, conditioning op semantics on structured data types is a very\n+promising path towards extensibility to non-dense tensors as experience with\n+LIFT abstractions for\n+[sparse](https://www.lift-project.org/publications/2016/harries16sparse.pdf)\n+and [position-dependent\n+arrays](https://www.lift-project.org/publications/2019/pizzuti19positiondependentarrays.pdf),\n+as well as [TACO](http://tensor-compiler.org/), has shown.\n+\n+### Property 2: Reversible Mappings Between Control and Data Structures\n+A `linalg.generic` *defines* the mapping between the iteration space (i.e. the\n+loops) and the data. \n+\n+Consider the following, partially specified, `linalg.generic` example:\n+```\n+#indexing_maps = { \n+ (i, j) -> (j, i), \n+ (i, j) -> (j) \n+}\n+#attrs = {args_in: 1, args_out: 1, indexings: indexing_maps}\n+func @example(%A: memref, \n+ %B: memref>) {\n+ linalg.generic #attrs (%A, %B): memref,\n+ memref>\n+ return\n+}\n+```\n+\n+The property \"*Reversible Mappings Between Control and Data Structures*\" is\n+materialized by a lowering into a form that will resemble:\n+```\n+#attrs = {args_in: 1, args_out: 1, indexings: indexing_maps}\n+func @example(%A: memref, \n+ %B: memref>) {\n+ // loop bounds determined from data sizes by \u201cinverting the map\u201d\n+ %J = \"dim\" %2, 0: index\n+ %I = \"dim\" %2, 1: index\n+ %J2 = \"dim\" %3, 0: index\n+ // iteration space is consistent with data + mapping inference \n+ %eq = \"eq\" %J, %J2: i1\n+ \"assert\" %eq: (i1) -> ()\n+ for %i = 0 to %I { // loop order is fully defined by indexing maps\n+ for %j = 0 to %J { // arbitrary permutations are possible\n+ %a = \"load\" %2, %j, %i: memref<8x?xf32>\n+ %b = \"load\" %3, %j: memref>\n+ %c = \"some_compute\"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)\n+ \"store\" %c, %3, %j: memref>\n+ }\n+ }\n+ return\n+}\n+```\n+\n+This mapping needs to be reversible because we want to be\n+able to go back and forth between the two and answer questions such as:\n+- Given a subset of the iteration space, what subset of data does it read and\n+write?\n+- Given a subset of data read or written, what subset of the iteration space\n+is responsible for this read or write?\n+\n+Answering these `2` questions is one of the main analyses that Linalg uses to \n+implement transformations such as tiling, tiled producer-consumer fusion, and\n+promotion to temporary buffers in fast memory.\n+\n+In the current implementation, `linalg.generic` uses a list of [AffineMaps]().\n+This is a pragmatic short-term solution, but in the longer term note that\n+this property could be even evaluated dynamically, similarly to\n+inspector-executor algorithms.\n+\n+### Property 3: The Type Of Iterators is Defined Explicitly\n+A `linalg.generic` op fully *declares* the type of its iterators. This\n+information is used in transformations.\n+\n+These properties are derived from established practice in the field and mirror\n+the properties from Ken Kennedy's [Optimizing Compilers for Modern Architectures](\n+https://www.elsevier.com/books/optimizing-compilers-for-modern-architectures/allen/978-0-08-051324-9).\n+The key idea of legality of loop transformations expressed by Kennedy is\n+that ***the lexicographic order of all dependence vectors must be\n+preserved***.\n+\n+This can be better captured directly at the loop level thanks to specific\n+iterator types, among which:\n+*parallel*, *reduction*, *partition*, *permutable/monotonic*, *sequential*, \n+*dependence distance*, ...\n+\n+These types are traditionally the result of complex dependence analyses and\n+have been referred to as \"*bands*\" in the polyhedral community (e.g. *parallel\n+bands*, *permutable bands*, etc, in\n+[ISL](https://en.wikipedia.org/wiki/Integer_set_library) schedule tree\n+parlance). \n+\n+Specifying the information declaratively in a `linalg.generic` allows\n+conveying properties that may be hard (or even impossible) to derive from\n+lower-level information. These properties can be brought all the way to the\n+moment when they are useful for transformations, used and then discarded.\n+\n+Additionally, these properties may also be viewed as a contract that the \n+frontend/user guarantees and that the compiler may take advantage of. The\n+common example is the use of data-dependent reduction semantics for\n+specifying histogram computations. If the frontend has additional knowledge\n+that proper atomic operations are available, it may be better to specify\n+parallel semantics and use the special atomic in the computation region.\n+\n+At this time, Linalg only has an explicit use for *parallel* and *reduction*\n+loops but previous experience shows that the abstraction generalizes.\n+\n+### Property 4: The Compute Payload is Specified With a Region\n+A `linalg.generic` op has a compute payload that is fully generic thanks to \n+the use of\n+[Regions](https://github.com/llvm/llvm-project/blob/58265ad42a90ae8905be6a447cb42e53529a54a0/mlir/docs/LangRef.md#regions).\n+\n+The region takes as arguments the scalar elemental types of the tensor or\n+buffer operands of the `linalg.generic`. For flexibility and ability to match\n+library calls, additional special values may be passed. For instance, a\n+`linalg.fill` operation takes a buffer and an additional scalar value.\n+\n+At this time there are no additional restrictions to the region\n+semantics. This is meant to allow the exploration of various design tradeoffs\n+at the intersection of regions and iterator types.\n+In particular, the frontend is responsible for the semantics of iterator types\n+to correspond to the operations inside the region: the region can capture \n+buffers arbitrarily and write into them. If this conflicts with some parallel\n+iterator requirement, this is undefined behavior.\n+\n+Concretely, consider the following, partially specified, `linalg.generic`\n+example:\n+```\n+#indexing_maps = { \n+ (i, j) -> (i, j), \n+ (i, j) -> (i, j) \n+}\n+#attrs = {args_in: 1, args_out: 1, indexings: #indexing_maps}\n+func @example(%A: memref, %B: memref, %C: memref) {\n+ linalg.generic #attrs (%A, %B, %C) {\n+ ^bb0(%a: f32, %b: f32):\n+ %c = addf %a, %b : f32\n+ return %c : f32\n+ }: memref, memref, memref\n+ return\n+}\n ```\n+\n+The property \"*The Compute Payload is Specified With a Region*\" is\n+materialized by a lowering into a form that will resemble:\n+```\n+func @example(%A: memref, %B: memref, %C: memref) {\n+ %M = dim %A, 0: index\n+ %N = dim %B, 1: index\n+ for %i = 0 to %M {\n+ for %j = 0 to %N {\n+ %a = load %A[%i, %j]: memref\n+ %b = load %B[%i, %j]: memref>\n+ %c = addf %a, %b : f32\n+ store %c, %C[%i, %j]: memref\n+ }\n+ }\n+ return\n+}\n+```\n+\n+In the process of lowering to loops and lower-level constructs, similar\n+requirements are encountered, as are discussed in the [inlined call op\n+proposal](https://llvm.discourse.group/t/introduce-std-inlined-call-op-proposal/282/2).\n+We expect to be able to reuse the common lower-level infrastructure provided\n+it evolves to support both region arguments and captures.\n+\n+### Property 5: May Map To an External Library Call\n+A `linalg.generic` op may map to an external library call by specifying a\n+`SymbolAttr`. At this level of abstraction, the important glue is the ability \n+to perform transformations that preserve the structure necessary to ***call\n+the external library after different transformations have been applied***.\n+\n+This involves considerations related to preservation of op semantics\n+and integration at the ABI level. Regardless of whether one wants to use\n+external library calls or a custom ISA, the problem for codegen is similar: \n+preservation of a fixed granularity.\n+\n+Consider the following, partially specified, `linalg.generic`\n+example:\n+```\n+#fun_attr = \"pointwise_add\"\n+#indexing_maps = { \n+ (i, j) -> (i, j), \n+ (i, j) -> (i, j) \n+}\n+#attrs = {args_in: 1, args_out: 1, indexings: #indexing_maps, fun: #fun_attr}\n+func @example(%A: memref, %B: memref, %C: memref) {\n+ linalg.generic #attrs (%A, %B, %C) {\n+ ^bb0(%a: f32, %b: f32):\n+ %c = addf %a, %b : f32\n+ return %c : f32\n+ }: memref, memref, memref\n+ return\n+}\n+```\n+\n+The property \"*Map To an External Library Call*\" is\n+materialized by a lowering into a form that will resemble:\n+\n+```\n+func @pointwise_add_sxsxf32_sxsxf32(memref, memref, memref) -> ()\n+\n+func @example(%A: memref, %B: memref, %C: memref) {\n+ call @pointwise_add_sxsxf32_sxsxf32 (%A, %B, %C): \n+ (memref, memref, memref) -> ()\n+ return\n+}\n+```\n+\n+Which, after lowering to LLVM resembles:\n+```\n+func @pointwise_add_sxsxf32_sxsxf32(!llvm<\"{ float*, i64, [2 x i64], [3 x i64] }*\">, \n+ !llvm<\"{ float*, i64, [2 x i64], [3 x i64] }*\">, \n+ !llvm<\"{ float*, i64, [2 x i64], [3 x i64] }*\">) -> ()\n+\n+func @example(%A: !llvm<\"{ float*, i64, [2 x i64], [3 x i64] }*\">, \n+ %B: !llvm<\"{ float*, i64, [2 x i64], [3 x i64] }*\">, \n+ %C: !llvm<\"{ float*, i64, [2 x i64], [3 x i64] }*\">) {\n+ llvm.call @pointwise_add_sxsxf32_sxsxf32 (%A, %B, %C): \n+ (!llvm<\"{ float*, i64, [2 x i64], [3 x i64] }*\">...) -> ()\n+ return\n+}\n+```\n+\n+#### Convention For External Library Interoperability\n+The `linalg` dialect adopts a convention that is similar to `BLAS` when\n+offloading operations to fast library implementations: pass a non-owning\n+pointer to input and output data with additional metadata. This convention\n+is also found in libraries such as `MKL`, `OpenBLAS`, `BLIS`, `cuBLAS`,\n+`cuDNN`, etc.. and more generally at interface points across language\n+boundaries (e.g. C++ / Python).\n+\n+Generally, `linalg` passes non-owning pointers to View data structures\n+to pre-compiled library calls linked externally.\n+\n+There is an [ongoing\n+discussion](https://llvm.discourse.group/t/lowering-optional-attributes-in-linalg-structuredops-to-standard-dialect/333/3)\n+on the topic of extending interoperability in the presence of key attributes.\n+\n+### Property 6: Perfectly Nested Writes To The Whole Output Operands\n+Perfectly nested loops form a particularly important class of structure that\n+enables key loop transformations such as tiling and mapping to library calls.\n+Unfortunately, this type of structure is easily broken by transformations such\n+as partial loop fusion. Tiling and mapping to library calls become more\n+challenging, or even infeasible. Linalg ops adopt perfect-nestedness\n+as a first-class property: the structure cannot be broken and is\n+transported in the IR by construction.\n+\n+A `linalg.generic` op represents a perfectly nested loop nest that writes the\n+entire memory region. This is a structural constraint across regions and\n+loops that has proven to be key in simplifying transformations.\n+\n+One particular point to mention is that converting imperfectly nested code\n+into perfectly nested code can often be done with enough loop distribution \n+and embedding of conditionals down to the innermost loop level.\n+\n+Previous experience with Tensor Comprehensions gave us the intuition that\n+forcing innermost control-flow nesting is a lot like writing data-parallel\n+code with arrays of boolean values and predication. \n+This type of trick has also been used before in polyhedral compilers to\n+convert non-affine control into affine compute dependencies.\n+\n+While it may be possible to automate such rewrites from generic IR,\n+`linalg.generic` just forces the semantics for now.\n+\n+The key implication is that this conversion to deep predication needs to be\n+undone once we are done with Linalg transformations. \n+After iterators and induction variables are materialized (i.e. after lowering\n+out of `linalg.generic` occurred), the overall performance will be greatly\n+influenced by the quality of canonicalizations, foldings and *Loop Independent\n+Code Motion* (LICM).\n+\n+In the grander scheme, the reliance on late LICM was deemed a necessary risk.\n+\n+### Putting it Together\n+As it stands, the six properties above define the semantics of a\n+`linalg.generic` op. It is an open question whether all of these semantics are\n+strictly necessary in practice and whether some should or could be derived \n+automatically while still maintaining the [core guiding\n+principles](#guiding_principles).\n+\n+For the time being, we have settled on the combination of these properties\n+because of empirical evidence building and working on multiple high-level\n+compilers. As we lay those down and engage more with the community, we expect\n+multiple rounds of discussions and design changes to the original architecture.\n+\n+## Data Representation: Views\n+The current implementation uses the [Strided MemRef (a.k.a View)](\n+https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio)\n+abstraction. The name *View* is used interchangeably in `linalg` to signify\n+*Strided MemRef*.\n+In the future we expect to use other structured data types and\n+support ragged, mixed-sparse and other types. As mentioned\n+[previously](#lessonslift) we expect to draw on the\n+experience from existing LIFT abstractions for\n+[sparse](https://www.lift-project.org/publications/2016/harries16sparse.pdf)\n+and [position-dependent\n+arrays](https://www.lift-project.org/publications/2019/pizzuti19positiondependentarrays.pdf).\n+\n+## Metadata Ops\n+A set of ops that manipulate metadata but do not move memory. These ops take\n+`view` operands + extra attributes and return new `view`s. The returned\n+`view`s generally alias the operand `view`. At the moment the existing ops\n+are:\n+\n+ * `std.view`,\n+ * `std.subview`,\n+ * `linalg.range`,\n+ * `linalg.slice`,\n+ * `linalg.transpose`.\n+ * `linalg.reshape`,\n+\n+Future ops are added on a per-need basis but should include:\n+\n+ * `linalg.tile`,\n+ * `linalg.intersection`,\n+ * `linalg.convex_union`,\n+ * `linalg.difference` (would need to work on a list of views).\n+\n+These additional operations correspond to abstractions that have been known to\n+work in the field of large-scale distributed stencil computations.\n+\n+In a longer-term future, the abstractions from [Legion data-centric\n+programming model](https://legion.stanford.edu/overview/) seem generally\n+appealing.\n+\n+## Named Payload-Carrying Ops\n+Additionally, `linalg` provides a small subset of commonly named operations:\n+\n+ * `linalg.copy`,\n+ * `linalg.fill`,\n+ * `linalg.dot`,\n+ * `linalg.matmul`,\n+ * `linalg.conv`.\n+\n+These named operations adhere to the `linalg.generic` op interface. Work is in\n+progress to define declarative mechanisms to automatically generate named ops\n+from a description in terms of only the generic op interface. \n+\n+This is the main reason there are only a small number of ops today: we expect\n+them to be auto-generated from Tablegen soon.\n+\n+# Open Issues and Design Alternatives\n+Multiple open issues and design alternatives are in flight and it is time to\n+lay them out for the community to discuss and pick apart:\n+1. Should `linalg.generic` support nesting?\n+1. Should `linalg.generic` regions take views or only scalars?\n+1. Should we try to solve automatic differentiation at this level of\n+abstraction?\n+1. Are all the six properties really necessary?\n+1. Is this relying too much on declarative specification and would we be\n+better off relying more on analyses?\n+1. Is this general enough for the community's needs? If not how should this be\n+extended, if at all?\n+...\n+\n+These key questions (and much more) should be really thought of in the general\n+context of MLIR in which different levels of IR interoperate seamlessly. In \n+practice, it is not necessary (or beneficial) to try and solve all problems in the \n+same IR.\n",
"delLines": null,
"isMissingNewNewline": null,
"isMissingOldNewline": null,
"newLength": "472",
"newOffset": "1",
"oldLength": "8",
"oldOffset": "1"
}
],
"id": "1789840",
"metadata": {
"copy:lines": {
"338": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
94,
"-"
],
"339": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
95,
"-"
],
"340": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
96,
"-"
],
"341": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
97,
"-"
],
"342": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
98,
"-"
],
"343": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
99,
"-"
],
"344": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
100,
"-"
],
"413": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
48,
"-"
],
"414": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
49,
"-"
],
"415": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
50,
"-"
],
"416": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
51,
"-"
],
"417": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
52,
"-"
],
"418": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
53,
"-"
],
"419": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
54,
"-"
],
"420": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
55,
"-"
],
"421": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
56,
"-"
],
"422": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
57,
"-"
],
"424": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
84,
"-"
],
"425": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
85,
"-"
],
"426": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
86,
"-"
],
"427": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
62,
"-"
],
"428": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
63,
"-"
],
"429": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
64,
"-"
],
"430": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
65,
"-"
],
"431": [
"mlir/include/mlir/Dialect/Linalg/IR/LinalgBase.td",
66,
"-"
]
},
"hash.effect": "pirncXYIhgkH",
"line:first": 3
},
"newProperties": [],
"oldPath": "mlir/docs/Dialects/Linalg.md",
"oldProperties": [],
"type": "2"
}
],
"creationMethod": "arc",
"dateCreated": "1580673142",
"dateModified": "1580673144",
"description": "Split the docs.",
"id": "241940",
"lintStatus": "0",
"properties": {
"arc.staging": {
"refs": [],
"status": "repository.unconfigured"
},
"arc:onto": [
{
"kind": "upstream",
"name": "master",
"type": "branch"
}
],
"local:commits": {
"00c500b45ef2300c124b139cb5d79067094783ac": {
"author": "Nicolas Vasilache",
"authorEmail": "ntv@google.com",
"commit": "00c500b45ef2300c124b139cb5d79067094783ac",
"message": "[mlir][Linalg][doc] Add Design Document for the Linalg Dialect\n\nSummary: This revision adds a Rationale for the Linalg Dialect\n\nReviewers: rriddle, mehdi_amini, ftynse, albertcohen\n\nReviewed By: albertcohen\n\nSubscribers: merge_guards_bot, jfb, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, llvm-commits\n\nTags: #llvm\n\nDifferential Revision: https://reviews.llvm.org/D73595",
"parents": [
"a3e09cb098dd0a089365259f77a2bc02136da217"
],
"summary": "[mlir][Linalg][doc] Add Design Document for the Linalg Dialect",
"time": "1580656894",
"tree": "4d1ee17a660859e08be07ebea89abb234721728d"
}
}
},
"revisionID": "73595",
"sourceControlBaseRevision": "a3e09cb098dd0a089365259f77a2bc02136da217",
"sourceControlPath": null,
"sourceControlSystem": "git",
"unitStatus": "0"
}