The Eclipse OMR compiler technology is a collection of stable, high performance optimization and code generation technologies suitable for integration into dynamic and static language runtime environments.
It is not a standalone compiler that can be linked into another environment. Rather, it provides all the essential building blocks for integrating and adapting this advanced compiler technology for different language environments.
It originates from a mature, feature-rich compiler technology developed within IBM called Testarossa. This technology was developed from the outset to be used in highly dynamic environments such as Java, but has proven its adaptability in static compilers, trace-based compilation, and binary re-translators.
This technology is consumed as-is in some of IBM's high-performance runtimes such has the J9 Java virtual machine and used in production environments.
- high-level optimization technology featuring classic compiler optimizations, loop optimizations, control and data flow analyses, and support data structures
- code generation technology with deep platform exploitation for x86 (i386 and x86-64), Power, System Z, and ARM (32-bit)
- a robust, tree-based intermediate representation (or IL) and support code for producing IL from different method representations
- expressive tracing and logging infrastructure for problem determination
- JitBuilder technology to simplify the effort to integrate a JIT compiler into an existing language interpreter
- a framework for constructing language-agnostic unit tests for compiler technology
While this is a significant initial contribution of compiler technology, more is on the way, including:
- integration of conformance tests with the Eclipse OMR makefiles, also demonstrating a sample hookup of the technology
- lots more documentation, in source and in the
doc/compiler
directory describing the compiler technology architecture and its inner workings - further code refactoring and design consistency enhancements
- additional optimizations and code generation technology upon refactoring from within the IBM Testarossa code
The compiler technology is written largely in C++, but there is a handful of support functions written in C and assembler.
The structure of the codebase and the design of the class hierarchy reflects this technology's heritage and the requirement to adapt to a wide variety of compilation environments (or projects as they are often referred).
Many of the classes in Testarossa use a design pattern we call extensible classes. This is a pattern to achieve extension through composition and static polymorphism. Extensible classes are an efficient and useful means to extend and specialize the core technology provided by Eclipse OMR for a particular project and for a particular processor architecture. The extensible design is the reason for the shape of the class hierarchy, the layout of directories, and file naming.
The core compiler components are provided under the compiler/
top-level directory and are organized as follows:
Directory | Purpose |
---|---|
codegen/ | Code for transforming IL trees into machine instructions. This includes pseudo-instruction generation with virtual registers, local register assignment, binary encoding, and relocation processing. |
compile/ | Logic managing the compilation of a method. |
control/ | Generic logic to decide on when and how to compile a method. |
cs2/ | A legacy collection of utilities providing functionality such as container classes and lexical timers. The functions within this directory are deprecated and are actively being replaced with C++ STL equivalents or new implementations based on STL. |
env/ | Generic interface to the environment that is requesting the compilation. In most cases this is the interface to the VM or compiler frontend that is incorporating the Eclipse OMR compiler technology. For example, it can be used to answer questions about the VM configuration, object model, classes, floating point semantics, etc. |
il/ | Intermediate language definition and utilities. |
ilgen/ | Utilities to help with the generation of intermediate language from some external representation. |
infra/ | Support infrastructure. |
optimizer/ | High-level, IL tree-based optimizations and utilities. |
ras/ | Debug and servicability utilities, including tracing and logging. |
runtime/ | Post-compilation services available to compiled code at runtime. |
aarch64/ | AArch64 processor specializations |
arm/ | ARM processor specializations |
x/ | X86 processor specializations |
p/ | Power processor specializations |
z/ | System Z processor specializations |
Other resources can be found in the Eclipse OMR project as follows:
Directory | Purpose |
---|---|
jitbuilder/ | JitBuilder technology extending Eclipse OMR technology |
fvtest/compilertest | Unit tests for compiler technology |
doc/compiler | Additional documentation |
The OMR
, TestCompiler
, and JitBuilder
namespaces are used to isolate
compiler technology for those particular components. Processor architecture
specialized namespaces (e.g., X86
, Power
, Z
, and ARM
) can be
nested within them, and secondary nesting for sub-architectures (e.g., I386
,
AMD64
) is also permitted.
The TR
namespace (an abbreviation for Testarossa) should be used for all
components of the OMR compiler that are part of the public API for consuming
projects. The OMR compiler public API includes the concrete classes for the
extensible class representation (see extensible classes).
At present, the use of the TR
namespace for the public API is largely aspirational
as much of the code appears as it did when it was first contributed. The epic to
track the work to migrate components of the OMR compiler public API to the TR
namespace is issue #3519.
Throughout the current compiler code, you may encounter references that are in
the global namespace but whose identifiers are prefixed simply with TR_
.
This is inconsistent with the namespace convention just described and they are
being moved to the TR
namespace as part of #3519.
If you extend the Eclipse OMR compiler you should choose a unique namespace for your project that does not conflict with the compiler namespaces.
Throughout the codebase you may find code guarded with #ifdef XXX_PROJECT_SPECIFIC
directives. Project-specific macros are an artifact of the refactoring process that produced this code. The initial code contribution originated from a much larger codebase of compiler technology used in several compiler products and compilation scenarios. In order to contribute this code sooner we eliminated most, but not all, of code that has a tighter coupling to a particular environment. For example, guarded code may require data structures or header files not present in the initial contribution.
Generally there should not be a need to enable these macros. Our expecatation is that either that guarded code will be enabled over time as it is made more general purpose for other language environments, or it will be removed outright as that code is refactored as part of the Eclipse OMR project.
We recognize the presence of these macros is far from ideal, and IBM will be working to eliminate them over time so that the codebase is self-contained and fully testable. These macros should not be imitated in newer code commits except where absolutely necessary.