Paper 27
Paper 27
Paper 27
Ricardo Nobre
1 Introduction
1
Just-in-time compilation is a method that compiles code at run-time in order to
improve the runtime performance of computer programs.
2
Strategic programming is “a generic programming idiom for processing compound
data such as terms or object structures”, relying on the “separation of two concerns:
basic data-processing computations vs. traversal schemes”. Traversals are composed
by passing the former as arguments to the latter [4].
3
Tom is a language extension that provides pattern matching facilities for manipu-
lating tree structures and XML documents [5,6].
2 Compilation Infrastructure
Front end
MATLAB XML IR
MAT2TIR
(JavaCC + Tom)
Type/Shape
Specification
TIR2MAT TIR2C
(Tom) Optimized
MATLAB Code
Table 1. (a) MATLAB code, (b) XML IR for the factorial function and C code gen-
erated from MATLAB with the following type and shape information input to the
compiler: n:int32:1x1.
A tir2x form component then takes the Tom IR as input and, by combining
it with the type/shape specifications written by the user, applies specific IR-
based transformations or code generation steps. We have developed two backend
applications, namely tir2mat and tir2C.
The MATLAB Generator backend (tir2mat ) is used for validation purposes
and generates MATLAB code from the IR. This MATLAB code is helpful as it
gives the user the possibility to compare results obtained by using transformed
and/or specialized code with the original code. This can be a way to evaluate
and analyze accuracies in the case of type and word-length transformations, and
to trace and debug in the case of inserting code for monitoring.
The C Generator backend (tir2C ) allows us to generate C code from the
MATLAB input code with the possibility to use the complementary information
added by the user.
The transformations in the Tom IR use Tom [5,6] capabilities to manipu-
late tree structures. Pattern matching is used to find specific patterns of data-
structures in the Tom IR where transformations are applied. The use of Tom
allows us to achieve a flexible compiler infrastructure as transformations rules
are described using Tom specifications.
Regarding the back-end, and when translating typeless/shapeless languages
such as MATLAB to C, a key step is the definition of specific types and in
particular array shapes. To this extent we have implemented a limited form
of data-flow analysis [9] as well as an external user-provided mechanism for
static determination of types and shapes of variables as described next. The idea
is that just declaring information about the argument variables of a function,
the external knowledge gained by the inference engine is sufficient to statically
determine the shapes and types of all, or at least almost all variables.
Pipe
+_engines: List<Engine>
+execute(program:Program)
1
1..* Program
Engine -_callables: Map<String, Callable>
-_actionSelector: ActionSelector -_callGraph: Map<String, Node>
-_mainCallable: String
+Engine()
+execute(program:Program) +Program(mainCallable:String)
+createCallGraph(): void
1
1..*
Callable
-_iR: IRAbstractType
+_symbolTable: SymbolTable
+_code: String[]
+_name: String
+Callable(iR:IRAbstractType,name:String)
+isRecursiveQ(): boolean
+dimensionsKnownQ(lang:int): boolean
+typesKnownQ(lang:int): boolean
The main class of the tir2c backend, Tir2C, instantiates a single pipeline
(Pipe class) and attaches engines (extend Engine) in the order they should exe-
cute. Engines can manipulate the IRs representing the input MATLAB files, can
change the internal state of the compiler (e.g. symbol tables, generated code), or
both. For instance, the C generator engine, CGeneratorEngine class, is respon-
sible for generating the C code from the imported IR, using the type and shape
information about variables inferred by the inference engine (InferenceEngine-
Filter class) with information directly specified in external specification files, in
case this files exist and contain type or shape declarations. The script inliner
engine (ScriptInlinerEngine) is an example of an engine that inlines IRs for
multiple MATLAB script files into a single IR.
The pipeline is executed after the call graph creation method, which returns
an instance of the Program class holding information about all M-Files, each
represented by an instance of the Callable class. A Callable object holds all
compiler internal variables, symbol tables and IR structure representing an M-
File. When executing (execute() method) the pipeline the Program object is
passed as argument, and then each engine attached to the pipeline is executed
(execute() method) with it passed as argument. Engines have access to all the
internal variables (e.g. IRs and symbol tables) of the compiler.
2.3 C Generation
3 Experimental Results
We carried out a series of experiments to evaluate our MATLAB compiler and the
impact of the type and shape information on the performance of the generated
C code.
In our experimental evaluation we used a set of 11 code kernels originally
written in MATLAB. We used our compiler to automatically derive C codes.
These kernels are drawn from simple arithmetic operations in some cases us-
ing linear algebra operations such as dense matrix-vector multiplications, from
digital signal processing applications, and from an industrial application.
One of the set of kernels used is an industrial code delivered by Honeywell for
the REFLECT project [7,22]. The code is related to a Three-dimensional (3D)
Path Planning algorithm. This kernel plans a 3D path between the current posi-
tion of an autonomous vehicle and a predetermined goal position. In this paper
we consider the most computationally intensive task of the 3D path planning
algorithm: the gridIterate function. This function consists of 4 nested for type
loops and uses 3D arrays. In the experiments we consider a partial map of the
environment of size 32 ∗ 64 ∗ 16.
We then compared the execution time of each sample code using different
type and shape declaration files. To compare the performance of the base types
the programmer can specify, we carried out experiments where the different base
types corresponding to distinct precision requirements were used. These include
the specification of double (default), float, and fixed-point precision.
Preliminary tests comparing the number of cycles needed to run executables
generated with Mathworks MCC tool6 and the C code generated by our com-
pilation infrastructure shown that, as expected, the compiled C code is much
faster than Mathworks JIT approach (between 2 and 4 times faster). These tests
were performed on an x86 microprocessor (in our case the Intel Core 2 Duo),
not at all a processor for embedded systems, as it is one of the few platforms
compatible with executables generated by MCC (execution requires run-time
compiled libraries). Therefore MCC can not be considered as an alternative to
our compiler for use with embedded systems.
3.1 Methodology
Experiments were conducted using the SimpleScalar/ARM simulator [10,11] to
execute the object output by gcc7 with the -O3 flag from the C code generated
by our compiler from MATLAB input files. The simulator models Intel Stron-
gARM SA-11xx microprocessors, has been validated against a large collection of
workloads, and is believed to be within 4% of real hardware for performance es-
timation [11]. The StrongARM family of microprocessors implements the ARM
V4 instruction set architecture. This simulator reports the cycle count with high
accuracy.
Note that we only declared the type and shape of the parameters of each
function in the M-files, thus asserting that for all call-sites the same information
holds. The information about all other variables is inferred at compilation time
by evaluating the expressions where the input parameters are present. Further-
more, for these examples, we relied on type inference to determine the type and
shape of all other local variables. This is clearly not the most flexible situation
and we will continue to explore scenarios where type/shape specialization makes
sense. In addition to the impact of shape information we also tested the perfor-
mance improvement gained by using single precision floating point rather than
the standard double precision.
6
The MATLAB compiler “prepares MATLAB file(s) for deployment outside of the
MATLAB environment, generates wrapper files in C or C++, optionally builds stan-
dalone binary files, and writes any resulting files into the current folder, by default.”
[23]
7
The GNU Compiler Collection [24]
3.2 Results and Discussion
We show in Fig. 4 the results obtained when using the SimpleScalar/ARM sim-
ulator.
The results consider the execution time improvements between float vs. dou-
ble, fixed vs. double, and fixed vs. float implementations. The last group of
columns (labeled “average”) corresponds to the average speedup for our set of
code kernels.
4 Related Work
5 Conclusion
Acknowledgment
This research has been partially funded by the Portuguese Science Founda-
tion Fundao para a Ciência e Tecnologia (FCT) under research grant PT-
DC/EIA/70271/2006.
The author also acknowledges the support of the FP7 EU-funded project
REFLECT for the access to MATLAB codes related to industrial applications.
The author acknowledges support from João M. P. Cardoso, affiliated with
Faculdade de Engenharia da Universidade do Porto (FEUP) and Pedro Diniz,
affiliated with Instituto de Engenharia de Sistemas e Computadores, Investigação
e Desenvolvimento em Lisboa (INESC-ID.
References
1. Ricardo Nobre, Joo M. P. Cardoso, Pedro C. Diniz ’Leveraging Type Knowledge
for Efficient MATLAB to C Translation’, paper presented at CPC2010, Vienna
University of Technology, Vienna, Austria, July 9, 2010.
2. MATLAB the Language of Technical Computing, http://www.mathworks.com/
products/matlab.
3. Richard Goering, ”Matlab edges closer to electronic design automation
world” EE Times, 10/04/2004 http://eetimes.com/electronics-news/4050334/
Matlab-edges-closer-to-electronic-design-automation-world
4. R. Lämmel, E. Visser, and J. Visser, Strategic programming meets adaptive pro-
gramming, In Proc. 2nd Intl Conf. on Aspect-Oriented Software Development
(AOSD’03), Boston, Mass., March 17 - 21, 2003. ACM, New York, NY, USA, pp.
168-177.
5. Jean-Christophe Bach et al.: Tom Manual - Version 2.7, http://tom.loria.fr
6. Emilie Balland, Paul Brauner, Radu Kopetz, Pierre-Etienne Moreau, and Antoine
Reilles, Tom: Piggybacking rewriting on java, In 18th International Conference
on Term Rewriting and Applications (RTA07), Paris, France, June 26-28, 2007,
Springer LNCS 4533, pp. 36-47.
7. João M. P. Cardoso et al., REFLECT: Rendering FPGAs to Multi-Core Embedded
Computing, Book Chapter, Reconfigurable Computing, Springer.
8. The JavaCC Home, https://javacc.dev.java.net/
9. A. Aho, J. Ullman, M. Lam and R. Sethi, Compilers: Principles, Techniques and
Tools, Addison Wesley, 2006.
10. T. Austin, E. Larson, and D. Ernst, SimpleScalar: An infrastructure for computer
system modeling, Computer, 35(2), 2002, pp. 5967.
11. SimpleScalar Version 4.0, Test Releases, http://www.simplescalar.com/v4test.
html
12. De Rose, L. and Padua, D. Techniques for the Translation of MATLAB programs
into Fortran 90. ACM Trans. Program. Lang. Syst. 21, 2 (Mar. 1999), pp. 28623.
13. Joisha, P. G. and Banerjee, P. 2007. A translator system for the MATLAB lan-
guage: Research Articles. Softw. Pract. Exper. 37, 5 (Apr. 2007), pp. 535-578.
14. G. Almsi , D. Padua, MaJIC: Compiling MATLAB for Speed and Responsiveness,
In Proc. of the ACM 2002 Conf. on Programming Language Design and Implemen-
tation (PLDI’02), June 17-19, 2002, Berlin, Germany.
15. Amina Aslam, and Laurie Hendren, McFLAT: A Profile-based Framework for Loop
Analysis and Transformations, in the 23rd International Workshop on Languages
and Compilers for Parallel Computing (LCPC2010), October 7 - 9, 2010, Rice Uni-
versity, Houston, Texas, USA.
16. A. Navak, M. Haldar, A. Choudhary and P. Banerjee, Parallelization of MATLAB
Applications for a Multi-FPGA System, in Proc. of the 9th Annual IEEE Symp.
on Field-Programmable Custom Computing Machines (FCCM’01), Rohnert Park,
Calif., May, 2001.
17. P. Banerjee, D. Bagchi, M. Haldar, A. Nayak, V. Kim, R. Uribe, Automatic Con-
version of Floating Point MATLAB Programs, in proc. of the 11th Annual IEEE
Symp. on Field-Programmable Custom Computing Machines (FCCM03), Napa,
Calif., 2003.
18. K. Olmos, E: Visser, Turning dynamic typing into static typing by program special-
ization in a compiler front-end for Octave, in Proceedings Third IEEE International
Workshop on Source Code Analysis and Manipulation (SCAM03), 26-27 Sept. 2003,
pp. 141- 150.
19. Real-Time Workshop: Generate C code from Simulink models and MATLAB code,
http://www.mathworks.com/products/rtw/
20. Houman Zarrinkoub, and Grant Martin, Best Practices for a MATLAB to C Work-
flow Using Real-Time Workshop, MATLAB Digest - November 2009, http://www.
mathworks.com/company/newsletters/digest/2009/nov/matlab-to-c.html
21. Scilab 2 C - Translate Scilab code into C code, http://forge.scilab.org/index.
php/p/scilab2c/
22. REFLECT, FP7 EU Project: http://www.reflect-project.eu.
23. MATLAB R2011b Documentation, MATLAB Compiler. http://www.mathworks.
com/help/toolbox/compiler/mcc.html
24. GCC, the GNU Compiler Collection. http://gcc.gnu.org/