Computer arithmetic, mostly
Hardware and FPGA Arithmetic
My main current research is Application-specific Arithmetic and its implementation in the FloPoCo project.
FloPoCo is a generator of arithmetic cores for FPGA computing, the subject of Bogdan Pasca's PhD, then Matei Istoan's. Some cores are similar to what you find in your processor, but the main purpose of this project is to design radically new ones that exploit FPGA flexibility. This includes operators for the evaluation of elementary functions in floating-point, see for instance our flagship exponential function. In this context, we also revisited Kulisch's long accumulator, we did some work on decimal multiplication, on polynomial evaluation, constant multiplication and division, etc. FloPoCo is also a VHDL generator framework with unique features, such as the frequency-directed construction of correct-by-design pipelines, and automatic testbench generation.
The PhD of Yohann Uguen attempted to embed this expertise in High-Level Synthesis (HLS) tools. This article shows that it enables arithmetic optimizations out of reach of the FloPoCo approach.
An example of application-specific arithmetic is the evaluation of arbitrary fixed-point unary functions. I contributed to table and addition methods with Arnaud Tisserand, and more recently with Luc Forget and Maxime Christ. Jérémie Detrey developed techniques using small multipliers which he then used to design operators for the logarithm number system in the FPLibrary project (now superseded by FloPoCo). We still work with Orégane Desrentes to find new and better ways to evaluate elementary functions.
Since I joined CITI Lab I have also been working on cores for signal processing, for instance fix-point arctangent, and FIR and IIR filters computing just right. With Anastasia Volkova, Silviu Filip, and Martin Kumm, we recently formalized the relationship between arithmetic errors and filter design, which opens a world of well-founded optimizations.
I am also interested in designing new general purpose hardware operators, such as the mixed-precision fused multiply-and-add and correctly-rounded sum of products we proposed for the Kalray processor. With the PhD of Nicolas Brunie, we also investigated the benefits of a closely-coupled reconfigurable accelerator in the context of a massively multicore design. In the same line, the thesis of Andrea Bocco studied the hardware implementation of UNUM arithmetic.
As FPGAs may be viewed as fine-grained massively parallel computers, I am also interested in programming methodologies inherited from the parallel computing community, which was the subject of my thesis.
Software implementation of elementary functions with correct rounding.
This was the subject of the PhDs of David Defour and Christoph Lauter. This work was linked with the development of the Correctly Rounded Math Library (CRLibm), archived here. The techniques used have tracked processor technology: see this article, this one and this one for older methods, and this one for a recent technique with very low overhead. CRLibm could also be used to build "perfect" elementary functions for interval arithmetic.
Starting with Ch. Lauter's PhD, the focus has been on automation of libm code generation and on formal proof generation. Such software is now routinely being developed using such a "metalibm" approach.