FloPoCo user manual


Installation

Single-line installation on Ubuntu 10.04LTS to 11.10

Copy this in a dash/bash shell (and enter your password when prompted):

yes | sudo apt-get install g++ libgmp3-dev libmpfr-dev libxml2-dev bison libmpfi-dev flex cmake && wget http://perso.ens-lyon.fr/damien.stehle/downloads/libfplll-3.0.12.tar.gz && tar xzf libfplll-3.0.12.tar.gz && cd libfplll-3.0.12/ && ./configure && make -j2 && sudo make install && cd .. && wget https://gforge.inria.fr/frs/download.php/28248/sollya-2.9.tar.gz && tar xzf sollya-2.9.tar.gz && cd sollya-2.9/ && ./configure && make -j2 && sudo make install && cd .. && wget https://gforge.inria.fr/frs/download.php/29991/flopoco-2.3.0.tgz && gunzip < flopoco-2.3.0.tgz | tar xvf - && cd flopoco-2.3.0/ && cmake . && make -j2 && ./flopoco

Generic installation instructions

FloPoCo may be compiled either using CMake or the autotools. The recommended way is CMake, which is available for most Unix and Linux distributions and for Windows. If you prefer to use the autotools read the README.autotools file.

CMake is included in mainstream Linux/Unix distributions, and is available for other operating systems, including Windows.

FloPoCo also depends on the MPFR library, on the C++ interface to GMP (which may or may not be a dependency of MPFR) and on flex++, all of which are probably available in your favourite Linux/Unix distribution.

Optionally, you may want to link FloPoCo against Sollya. This enables more operators (HOTBM, FunctionEvaluator). For this purpose, you must download, compile and install Sollya. FloPoCo is demonstrated to work with version 2.0 of Sollya and should work with future releases.

Compilation is a two-step process:

cmake .

make

The adventurous may get FloPoCo from its subversion repository.


Command-line interface

FloPoCo is a command-line tool. The general syntax is

flopoco <options> <operator specification list>

FloPoCo will generate a single VHDL file (named by default flopoco.vhdl) containing synthesisable descriptions of all the operators listed in <operator specification list>, plus possibly sub-operators instanciated by them. To use these operators in your design, just add this generated file to your project.

FloPoCo will also issue a report with useful information about the generated operators, such as the pipeline depth. In addition, three levels of verbosity are available.

Examples

./flopoco IntConstMult 16 12345

produces a file flopoco.vhdl containing a single operator for the integer multiplication of an input 16-bit number by the constant 12345. The VHDL entity is named after the operator specification, here IntConstMult_16_12345.

./flopoco IntConstMult 16 12345 IntConstMult 16 54321

produces a file flopoco.vhdl containing two VHDL entities and their architectures, for the two given constant multipliers.

./flopoco FPConstMult 8 23 8 23 0 -50 1768559438007110

produces a file flopoco.vhdl containing two VHDL entities, one for the specified constant floating-point multiplier by 1768559438007110x2-50, and the other one for a needed sub-component (an integer multiplier for the significand multiplication).

Options

Several transversal options are available and will typically change the operators occuring after them in the list. For instance -frequency=300 sets the target frequency. The -name=UserProvidedName option replaces the (ugly and parameter-dependent) name generated by FloPoCo for the next operator with a user-provided one. This allows in particular to change parameters while keeping the same entity name, so that these changes are transparent to the rest of the project. Options related to pipelining are reviewed below.

The -target option selects the target FPGA family. We try to optimize for the highest speed grade available for this family (see below for pipelining options).

Built-in help

To obtain a concise list of the available operators and options, simply type

./flopoco

In addition this help may be more in sync with the code than this file, especially if you are using a svn snapshot.

Helper programs

The FloPoCo distributions also includes useful programs for converting the binary string of a floating-point number to human-readable form (bin2fp) and back (fp2bin). The longacc2fp utility converts the fixed-point output of the LongAcc operator (see below) to human-readable form.


Floating-point format

The floating-point format used in FloPoCo is identical to the one used in FPLibrary. It is inspired from the IEEE-754 standard.

An FP number is a bit vector consisting of 4 fields. From left to right:

A 2-bit exception field
00 for zero, 01 for normal numbers, 10 for infinities, and 11 for NaN
A sign bit
0 for positive, 1 for negative
An exponent field on wE bits
It is biased as in IEEE-754. The smallest possible FP numbers have exponent field 00...00, the FP number 1.0 has the exponent field 011...11 and the largest possible FP numbers have exponent 11...11
A fraction field on wF bits
The actual significand has an implicit leading 1, so the fraction field ff...ff represents the significand 1.ff...ff

The format is therefore parameterized by to positive integers wE and wF which define the sizes of the exponent and fraction fields respectively.

The utilities fp2bin and bin2fp will allow you to get familiar with the format and set up test benches.

Difference between FloPoCo format and IEEE-754 format

There are two main differences between the format (wE=8, wF=23) and the IEEE-754 single precision format (the same holds for double).

Note that anyway, FloPoCo provides conversion operators from and to IEEE-754 formats (single and double precision).


LNS format

Numbers in the Logarithm Number System used in FloPoCo have an encoding similar to the floating-point format. It is also the same as the one used in FPLibrary.

Its fields are:

A 2-bit exception field
Same encoding as floating-point: 00 for zero, 01 for the general case, 10 for infinities, and 11 for NaN
A sign bit
0 for positive, 1 for negative
The integral part of the exponent on wE bits
The fractional part of the exponent on wF bits
The fixed-point exponent is encoded in two's-complement.

Reasonable values are 4 to 8 for wE, and 8 to 20 for wF. Other values are still allowed, including negative wE. Use at your own risk.


Pipelining

An operator may be combinatorial, or pipelined. A combinatorial operator has pipeline depth 0. An operator of pipeline depth 1 is obtained by inserting one and only one register on any path from an input to an output. Hopefully, this divides the critical path delay by almost 2. An operator of pipeline depth 2 is obtained by inserting two register levels, etc.

It should be noted that, according to this definition, pipelined operators usually do not directly buffer neither their inputs nor their outputs. For instance, connecting the input of a 400MHz operator to the output of another 400MHz operator may well lead to a circuit working at 200MHz only. It is the responsibility of the user or calling program to insert one more level of registers between two FloPoCo operators. This convention may be felt as a burden to the user, but it is the most sensible choice. It makes it possible to assemble sub-component without inserting registers in many situations, thus reducing the latency of complex components. Besides, different application contexts may have different policies (registers on output, or registers on input).

Two command-line options control the pipelining of the FloPoCo operators that follow them.

-pipeline=[yes|no] (default yes)
-frequency=[frequency in MHz] (default 300)
Sets the target frequency. If the -pipeline option is set, then FloPoCo will try to pipeline the operator to the given frequency. It will report a warning if it fails -- or if frequency-directed pipelining is not yet implemented for this operator.
Requires the operators to be pipelined. If no, the operator will be combinatorial. If yes, registers may be inserted if needed to reach the target frequency.

The philosophy of FloPoCo's approach to pipelining is the following:

Note that not all operators support pipelining (utimately they all will). They are mentionned in the command-line help.


Available operators

Here is the list of operators that can be generated by FloPoCo. This list may not be fully up-to date... the code is the reference.

Useful building blocks for FP operators

LeftShifter wIn MaxShift
Left barrel shifter. It has two inputs, the data to shift and a shift value. The width of the latter is deduced from MaxShift, which is the maximum shift distance. This operator will be pipelined to match target frequency.
RightShifter wIn MaxShift
Same, but to the right.
LZOC wIn wOut
Leading Zero/One Counter.
LZOCShifterSticky wIn wOut computeSticky countType
Leading Zero/One Counter merged with a shifter. If computeSticky=0 the bits shifted out are discarded, if 1 they are ORed into a sticky bit. If countType=0, a leading zero counter is built, if 1 a leading one counter is built, if -1 the value to count is input from an extra input port.

Pipelined integer standard operators

Adders

IntAdder wIn
Integer adder. In modern VHDL, integer addition is expressed by a + and one usually needn't define an entity for it. However, this operator will be pipelined if the addition is too large to be performed at the target frequency.
MyIntAdder wIn optimizeType srl implementation bufferedInputs
IntAdder in manual mode. The option optimizeType=<0,1,2,3> where 0=LUT 1=REG 2=SLICE 3=LATENCY allows selecting the different optimization criteria. srl=<0,1> allows generating architectures optimized for the use of hardware shift registers. The architecture can also adapt if inputs of the adder are already buffered or not using the option bufferedInputs=<0,1>. Automatic design space exploration is performed by setting implementation=-1. Forcing architecture selection can be done by setting implementation=<0,1,2> where 0=Classical, 1=Alternative, 2=Short-Latency. Please check out this article for more details.
IntDualSub wIn opType
Integer adder/subtracter or dual subtracter, possibly pipelined. The operation type defines the functioning mode: if 1, compute X-Y and X+Y; if 0, compute X-Y and Y-X.

Multi-operand Adders

IntNAdder wIn N
Multi-operand integer adder using the hardware shift-registers (SRLs for Xilinx) available in current FPGAs.
IntCompressorTree wIn N
Multi-operand integer adder using compressor trees. High quality operator at the expense of synthesis time.

Multipliers

IntMultiplier wInX wInY signed ratio
The IntMultiplier operator now regroups in a transparent way multiple possible architectures. The signed parameter controls whether or not the inputs will be treated as 2's complement numbers for the signed case. The ratio parameter is a user knob for targeting a more DSP-oriented architecture (ratio=1), or an architecture where some smaller multiplications are casted in logic. A 0 value for the ratio will generate an DSP-free architecture.
UnsignedIntMultiplier wInX wInY
Generates a DSP-oriented multiplication architecture. Trades possible DSP underutilization for logic and latency
SignedIntMultiplier wInX wInY
Same as above but signed and no real support for Altera targets.
IntSquarer wInX wInY
Same for squaring. For large multiplications on some FPGAs, it saves DSP blocks (your mileage may vary).
IntKaratsuba wIn
Multiplier that saves multiplications by trading them for additions. The TwoWaySplitting, ThreeWaySplitting and FourWaySplitting Karatsuba-Ofman algorithms are implemented. See this article for more details.
IntTilingMultiplier wInX wInY ratio maxTimeInMinutes
Integer multiplier of two integers X and Y of sizes wInX and wInY using the tiling technique presented in this article The ratio is a number in [0,1] and selects between DSP dominant architectures (ratio closer to 1) and logic dominant architectures for ratio closer to 0. The algorithm uses an optimized backtracking approach in exploring the design space and could therefore take a long time to get an optimal solution. The option maxTimeInMinutes is used to restrict the maximum amount of time that the algorithm can search for a solution. A value of -1 for this parameter will find the best solution.
IntTruncMultiplier wInX wInY ratio error useLimits maxTimeInMinutes
Truncated integer multiplier of two integers X and Y of sizes wInX and wInY using the tiling technique and the specific truncation theory presented in this article error gives the maximal allowed error produced by this operator. Two different soft-multiplier expansion techniques can be selected using the use limits option but the recommended value is 1.

Classical floating-point operators

These operators are correctly rounded to the nearest, in a way compatible with IEEE-754, with the exception that subnormal numbers are flushed to zero.
FPMultiplier wE wF
A floating-point multiplier. The actual FloPoCo component supports different input and output sizes, but this is not available from the command line.
FPAdder wE wF
A floating-point adder with a new, more compact single-path architecture.
FPAdderDualPath wE wF
A previous FPAdder architecture. Trades a larger circuit size for a smaller latency.
FPAdder3Input wE wF
A brand new 3-operand floating-point adder
FPDiv wE wF
A floating-point divider.
FPSqrt wE wF
A floating-point square root using the classical digit-recurrence algorithm. This implementation returns a correctly rounded result, uses no DSP nor RAM blocks and has low LUT usage. However, the latency can be as high as wF for high frequencies, and the frequency is limited for large wF. An alternative is FPSqrtPoly below.
FPSqrtPoly wE wF CorrectlyRounded Degree
A floating-point square root using a polynomial approximation. It consumes DSP and RAM blocks, but can reach higher frequencies than FPSqrt, and has lower latency. CorrectlyRounded (0/1) is a boolean selecting the rounding mode. If set to 0, the operator will only be faithful (last-bit accurate, but not necessarily correctly rounded). This saves a lot of resources. Degree is an integer (typically between 2 and 5) that defines the degree of the polynomial approximation, and allows the user to trade-off between DSP usage, latency and memory consumption. Lower degree means lower latency and DSP count, but larger consumption of embedded memory.
FPSquarer wE wF
A floating-point squarer, using IntSquarer for the mantissa.

Floating-point pipelined datapath generation

Current FloPoCo release is proud to offer in early alpha version:
FPPipeline filename wE wF
The operator receives the filename containing an untimed, untyped description (Python-like) of the datapath to be generated, together with the global precision of the operators to be used. Below are some examples of input files:
/* Jacobi1D: */
j = (a0+a1+a2)*0.3333333333333333;
output j;
/* Horner: */
p = a0+x*(a2+x*a2);
output p;
/* 2D Norm: */
r = sqrt(sqr(x0-x1)+sqr(y0-y1));
output r;

The currently supported operators are: +,-,*,/, sqr(), sqrt(), exp(), log().

Long fixed-point accumulator, and derivatives

These operators are described in all the gory details in this article.
LongAcc wE_in wF_in MaxMSB_in LSB_acc MSB_acc
Long fixed-point accumulator. By tuning the MaxMSB_in, LSB_acc and MSB_acc parameters to a given application, it allows one to bring rounding error to a provably arbitrarily small level (and in some case to avoid any rounding), for a very small hardware cost compared to using a floating-point adder for accumulation.
DotProduct wE_in wF_X wF_Y MaxMSB_in LSB_acc MSB_acc
Dot product operator. It feeds a long accumulator with the unrounded result of a floating-point multiplier, thus removing rounding errors from the multiplication as well.
LongAcc2FP MaxMSB_in LSB_acc MSB_acc wE_out wF_out
Post-normalisation unit for LongAcc. It converts the output of a LongAcc or DotProduct (with the same parameters) into a floating-point number.

Constant multipliers

We provide two techniques for building a multiplier by a constant. One is the good old KCM technique described by Chapman in 1994. It builds an operator whose size is independent on the constant, but grows with the size of the input. It is very efficient for very small input bit sizes and arbitrary constants. The other one is based on shift-and-add graphs, and is described in all the gory details in this article. It will be more efficient for some constants. Some day we will be able to provide a uniform interface for these two families, in between you may want to try and synthesize both and pick up the best.

IntConstMult w c
Integer constant multiplier using the shift-and-add technique: w is input size, c is the constant.
IntIntKCM w c signedInput
Integer constant multiplier using KCM: w is input size, c is the constant.
FixRealKCM lsbIn msbIn signedInput lsbOut constant
Faithful multiplier of a fixed-point input by a real constant. The fixed-point format of the input is provided as two integers lsbIn and msbIn which give the weights of the least significand bit and the weight of the most significand bit. The input may be signed, or not. The constant is provided as a Sollya expression, e.g "log(2)". This operator is briefly described in III.A. of this article
FPConstMult wE_in wF_in wE_out wF_out cst_sgn cst_exp cst_int_sig
Floating-point constant multiplier using the shift-and-add approach. The constant is provided as sign, integral significand and integral exponent. Also described in this article.
FPConstMultParser wE_in wF_in wE_out wF_out wF_C constant_expr
Floating-point constant multiplier with a parser for the constant. This is basically the same as the previous, with a nicer interface: last argument is a Sollya expression between double quotes,e.g."exp(pi/2)".
FPRealKCM wE wF constantExpression
Same as the previous but using the KCM algorithm.
FPConstMultRational wE_in wF_in wE_out wF_out a b
Floating-point constant multiplier for a rational constant a/b. It uses the periodic representation of the constant to implement an optimal shift-and-add tree for it. This works well mostly for small a and b (scaled by whatever positive or negative power of two). This operator returns a correctly rounded result, therefore using it to multiply by 1/3 is bit-for-bit equivalent to using an IEEE-compliant division by 3.0. This operator is described in this article. For division by b, you should also try FPConstDiv and compare the two obtained operators.

Divider by a small integer constant

These operators are described in this article.
IntConstDiv w d alpha
Euclidean division of input of size w by the small odd integer d. Algorithm uses radix 2^alpha, alpha=-1 means a sensible default.
FPConstDiv wE wF d
Correctly rounded floating-point division by the small integer d. This works well for small, odd values of d.
FPConstDivExpert wE wF d e alpha
Correctly rounded floating-point division by d.2^e, where d is a small odd integer. alpha=-1 means a sensible default.

Floating-point elementary functions

FPExp wE wF
An exponential operator, where both inputs and outputs have wE bits exponent and wF bits significand.
FPLog wE wF TableInsize
A natural logarithm operator, where both inputs and outputs have wE bits exponent and wF bits significand. The third allows for performance tuning. In doubt, set it to 0, which will default to something sensible. Otherwise, it defines the input size of the tables used by the operator, and should be between 6 and 15. See this article (improved version in preparation).
FPPow wE wF
A floating-point pow function as described in C99 and IEEE 754-2008, where both inputs and outputs have wE bits exponent and wF bits significand. This function (including its exceptional case management) is fully compatible with the C99 standard, hence current default libms..
FPPowr wE wF
A floating-point powr function as described in IEEE 754-2008, where both inputs and outputs have wE bits exponent and wF bits significand. This function is a novelty of IEEE 754-2008, the difference with good old pow is that it is purely defined as powr(x,y)=e^(y*ln(x)), in particular in the definition of its exceptional cases. For instance, pow is defined for negative integer x, while powr returns NaN in such cases.

Conversion operators

Fix2FP LSB MSB wE wF
Convert a 2's complement fixed-point number in the bit range MSB...LSB (both included) into floating-point. Example: Fix2FP 0 31 8 23 converts an input integer into a single-precision number.
InputIEEE wEI wFI wEO wFO
Conversion from IEEE-754 formats. Use InputIEEE 8 23 wEO wFO to convert from single-precision (or binary32) format, or InputIEEE 11 52 wEO wFO to convert from double-precision (or binary64) format. You may convert to a larger internal format or to a narrower one. Conversions are always correctly rounded.
OurputIEEE wEI wFI wEO wFO
Conversion to IEEE-754 formats. Not implemented yet. Do not hesitate to ask for it.
FP2FP wEI wFI wEO wFO
Conversion from a FloPoCo format to another one. Not implemented yet. Do not hesitate to ask for it.

Applications and examples

Collision wE wF
A collision operator. This is mostly a case study for a compound operator. It is described in this article.

Fixed point function evaluators

These operators need libsollya! They will not be available otherwise.
FunctionEvaluator function wI wO degree
HOTBM function wI wO degree
Fixed-point implementation of a function.

FloPoCo provides two generic operators, HOTBM and FunctionEvaluator, for evaluating an arbitrary function in fixed point. They offer (almost) the same interface: the description of a function between quotes (like "sin(x)^2" ior "sqrt(1+x)"), input and output precisions (with a difference of interpretation for outputs), and a polynomial degree. The function is assumed to have its input on [0,1], if you need a function on a different domain, you need to scale the input, e.g. use "sin(pi/2*x)" for a sine between 0 and pi/2.

Both methods use piecewise polynomial approximation, the polynomial being "computed just right". The differences are the following.

As both methods use a polynomial approximation, they work well for functions which are regular enough. In mathematical terms, they should be defined and n-times continuously differentiable on [0,1]. The code is well tested for monotonic functions only. Do not hesitate to contact us for help on a given function.

For both HOTBM and FunctionEvaluator the input operand is interpreted as a positive fixed-point number, with the point before the leftmost bit. The function to be implemented is assumed to be well defined in [0;1[.

For HOTBM the output is a fixed-point number, where the first bit is the sign and the point is placed right after it. Note that the output is in fact wO+1 bits wide. For FunctionEvaluator, wO defines the weight of the least significant bit of the result, and the actual output size depends on the range of the function on [0,1]. Just try it.

Example:
flopoco HOTBM "sin(x*Pi/2)" 16 16 3
flopoco FunctionEvaluator "sin(x*Pi/2)" 32 32 4

Interface variations on HOTBM:
HOTBMFX func wE_in wF_in wE_out wF_out degree
HOTBMRange func wI wO degree xmin xmax scale
wE_in, wF_in, wE_out, wF_out are the width of the integral and fractional parts of the input and the output, respectively. [xmin, xmax] is the input domain of the function, and scale is a scaling factor to be applied to the output. HOTBMFX version allows to select arbitrary fixed-point representations for the input and output. Negative values are allowed. HOTBMRange uses HOTBM after mapping [xmin,xmax[ to [0,1[, then multiplies the output by the scaling factor scale.

Example:
flopoco HOTBMFX "log2(1+2^(-x))" 2 8 -1 8 1

Note that the HOTBM* operators perform an exploration process that typically takes a few minutes for 16 bits, and may take hours for 24 bits.

LNS

These operators compute in the Logarithmic Number System. They are mostly useful for low-precisions systems performing few additions and many multiplications, divisions or square roots.

LNSAddSub wE wF
LNS addition operator. Both operands and the output have wE integral bits and wF fractional bits in their exponents.
LNSMul wE wF
LNS multiplication operator. Both operands and the output have wE integral bits and wF fractional bits in their exponents.
LNSDiv wE wF
LNS division operator. Both operands and the output have wE integral bits and wF fractional bits in their exponents.
LNSSqrt wE wF
LNS square root operator. Both input and output have wE integral bits and wF fractional bits in their exponents.

The implementation of LNS addition/subtraction is based on HOTBM to compute sums, and uses cotransformation to evaluate differences. It is described in this article.

Test Benches

The TestBench and TestBenchfile operators generate a test bench for the operator which precedes it in the command line. The test vectors are generated from the specification of the operator (see the developer documentation of the Operator::emulate() method).

Test cases include both standard tests and random tests. The single parameter n specifies the number of random tests to generate. The pseudo-random number generator is initialised with n as the seed, so that the test bench will be deterministic for a given n.

We strongly advise that you test operators before using them, and we await your bug reports.

TestBench n
The test vectors are coded in the VHDL of the test bench, and can therefore be easily accessed and modified. In addition, they may include comments. However, this means that the compilation time of the test vectors is large, and this is not convenient beyond a few thousand tests.

Example:
flopoco HOTBM "sin(x*Pi/2)" 16 16 3 TestBench 1000

TestBenchFile n
TestBenchFile is similar to TestBench but moves the test vectors to a separate file called test.input. Thus the VHDL itself is very short, as is the compilation time. The simulation time is proportional to the number of tests. This scales to millions of test vectors, but is slightly less convenient in the debugging phase.

If n=-2, and exhaustive test is generated. If n=-1, no file is generated.

Example:
flopoco FPAdder 8 16 TestBenchFile 20000

Miscellanous

Wrapper
Produce a wrapper for the preceding operator: this operator simply adds registers before and after the preceding operator. It is useful in some cases, e.g. to get critical path information including the delay of the first and last stages connected to registers, not to I/O.