## Xilinx vs Altera

I have problems comparing gate number for FPGA implementation using Xilinx with implementation using Altera. They use different building blocks. Xilinx uses terms like slices LUT (Look Up Table),FF (Flip Flop) and LC (Logic Cell) while Altera uses LE (Logic Elements).

Here said that the logic cell to logic element ratio is 1.125:1, despite generally similar functionality. Therefore, divide Xilinx’s stated LC count by 1.125 to get the equivalent Altera LE count. More details here.

This link compares Xilinx and Altera FPGA logic comparison. And this is a guide on how to choose between those two.

This site mentions that:
Xilinx: 1 slices = 2 LUT + 2 FF + some more logic
Altera: 1 LE = 1 LUT + 1FF + some more logic
So 1 slice (xilinx) = 2 LE (altera).

## PB preference over ONB

In principle there are no restrictions on the kind of basis that is used (polynomial, normal, …). Although more work has to be done on this, we believe that a polynomial basis is most suited because a number of the advantages of (optimal) normal basis disappear when r > 1.

[Erik De Win, Antoon Bosselaers, Servaas Vandenberghe, Peter De Gersem, Joos Vandewalle, “A Fast Software Implementation for Arithmetic Operations in GF(2^n)“, Katholieke Universiteit Leuven, Belgium]

## Operations over GF(2^m): Comments and Conclusion

This book, page 231 (based on FPGA implementation) :

1. For modular multipliers, combinational circuits are too expensive in terms of area for big polynomials in cases that can’t be implemented in a single device. Sequential implementations need m (degree of f(x)) cycles to obtain a result and could be too slow. A trade-off can be obtained using a sequential circuit that computes G bits per cycle. Tables 7.5 and 7.6 show results for the 163- and 233-bits NIST-recommended polynomials.
2. Regarding squaring, combinational circuits are simpler and faster than the corresponding sequential circuits.
3. For exponentiation, the computation time depends on the number of ones in the exponent and the multiplication deter- mines the worst time. For faster exponentiation, multipli- cation such as in Sec. 7.7.5 should be used.
4. For division-inversion, the binary division can be used for in- version with good results. The MAIA inversion has the critical path in the computation of the degree of polynomials.
5. For multipliers with special irreducible polynomials (AOPs, trinomials, pentanomials), combinational circuits have the same area problems as combinational multipliers with general irreducible polynomials, but with a lower complexity (area, delay).

## Generating EC parameters

… is not as easy as generating random numbers.

P1363 Section 1.9.5 mention that

The most difficult part of generating EC parameters is finding a base point of prime order

So the next things to do is finding a random point in an elliptic curve (prime case A.11.1/binary case A.11.2), and use A.2.5 to find a square root modulo p and use A.2.1 to calculate modular exponentiation.

In the text book, algorithm for elliptic curve key pair generation is only 5 lines. But implementing one line requires many hours understanding P1363.

• #### Budi Rahardjo 5:29 am on October 30, 2009 Permalink | Reply

implementing one line requires many hours of understanding

no kidding. and then after that, many more hours of coding time.
i am in the middle of it right now.

• #### CG 10:20 am on October 30, 2009 Permalink | Reply

so true.

that’s why we really glad you joined in 😉

## Yes, it is implementable, but how?

Reading the third chapter of this book, I’m astonished that ECC (ECDH) is implementable on Chipcon CC1010 chip which consists of an 8-bit 8051 processor core with a built-in radio transceiver and a hardware DES engine. It containts 32 kb of flash memory for storing programs, 2048 bytes of SRAM external to the 8051 core (XRAM), and 128 bytes of internal SRAM (IRAM).

Now the question is, without using any additional extra hardware, how to build codes calculating those complex operation of ECC that fits those small memories???

[screaming in horror…]

• #### Budi Rahardjo 7:40 am on December 30, 2008 Permalink | Reply

Well, if he can do it, you can too.

A few years ago, I have a student porting Linux to a constraint device; 8-bit processor with 8kB (or was it 16kB?) RAM. It worked.

I am not saying that it is easy, but it can be done.

• #### CG 8:34 am on December 30, 2008 Permalink | Reply

@BR: do you have the sample of the source code of Linux ported to a constrained device? Is it in assembly?

• #### Budi Rahardjo 8:57 am on December 30, 2008 Permalink | Reply

Source code Linux kan terbuka, jadi bisa dilihat. Memang ada bagian (kecil) yang bentuknya assembly, tetapi sebagian besar tetap dalam C. Compilenya dilakukan di PC dengan resource yang besar (running Linux) dengan menggunakan gcc untuk cross compile.

• #### CG 9:04 am on December 30, 2008 Permalink | Reply

@BR: we definitely have to discuss more about this! and you’ve got to show me some stuff!

How much RAM do you need to implement ECC (ECDH)? Cramming the memory into 2048+128 of RAM is not really easy if you’re used to 2GB of RAM.

• #### CG 7:11 am on January 2, 2009 Permalink | Reply

@waskita: as small as possible. and i’m used to 1GB of RAM :((

## Software vs Hardware

Have just read this, this and this .

And make this simple table

Now reading this book and trying to figure out what issues in software implementation that worth to be the focus of my research and promising enough to explore [and defendable enough to convince my academic advisors 😉 ]

c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r