## Fast inversion in GF(p^N), 4 times down to to 2 times slower than multiplication

### Fast inversion in GF(p^N), 4 times down to to 2 times slower than multiplication

ast inversion in GF(p^N), 4 times down to to 2 times slower than
multiplication

The field GF(p^N) has a reduction polinomial of the form "x^N + P"
where P has exactly N digits modulo a subfield FFT_MOD, x^N is makes
it monic, and x is the p from the definition. "x^N + P" is preferred
to be irreducible.
Well as many try to invert in this field with complicated methods,
there's a simple Euclid-like method that does the job with the cost
similar to 4 multiplications and 1 subfield inversion.

In the Euclidean algorithm we start with A = D = 1 ; B = C = 0 ; X =
_X (the input number to invert) ; Y = x^N + P and we have these
equations:
A*_X + B*_Y = X
C*_X + D*_Y = Y
Note that we could chose W = highest_digit_val(Y) /
highest_digit_val(X) and reformulate :
A = A*W - C ; B = B*W - D ; X = X*W - Y (note X is loosing it's
highest digit)
But this method requires approx 2N subfield inversions and makes it
inefficient in small and practical situations.

If we chose: V = highest_digit_val(Y) ; W= highest_digit_val(X)
We'd use: A = A*V - C*W ; B = B*V - D*W ; X = X*V - Y*W ; (note X is
also loosing it's highest digit)
So we doubled the number of subfield multiplications and we need no
subfield division (quite efficient when N < lg(FFT_MOD) ).
This way we deduced a method to cut one digit at a time from a 2N set
of digits onwards computing the inverse. This would lead to a metod
for inversion using 4N^2 + O(N) simple subfield operations. The catch
is that we can improve it still, by cutting 2 or more digits at a
time, and as we'll see we'll get to a formula that inversion time I(N)
is close to " 2 * M(N) * (1 + 1/s) " where s is the number of digits
we cut at a time, when we deduct a method to perform it, M(N) is the
time of the multiplication, for this problem we're using M(N) = N^2, .

X = x3 x2 x1 x0 ...
Y = y3 y2 y1 y0 ...

step 1 : 1 digit cut (to enter we need x3 != 0 and y3 != 0)
X <-- X*y3 - Y*x3
X = 0 ; x2y3 - y2x3 = k1 ; x1y3 - y1x3 = k2 ; ...
Y = y3 ; y2 ; y1 ; ...
step 2 : 2 digits cut (to enter we need k1 != 0 and y3 != 0)
Y <-- Y*k1 - (X << 1)*y3
expanding:
Y <-- Y * (k1 + (x3y3 << 1)) - X * (y3y3 << 1)
let P1 = k1 + (x3y3 << 1); P2 = y3y3 << 1
P1 has 2 non-null coefficients, P2 has 1 non-null coefficient
If we calculate the operations count of the inversion when using 2
digits cut, we'd get:
I(N) = 2N * 3N / 2 + O(N) (number of coefficient to reduce * cost
per 2 coeff cutoff / number of coeff cut) = 3N^2 + O(N)

Note that after step 2, we're back to the same problem but with X and
Y having different meanings through multiplication by P1 and P2, and 1
digit less for each, and P1 and P2 are polynomials. We could expand to
a method of cutting 4 points and reaching I(N) = 2.5N^2+ O(N).
The order of complexity is as follows:
s = 1 ; I(N) = 4N^2 + O(N)
s = 2 ; I(N) = 3N^2 + O(N)
s = 3 ; I(N) = 2.(6)N^2 + O(N)
s = 4 ; I(N) = 2.5N^2 + O(N)
s = 5 ; I(N) = 2.4N^2 + O(N)
...
s = N ; I(N) = much bigger than when s = 1 , so there must be a
breaking point depending on N and subfield bits, the calculations just
become too inefficient after some point.

Here's some sample C++ code for the s = 1 case:

//---------------------------------------------------------------------------
/* DEFINITIONS
//

Hi,

I have the stored procedure which uses temporary tables
and does some calculations.
It executes in 3 sec when I use Query Analyzer.
When I execute it using ADO.NET client it takes 30 sec.
I looked at execution time using Profiler so this is not some the .NET
Framework bottleneck.
SQL server executed the same stored procedure ten times slower.
I use SQL 2000 SP2.
Another interesting thing that I've found:
When I changed the stored procedure to use table variables (@tablename)
instead of temporary (#) tables, Query Analyzer execution time becomes one
minute(!!!)
and ADO.NET execution time remains the same (30 sec).

Does anyone know how to solve the problem, because my application is written
in C#
and I don't want my SP to execute for 30 seconds when only 3 sec is really
required.

Any help is appreciated,

Oleg