Fast inversion in GF(p^N), 4 times down to to 2 times slower than multiplication

Fast inversion in GF(p^N), 4 times down to to 2 times slower than multiplication

Post by mega.bit » Wed, 08 Oct 2008 02:02:06

ast inversion in GF(p^N), 4 times down to to 2 times slower than

The field GF(p^N) has a reduction polinomial of the form "x^N + P"
where P has exactly N digits modulo a subfield FFT_MOD, x^N is makes
it monic, and x is the p from the definition. "x^N + P" is preferred
to be irreducible.
Well as many try to invert in this field with complicated methods,
there's a simple Euclid-like method that does the job with the cost
similar to 4 multiplications and 1 subfield inversion.

In the Euclidean algorithm we start with A = D = 1 ; B = C = 0 ; X =
_X (the input number to invert) ; Y = x^N + P and we have these
A*_X + B*_Y = X
C*_X + D*_Y = Y
Note that we could chose W = highest_digit_val(Y) /
highest_digit_val(X) and reformulate :
A = A*W - C ; B = B*W - D ; X = X*W - Y (note X is loosing it's
highest digit)
But this method requires approx 2N subfield inversions and makes it
inefficient in small and practical situations.

If we chose: V = highest_digit_val(Y) ; W= highest_digit_val(X)
We'd use: A = A*V - C*W ; B = B*V - D*W ; X = X*V - Y*W ; (note X is
also loosing it's highest digit)
So we doubled the number of subfield multiplications and we need no
subfield division (quite efficient when N < lg(FFT_MOD) ).
This way we deduced a method to cut one digit at a time from a 2N set
of digits onwards computing the inverse. This would lead to a metod
for inversion using 4N^2 + O(N) simple subfield operations. The catch
is that we can improve it still, by cutting 2 or more digits at a
time, and as we'll see we'll get to a formula that inversion time I(N)
is close to " 2 * M(N) * (1 + 1/s) " where s is the number of digits
we cut at a time, when we deduct a method to perform it, M(N) is the
time of the multiplication, for this problem we're using M(N) = N^2, .

Let the polynomials to be reduced start with :
X = x3 x2 x1 x0 ...
Y = y3 y2 y1 y0 ...

step 1 : 1 digit cut (to enter we need x3 != 0 and y3 != 0)
X <-- X*y3 - Y*x3
X = 0 ; x2y3 - y2x3 = k1 ; x1y3 - y1x3 = k2 ; ...
Y = y3 ; y2 ; y1 ; ...
step 2 : 2 digits cut (to enter we need k1 != 0 and y3 != 0)
Y <-- Y*k1 - (X << 1)*y3
Y <-- Y * (k1 + (x3y3 << 1)) - X * (y3y3 << 1)
let P1 = k1 + (x3y3 << 1); P2 = y3y3 << 1
P1 has 2 non-null coefficients, P2 has 1 non-null coefficient
If we calculate the operations count of the inversion when using 2
digits cut, we'd get:
I(N) = 2N * 3N / 2 + O(N) (number of coefficient to reduce * cost
per 2 coeff cutoff / number of coeff cut) = 3N^2 + O(N)

Note that after step 2, we're back to the same problem but with X and
Y having different meanings through multiplication by P1 and P2, and 1
digit less for each, and P1 and P2 are polynomials. We could expand to
a method of cutting 4 points and reaching I(N) = 2.5N^2+ O(N).
The order of complexity is as follows:
s = 1 ; I(N) = 4N^2 + O(N)
s = 2 ; I(N) = 3N^2 + O(N)
s = 3 ; I(N) = 2.(6)N^2 + O(N)
s = 4 ; I(N) = 2.5N^2 + O(N)
s = 5 ; I(N) = 2.4N^2 + O(N)
s = N ; I(N) = much bigger than when s = 1 , so there must be a
breaking point depending on N and subfield bits, the calculations just
become too inefficient after some point.

Here's some sample C++ code for the s = 1 case:


1. Ubuntu Lurid crapware-rot sets in: "My Lucid bootup time has become slower after time."

2. ADO.NET Query Execution time is 10 times slower than via Query Analyzer


I have the stored procedure which uses temporary tables
and does some calculations.
It executes in 3 sec when I use Query Analyzer.
When I execute it using ADO.NET client it takes 30 sec.
I looked at execution time using Profiler so this is not some the .NET
Framework bottleneck.
SQL server executed the same stored procedure ten times slower.
I use SQL 2000 SP2.
Another interesting thing that I've found:
When I changed the stored procedure to use table variables (@tablename)
instead of temporary (#) tables, Query Analyzer execution time becomes one
and ADO.NET execution time remains the same (30 sec).

Does anyone know how to solve the problem, because my application is written
in C#
and I don't want my SP to execute for 30 seconds when only 3 sec is really

Any help is appreciated,


3. TV gets slower and slower over time (10 min) then unwatchable

4. Java is 7.5 times slower... when it's not melting down the kernel.

5. Ubuntu Lurid crapwa "boots slower and slower over time...embarrassed to show prospective Linux converts"

6. str.Template 4 times slower than calling replace multiple times

7. How to change the "N's" in proc report across time

8. windows 7 FAST configuration for HMAC time-based one-time password authentication

9. Windows Time Service not working - time running too fast.

10. [ntp:questions] Windows Time Service not working - time running too fast.

11. Time runs exactly three times too fast

12. Fast (real-time) time stretch code

13. Oracle JDBC Driver ps.setTimestamp() slows down query execution many times

14. Is time.time() < time.time() always true?

15. fts- contains expression: slow the first time fast the other time