Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factorization cliff-edge? #29

Open
hvds opened this issue Apr 17, 2022 · 5 comments
Open

Factorization cliff-edge? #29

hvds opened this issue Apr 17, 2022 · 5 comments

Comments

@hvds
Copy link

hvds commented Apr 17, 2022

This may well be expected, but reporting it just in case - what's worst case time expected to be for factorizing something just under 100 digits?

I'm running my code [1] on a slightly harder case than before, involving slightly higher numbers (by a few digits). However it appears to have got stuck on one number for over 20h so far, when previous runs tended to stall for at most a few minutes.

I attached to the process to get the current stack trace, below. Assuming mpz_t is storing numbers in an obvious way, 64 bits per limb, I believe the number it is factorizing is: 363277160032028232403370760029415771263823736047870707805583581797268868441456882284309326109905669 (about 3.6e98).

(gdb) where
#0  next_prime_in_segment (p=5793758587, segment_bytes=24560, 
    segment_start=5793529590, 
    sieve=0x55d9ce5a4010 "\377\377\277\377\376_\347\362\376\375{T\177.\373\377\377\375\367\376\377\263\037\333\177\377\333\257\347{\365\311\376\373\377\377\363\337\357\367\371gow\377\337\366\377\377\377\225\277\377\377\333\372\357w\277\376\237j\377\377\334\363\377\377\357\277\373\377\357\376\327\337w}\177\363\357\377\357\377\377\376\337\377\327\375\272\377\376\377\177']\367~\377\367\355\337\177\317\177\365\377\377\377\376w\347\337\351\376}\365\377\377\371\377\337\336\343\376\377\275\377m\377\337\067\377\277\377?\377\376\373\357{\277\376\357\371\372\373\276\277\177\317\377\177\377\177\377\373\337\177\276?\375\343\377\377\377\223\356\337\257{g\345\337\177\377\373\375\377\366\377\374\277\347\376\376\376\377\367\267\301\377\335z\337\355\177\377{"...) at prime_iterator.c:40
#1  next_prime_in_segment (p=5793758587, segment_bytes=24560, 
    segment_start=5793529590, 
    sieve=0x55d9ce5a4010 "\377\377\277\377\376_\347\362\376\375{T\177.\373\377\377\375\367\376\377\263\037\333\177\377\333\257\347{\365\311\376\373\377\377\363\337\357\367\371gow\377\337\366\377\377\377\225\277\377\377\333\372\357w\277\376\237j\377\377\334\363\377\377\357\277\373\377\357\376\327\337w}\177\363\357\377\357\377\377\376\337\377\327\375\272\377\376\377\177']\367~\377\367\355\337\177\317\177\365\377\377\377\376w\347\337\351\376}\365\377\377\371\377\337\336\343\376\377\275\377m\377\337\067\377\277\377?\377\376\373\357{\277\376\357\371\372\373\276\277\177\317\377\177\377\177\377\373\337\177\276?\375\343\377\377\377\223\356\337\257{g\345\337\177\377\373\375\377\366\377\374\277\347\376\376\376\377\367\267\301\377\335z\337\355\177\377{"...) at prime_iterator.c:28
#2  prime_iterator_next (iter=iter@entry=0x7ffe5805e170)
    at prime_iterator.c:437
#3  0x00007fadbd940e9d in ec_stage2 (f=0x7ffe5805e2d0, z=0x7ffe5805e1c0, 
    x=0x7ffe5805e1b0, B2=8192000000, B1=81920000) at ecm.c:535
#4  _GMP_ecm_factor_projective (n=n@entry=0x7ffe5805e2e0, 
    f=f@entry=0x7ffe5805e2d0, B1=B1@entry=81920000, B2=8192000000, B2@entry=0, 
    ncurves=ncurves@entry=100) at ecm.c:667
#5  0x00007fadbd93b3a6 in factor (input_n=input_n@entry=0x7ffe5805efe0, 
    pfactors=pfactors@entry=0x7ffe5805ef70, 
    pexponents=pexponents@entry=0x7ffe5805ef78) at factor.c:303
#6  0x00007fadbd93d1a2 in is_semiprime (n=0x7ffe5805efe0) at factor.c:821
#7  0x00007fadbd96a2b7 in XS_Math__Prime__Util__GMP_liouville (
    cv=<optimized out>) at XS.xs:768
...
(gdb) up 6
(gdb) p n
$1 = (__mpz_struct *) 0x7ffe5805efe0
(gdb) p *n
$2 = {_mp_alloc = 7, _mp_size = 6, _mp_d = 0x55d9cd2976c0}
(gdb) p n->_mp_d[0]@6
$3 = {14963268348004307717, 17357550526725585807, 9582347205594536685, 
  15962625503134170351, 1376293721990814124, 170}
(gdb) p /x n->_mp_d[0]@6
$4 = {0xcfa83548a1a49305, 0xf0e2683d645f7b8f, 0x84fb560397eb5aed, 
  0xdd86a3524ce73cef, 0x131993d925dcd9ac, 0xaa}

[1] Using the injecting branch, I'm running ./oul -x32e104 -g1000000 92 6 with perl-5.34.0, MPU-0.73, MPUG-0.52, configured with:

./Configure -des -Dprefix=/opt/maths-5.34.0 -Dcc=gcc-9 -Doptimize="-g -O6" -Duse64bitall -Uversiononly
@hvds
Copy link
Author

hvds commented Apr 18, 2022

The machine crashed 3 hours later, so I don't get to discover if it would have completed. :(

I'll focus on other inputs that use smaller numbers for now. If there is no bug here - if 24+ hours is a realistic time to expect for these factorizations - I'll need to find a different approach to have a chance at finding those larger values.

@danaj
Copy link
Owner

danaj commented Jun 26, 2023

Unfortunately this module doesn't have state of the art factoring. The ECM is pretty good, but tuned more for smaller numbers than GMP-ECM. ECM really struggles past 60 or so digits unless there is a small factor. The Quadratic Sieve is great to have but certainly isn't close to state of the art (the code is over 15 years old now). An improved version could perhaps work on this size (I mean, either SIMPQS 2.0 or new code that might practically make it into this module -- yafu's QS definitely works at this size).

We only run our QS for numbers between 30 digits and 300 bits (90-ish digits). Running this number I get lots of small attempts then it goes into the loop doing obnoxious amounts of ECM. ECM for a semiprime this size has a miniscule chance of succeeding (this is quite a bit larger than the ECM record, so I would basically give it zero chance), but we're really out of options in this module and we still aren't sure if maybe there is a small-ish factor (20-30 digits) that will let us succeed.

$ tmmpu 'prime_set_config(verbose=>3); say for factor("363277160032028232403370760029415771263823736047870707805583581797268868441456882284309326109905669");'
# p-1 trying 363277160032028232403370760029415771263823736047870707805583581797268868441456882284309326109905669 (B1=20000 B2=200000)
# p-1: no factor
# ecm trying 363277160032028232403370760029415771263823736047870707805583581797268868441456882284309326109905669 (B1=200 B2=20000 ncurves=4)
# ecm: no factor
<.......>
starting large ECM on 363277160032028232403370760029415771263823736047870707805583581797268868441456882284309326109905669
# ecm trying 363277160032028232403370760029415771263823736047870707805583581797268868441456882284309326109905669 (B1=1280000 B2=128000000 ncurves=100)
# ecm: no factor
<......>

I'd recommend YAFU for these sorts of numbers. With a bit of infrastructure it can be called from Perl.

@danaj
Copy link
Owner

danaj commented Jun 26, 2023

I tested with Pari/GP 2.14.0. It successfully factored after 5 hours 48 minutes.

@danaj
Copy link
Owner

danaj commented Jun 26, 2023

SIMPQS 2.0 factored successfully in 9.5 hours.

@hvds
Copy link
Author

hvds commented Apr 4, 2024

FWIW I tried this today with the code from #48, after applying the patch below as a naive extrapolation of appropriate configuration parameters. This was on a loaded machine (load average 6):

...
56545 full, 62305 combined, 796640 partial
11469876 curves, 796640 partials.
118850 relations found in total!
reduced to 103983 x 104047 in 4 passes
lanczos halted after 1646 iterations
63 nullspace vectors found.
  32177148682644876313598264901319792229056727
  11289911471489891284233137775351271575289506866245788547
25785.92user 0.44system 7:09:45elapsed 100%CPU (0avgtext+0avgdata 597528maxresident)k
0inputs+0outputs (0major+183517minor)pagefaults 0swaps

.. which I think would probably beat your time for PARI had the machine been unloaded. So (depending on the difference between our computers) I believe a less naive extension of the config values would likely support extending the range of sizes that factor() targets for QS.

--- a/simpqs.c
+++ b/simpqs.c
@@ -2696,14 +2696,15 @@ int _GMP_simpqs(mpz_t n, mpz_t *farray) {
      *   largeprime >= 1000 * 500000 (not 1/10 of that)
      *   errorbits >= 33
      */
-    numPrimes = 64000;
+    int excess = decdigits - 91;
+    numPrimes = 80000 + 4000 * excess;
     Mdiv2 = 192000 / SIEVEDIV;
-    largeprime = numPrimes * 10 * decdigits;
+    largeprime = 1000 * (500000 + 50000 * excess);
     secondprime = SECONDPRIME;
     midprime = MIDPRIME;
-    firstprime = 30;
-    errorbits = decdigits / 4 + 2;
-    threshold = 43 + (7 * decdigits) / 10;
+    firstprime = 30 + excess / 5;
+    errorbits = 34 + excess / 5;
+    threshold = 102 + excess;
   }
 
   if (verbose > 2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants