# -*- coding: utf-8 -*-
"""identify.ipynb

Automatically generated by Colaboratory.

Original file is located at
    [REDACTED]

# Initialization
Run the code below before you start.
"""

#! pip install git+https://github.com/RamanujanMachine/LIReC.git
from LIReC.db.access import db
def print_all(results): # just for the sake of showcasing
    for r in results:
        print(r)
    if not results:
        print('nothing!')

"""# Numeric Identification
We have our own tool for high precision numeric identification, named `db.identify(...)`. This notebook is a showcase of its functionality:

## Rational testing
"""

print_all(db.identify(['0.66666666666666'])) # Should be obvious...
print_all(db.identify(['0.36028659160696'])) # ...but can you guess this one?

"""The rational approximation obtained comes with a precision, which is the number in parentheses. Denoting this number as `prec`, it tells you that the error between your input and the given rational approximation is less than `10 ** -prec`. In other words, it is the `prec`-th digit after the decimal point where the rational approximation is estimated to deviate from your input.

## Do not use `float`s!
"""

print_all(db.identify(['0.36028659160696008'])) # This will still work...
print_all(db.identify([0.36028659160696008])) # ...but why is the precision lower here?
print_all(db.identify([0.3602865916069600818833162743091095])) # ...and why is it the same here??? these digits are correct!
print_all(db.identify([352/977])) # Hold on... this gives the same precision too

"""Naively typing decimal values in Python, or using the division operator on `int`s, gives you the `float` type. It  cannot handle  more than 15 digits of precision, and as such is unsuitable for high precision applications such as `identify`. This can even "pollute" high-precision calculations, so you should be careful of these "bad floats".

There are many ways to deal with this, the easiest being inputting numbers as `str`s (as we have so far) which don't do any rounding. Similarly, if you're using [`mpmath`](https://mpmath.org/doc/current)'s `mpf` type, or Python's built-in [`decimal.Decimal`](https://docs.python.org/3/library/decimal.html), they will work in `identify` too (assuming they haven't been "polluted", otherwise no promises). Really, any arbitrary-precision numeric type that can faithfully stringify itself should work:
"""

import mpmath as mp
mp.mp.dps = 50 # reminder: anything higher than your number's precision will work here
print_all(db.identify([mp.mpf('0.3602865916069600818833162743091095')])) # This will work now!

from decimal import Decimal, getcontext
getcontext().prec = 50 # same as before
print_all(db.identify([Decimal('0.3602865916069600818833162743091095')])) # This will work too.

"""## Multiple values and invented names"""

print_all(db.identify(['3.35998345818749438788793', '0.60178868910874809225601'])) # Do you think these are related?
print_all(db.identify(['0.60178868910874809225601', '3.35998345818749438788793'])) # switcheroo

"""When connecting multiple values, it is no longer sensible to implicitly refer to any of them. However, explicitly adding them to the printed expression becomes unwieldy fast, so `identify` instead invents names for the constants it is given. In this case it is given 2 values, so it calls the first `c0` and the second `c1`. We will cover giving custom names to values later.

## Manual `isolate`
`identify` always isolates the first value as a function of the others by default. This can be controlled, however:
"""

v = ['3.35998345818749438788793', '0.60178868910874809225601']
print_all(db.identify(v, isolate=0)) # nothing changed
print_all(db.identify(v, isolate=1)) # i'm the captain now

print_all(db.identify(v, isolate=False)) # can be disabled too
print_all(db.identify(v, isolate=2)) # oops (same result)

print_all(db.identify(v, isolate='c0')) # also works
print_all(db.identify(v, isolate='c1')) # also works

from sympy import symbols # accepts sympy.Symbol as well
c0, c1 = symbols('c0, c1')
print_all(db.identify(v, isolate=c0))
print_all(db.identify(v, isolate=c1))

for r in db.identify(v):
    r.isolate = 1 # can be changed post-hoc as well, accepts the same stuff as identify's isolate argument
    print(r)

"""## Named constants
Some constants in the world of mathematics have been given their own name, usually for having some importance. We provide a built-in way to test numerical relations with these constants:
"""

print_all(db.identify(['pi', '0.97246132412085677654851153011507784604'])) # first time might take a while...
print_all(db.identify(['Zeta2', '3.3116007335148931031390818333126918558']))

print(db.names) # for a full list of supported names
print(db.describe('alpha_M')) # if you'd like to know a little more about a specific constant, such as...

"""## Wide search
You don't always know which constant(s) is/are related to your value, and in that case you can turn on `wide_search`. You can do so either by setting it to `True`, which gradually increases the amount of named constants which will be tested, or by setting it to a list of integers which will limit the search to only that many constants at a time.

**Warning**: Be careful setting `wide_search=True`, since in case your value is not related to any known constant, the search will not terminate until the entire power set of constants is exhausted, so only use it manually and Ctrl+C if you must.
"""

print_all(db.identify(['0.97246132412085677654851153011507784604'], wide_search=[1, 2])) # maybe related to more? spoiler alert no
print_all(db.identify(['3.3116007335148931031390818333126918558'], wide_search=True)) # not recommended, but it's an option

"""## Return on Investment
That last warning begets an important question: What counts as "not related"? Well, informally, the greater the range of integers you allow to participate in an integer relation (such as those that `identify` finds), but with one missing constant, then the more values you can "correctly" place in that missing constant. We discuss this in greater detail in [this paper](https://arxiv.org/abs/2308.11829) (specifically the section *Identifying over-fitted formulas discovered by PSLQ*).

`identify` answers this question by defining the **Return on Investment** of a relation, which is the result of dividing the precision of the least precise constant in that relation by the total number of digits of the integers in the relation. This value, by default, is required to be at least `2` to be considered significant by `identify`, but it can be changed by setting `min_roi`. That is, increasing `min_roi` causes `identify` to be more aggressive with how it filters relations, and decreasing it makes `identify` more "loose".

**Warning**:
- Do not set `min_roi` to less than `2` when enabling the wide search, as this is the mechanism that allows the wide search to figure out which constants are best related to your values.
- Even without the wide search, do not set `min_roi` to less than `2` unless you are absolutely sure of the relation(s) you are supposed to get.
"""

print_all(db.identify(['3.35998345818749438788793', '0.60178868910874809225601'])) # remember me?
print_all(db.identify(['3.359983458187494387', '0.601788689108748092'])) # well turns out you can't reduce the digits here much...
print_all(db.identify(['3.359983458187494387', '0.601788689108748092'], min_roi=1.5)) # ...unless you reduce min_roi

"""## Manual precision
If you aren't fully confident in the digits of the numbers you're feeding into `identify`, you can manually set `min_prec`:
"""

print_all(db.identify(['3.35998345818749438788793123456789', '0.60178868910874809225601987654321'])) # oops i accidentally garbage
print_all(db.identify(['3.35998345818749438788793123456789', '0.60178868910874809225601987654321'], min_prec=24)) # now this will work again!

"""## Subrelations (and disabling them)"""

print('first try')
print_all(db.identify(['0.72057318321392', '0.36028659160696'])) # What happens now?
print('second try')
print_all(db.identify(['0.72057318321392', '0.36028659160696'], strict=True)) # Don't separate!

print('third try')
print_all(db.identify(['0.66666666666666', '0.36028659160696'])) # Sometimes I ignore some of your values...
print('fourth try')
print_all(db.identify(['0.66666666666666', '0.36028659160696'], strict=True)) # ...and strict=True doesn't help

"""In all 4 examples we try to identify an array of 2 rational numbers. In the first example, `identify` successfully recovers both rational numbers individually (and this is the reason we do `print_all` and not just `print`, since `identify` can return multiple relations). However, if we want `identify` to connect our two constants even if there are "smaller" relations involved, we can set `strict=True` to stop this "smaller" search.

However, this is not all-powerful. In the latter 2 examples, the rational numbers there involve integers with significantly different orders of magnitude, and as such `identify` latches on the "easier" of the two. In this case, `strict=True` cannot help no matter what.

## General polynomial relations
So far, the basic assumption within `identify` has been that the constants involved, if related, are related via a "Möbius relation". What this means, for instance for 2 constants $\alpha,\beta$ is that there exist integers $a,b,c,d$ ("sufficiently small", see [Return on Investment](#Return-on-Investment)) such that $\alpha=\frac{a\beta+b}{c\beta+d}$, or in other words $c\alpha\beta+d\alpha-a\beta-b=0$. Generalizing to more constants has, in general, the numerator and denominator both having up to all but one constant with a nonzero coefficient, without products between them and not involving their powers (as well as maybe a free integer).

This assumption can be changed via two parameters: `degree` and `order`.
- `degree` is exactly the (maximum expected) [degree](https://en.wikipedia.org/wiki/Degree_of_a_polynomial#Extension_to_polynomials_with_two_or_more_variables) of the polynomial involved (generalized to multivariate polynomials).
- `order` is the maximum expected single exponent that participates in the polynomial. This means that in each term of the polynomial, each variable's exponent is no greater than `order`.

The default degree and order are $2$ and $1$ respectively, which results in Möbius relations as mentioned. Another example is degree and order both $1$, which results in affine relations, such as:
"""

print_all(db.identify(['phi', '2.23606797749978969640917366873127632'], degree=1)) # Do you recognize this value?

"""## Explicit `PreciseConstant`s
Internally, `identify` uses the `LIReC.lib.pslq_utils.PreciseConstant` type which adds to each value its numerical precision and its symbol, which will be used when printing. Importing and using the `PreciseConstant` type allows you to control these things for yourself:
"""

from LIReC.lib.pslq_utils import PreciseConstant

# value + precision, value part accepts same stuff as the list of values in identify
# symbol will be automatically determined
v1 = PreciseConstant('3.35998345818749438788793', 24)

# value + precision + custom symbol, the symbol accepts same stuff as identify's isolate
v2 = PreciseConstant('0.60178868910874809225601', 24, 'hello')

for r in db.identify([v1, v2]):
    print(r)
    r.constants[0].symbol = 'world' # symbol can be changed post-hoc as well...
    print(r) # ...except now the isolate kinda fails because we did this post-hoc, so...
    r.isolate = 'hello' # ...this should fix it
    print(r)

"""## The end!
You reached the end of this notebook! If you read and understood everything, good job! (if not i don't blame you :P)

If you still have questions or want to give feedback, consider checking [the github](https://github.com/RamanujanMachine/LIReC) or [contact us](https://www.ramanujanmachine.com/about-us).
"""
