Cinip: big numbers, arbitrary precision, conveniently

Introduction

PHP has an extension called BCMath, which allows the use of numbers of arbitrary size, with a configurable number of digits after the decimal point. However, it can force the programmer to use expressions which are painful to write and difficult to read.

Imagine that you need to compute the following expression:

($a * $b + $c) * ($d * $e + $f) / ($g * $h + $i) * ($j * $k + $l)

You know that the numbers involved are large and that they may overflow PHP's numeric datatypes, so you have to use BCMath. But to express the above, you have to write this:

bcmul(
  bcdiv(
    bcmul(
      bcadd(bcmul($a, $b), $c),
      bcadd(bcmul($d, $e), $f),
    ),
    bcadd(bcmul($g, $h), $i),
  ),
  bcadd(bcmul($j, $k), $l),
)

Wouldn't it be nice if you could use a more readable, more PHP-like notation? Well, now you can, and this page describes how.

The necessary software is called Cinip, and its usage is very simple. (The name is an acronym for “Calculate Immense Numbers in PHP”, and I pronounce it as “SINN-ipp”.) Currently at version 1.1.0, it is effectively in the public domain, and available for download in tgz and zip formats. It requires PHP 5.3.0 or later.

The remainder of this document explains how to use the software; in what follows, a knowledge of BCMath is helpful, but not required.

A simple example

To solve the problem just described, we might use something like the following code. It performs the calculation given above and assigns its result to $x.

$f = new cinip\parser::get_func(500);
/*
  We want calculations to
  use 500 decimal places;
  Cinip's default is 60.
*/
 
$expression = '
  ($a * $b + $c) * ($d * $e + $f) / ($g * $h + $i) * ($j * $k + $l)
';
 
$x = eval($f($expression));

A longer example

The following is a complete program (which doesn't show Cinip to best advantage, since the arithmetic expressions are not very complicated). Note that the scale (the first argument to the static method get_func) is always zero, since we are dealing only with integers.

<?php
require_once "cinip.php";
/*
  All of Cinip is defined in
  the file cinip.php
*/
 
use kingfisher\cinip;
/*
  Everything defined in
  cinip.php is in the above
  namespace
*/
 
test_gcd();
 
/*
  test the function
  `gcd', defined below
*/
function test_gcd() {
  $fac100 = factorial(100);
  $fac200 = factorial(200);
 
  $f = cinip\parser::get_func(0);
 
  if(eval($f('gcd($fac100, $fac200) != $fac100'))) {
    print "Something's very wrong...\n";
  }
}
 
function factorial($n) {
  $f = cinip\parser::get_func(0);
 
  $fac = 1;
 
  for($i = 2; $i <= $n; $i++)
    $fac = eval($f('$fac * $i'));
 
  return $fac;
}
 
/*
  calculate the greatest
  common divisor of $a
  and $b, using Euclid's
  algorithm
*/
function gcd($a, $b) {
  $f = cinip\parser::get_func(0);
 
  while(eval($f('$b != 0'))) {
    list($a, $b) = array($b, eval($f('$a % $b')));
  }
 
  return $a;
}

API

All of Cinip is contained in a single file (called cinip.php, but you can rename it if you like), which should be loaded with require_once or include_once. Everything in this file is defined in the namespace \kingfisher\cinip.

Among the things defined are two classes called parser and CinipException, and an interface called Exception. These three things constitute the API, and everything else should be considered private.

In a moment we will look at the methods of the class parser, but first we need to consider the static method get_func and its relationship to two other methods of parser.

get_func allows one to eliminate some clutter. Consider the following two pieces of code, which are essentially equivalent.

use kingfisher\cinip;

⋮

$parser = new cinip\parser(x₁, x₂, …, x_m);

⋮

$z = eval($parser->to_bc(y₁, y₂, …, y_n));

use kingfisher\cinip;

⋮

$f = cinip\parser::get_func(x₁, x₂, …, x_m);

⋮

$z = eval($f(y₁, y₂, …, y_n));

Clearly, “eval($f(…))” is more convenient to read and write than “eval($parser->to_bc(…))”.

Methods of the class `parser`

to_bc($arithmetic_expression)

Parses $arithmetic_expression, transforms it into a new expression which uses BCMath functions, and returns this new expression. (The syntax rules for $arithmetic_expression are given below.)

Here is a small program which illustrates the use of to_bc.

<?php
require_once "cinip.php";
 
use kingfisher\cinip;
 
$parser = new cinip\parser;
 
$x = 1;
$y = 2;
 
$expressions = array(
  '1 + 2',
  '$x + $y',
  'pow(PHP_INT_MAX, 20) / log(33, 10)',
  '1e200 / 3e400',
);
 
foreach ($expressions as $e) {
  $new_expr = $parser->to_bc($e);
  $result = eval($new_expr);
  print
    "Original expression: $e\n" .
    "Transformed expression: " .
      "$new_expr\n" .
    "Result: $result\n\n";
}

__construct( [ $scale = 60 [ , $perf_mode = parser::memoize ] ] )

$scale (which is analogous to PHP's bcmath.scale setting) is passed by Cinip to all BCMath functions which take a scale parameter (that is, all functions except bcmod). The scale can't be changed after an object has been created; if you need more precision, just create another parser object.

If you will be working only with integers, then setting $scale to 0 will make your calculations faster.

A feature of BCMath is worth highlighting: it appears that the scale is used, not just for the final result of a calculation, but also for the intermediate results. This can result in truncation, as shown by the following program.

<?php
$result = bcmul('.13', '.13', 3);
/*
  the correct answer,
  mathematically speaking,
  is 0.0169
*/
 
$result = ltrim($result, '0');
 
if($result === '.016') {
  print "Truncation occurred.\n";
} elseif($result === '.017') {
  print "Rounding occurred.\n";
} else {
  print "Something completely " .
    "unexpected occurred.\n";
}

Note that bcmath.scale has no effect on Cinip.

$perf_mode can be one of two class constants, parser::memoize or parser::no_enhancement. The former tells the object to memoize the method to_bc (which is described below), the latter tells it not to.

Normally you should use parser::memoize, but if each of your expressions is evaluated only once, or you have a large number of distinct expressions and are worried about high memory usage, then it may be better to use parser::no_enhancement.

get_func( [ $scale = 60 [ , $perf_mode = parser::memoize ] ] )

A static method. Has been fully described above.

get_scale()

Returns the scale being used.

get_perf_mode()

Returns the performance mode being used (one of parser::memoize or parser::no_enhancement).

Syntax rules for expressions

The syntax of expressions is a subset of that of PHP expressions. The following code gives a good overview of what may appear in a Cinip expression; it generates a valid expression and stores it in the variable $expression.

<?php
require_once "cinip.php";
 
use kingfisher\cinip;
 
$expression = implode("+",
  array(
    # integers can be used...
    '123',
 
    # as can floating-point
    # numbers...
    '1.2',
 
    # numbers in scientific
    # notation...
    '1e2',
    '3.4e5',
    '-6.7e-8',
 
    /*
      string literals using
      double quotes (with
      some complications,
      described below)...
    */
    '"123"',
 
    /*
      string literals using
      single quotes...
    */
    "'123'",
 
    # variables...
    '$x',
 
    # any constant...
    'PHP_INT_MAX',
    'M_PI',
    'M_E',
 
    # any function.
    'log(100)',
    'rand(1, 10)',
  )
);

The following arithmetic operators are permitted:

+ - * / %

They are implemented using the corresponding BCMath functions. (Note that, just as in PHP, “+” and “-” can be used as binary or as unary operators.)

The following comparison operators are permitted:

< <= == >= > !=

They are implemented using bccomp, which means that they are accurate only as far as the scale permits; see the following code.

<?php
require_once "cinip.php";
 
use kingfisher\cinip;
 
$f = cinip\parser::get_func(3);
 
if(eval($f('.111 == .1111'))) {
  print "They are the same.\n";
 
  /*
    This branch will be
    entered, because the
    scale is too low.
  */
}

Nesting to arbitrary depth is, of course, permitted:

a(1, b($x, c(PHP_INT_MAX, d(log(42)))))

Any function can be used, but two functions are treated specially, namely sqrt and pow.

sqrt will be changed to bcsqrt. For pow, a run-time check will be made on the second argument; if it's an integer (that is, an integer in the mathematical sense; it need not have type integer), then bcpow will be used; otherwise pow will be used.

It's your responsibility to ensure that a function's arguments are appropriate. For example, the expression log(pow(2, 2000)) won't work, because 2²⁰⁰⁰ is too large to be represented as a double-precision floating-point number.

Things that don't work

Description	Example of erroneous code
Using the `->` operator to reference properties or methods of an object.	$f = cinip\parser::get_func(); /* Each of these 2 lines contains an invalid expression */ $x = eval($f('$o->a')); $x = eval($f('$o->f()'));
Referencing array elements.	$a = range(1, 9); $f = cinip\parser::get_func(); $x = eval($f('$a[0]'));
Class constants (but you can use PHP's `constant` function).	<?php require_once "cinip.php"; use kingfisher\cinip; class C { const N = 100; } $f = cinip\parser::get_func(); $x = eval($f('C::N'));
Certain kinds of interpolation into double-quoted strings.	$f = cinip\parser::get_func(); $a = array(); $a["b"]["c"] = 42; $x = eval($f('100 + "{$a["b"]["c"]}"'));
A name qualified with a namespace.	$f = cinip\parser::get_func(); # both the following # Cinip expressions # are syntactically # invalid $y = eval($f('100 + xyz\\g()')); $y = eval($f('100 + namespace\\g()'));

Adding these things should not be terribly difficult, but would probably be quite tedious, there being no good parser generator for PHP that I know of.

More on interpolation

Interpolation into double-quoted strings requires some explication. Cinip's idea of a double-quoted string is much simpler than PHP's, and can be explained using the following code.

if(preg_match('@"([^\\\\"]|\\\\.)*"@', $s)) {
  print "The variable \$s contains a " .
    "double-quoted string literal.\n";
}
 
/*
  A note on the above regex:
  a backslash is special in PHP
  single-quoted strings, and
  special in regular expressions,
  so we need a sequence of 4
  backslashes to get a literal
  backslash in the regular
  expression.
*/

Because of this difference, Cinip and PHP sometimes differ on where a double-quoted string ends. The result is that string interpolation will work if you use the simple syntax, but will not work for some uses of the complex syntax (these terms are from the PHP documentation).

The following program (expanded from a code snippet above) illustrates.

<?php
require_once "cinip.php";
 
use kingfisher\cinip;
 
$f = cinip\parser::get_func();
 
$a = array();
$a["b"]["c"] = 42;
 
/*
  this works because
  interpolation occurs
  before Cinip sees the
  expression
*/
$x = eval($f("100 + {$a["b"]["c"]}"));
 
/*
  this works because we used
  single-quotes inside the
  double-quoted string
*/
$x = eval($f('100 + "{$a[\'b\'][\'c\']}"'));
 
/*
  as far as Cinip is
  concerned, the first
  double-quoted string here
  is 4 characters long.
  as a result, interpolation
  will not occur.  in fact,
  this Cinip expression is
  syntactically invalid.
*/
$x = eval($f('100 + "{$a["b"]["c"]}"'));

Interpolation is always done by PHP, not Cinip, with the time of interpolation being determined by the quoting used. The following program illustrates.

<?php
require_once "cinip.php";
 
use kingfisher\cinip;
 
$parser = new cinip\parser(0);
 
$x = str_repeat('2', 60);
 
/*
  here, interpolation occurs
  before to_bc even begins
  to execute.
*/
$new_code = $parser->to_bc("111 + $x");
var_dump(eval($new_code));
 
/*
  here, interpolation occurs
  while eval is running
*/
$new_code = $parser->to_bc('111 + "$x"');
var_dump(eval($new_code));

A small convenience

Numeric literals can have an arbitrary number of digits before or after the decimal point; unlike in PHP, there will be no loss of precision (unless, of course, you've specified a scale which is too low for the number of decimal digits). See the following code for an example.

<?php
require_once "cinip.php";
 
use kingfisher\cinip;
 
$f = cinip\parser::get_func();
 
$x = str_repeat('2', 100000);
$y = str_repeat('1', 100000);
 
if(eval($f("$x / 2 != $y"))) {
  print "Something's very wrong...\n";
}

A warning

Be careful of interpolating variables into double-quoted strings, because precision may be lost. See the following code.

require_once 'cinip.php';
 
use kingfisher\cinip;
 
$precision = 14;
 
ini_set('precision', $precision);
 
$s = str_repeat('1', $precision);
$x = floatval($s . '0');
$y = floatval($s . '1');
 
$f = cinip\parser::get_func();
 
# double quotes here
#
if(eval($f("$x != $y"))) {
  print "That's funny -- we should " .
    "have lost precision.\n";
}
 
# single quotes here
#
if(eval($f('$x == $y'))) {
  print "That's funny -- \$x and \$y " .
    "should be different.\n";
}

Exceptions

Cinip defines an interface called Exception. It is guaranteed to be implemented by all exception classes defined by Cinip, so to catch any exception that the software may throw, you can use code like the following.

<?php
require_once "cinip.php";
 
use kingfisher\cinip;
 
$f = cinip\parser::get_func();
 
$invalid_expression = '1 +';
 
try {
  eval($f($invalid_expression));
} catch(cinip\Exception $e) {
  /*
    Any exception thrown by
    Cinip will be handled
    here.
  */
}

Currently, Cinip defines only one exception class, called CinipException. It is thrown, for example, when an expression cannot be parsed.

Development

Cinip is developed in a public GitHub repository.