R project for geometrical statistics
Volume approximation and sampling in high dimensions
You can read the proposal here
Volume computation of convex polytopes and sampling algorithms are very useful in many scientific fields and applications. The VolEsti is a C++ software package for volume approximation and scales to a few hundreds dimensions in contrast to currently available packages. Thus it could be an essential tool for a quite large number of scientific applications that need fast volume approximation or sampling in high dimensions. The goal of this project is to provide a friendly and efficient interface of VolEsti in a high level language as R. Second to propose additions that might be useful for some scientific applications, especially in economics and in biogeography and control, by extending VolEsti. Furthermore this new R package would provide necessary tools for some other future extensions that could be used in applications in biology and optimization problems. We hope this project will be a decisive contribution towards the first complete and efficient tool for geometrical statistics and thus, help educational programs, research or even serve as a building block towards an international, interdisciplinary community in geometrical statistics.
Working together with my mentors on the algorithmic part of the project and especially on Volume computation for Zonotopes. Here are some important papers for our work:
- Zonotopes and the LP-Newton method here
- Goffin’s Algorithm for Zonotopes here
- Methods for order reduction of Zonotopes here
Here you can see an application (not included in the proposal) in biology that needs uniform sampling from zonotopes.
Working on the main algorithm of the project: We study state-of-the-art volume computation algorithms, as Ben Cousin’s and Santosh Vempala’s cooling gaussian algorithm, and reschedule the deliverables in order to implement them as well. The main advantages are (a) better time efficiency, (b) less code and neccesary R and C++ packages and (c) more choices and flexibility for the user depending the problem.
The cooling gaussian algorithm takes some fundamental ideas from Lovasz’s-Vemapla’s LV algorithm. A practical version of a combination of these two algorithms is given here with very encouraging results. Comparing with VolEsti (or an improved version of VolEsti) is an important issue.
The final goal is an application with a variety of choices depending on the problem, the dimension, the sample distribution requested and the type of the convex body.
Setting up the project environment at github. Everything is ready for the coding period!
We made the following changes to the Deliverables - time schedule:
- After we complete the Part I of the project we are going to implement Cousin’s and Vempala’s algorithm for the general convex body case.
- We have to write new classes for the ellipsoid.
- We are not going to need any SOCP solver.
- We have to modify the RCHR function in order to get a new point in each step and sample from the gaussian distribution.
- We are going to need a new function for the ScheduleAnnealing.
Coding period has started. You can see our progress at github.
CGAL exclusion completed. We created a new Cartesian Kernel and a new Point class. We consider points and vectors, starting at the origin, to be the same. So we define the following operators: +, - , (double)*(point).
1) Boost exclusion completed.
2) We implemented a new class for generating random points on a hypersphere, needed for the Random Direction Hit and Run.
3) We have built a R interface using Rcpp. Now our C++ code is able to be excecuted through R.
In folder include we develop the pure C++ code of the project, using a C++ main function.
In folder test we compile the C++ code with g++ compiler in order to test the algorithm when we make changes.
In folder R-proj/src we develop the RCpp interface.
4) We have added new classes for the ellipsoid, for the intersection of a polytope with an ellipsoid and for the intersection of a ball with a polytope and an ellipsoid.
1) We have implemented rounding function, based on SVD decomposition given by Vempala and Cousin. We use RcppEigen library.
2) We have defined a member function in class StdHPolytope (not ready for use) in order to apply linear transformation to an H-polytope. The input is of type Eigen::MatrixXd and the trasformation reduces to a matrix multiplication. We might replace class member stdMatrix with Eigen::MatrixXd in future in order to have a more simple code.
3) We tested cpplex combined with C++ pilal (Pathetically Inadequate Linear Algebra Library). An interface using cpplex and pilal is given here. We develop this project to a header only library (both cpplex and pilal) and wrote a C++ interface that takes a H-polytope as an input and computes the chebychev ball. We reject this solver because it is not enough time efficient for our needs.
1) We consider the Chebychev ball problem as an optimization problem for which optimization algorithms, such as BfgsSolver, LbfgsbSolver or SLSQP_Solver can be used. We used CppOptimizationLibrary by building a C++ interface for the chebychev ball computation, but these methods need a starting point in the interior of the polytope in order to converge. We note these methods because we will try to use them for the V-polytope membership later. We might use the Dlib implementations using the R interface.
2) We are trying to call lpSolve library dynamically. An alternative is to use the linear solver from Google’s Operations Research tools.
3) For the SOCP problem we are going to use ECOSolver as we know that is quite efficient for our application.
first evaluation is open
Comparing to the Deliverables in the proposal:
1) We completed all the parts except of the linear solver replacement because we did not expect it to be so difficult to find an efficient, C++ header only, open source solver, but we discover new intersting techniques that might be used in the next weeks for the V-polytope volume computation.
2) We have completed all 5th week’s work, by replacing the Boost library and completing the Rcpp interface.
3) We have completed 6th week’s work except V-polytope class by implementing the classes for the non-linear bodies of the Part II of the project.
For the linear programming solver we worked on the following implementations:
1) cpplex. We made changes to be a header-only library. We tested and rejected because it is inefficient for our project. It computed the Chebychev center of a 100-dimensional hypercube in 5.2 sec. when CGAL solver took 0.12sec.
2) CppOptimizationLibrary and Dlib. An interior point is needed for this approach.
3) linear solver. This is our main alternative, but “linear_solver_test.cc” file is missing.
4) lpsolve. We are trying to implement a dynamic or a static call.
1) Use lpSolve library using static linking with external library liblpsolve55.so (completed).
2) lpSolve seems to be quite efficient for our application. We did run experiments but seems to be comparable with cgal’s LP solver.
1) We have completed Rcpp package “volesti”. CGAL and BOOST exclusion is completed and C++ and R interfaces are ready for use and experiments.
2) We are ready for the first pull request.
3) We have started implementing Cousins’ - Vempala’a algorithm.
1) Link lpSolve to Rcpp package using .so file.
2) Improve C++ and R code. Write comments, improve implementation of coordinate directions hit and run (V. Fisikopoulos’ older TODO).
3) Implement a script to run test if requested from the user.
4) Fix some bugs. Define the sequence of concentric balls more efficently. We do not use exact computations.
1) Implement Khachiyan algorithm for the Minimum Enclosing Ellipsoid for the rounding subroutine. We use Nikolic’s implementation. We do not use all the bnmin library. We add only the necessary source files to our project and make some adjustments in order to create a header only version of the algorithm.
2) We use BH package which is a Rcpp package for Boost library. We now use Boost for random generators. Function rand() rejected because it does not generates uniform integers from a range [a,b]. Boost is used in Nikolic’s implementation as well.
3) devtools::check() suggests not to include .so files to our package. So we made all the adjustments in our Rcpp package in order to compile lpSolve library from source files. We use lpSolveAPI files and makefile. The makefile builds an .a file and then makevars file links it to our c++ code.
4) Complete schedule_annealing for gaussian distribution in schedule_annealing branch in the gaussian_annealing.h header file. We follow the paper of Cousins-Vempala and the matlab code as well.
5) Complete gaussian sampling (both random and coordinate directions hit and run) and the main function for volume approximation. We have to compare it with matlab implementation.
1) Improve test mechanism and link repos to circleci account. Each time we push a commit to a repository all the tests are running.
2) Fix some bugs in CV algo. We have to use seed using chrono library for the random samplers. This change requires c++11. The new implementation of the CV algo runs 4-5 times faster than the matlab implementation.
3) Add ball walk for the gaussian sampling. Create Pull Request for the CV algo.
4) We add roxygen2 comments above the R functions and create Rd files for the documentation.
1) Parameterize the CV implementation. In addition to matlab implementation we parameterize the radius of the ball for the ball walk, the number of points we sample in each step of the schedule annealing and the parameter “ratio” for the schedule annealing as well.
2) We extend the test mechnism using circleci for the rounding, the chebychev center and the rotate function.
3) Make necessary adjustments to use number type NT as template everywhere in the implmentation.
4) Use iterators everywhere in the implementation.
1) Use constants and better name for some variables that declare tolerance and maximum number of iterations.
2) Create new branch v_polytopes. We implement a new class for V-polytope. We use the pre-GSoC work we have done for the proposal. We fix some bugs in the function that computes an enclosed ball: We pick d+1 random vertices until they define a full dimension simplex and then we compute the largest enclosing ball of the defined simplex.
3) Create a new function for ball walk with uniform target distribution.
4) Implement member functions in VPolytope class for Coordinate direction hit and run and Random Directions hit and run, using lpSolve for the linear progrmams.
5) Create new branch use_eigen_matrices in order to replace stdMatrix with Eigen matrix and vector.
1) Complete replacement with eigen matrix everywhere in the implementation. Merge branc into develop and schedule_annealing and v_polytopes.
2) Add comments and fix some bugs for the radius of ball walk in schedule annealing. Make additions in R package in order to run CV algorithm through R.
3) Create some more member functions to both H and V polytopes’ classes in order to make rounding and rotating able for V-polytopes: a) a function for the linear trnasformation, defined by a squared matrix, of the polytope and a function for the shifting of the polytope by a point. Moreover if rounding is requested for a V-polytope then we give the vertices of the polytope for the computation of the minimum volume enclosing ellipsoid, only if number_of_vertices<=20*dimension.
4) Use functions from previous step and make necessary adjustments in CV implementation in order to use it for the volume approximation of V-polytopes. In get_first_gaussian() function the distances between the internal point and the facets of the polytope is requested in order to obtain an upper bound of a_0. For V-polytopes we compute an upper bound for the number of facets and then for each facet we consider a lower bound for the distance from the internal point, which is the radius of the enclosed ball we have computed in a previous step of the algorithm.
5) We create 3 new header files in a new folder number_types. In each of them we define nummber type NT as float, double or long double. We have to include the right header file in main cpp file before volume.h in order to use the desirable number type. We have created two more volume tests for float and long double. Using double or float seem to need half time than long double.
13th week (last week)
1) We make additions to the R interface in order to use it for the volume approximation of V-polytopes.
2) We update the R documentation by adding new roxygen2 comments, a table of contents and a description of the package.
3) We add C++ sampling_only() function in sample_only.h header file. We use this function in R interface to provide the option of sampling from a convex H or V-polytope with uniform or spherical gaussian target distribution. The R function is sample_points().
4) We created rand_rotate() R function to provide the option of applying a random rotation to a convex H or V-polytope. This function return a R list that contains paraments “matrix” and “vector” for an H polytope or it returns just a numerical matrix that contains vertices row-wise for a V-polytope.
5) We created round_polytope() R function to provide the option of rounding a convex H or V-polytope. This function returns the same arguments as rand_rotate() function for the polytope description and also “round_value” and “minmaxRatio” (the ratio between minimum and maximum axe of the constructed ellipsoid) returning list’s elements.
6) We add tests for volume approximation for both algorithms in R. The test function is demoVolume(), which takes as an input “volesti” or “CV” string.
7) We add tests for sampling for both distributions. The test function is demoSampling(), which takes as an input “uniform” or “gaussian” string.
8) We add tests for rounding in R. The test function is demoRounding(), which takes no inputs.
9) We updated readme file in “improve_R_interface” branch.
10) We created a new branch “improve_rounding” in order to work on the stopping criterion of the rounding based on minimum volume enclosing ellipsoid and to implement a new rounding algorithm based on SVD decomposition.
11) We replace Eigen::MatrixXd with typedef Eigen::Matrix<> MT in order to set the number type of the matrix. Now we can use different number types through the whole implementation.
TODO List here