Home
GenAI Evaluation is a library which contains methods to evaluate differences in Real & Synthetic Data.
Functions
- multivariate_ecdf: Computes joint or multivariate ECDF in contrast to the univariate capabilities provided by packages like statsmodels
- ks_statistic: Calculates the KS Statistic for two multivariate ECDFs
Read more in the API Reference & User Guide pages.
Authors
- Dr. Vincent Granville - Research
- Rajiv Iyer - Development/Maintenance
Installation
The package can be installed with
pip install genai_evaluation
Tests
The test can be run by cloning the repo and running:
pytest tests
In case of any issues running the tests, please run them after installing the package locally:
pip install -e .
Usage
Start by importing the class
from genai_evaluation import multivariate_ecdf, ks_statistic
Assuming we have two pandas dataframes (Real & Synthetic) and only numerical columns, we pass them to the multivariate_ecdf function which returns the computed multivariate ECDFs of both.
query_str, ecdf_real, ecdf_synth = multivariate_ecdf(real_data, synthetic_data, n_nodes = 1000, verbose = True)
We then calculate the multivariate KS Distance between the ECDFs
ks_stat = ks_statistic(ecdf_real, ecdf_synth)
Motivation
The motivation for this package comes from Dr. Vincent Granville's paper Generative AI Technology Break-through: Spectacular Performance of New Synthesizer
If you have any tips or suggestions, please contact us on email.