Home

PyPI version

GenAI Evaluation is a library which contains methods to evaluate differences in Real & Synthetic Data.

Functions

  • multivariate_ecdf: Computes joint or multivariate ECDF in contrast to the univariate capabilities provided by packages like statsmodels
  • ks_statistic: Calculates the KS Statistic for two multivariate ECDFs

Read more in the API Reference & User Guide pages.

Authors

Installation

The package can be installed with

pip install genai_evaluation

Tests

The test can be run by cloning the repo and running:

pytest tests

In case of any issues running the tests, please run them after installing the package locally:

pip install -e .

Usage

Start by importing the class

from genai_evaluation import multivariate_ecdf, ks_statistic

Assuming we have two pandas dataframes (Real & Synthetic) and only numerical columns, we pass them to the multivariate_ecdf function which returns the computed multivariate ECDFs of both.

query_str, ecdf_real, ecdf_synth = multivariate_ecdf(real_data, synthetic_data, n_nodes = 1000, verbose = True)

We then calculate the multivariate KS Distance between the ECDFs

ks_stat = ks_statistic(ecdf_real, ecdf_synth)

Motivation

The motivation for this package comes from Dr. Vincent Granville's paper Generative AI Technology Break-through: Spectacular Performance of New Synthesizer

If you have any tips or suggestions, please contact us on email.