Arbitra

Testing Methodology

Arbitra's testing methodology is grounded in the use of real identity documents and sophisticated falsifications to determine statistically valid sample sizes for algorithm performance evaluation. This approach enables assessments that accurately reflect real-world fraud conditions and document behaviors.

Core Methodology Foundation

Arbitra's evaluation framework centers on testing with real identity documents and sophisticated falsifications to evaluate image categorization algorithm performance using confusion matrix analysis. This real-world approach focuses on:

True Positive Rate (TPR)

Accurately detecting authentic identity documents in real-world conditions

True Negative Rate (TNR)

Effectively rejecting sophisticated falsifications that mirror actual fraud patterns

By using real documents and high-quality falsifications, our estimates achieve statistically valid confidence levels while accurately reflecting genuine fraud behaviors and document characteristics found in operational environments.

Statistical Framework for Real-World Testing

Our statistical framework ensures that testing with real identity documents and sophisticated falsifications produces reliable, actionable results that reflect actual algorithm performance in operational settings.

95%

Confidence Level

(α = 0.05)

±1%

Target Precision

(W = 0.02)

95-99.5%

Expected Accuracy

Range

Sample Size Calculation for Real Document Testing

Single Observation Analysis

When testing with real identity documents and sophisticated falsifications, where each individual contributes only one document sample, we estimate the minimum required number of samples using the following statistical formula:

n ≈ (4 × z² × p × (1 − p)) / W²

Where:

  • n = number of real document samples
  • z = 1.96 for 95% confidence
  • p = expected accuracy on real documents
  • W = target confidence interval width

Real Document Sample Size Requirements:

p = 0.995
~191
real documents needed
p = 0.99
~380
real documents needed
p = 0.95
~1,825
real documents needed

Key Insight: Sample size requirements are highly sensitive to expected algorithm accuracy when testing with real identity documents and sophisticated falsifications.

Real-World Data Structure Considerations

When testing with real identity documents and sophisticated falsifications, each user typically provides multiple document images (different angles, devices, lighting conditions, etc.). This real-world clustered data structure requires statistical correction through a design effect:

Design Effect (DE) for Real Document Testing:

DE = 1 + (m − 1)ρ

Adjusted Sample Size for Real Documents:

n = n_effective × DE

Where:

  • m = number of real document images per user (typically 5)
  • ρ = intraclass correlation coefficient for real documents (range: 0.5 to 0.95)

Real Document Testing Examples (m = 5):

ρ = 0.5:~1,140 samples (~228 users)
ρ = 0.7:~1,445 samples (~289 users)
ρ = 0.95:~1,825 samples (~365 users)
Real Documents vs. Synthetic Data Approaches

Arbitra's Real Document Approach:

Our methodology is grounded in testing with real identity documents and sophisticated falsifications because this approach provides the most accurate reflection of algorithm performance in operational environments where genuine fraud patterns and document behaviors are present.

Advantages of Real Document Testing

  • Captures authentic document characteristics and security features
  • Sophisticated falsifications mirror actual fraud techniques and patterns
  • Reflects genuine user behavior and document presentation variations
  • Provides reliable performance indicators for real-world deployment

Limitations of Synthetic-Only Approaches

  • Although some synthetic datasets are created to replicate real-world variability, they often fall short due to methodological constraints
  • Limited access to rich training data results in distributions that may diverge significantly from real-world conditions
  • May not capture sophisticated fraud techniques used in actual attacks
  • Can produce misleading performance conclusions if not validated against real data

Hybrid Testing Recommendations

When broader coverage is needed or to simulate rare fraud scenarios, synthetic datasets can serve as a complementary benchmark—but only when validated against results from real identity documents and sophisticated falsifications. This hybrid approach still depends on a foundation of real document testing to be effective and meaningful.

Arbitra's use of real documents and sophisticated falsifications helps reduce the gaps inherent in synthetic-only approaches, enabling evaluations that accurately simulate real-world fraud scenarios and document behaviors.

Core Principle: Relying on synthetic data alone can produce misleading conclusions. Arbitra chooses to work with real and high-quality falsified documents precisely to overcome these limitations and provide more meaningful, actionable evaluation results.