All contributions
AI & Machine Learningmachine-learningfintechfraud-detection

How to reduce fraud 73% with Machine Learning: a case study

Practical case study of implementing a financial fraud detection system with ML for a LATAM fintech. Architecture, features, model, and production results.

Numoru EngineeringPublished on April 5, 202610 min read
Share

The problem: rising fraud in digital payments

-73%
Fraud rate reduction
In 90 days post-deploy
-74%
Chargebacks
Driving the ROI story
+$2.1M
Annualised net impact
Less loss + more conversion
$60-180k
Typical rollout ticket
Fintech, 3-4 months

A Latin American fintech was processing over 2 million transactions per month. Fraud accounted for 4.2% of total volume — more than double the industry benchmark (1.5–2%). The existing manual rules blocked legitimate transactions (18% false positives) while still letting sophisticated fraud patterns through.

The cost was not only financial: every fraudulent transaction generated chargebacks, eroded user trust and added regulatory risk.

The solution: real-time detection with ML

We designed and deployed a 3-layer fraud detection system:

Layer 1: Feature engineering

80% of an ML model's value lives in its features. We built 147 features in 4 categories:

  • Transactional: amount, frequency, velocity between transactions, deviation from typical behavior
  • Device: device fingerprint, geolocation, IP changes, User-Agent
  • Behavioral: hour of day, day of week, navigation patterns before the transaction
  • Network: relationships across accounts (same device, same IP, shared beneficiaries)

The network features were the most discriminative. A common fraud pattern involves multiple "clean" accounts that converge on a single beneficiary.

Layer 2: Classification model

We benchmarked 4 algorithms and picked XGBoost for its balance of performance, interpretability and inference speed:

ModelAUC-ROCPrecisionRecallLatency (p99)
Logistic Regression0.890.820.712 ms
Random Forest0.930.880.7915 ms
XGBoost0.960.920.858 ms
Neural Network0.950.900.8645 ms

The neural net had similar recall but 5× the latency — unacceptable for real-time decisions.

Layer 3: Decision system

Not everything is decided by ML. The system combines:

  1. Deterministic rules: instant block for known patterns (confirmed stolen cards, IPs on a blocklist)
  2. Model score: fraud probability (0–1)
  3. Dynamic threshold: tuned per merchant risk segment and transaction amount
  4. Manual review: gray-zone transactions (score 0.4–0.7) routed to an analyst team

Production architecture

End-to-end latency under 100 ms:

Transaction → API Gateway → Feature Store (Redis)
                                    ↓
                            Feature Pipeline (Flink)
                                    ↓
                            XGBoost model (served with BentoML)
                                    ↓
                            Decision Engine → Approve / Reject / Review

Key components:

  • Feature Store on Redis: pre-computed features with 24h TTL for time-window features
  • Apache Flink: real-time feature streaming (1h, 24h, 7d windows)
  • BentoML: model serving with automatic batching and health checks
  • PostgreSQL: decision log for audit and retraining

Production results

After 4 months in production:

MetricBeforeAfterChange
Fraud rate4.2%1.1%-73%
False positives18%3.2%-82%
Decision time2-5 min (manual)85 ms~99.9%
Monthly chargebacks~8,400~2,200-74%

ROI was reached in month two. The reduction in chargebacks plus the lift in conversion (fewer legitimate transactions blocked) generated a net positive impact of $2.1M USD annualized.

Lessons learned

What worked

  • Network features were the biggest differentiator. Account-relationship patterns are harder to fake than individual features.
  • Per-segment dynamic thresholds avoided a one-size-fits-all model that would have been too aggressive for low-risk merchants.
  • Drift monitoring with automatic alerts whenever feature distributions shift meaningfully.

What didn't work at first

  • Aggressive oversampling (SMOTE) in the initial training produced an over-sensitive model. We switched to focal loss with XGBoost to handle class imbalance.
  • Geolocation features had low discrimination in LatAm because of widespread VPN use and dynamic-IP mobile networks.

What we would do differently

  • Implement explainability from day 1 (SHAP values per transaction). Compliance needed it to justify rejections to regulators and it arrived 6 weeks late.
  • Invest more in synthetic data to train against emerging fraud patterns before they show up in production.
Fraud metrics — baseline vs ML stack (90-day measurement)

Fintech case described above. Same volume, same fraud-actor population; the ML stack replaces rules-only detection.

0255075100Fraud rate %False positives %Detection recall %Chargeback count(k)
  • Pre-ML baseline
  • Post-ML (90 days)

Anonymized client metrics.

Business & commercial impact

Business & commercial impact

How Numoru productizes this

Fraud-detection rollouts are one of the clearest ROI stories in a fintech — loss + chargeback reduction shows up on the P&L in the first quarter. Numoru sells it as a 3-4 month engagement with a guaranteed metric threshold, plus an MLOps retainer that keeps the model fresh against adversarial drift.

Fraud-detection ticket by buyer (Numoru 2026, USD)

Fintech (wallets / remittances)
Real-time scoring of card & transfer flows.
$80,000 – 180,000
One-time + $6k / mo MLOps
E-commerce (marketplace)
Seller risk + refund abuse.
$45,000 – 110,000
One-time + $3,500 / mo
Insurance
Claims fraud scoring + NLP evidence.
$95,000 – 220,000
One-time + $7k / mo
Banks (digital channel)
Session-level anomaly + device profiling.
$150,000 – 380,000
RFP + 24 mo engagement
Telco
SIM-swap + subscription fraud.
$55,000 – 140,000
One-time + $4k / mo
Public case studyPayments · Global · 2024

Stripe Radar — fraud detection benchmarks

Challenge
Publish impact of ML fraud detection on real merchant cohorts.
Solution
Stripe shares aggregated Radar outcomes in their blog, including ROC curves and merchant-wide fraud reduction.
Results
Fraud reduction, first 90 days
-40 to -80%
Across merchant cohort
False-positive rate
<1%
Radar default threshold
Latency
<100 ms
At checkout
Public case studySecurity research · Global · 2024

IBM — Cost of a Data Breach (fintech segment)

Challenge
Report annualised cost of fraud + breach for financial-services firms.
Solution
IBM surveys 600+ breached companies per year, cutting by industry.
Results
Avg breach cost (financial svc)
$5.9M
Up YoY
Cost of identity theft (US)
$43B / yr
Industry total
Share preventable by ML
30-50%
Expert estimate

Fintech deploying Numoru fraud stack (12 months)

Payback: < 1
Assumptions
Monthly transaction volume2M
Avg ticket$38
Baseline fraud rate4.2%
Post-rollout fraud rate1.1%
False-positive lost revenue$420k / mo
Post-rollout FP lost revenue$140k / mo
Engagement cost$140,000 one-time
MLOps retainer$6,000 / mo
Engagement (one-time)−$140,000
MLOps (12 mo × $6k)−$72,000
Fraud loss avoided (3.1% × 2M × $38 × 12)+$28,272,000
FP-driven conversion recovered (12 mo × $280k)+$3,360,000
Net year-1 contribution+$31,420,000
Diagnosis
$6,500one-time
2-week model assessment.
  • Baseline metrics audit
  • Feature-coverage review
  • Cost-of-fraud analysis
  • Roadmap + quick wins
Full rollout
$80,000 – 220,000one-time
3-4 months to production.
  • Feature pipeline + store
  • XGBoost + GNN models
  • Decision engine + case mgmt
  • Explainability (SHAP)
  • Compliance artifacts
MLOps retainer
$3,500 – 10,000/ mo
Keep it sharp vs adversarial drift.
  • Weekly retraining
  • Drift monitoring
  • Adversarial red-team
  • Regulatory reporting

Conclusion

ML fraud detection isn't magic — it's feature engineering, careful model selection, and a decision system that combines automation with human supervision.

The most valuable component wasn't the model itself but the real-time feature infrastructure that lets us react in milliseconds to patterns no human team could catch at scale.

Want results like these for your company?

Start a conversation
Share