Question 1

What is synthetic data?

Accepted Answer

Synthetic data is artificially generated information that mimics the statistical properties of real-world data without containing any actual personal records. It is created through mathematical models and algorithms rather than collected from real individuals.

Question 2

What does "born-synthetic" mean?

Accepted Answer

Born-synthetic means data that was generated from scratch using mathematical distributions and cultural models — never derived from, trained on, or linked to real individuals. Unlike anonymized data, born-synthetic data has zero lineage to any real person.

Question 3

How is born-synthetic data different from anonymized data?

Accepted Answer

Anonymized data starts with real records and removes identifiers. Born-synthetic data is generated from zero — no real data enters the pipeline. Anonymized data carries re-identification risk; born-synthetic data cannot be re-identified because no real person exists behind any record.

Question 4

What is the difference between synthetic data and fake data?

Accepted Answer

Fake data is randomly generated without statistical coherence — a 25-year-old with $50M in assets and a retired occupation. Synthetic data preserves realistic correlations between fields: age, wealth, occupation, geography, and asset allocation follow mathematically validated distributions.

Question 5

What is the difference between synthetic data and test data?

Accepted Answer

Test data is any data used in non-production environments. Synthetic data is one method of generating test data. The critical difference: most test data is copied or scrambled from production databases, which carries GDPR and PCI DSS risk. Synthetic data eliminates that risk entirely.

Question 6

Is synthetic data accurate enough for AI training?

Accepted Answer

Born-synthetic data preserves the statistical distributions that matter for AI model training — Pareto wealth distributions, realistic correlation matrices, and cultural patterns. What it does not preserve is individual-level information, which is precisely what regulations prohibit.

Question 7

Who uses synthetic data?

Accepted Answer

Banks, neobanks, payment processors, insurance companies, RegTech firms, and AI teams use synthetic data for compliance testing, model training, fraud detection development, and stress testing. Any organization handling financial PII that needs to reduce regulatory risk is a candidate.

Question 8

What does Sovereign Forger sell?

Accepted Answer

Sovereign Forger sells pre-built datasets of synthetic financial profiles. Two product lines: UHNWI profiles (19 fields, wealth-focused) and KYC/AML Enhanced profiles (29 fields, compliance-focused). Six geographic niches, three volume tiers each.

Question 9

What are the six geographic niches?

Accepted Answer

Silicon Valley (Founders & VC), Old Money Europe (Dynasties & Private Banking), Middle East (Sovereign Families & Merchant Houses), LatAm Barons (Agribusiness & Infrastructure), Pacific Rim (Semiconductor & Shipping Dynasties), and Swiss-Singapore (Offshore Wealth & Multi-Family Offices).

Question 10

How many records are available?

Accepted Answer

Each niche contains 100,000 pre-generated records, for a total of 600,000 UHNWI profiles and 600,000 KYC/AML profiles. Combined: over 1.2 million synthetic financial profiles ready for immediate download.

Question 11

What fields are in UHNWI profiles?

Accepted Answer

19 interlocked fields: full name, age, nationality, country of residence, city, net worth, primary wealth source, occupation, industry, investment style, risk tolerance, portfolio composition (6 asset classes), philanthropy status, political exposure, and family office flag.

Question 12

What fields are in KYC/AML Enhanced profiles?

Accepted Answer

29 fields covering the UHNWI base plus: document type, document number, document country, issue date, expiry date, PEP status, PEP category, sanctions flag, source of funds, source of wealth, and risk score. Designed for CDD and EDD workflows.

Question 13

What does "Math First, AI Second" mean?

Accepted Answer

Every profile starts with a mathematical foundation — statistically validated distributions and constraints that ensure financial realism. Only after the numbers are locked does an AI layer add cultural depth: names, narratives, and contextual details. Math ensures accuracy. AI adds realism. The two phases are strictly sequential and independently auditable.

Question 14

What is FORGE Mode?

Accepted Answer

FORGE Mode is a zero-AI pipeline configuration that generates profiles using only mathematical rules — no language model involved at any stage. Every field is fully deterministic and auditable. Designed for organizations that require complete transparency with no AI involvement in data generation.

Question 15

What LLM do you use?

Accepted Answer

A locally hosted large language model running on dedicated hardware. No record ever touches the internet. No API calls to external providers. The model runs fully offline, ensuring complete data isolation.

Question 16

What is the Certificate of Sovereign Origin?

Accepted Answer

A document included with every dataset that certifies the data was generated from zero using the Sovereign Forger pipeline. It documents the generation methodology, version number, statistical parameters, and confirms zero lineage to real individuals. Useful for audit trails and regulatory documentation.

Question 17

How many archetypes exist?

Accepted Answer

Dozens of distinct wealth archetypes spanning the six geographic niches. Each archetype defines a coherent persona with realistic wealth sources, asset allocations, geographic patterns, and cultural markers — ensuring every profile reflects a plausible individual, not a random combination of fields.

Question 18

Who is behind Sovereign Forger?

Accepted Answer

Sovereign Forger is a product of Signal Flow LLC, registered in New Mexico, USA. The company specializes in synthetic data engineering for financial compliance, with a focus on privacy-by-construction methodologies.

Question 19

Do you have case studies or client references?

Accepted Answer

We treat our clients' data strategies as confidential — because they are. The organizations that use synthetic UHNWI data for AI training and compliance modeling consider their data sources a competitive advantage. We respect that. No case studies. No logo walls. No "as seen in" banners. The best way to evaluate quality is the free 100-record sample — download it, run it through your pipeline, and judge the output yourself.

Question 20

How much do UHNWI datasets cost?

Accepted Answer

Three tiers: Essential (1,000 records) at $499, Warehouse (10,000 records) at $2,499, and Enterprise (100,000 records) at $12,500. Each tier is available for any of the six geographic niches.

Question 21

How much do KYC/AML datasets cost?

Accepted Answer

Three tiers: Compliance Starter (1,000 records) at $999, Compliance Pro (10,000 records) at $4,999, and Enterprise (100,000 records) at $24,999. Each tier includes all 29 fields plus the Certificate of Sovereign Origin.

Question 22

Why is KYC/AML data more expensive than UHNWI?

Accepted Answer

KYC/AML profiles contain 29 fields versus 19 for UHNWI, including sensitive compliance fields: document validation data, PEP status, sanctions flags, source of funds, and risk scores. The additional fields require more complex generation logic and validation.

Question 23

Can I try before buying?

Accepted Answer

Yes. Download 100 free UHNWI records from any niche — no registration, no credit card, no email required. KYC/AML free samples (100 records) require an email address. Both include the Certificate of Sovereign Origin. Start with the free GDPR Risk Assessment to understand your exposure, then validate data quality with the sample.

Question 24

What payment methods do you accept?

Accepted Answer

All major credit cards via Stripe. Enterprise clients can request invoice-based payment for orders above $10,000.

Question 25

Do you offer bulk discounts?

Accepted Answer

Enterprise tier pricing already reflects volume discounts. For orders exceeding 100,000 records or multi-niche packages, contact us for custom pricing.

Question 26

Is there a subscription model?

Accepted Answer

Currently, all datasets are one-time purchases. A subscription model for monthly updated datasets is on the roadmap for organizations requiring fresh data for ongoing testing cycles.

Question 27

What is your refund policy?

Accepted Answer

Because digital data products cannot be "returned," we offer the free sample specifically so you can validate quality before purchasing. If a dataset has a demonstrable technical defect, we will replace it at no cost.

Question 28

Can I use the data commercially?

Accepted Answer

Yes. All purchased datasets include a commercial license for internal use: testing, AI training, compliance validation, and software development. Redistribution or resale of the raw data is not permitted.

Question 29

What about enterprise or multi-team licensing?

Accepted Answer

Enterprise clients can purchase organization-wide licenses that allow usage across multiple teams and departments. Volume pricing, custom configurations, and dedicated support are available for orders above $10,000. All enterprise conversations are confidential. Contact us through the website form to discuss your requirements.

Question 30

Can I request custom fields or custom niches?

Accepted Answer

Yes. We build custom configurations regularly — additional fields, new geographic niches, specific distribution profiles. The details of what we customize and for whom remain between us and the client. Contact us through the website form.

Question 31

What is the GDPR Risk Assessment tool?

Accepted Answer

A free interactive tool that scores your organization's GDPR compliance risk across 10 dimensions. You receive a risk score, estimated fine exposure, and a downloadable PDF report. No registration required to use it; email required only for the PDF.

Question 32

Does synthetic data fall under GDPR?

Accepted Answer

Born-synthetic data does not constitute personal data under GDPR because it is not derived from identified or identifiable natural persons. Recital 26 of GDPR explicitly states that data protection principles do not apply to anonymous information — including information that does not relate to any natural person.

Question 33

What is GDPR Article 25 and how does it apply?

Accepted Answer

Article 25 requires "data protection by design and by default." Born-synthetic data satisfies this requirement by construction: there is no personal data to protect because none was used as input. This is the strongest possible implementation of privacy by design.

Question 34

What is the re-identification risk of born-synthetic data?

Accepted Answer

Zero. Born-synthetic data cannot be re-identified because no real person exists behind any record. There is no "original" to match against. Research shows 99.98% of individuals can be re-identified in anonymized datasets — born-synthetic eliminates this risk entirely.

Question 35

What does EU AI Act Article 10 require for training data?

Accepted Answer

Article 10 mandates that training data for high-risk AI systems must meet quality criteria including relevance, representativeness, and freedom from errors. Born-synthetic data provides complete documentation of origin, full control over statistical properties, and zero privacy risk — satisfying Article 10's governance requirements. Enforcement begins August 2026.

Question 36

How does DORA relate to synthetic data?

Accepted Answer

DORA Articles 24-25 require financial entities to conduct resilience testing including threat-led penetration testing (TLPT). Synthetic data enables comprehensive stress testing scenarios — market crashes, mass defaults, liquidity crises — without exposing real customer data. The ECB has explicitly endorsed synthetic data for stress testing.

Question 37

How does synthetic data help with PCI DSS 4.0?

Accepted Answer

PCI DSS 4.0 Requirement 6.5.4 explicitly prohibits the use of real PANs in test environments. Synthetic payment data provides realistic transaction patterns and document numbers that pass format validation without corresponding to any real account — eliminating real PANs from test environments entirely.

Question 38

Which regulations does synthetic data help with?

Accepted Answer

GDPR (data protection by design), EU AI Act (training data governance), DORA (resilience testing), PCI DSS 4.0 (no real PANs in testing), CCPA (California), LGPD (Brazil), PDPA (Singapore), and any regulation that restricts use of personal data in non-production environments.

Question 39

What is "compliant by construction"?

Accepted Answer

It means compliance is built into the data generation process rather than applied after the fact. Born-synthetic data does not need to be anonymized, masked, or scrubbed because it was never personal data. Compliance is not a feature — it is the architecture.

Question 40

How do I document synthetic data usage for auditors?

Accepted Answer

The Certificate of Sovereign Origin provides generation methodology documentation. Combined with internal records of how the data was used (testing scenarios, model training logs), this creates a complete audit trail from data origin to application.

Question 41

Can I transfer synthetic data across borders?

Accepted Answer

Yes. GDPR cross-border transfer restrictions (Chapter V) apply to personal data. Born-synthetic data is not personal data, so no Standard Contractual Clauses, no adequacy decisions, and no Binding Corporate Rules are needed.

Question 42

What happens if synthetic data is breached?

Accepted Answer

Nothing. A breach of synthetic data exposes no personal information, triggers no notification requirements under GDPR Article 33, and creates no liability. This is one of the fundamental advantages of born-synthetic data in test environments.

Question 43

What is the GDPR fine exposure for using real data in testing?

Accepted Answer

Maximum GDPR fines are 4% of global annual revenue or €20 million, whichever is higher. EU AI Act fines reach €35 million or 7% of revenue. PCI DSS non-compliance can cost $5,000–$100,000 per month plus loss of card processing ability. Born-synthetic data eliminates all of these risk categories.

Question 44

What SLA or quality guarantee do you offer?

Accepted Answer

Every dataset passes the DIAMOND Standard audit — our proprietary zero-tolerance quality framework. If a dataset contains a demonstrable defect, we replace it at no cost. The free sample lets you validate quality in your environment before any purchase.

Question 45

What statistical distribution do you use for wealth?

Accepted Answer

We use empirically validated distributions that correctly model extreme wealth concentration — the kind of right-tail behavior where UHNWI wealth actually lives. Most synthetic data tools default to bell curves that produce unrealistic, evenly spread results. Our approach matches real-world wealth patterns.

Question 46

How do you ensure field correlations are realistic?

Accepted Answer

Every profile is validated against a set of mathematical constraints that enforce realistic relationships between fields. A 28-year-old cannot have 40 years of investment history. A $500M net worth cannot be 90% allocated to savings accounts. Age, wealth, occupation, industry, geography, and asset allocation must be internally consistent — and they are, across every single record.

Question 47

What is the DIAMOND Standard audit?

Accepted Answer

DIAMOND is our internal quality standard — a multi-dimensional validation framework that every record must pass before it reaches a customer. The current production run passed with zero errors across 666,000 records. The details of what DIAMOND checks are proprietary, but the result is simple: every field in every record is statistically valid and internally consistent.

Question 48

What file format are the datasets?

Accepted Answer

All datasets are delivered as CSV files with UTF-8 encoding. Enterprise clients can request JSON, Parquet, or custom formats.

Question 49

How fast is delivery?

Accepted Answer

Immediate. All datasets are pre-generated and available for instant download after purchase. No generation queue, no waiting period.

Question 50

Can I integrate Sovereign Forger data into my existing pipeline?

Accepted Answer

Yes. CSV files can be imported into any data pipeline, ETL tool, or database. The data structure is documented with full schema definitions, making it compatible with tools like Apache Spark, Pandas, dbt, and any SQL database.

Question 51

Is the data compatible with Temenos, Fenergo, or ComplyAdvantage?

Accepted Answer

The CSV format with standardized field names (nationality as ISO 3166, document types as standard codes, risk scores as numeric values) is compatible with any system that accepts structured data imports. Field mapping to platform-specific schemas is straightforward — the 100-record free sample lets you test integration before purchasing.

Question 52

How do you handle cultural naming conventions?

Accepted Answer

Each geographic niche uses culturally appropriate naming patterns. A Middle Eastern sovereign family profile will have authentic Arabic naming structures, while a European dynasty profile will carry the correct nobility conventions. This cultural layer is one of the reasons our data passes human review — not just automated validation.

Question 53

Can I combine multiple niches?

Accepted Answer

Yes. Each niche is delivered as a separate CSV with identical column structures. Concatenating them is trivial. Enterprise clients can request pre-merged multi-niche datasets.

Question 54

How is Sovereign Forger different from Mostly AI?

Accepted Answer

Mostly AI requires your real data as input and learns from it to generate synthetic copies. Sovereign Forger requires no input data — it generates from zero. This means zero data transfer risk, zero re-identification risk, and no need to share sensitive data with a third-party platform.

Question 55

How is Sovereign Forger different from Tonic.ai?

Accepted Answer

Tonic focuses on database subsetting and masking — it takes your production database and creates a reduced, masked copy. Sovereign Forger generates entirely new data with no connection to any existing database. Different approach, different risk profile.

Question 56

How is Sovereign Forger different from Gretel.ai?

Accepted Answer

Gretel uses deep learning models trained on your data to generate synthetic copies. Sovereign Forger uses proprietary mathematical methods with no training data required. Gretel needs your data; Sovereign Forger needs nothing.

Question 57

Is Sovereign Forger a platform or a product?

Accepted Answer

Sovereign Forger is a data product, not a SaaS platform. You buy datasets, not subscriptions to software. This means no vendor lock-in, no ongoing SaaS costs, and no integration complexity. No competitor in the synthetic data space offers this model.

Question 58

Do competitors offer UHNWI-specific data?

Accepted Answer

No major synthetic data vendor specializes in UHNWI wealth profiles. Most generate generic retail banking data. Sovereign Forger is the only provider offering culturally nuanced, Pareto-distributed ultra-high-net-worth profiles across six geographic niches.

Question 59

Where is the data generated?

Accepted Answer

All data is generated offline on dedicated hardware. No cloud services, no API calls to external providers, no data leaves the generation environment. The entire pipeline — including the language model — runs locally.

Question 60

Do you store customer data?

Accepted Answer

The minimum required for order fulfillment: email address and payment confirmation via Stripe. We do not store payment card details — those are handled entirely by Stripe. We do not access, store, or process our customers' production data. Your purchase history, your use case, and your identity stay with us and go nowhere else.

Question 61

Do you have an API?

Accepted Answer

Not currently. Datasets are delivered as downloadable files. An API for on-demand generation is on the product roadmap.

Frequently Asked Questions

What We Build and Why

What It Costs and How to Start

Regulation Without the Risk

Under the Hood

How We Compare. How We Protect.

Ready to Eliminate Your Test Data Risk?

Frequently Asked Questions

What We Build and Why

What is synthetic data?+

What does “born-synthetic” mean?+

How is born-synthetic data different from anonymized data?+

What is the difference between synthetic data and fake data?+

What is the difference between synthetic data and test data?+

Is synthetic data accurate enough for AI training?+

Who uses synthetic data?+

What does Sovereign Forger sell?+

What are the six geographic niches?+

How many records are available?+

What fields are in UHNWI profiles?+

What fields are in KYC/AML Enhanced profiles?+

What does “Math First, AI Second” mean?+

What is FORGE Mode?+

What LLM do you use?+

What is the Certificate of Sovereign Origin?+

How many archetypes exist?+

Who is behind Sovereign Forger?+

Do you have case studies or client references?+

What It Costs and How to Start

How much do UHNWI datasets cost?+

How much do KYC/AML datasets cost?+

Why is KYC/AML data more expensive than UHNWI?+

Can I try before buying?+

What payment methods do you accept?+

Do you offer bulk discounts?+

Is there a subscription model?+

What is your refund policy?+

Can I use the data commercially?+

What about enterprise or multi-team licensing?+

Can I request custom fields or custom niches?+

What is the GDPR Risk Assessment tool?+

Regulation Without the Risk

Does synthetic data fall under GDPR?+

What is GDPR Article 25 and how does it apply?+

What is the re-identification risk of born-synthetic data?+

What does EU AI Act Article 10 require for training data?+

How does DORA relate to synthetic data?+

How does synthetic data help with PCI DSS 4.0?+

Which regulations does synthetic data help with?+

What is “compliant by construction”?+

How do I document synthetic data usage for auditors?+

Can I transfer synthetic data across borders?+

What happens if synthetic data is breached?+

What is the GDPR fine exposure for using real data in testing?+

What SLA or quality guarantee do you offer?+

Under the Hood

What statistical distribution do you use for wealth?+

How do you ensure field correlations are realistic?+

What is the DIAMOND Standard audit?+

What file format are the datasets?+

How fast is delivery?+

Can I integrate Sovereign Forger data into my existing pipeline?+

Is the data compatible with Temenos, Fenergo, or ComplyAdvantage?+

How do you handle cultural naming conventions?+

Can I combine multiple niches?+

How We Compare. How We Protect.

How is Sovereign Forger different from Mostly AI?+

How is Sovereign Forger different from Tonic.ai?+

How is Sovereign Forger different from Gretel.ai?+

Is Sovereign Forger a platform or a product?+

Do competitors offer UHNWI-specific data?+

Where is the data generated?+

Do you store customer data?+

Do you have an API?+

Ready to Eliminate Your Test Data Risk?