Data Team

Payrails Data Catalog

A searchable directory of every dataset, table, and field across Payrails — who owns it, what it means, and how it connects to the rest of our data.

Open Data Catalog Read the FAQ ↓

Catalog is live and up to date

Updated on every production release

Maintained by the Payrails Data Team

FAQ

Common Questions

Everything you need to navigate and understand our data.

What is the Data Catalog and why does it exist? +

The Data Catalog is a searchable directory of every dataset, table, and field that the Payrails Data Team produces and maintains. Think of it like a map of our data warehouse — it tells you what data exists, what each field means, and how datasets relate to each other.

It exists so that anyone — from a data analyst to a product manager — can understand our data without needing to ask an engineer. Instead of guessing what a column called transaction_status means, you can look it up here.

How do I find a specific table or dataset? +

Once you open the catalog, use the search bar at the top — you can search by table name, column name, or keyword (e.g. "payment", "merchant", "refund").

On the left sidebar, you can also browse by category or data source (e.g. Checkout, Adyen, Braintree). Tables in the Core and Curated sections are the most commonly used starting points for business reporting.

What is the difference between a Source and a Model? +

Sources are raw data that arrive directly from external systems — for example, transaction records sent by a payment processor like Checkout.com or Adyen. This data is unprocessed and may be incomplete or need cleaning.

Models are datasets that the Data Team has built by transforming that raw data — cleaning, joining, calculating, and restructuring it to make it accurate and easy to use. When in doubt, use models (not sources) for analysis.

What are the different data layers — dl_, intermediate, core, curated? +

Our data goes through several stages before it is ready for reporting:

• dl_ (Data Layer / Sources) — Raw ingested data from external providers (Adyen, Checkout, Braintree, etc.). Do not use these directly unless you know what you are doing.
• Intermediate — Partially cleaned and joined data, used as building blocks. Not intended for direct consumption.
• Core — Clean, standardized, production-ready datasets. This is usually the right place to start for reporting and analysis.
• Curated — Purpose-built datasets assembled for specific reporting needs (e.g. financial reconciliation, merchant dashboards).

How do I know if a dataset is reliable and production-ready? +

In the catalog, each table shows whether automated data quality tests are passing or failing. A green status means the table has been validated and its data quality checks are passing.

As a general rule:
• Tables in the Core and Curated layers are actively maintained and production-grade.
• Tables in dl_ (raw sources) and Intermediate layers are internal building blocks — they may not be complete or documented.

If you are unsure whether a table is safe to use for a report or decision, ask the Data Team.

How do I understand what a column or field means? +

Click on any table name in the catalog to open its detail page. You will see a full list of columns with their name, data type, and description.

If a column description is missing, it means the Data Team has not yet documented that field. Please flag it to us in #data-help on Slack — we appreciate the feedback and will prioritize adding it.

How often is the data refreshed? Is the catalog always up to date? +

Most production datasets in the Core and Curated layers are refreshed at least once daily. Some high-priority tables are refreshed more frequently.

The Data Catalog itself (this directory) is rebuilt and redeployed automatically each time a new production release goes out. This means the structure and documentation always reflect the latest version of our data models.

If you suspect the data in a dashboard or report is stale, check the updated_at or dbt_updated_at field on the relevant table.

What do the test statuses (pass / fail / error) mean? +

Each table in the catalog can have automated data quality tests — checks that verify things like: "no duplicate IDs", "this column is never null", "values are within an expected range."

• Pass — All checks for this table are succeeding. The data meets its quality contract.
• Fail — One or more checks have failed. The Data Team is usually already aware and investigating.
• No tests — This table has not yet been covered by automated tests. Treat with extra caution.

Failing tests do not always mean the data is wrong — sometimes they reflect a known upstream issue or a test threshold that needs adjustment. If you are concerned about a failure, reach out to the Data Team.

How do I request new data, a new dataset, or a change to an existing one? +

The best way to request data work is to reach out to the Data Team directly:

• Slack — Post in #data-help with a description of what you need and why.
• Jira / Ticket — If you have access to our project board, create a ticket with your use case.

When making a request, it helps to include: what business question you are trying to answer, which teams or metrics are involved, and how urgently it is needed. The more context you provide, the faster we can scope and prioritise the work.

Who maintains this catalog and who do I contact for help? +

The catalog is maintained by the Payrails Data Team. Individual tables are owned by the person or team listed in the Owner field on each table's detail page.

For general questions, data requests, or to report something that looks wrong, reach out on Slack in #data-help. We are happy to help you find and understand the data you need.