Tech topics

What is Tokenization?

Overview

Tokenization is a process by which PANs, PHI, PII, and other sensitive data elements are replaced by surrogate values, or tokens. Tokenization is really a form of encryption, but the two terms are typically used differently. Encryption usually means encoding human-readable data into incomprehensible text that is only decoded with the right decryption key, while tokenization (or “masking”, or “obfuscation”) means some form of format-preserving data protection: converting sensitive values into non-sensitive, replacement values – tokens – the same length and format as the original data.

Tokens share some characteristics with the original data elements, such as character set, length, etc.
Each data element is mapped to a unique token.
Tokens are deterministic: repeatedly generating a token for a given value yields the same token.
A tokenized database can be searched by tokenizing the query terms and searching for those.

As a form of encryption, tokenization is a key data privacy protection strategy for any business. This page provides a very high-level view of what tokenization is and how it works.

Encryption vs. tokenization vs. whatever: What you need to know

Learn the difference between encryption, tokenization, obfuscation, masking and other terms. Even when the terms are used precisely, people often misunderstand the differences between them. And those differences matter.

Learn more

Tokenization

Where did tokenization come from?

Digital tokenization was first created by TrustCommerce in 2001 to help a client protect customer credit card information. Merchants were storing cardholder data on their own servers, which meant that anyone who had access to their servers could potentially view or take advantage of those customer credit card numbers.

TrustCommerce developed a system that replaced primary account numbers (PANs) with a randomized number called a token. This allowed merchants to store and reference tokens when accepting payments. TrustCommerce converted the tokens back to PANs and processed the payments using the original PANs. This isolated the risk to TrustCommerce, since merchants no longer had any actual PANs stored in their systems.

As security concerns and regulatory requirements grew, such first-generation tokenization proved the technology’s value, and other vendors offered similar solutions. However, problems with this approach soon became clear, as discussed below

What types of tokenization are available?

There are two types of tokenization: reversible and irreversible.

Reversible tokens can be detokenized – converted back to their original values. In privacy terminology, this is called pseudonymization. Such tokens may be further subdivided into cryptographic and non-cryptographic, although this distinction is artificial, since any tokenization really is a form of encryption.

Cryptographic tokenization generates tokens using strong cryptography; the cleartext data element(s) are not stored anywhere – just the cryptographic key. NIST-standard FF1-mode AES is an example of cryptographic tokenization.

Non-cryptographic tokenization originally meant that tokens were created by randomly generating a value and storing the cleartext and corresponding token in a database, like the original TrustCommerce offering. This approach is conceptually simple, but means that any tokenization or detokenization request must make a server request, adding overhead, complexity, and risk. It also does not scale well. Consider a request to tokenize a value: the server must first perform a database lookup to see if it already has a token for that value. If it does, it returns that. If not, it must generate a new random value, then do another database lookup to make sure that value has not already been assigned for a different cleartext. If it has, it must generate another random value, check that one, and so forth. As the number of tokens created grows, the time required for these database lookups increases; worse, the likelihood of such collisions grows exponentially. Such implementations also typically use multiple token servers, for load-balancing, reliability, and failover. These must perform real-time database synchronization to ensure reliability and consistency, adding further complexity and overhead.

Modern non-cryptographic tokenization focuses on “stateless” or “vaultless” approaches, using randomly generated metadata that is securely combined to build tokens. Such systems can operate disconnected from each other, and scale essentially infinitely since they require no synchronization beyond copying of the original metadata, unlike database-backed tokenization.

Irreversible tokens cannot be converted back to their original values. In privacy terminology, this is called anonymization. Such tokens are created through a one-way function, allowing use of anonymized data elements for third-party analytics, production data in lower environments, etc.

Tokenization benefits

Tokenization requires minimal changes to add strong data protection to existing applications. Traditional encryption solutions enlarge the data, requiring significant changes to database and program data schema, as well as additional storage. It also means that protected fields fail any validation checks, requiring further code analysis and updates. Tokens use the same data formats, require no additional storage, and can pass validation checks.

As applications share data, tokenization is also much easier to add than encryption, since data exchange processes are unchanged. In fact, many intermediate data uses – between ingestion and final disposition – can typically use the token without ever having to detokenize it. This improves security, enabling protecting the data as soon as possible on acquisition and keeping it protected throughout the majority of its lifecycle.

Within the limits of security requirements, tokens can retain partial cleartext values, such as the leading and trailing digits of a credit card number. This allows required functions—such as card routing and “last four” verification or printing on customer receipts—to be performed using the token, without having to convert it back to the actual value.

This ability to directly use tokens improves both performance and security: performance, because there is no overhead when no detokenization is required; and security, because since the cleartext is never recovered, there is less attack surface available.

What is tokenization used for?

Tokenization is used to secure many different types of sensitive data, including:

payment card data
U.S. Social Security numbers and other national identification numbers
telephone numbers
passport numbers
driver’s license numbers
email addresses
bank account numbers
names, addresses, birth dates

As data breaches rise and data security becomes increasingly important, organizations find tokenization appealing because it is easier to add to existing applications than traditional encryption.

PCI DSS compliance

Safeguarding payment card data is one of the most common use cases for tokenization, in part because of routing requirements for different card types as well as “last four” validation of card numbers. Tokenization for card data got an early boost due to requirements set by the Payment Card Industry Security Standards Council (PCI SSC). The Payment Card Industry Data Security Standard (PCI DSS) requires businesses that deal with payment card data to ensure compliance with strict cybersecurity requirements. While securing payment card data with encryption is allowed per PCI DSS, merchants may also use tokenization to meet compliance standards. Since payments data flows are complex, high performance, and well defined, tokenization is much easier to add than encryption.

Secure sensitive data with tokenization

Tokenization is becoming an increasingly popular way to protect data, and can play a vital role in a data privacy protection solution. OpenText™ Cybersecurity is here to help secure sensitive business data using Voltage SecureData by OpenText™, which provides a variety of tokenization methods to fit every need.

Voltage SecureData and other cyber resilience solutions can augment human intelligence with artificial intelligence to strengthen any enterprise’s data security posture. Not only does this provide intelligent encryption and a smarter authentication process, but it enables easy detection of new and unknown threats through contextual threat insights.

Resources

Related products

Voltage SecureData Enterprise by OpenText™

Voltage encryption delivers data privacy protection, neutralizes data breach, and drives business value through secure data use.

Voltage SecureData Payments by OpenText™

Protect credit card data in retail point-of-sale, web, and mobile ecommerce environments to reduce audit costs, neutralize data breach, and build brand value.

See all related products

What is Tokenization?

Overview

Encryption vs. tokenization vs. whatever: What you need to know

Tokenization

Where did tokenization come from?

What types of tokenization are available?

Tokenization benefits

What is tokenization used for?

Secure sensitive data with tokenization

Resources

What is Data-Centric Audit and Protection (DCAP)?

What is Key Management?

What is Data Security?

What is a CASB (Cloud Access Security Broker)?

Multi-national retail organization

Cloud data security white paper

Voltage securedata for snowflake data sheet

OpenText announces voltage securedata integration with snowflake

Voltage securedata on the azure marketplace

Voltage securedata cloud white paper

Voltage securedata for snowflake - solution overview video

Snowflake + voltage securedata – the solution to secure cloud analytics

Data security offerings for government

OpenText announces voltage securedata services, delivering its patented, privacy

Related products

Voltage SecureData Enterprise by OpenText™

Voltage SecureData Payments by OpenText™

How can we help?

opentext.aiopentext.ai

Enterprise ApplicationsEnterprise Applications

IndustryIndustry

Line of BusinessLine of Business

Smarter with OpenTextSmarter with OpenText

Information management at scaleInformation management at scale

AI CloudAI Cloud

Application ModernizationApplication Modernization

Business Network CloudBusiness Network Cloud

Content CloudContent Cloud

Cybersecurity CloudCybersecurity Cloud

Developer CloudDeveloper Cloud

DevOps CloudDevOps Cloud

Experience CloudExperience Cloud

IT Operations CloudIT Operations Cloud

PortfolioPortfolio

Your journey to successYour journey to success

Customer SupportCustomer Support

Customer Success ServicesCustomer Success Services

Strategy & Advisory ServicesStrategy & Advisory Services

Consulting ServicesConsulting Services

Learning ServicesLearning Services

Managed ServicesManaged Services

Find an OpenText PartnerFind an OpenText Partner

Find a Partner SolutionFind a Partner Solution

Grow as a PartnerGrow as a Partner

Become a Partner

Asset LibraryAsset Library

BlogsBlogs

EventsEvents

CommunitiesCommunities

Customer StoriesCustomer Stories

OpenText NavigatorOpenText Navigator

Encryption vs. tokenization vs. whatever: What you need to know

Where did tokenization come from?

What types of tokenization are available?

Tokenization benefits

What is tokenization used for?

Secure sensitive data with tokenization

Footnotes

opentext.ai

Enterprise Applications

Industry

Line of Business

Smarter with OpenText

Information management at scale

AI Cloud

Application Modernization

Business Network Cloud

Content Cloud

Cybersecurity Cloud

Developer Cloud

DevOps Cloud

Experience Cloud

IT Operations Cloud

Portfolio

Your journey to success

Customer Support

Customer Success Services

Strategy & Advisory Services

Consulting Services

Learning Services

Managed Services

Find an OpenText Partner

Find a Partner Solution

Grow as a Partner

Asset Library

Blogs

Events

Communities

Customer Stories

OpenText Navigator