There are two types of tokenization: reversible and irreversible.
Reversible tokens can be detokenized – converted back to their original values. In privacy terminology, this is called pseudonymization. Such tokens may be further subdivided into cryptographic and non-cryptographic, although this distinction is artificial, since any tokenization really is a form of encryption.
Cryptographic tokenization generates tokens using strong cryptography; the cleartext data element(s) are not stored anywhere – just the cryptographic key. NIST-standard FF1-mode AES is an example of cryptographic tokenization.
Non-cryptographic tokenization originally meant that tokens were created by randomly generating a value and storing the cleartext and corresponding token in a database, like the original TrustCommerce offering. This approach is conceptually simple, but means that any tokenization or detokenization request must make a server request, adding overhead, complexity, and risk. It also does not scale well. Consider a request to tokenize a value: the server must first perform a database lookup to see if it already has a token for that value. If it does, it returns that. If not, it must generate a new random value, then do another database lookup to make sure that value has not already been assigned for a different cleartext. If it has, it must generate another random value, check that one, and so forth. As the number of tokens created grows, the time required for these database lookups increases; worse, the likelihood of such collisions grows exponentially. Such implementations also typically use multiple token servers, for load-balancing, reliability, and failover. These must perform real-time database synchronization to ensure reliability and consistency, adding further complexity and overhead.
Modern non-cryptographic tokenization focuses on “stateless” or “vaultless” approaches, using randomly generated metadata that is securely combined to build tokens. Such systems can operate disconnected from each other, and scale essentially infinitely since they require no synchronization beyond copying of the original metadata, unlike database-backed tokenization.
Irreversible tokens cannot be converted back to their original values. In privacy terminology, this is called anonymization. Such tokens are created through a one-way function, allowing use of anonymized data elements for third-party analytics, production data in lower environments, etc.