Hiya! ๐ It's been a while since I wrote posts on my blog, and here I'm with a new problem that I faced recently. Stay tuned, as I also intend to write about a few other interesting challenges I faced recently, in the upcoming blog posts.
Before going any further with the subject of discussion, I would like to announce ๐ข that I've been recognized as an IBM Champion ๐ฅ for the 4th consecutive time in the IBM Z and LinuxONE expertise. I came to know about this program in the year 2021 after listening to a podcast of Subhasish Sarkar where he talked about the program and have been an advocate ever since. The more you contribute to the community that uses IBM products, such as IBM Z/Mainframe, the more likely you are to become an IBM Champion โญ. My main source of contributions to the IBM Z community is through the articles that I create on this blog and through my acts of advocacy on LinkedIn. Last year, I wrote a series titled "Printing Shapes using COBOL" on a new blog that I created on Hashnode. If you haven't checked that out, click ๐ here. You'll learn to print some interesting shapes using COBOL.
Alright! Let's continue with the topic in question: Fixing SQLCODE = -171 in Db2.
Intro
Database refresh is the process of extracting data from the tables in the production environment and loading it into the corresponding tables in the lower environment. Since Db2 tables in the lower environment are often subject to unintended modifications by developers, a refresh is usually performed to restore a clean state before testing critical changes.
Problem statement
Let's consider a scenario where we need to obfuscate a column storing bank account numbers in a table say, SAMPLE_TABLE.
Remember that there is a difference between masking data and obfuscating data. Masked data is typically used for display purposes (e.g., showing only the last four digits of a bank account number: XXXXXX1234
). It is often reversible, meaning authorized users can see the full data. On the other hand, obfuscated data means permanently transforming where the original value cannot be derived (e.g., replacing the values with randomized numbers). Obfuscated data ensures that even if someone accesses the obfuscated data, it cannot be linked back to the original value.
Here ๐ is a sample SQL statement used for this purpose:
UPDATE SAMPLE_TABLE
SET BANK_ACCOUNT_NUMBER = RIGHT(STRIP(DIGITS(CUSTOMER_ID * 1234),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 2345),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 3456),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 4567),L,'0'),
LENGTH(STRIP(BANK_ACCOUNT_NUMBER)))
WHERE BANK_ACCOUNT_NUMBER <> ' ';
COMMIT;
There are two columns involved in this SQL: BANK_ACCOUNT_NUMBER, a column defined as CHARACTER(35) and CUSTOMER_ID defined as INTEGER.Fact ๐ The maximum length of a bank account number worldwide, when considering the International Bank Account Number (IBAN), can be up to 34 alphanumeric characters.
Logic behind the SQL
- The obfuscated bank account number is generated by concatenating numeric transformations of
CUSTOMER_ID
- The
RIGHT()
function extracts only the required number of digits, ensuring the new obfuscated value matches the original length.
The error: SQLCODE = -171
SQLCODE = -171, SQLSTATE = 42815
THE RIGHT FUNCTION HAS AN ARGUMENT THAT IS NOT VALID
When does this happen?
Letโs assume CUSTOMER_ID = 99.
The first argument of RIGHT()
is computed as:
122166 || 232155 || 342144 || 452133 = 24 digits
Assume the BANK_ACCOUNT_NUMBER
for this customer as 100000000000000000000145678
(27 digits).
Now, RIGHT(โ122166232155342144452133โ, 27)
fails because the string only has 24 characters, but weโre trying to extract 27 characters from the right side.
The fix: Ensuring a valid argument for RIGHT()
To prevent this error, we need to ensure that the second argument of RIGHT() never exceeds the length of the first argument. We can achieve this using the LEAST() function: UPDATE SAMPLE_TABLE
SET BANK_ACCOUNT_NUMBER = RIGHT(
STRIP(DIGITS(CUSTOMER_ID * 1234),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 2345),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 3456),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 4567),L,'0'),
LEAST(LENGTH(STRIP(BANK_ACCOUNT_NUMBER)),
LENGTH(STRIP(DIGITS(CUSTOMER_ID * 1234),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 2345),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 3456),L,'0') ||
STRIP(DIGITS(CUSTOMER_ID * 4567),L,'0'))))
WHERE BANK_ACCOUNT_NUMBER <> ' ';
How does this fix work?
LEAST(LENGTH(STRIP(BANK_ACCOUNT_NUMBER)), โฆ) ensures that we donโt attempt to extract more characters than available. If the length ofBANK_ACCOUNT_NUMBER
is greater than the generated string, we only take whatโs available.