Sunday, January 31, 2021

How to convert rows to columns in a dataset using IBM DFSORT?

Welcome! In this blog post I am writing  about another scenario. Are you ready?

One of the site that I worked for dealt with Invoices, approvals and payments. There was a workflow which usually began from the point where the system received invoices in bulk. The system will then review and approve the supplier invoices. Upon approval, the supplier was paid. 

The application was geographically spread out in 3 regions of the globe and the workflow was common across the regions.

There was a requirement to create a report capturing the monthly statistics on invoices, approvals and payments across the 3 regions and send it over email to appropriate stakeholders. 

Invoices, approvals and payments' data were stored in different DB2 tables. Hence, 3 SQL SELECT queries were combined using UNION ALL to pull the necessary details. The output from SQL had the following data:

INVOICES|+00099999
APPROVALS|+00012345
PAYMENTS|+00003456

It was a pipe delimited file, so if you imported the file to Microsoft Excel with pipe delimiter, you'll have 2 columns and 3 rows as shown below.



Since we had to pull this data for 3 regions, there were 3 output datasets with the same record structure.

The next step was to create a report out of these files. The expected final output was,

The final report. Rows are converted to columns. There is an header and trailer record too. 

We used ICETOOL ⛄ to create the report

There are a lot of stuff which we put together with ICETOOL to accomplish the output. Let's look at them one by one. 

ICETOOL is a multipurpose DFSORT utility that uses the capabilities of DFSORT to perform multiple operations on one or more datasets in a single step.

Step 1: 


Each input dataset was a pipe delimited file. The 3 SQL output datasets were concatenated and passed as one input to the ICETOOL step. There were 2 columns (fields) in each record of the dataset. But the columns didn't start in fixed position. For example, the count for Invoices would start at 10th position whereas for Approvals, it was at 11th position. So, we had to make the columns start at fixed positions and we used the PARSE operand in DFSORT to do that. 

As we knew each output dataset from SQL will have only 3 records and they all look similar (except for the variation in count values), WHEN=GROUP in DFSORT was used to create 3 groups. Each group was assigned a unique ID and each record in the group was assigned with sequence numbers.

WHEN=GROUP is used to introduce a group of records to DFSORT.
 
Well, you may ask why we should do this? πŸ€” With sequence numbers and ID's for the group, we can manipulate/arrange the records in the way we wish.  

At this stage, the output looked like below:

 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ****** ***************************** Top of Data ******************************  
 000001 INVOICES  +000999991 1                           
 000002 APPROVALS +000123451 2                           
 000003 PAYMENTS  +000034561 3                           
 000004 INVOICES  +000000192 1                           
 000005 APPROVALS +000000162 2                           
 000006 PAYMENTS  +000000152 3                           
 000007 INVOICES  +001000503 1                           
 000008 APPROVALS +000123613 2                           
 000009 PAYMENTS  +000034713 3                           
 ****** **************************** Bottom of Data ****************************  
Following points are to be noted from the output:

1. The fields were in fixed positions nowπŸ’ͺ. 
2. There were 3 groups: each group was assigned with a unique ID in position 21. Each record in a group was assigned with sequence numbers in position 23.


Let's see how we converted the fields from rows to column?

Consider the first 3 records (first group). 
 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ****** ***************************** Top of Data ******************************  
 000001 INVOICES  +000999991 1                           
 000002 APPROVALS +000123451 2                           
 000003 PAYMENTS  +000034561 3   
Note: Ignore the string literals (the first 10 bytes) and the (+) symbol in 12th position (as the counts can't go beyond zero in our case) for the remainder of this blog post.

The count values (in position 13 thru 20) are placed row wise, one after the another. If we sum these count values, we'll get 115800. 

Right! What if we had the count values in the following way? The count values are still placed row wise, one after the other, but they are in different positions in each record. And, there are lot of zerosπŸ€”.
 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ****** ***************************** Top of Data ******************************  
 000001 00099999 00000000 00000000                         
 000002 00000000 00012345 00000000                          
 000003 00000000 00000000 00003456  
Now, if we sum each of the columns (we have got 3 columns here) we would get the following output. 
 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ****** ***************************** Top of Data ******************************  
 000001 00099999 00012345 00003456   

You just witnessed how we converted the rows to columns πŸ‘. Awesome?  All thanks goes to those zeros πŸ‘ in each record which helped us consolidate 3 records to one. Those zeros acted as placeholders

We implemented this idea using DFSORT. Instead of zeros, we inserted Binary Zeros (Hex Value - 00) to the input records in the following way: 
  • The group's first record was populated in first 8 bytes. All other positions were filled with Binary zeros. 
  • The group's second record was populated in positions 10 thru 17. All other positions were filled with Binary zeros.
  • The group's third record was populated in positions 19 thru 26. All other positions were filled with Binary zeros. 
A group's first, second and third records were identified with the help of sequence numbers that we populated for each group (in 23rd position) with the help of PUSH parameter. 
Binary zeros can be used in DFSORT as placeholders so that it can be filled with data at a later point of time.

Let's look at the code now. 

INREC PARSE=(%01=(ENDBEFR=C'|',FIXLEN=10),%02=(ENDBEFR=C'|',     
            FIXLEN=9)),BUILD=(%01,12:%02)                        
SORT FIELDS=COPY                                                 
OUTREC IFTHEN=(WHEN=GROUP,RECORDS=3,PUSH(21:ID=1,23:SEQ=1)),     
       IFTHEN=(WHEN=(23,1,CH,EQ,C'1'),BUILD=(1:13,8,18Z,70:21,1, 
              80:X)),                                            
       IFTHEN=(WHEN=(23,1,CH,EQ,C'2'),BUILD=(9Z,10:13,8,18:8Z,   
              70:21,1,80:X)),                                    
       IFTHEN=(WHEN=(23,1,CH,EQ,C'3'),BUILD=(18Z,19:13,8,70:21,1,
              80:X))                                             

What does this code do? 

INREC PARSE=(%01=(ENDBEFR=C'|',FIXLEN=10),%02=(ENDBEFR=C'|',     
            FIXLEN=9)),BUILD=(%01,12:%02)

ENDBEFR and FIXLEN parameters of PARSE operand are used to define the rules for extracting vaariable length data to %nn fixed parsed fields. ENDBEFR stops extracting data at the byte before the specified string. FIXLEN is used to specify the length of the fixed area to contain the extracted data. %01 is the fixed parsed field which holds the string literal such as INVOICES, APPROVALS and PAYMENTS. %02 is the fixed parsed field which holds the count value. The BUILD statement is used in INREC PARSE to build the record with fixed parsed fields. 

SORT FIELDS=COPY - SORT statement with COPY option.

OUTREC IFTHEN=(WHEN=GROUP,RECORDS=3,PUSH(21:ID=1,23:SEQ=1)),     
       IFTHEN=(WHEN=(23,1,CH,EQ,C'1'),BUILD=(1:13,8,18Z,70:21,1, 
              80:X)),                                            
       IFTHEN=(WHEN=(23,1,CH,EQ,C'2'),BUILD=(9Z,10:13,8,18:8Z,   
              70:21,1,80:X)),                                    
       IFTHEN=(WHEN=(23,1,CH,EQ,C'3'),BUILD=(18Z,19:13,8,70:21,1,
              80:X))  

There are 4 IFTHEN...WHEN conditions coded in OUTREC. The first condition assigns a unique ID for each set of 3 records and sequence number for each record in the group. PUSH parameter helps in doing that. A group is identified with WHEN=GROUP and RECORDS=3 parameters
 
2nd, 3rd and 4th IFTHEN...WHEN conditions reformat the input records which has got sequence numbers as 1, 2 and 3 respectively. Binary zeros are inserted in the BUILD statement. For example, 18Z means insert 18 Binary zeros. 

After running these SORT statements, we got the following output:
 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ****** ***************************** Top of Data ******************************  
 000001 00099999                                                             1   
 000002          00012345                                                    1   
 000003                   00003456                                           1   
 000004 00000019                                                             2   
 000005          00000016                                                    2   
 000006                   00000015                                           2   
 000007 00100050                                                             3   
 000008          00012361                                                    3   
 000009                   00003471                                           3   
 ****** **************************** Bottom of Data ****************************  

Turning the HEX mode ON, you'll be able to see the binary zeros (X'00') spread out over each record.

 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 000001 00099999                                                             1   
        FFFFFFFF0000000000000000004444444444444444444444444444444444444444444F44  
        000999990000000000000000000000000000000000000000000000000000000000000100  
 ------------------------------------------------------------------------------   
 000002          00012345                                                    1   
        000000000FFFFFFFF0000000044444444444444444444444444444444444444444444F44  
        000000000000123450000000000000000000000000000000000000000000000000000100  
 ------------------------------------------------------------------------------   
 000003                   00003456                                           1   
        000000000000000000FFFFFFFF4444444444444444444444444444444444444444444F44  
        000000000000000000000034560000000000000000000000000000000000000000000100  
 ------------------------------------------------------------------------------   
 000004 00000019                                                             2   
        FFFFFFFF0000000000000000004444444444444444444444444444444444444444444F44  
        000000190000000000000000000000000000000000000000000000000000000000000200  
 ------------------------------------------------------------------------------   
 000005          00000016                                                    2   
        000000000FFFFFFFF0000000044444444444444444444444444444444444444444444F44  
        000000000000000160000000000000000000000000000000000000000000000000000200  
 ------------------------------------------------------------------------------   
 000006                   00000015                                           2   
        000000000000000000FFFFFFFF4444444444444444444444444444444444444444444F44  
        000000000000000000000000150000000000000000000000000000000000000000000200  
 ------------------------------------------------------------------------------   
 000007 00100050                                                             3   
        FFFFFFFF0000000000000000004444444444444444444444444444444444444444444F44  
        001000500000000000000000000000000000000000000000000000000000000000000300  
 ------------------------------------------------------------------------------   
 000008          00012361                                                    3   
        000000000FFFFFFFF0000000044444444444444444444444444444444444444444444F44  
        000000000000123610000000000000000000000000000000000000000000000000000300  
 ------------------------------------------------------------------------------   
 000009                   00003471                                           3   
        000000000000000000FFFFFFFF4444444444444444444444444444444444444444444F44  
        000000000000000000000034710000000000000000000000000000000000000000000300  
 ------------------------------------------------------------------------------   
   
It's time πŸ•’ to sum the records to one, thereby converting the rows to column. 

Using SUM FIELDS with Binary Zeros has got a perk⭐


Hex representation of the first count value (00099999) is F0F0F0F9F9F9F9F9

Hexadecimal representation of EBCDIC displayable characters. 

The hex value of  a binary zero is 00

When the count value is added with binary zeros (in BI format), 

 F0F0F0F9F9F9F9F9
+0000000000000000
 ----------------
 F0F0F0F9F9F9F9F9 → 00099999

We get the same value πŸ˜€. 

Let's take another example. The Hex value of my name (in upper case) is E2D9C9D5C9E5C1E2C1D5

Hexadecimal representation of EBCDIC displayable characters.

When my name, a string, is added with Binary zeros (in BI format), I get my name back. 

  E2D9C9D5C9E5C1E2C1D5
+00000000000000000000
 --------------------
 E2D9C9D5C9E5C1E2C1D5 → SRINIVASAN

The takeaway is - with Binary zeros, we can even use SUM FIELDS on strings 😎.

Step 2:


We summed the count values on their Group ID. This means, 
  • The count value in the first record (in positions 1 thru 8) of each group was summed up with first 8 bytes of binary zeros in records 2 and 3 of the group. 
  • The count value in the second record (in positions 10 thru 17) of each group was summed up with 8 bytes of binary zeros (in positions 10 thru 17) in records 1 and 3 of the group. 
  • The count value in the third record (in positions 19 thru 26) of each group was summed up with 8 bytes of binary zeros (in positions 19 thru 26) in records 1 and 2 of the group. 

Let's look at the code. 

SORT FIELDS=(70,1,CH,A)                                     
SUM FIELDS=(1,8,BI,10,8,BI,19,8,BI)                         
OUTREC FIELDS=(1,8,X,10,8,X,19,8,X,C'1',71:SEQNUM,1,ZD,80:X) 

If a field is in BI format, it can be 2, 4 or 8 bytes long. Here, the count value and binary zeros are 8 bytes long, hence we SUM fields on 8 bytes of BI.

OUTREC FIELDS is used to format the summed output fields. 

We got the following output. 
 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ****** ***************************** Top of Data ******************************  
 000001 00099999 00012345 0000346 1                     1   
 000002 00000019 00000016 0000002 1                     2   
 000003 00100050 00012361 0000348 1                     3   
 ****** **************************** Bottom of Data ****************************   

If you compare this output with one that we've got after using PARSE operand, 
 =COLS> ----+----1----+----2----+----3----+----4----+----5----+----6----+----7--  
 ****** ***************************** Top of Data ******************************  
 000001 INVOICES  +000999991 1                           
 000002 APPROVALS +000123451 2                           
 000003 PAYMENTS  +000034561 3                           
 000004 INVOICES  +000000192 1                           
 000005 APPROVALS +000000162 2                           
 000006 PAYMENTS  +000000152 3                           
 000007 INVOICES  +001000503 1                           
 000008 APPROVALS +000123613 2                           
 000009 PAYMENTS  +000034713 3                           
 ****** **************************** Bottom of Data ****************************  

you'll notice that the rows are converted to columns now. 


That's it. I hope you now understand the usage of binary zeros. In the next post, I will continue with the rest of the code that we prepared to meet the requirements. 

Should you have any questions/suggestions, please use the Comments section below. Thx.  


No comments:

Post a Comment