Sunday, January 24, 2021

How to exclude a group of records using IBM DFSORT?

One of the site I worked for, dealt with Invoices. We used Electronic Data Interchange (EDI) to exchange trading information in a standard format between suppliers/customers. 

EDI eliminates a lot of manual intervention by the transmission of electronic messages between computer systems. Do not confuse the term electronic message with an e-mail message. An email message 📧 is usually not processed by the receiving system whereas, EDI messages are intended to undergo automatic processing in the receving system. 

There is a specific format in EDI to transmit invoice data and it is EDI 810. It is an electronic version of an invoice that contains the information requied for a usual paper-invoice purchase transaction. 

Use of EDI to transmit data can shorten the lead time between invoice receipt and fulfillment of orders. EDI is often referred as 'paperless trading'.

The EDI 810 documents which were received from the site's customers were translated to a proprietary file format as required by the backend application. The translated file usually had a lot of batches and within each batch there were a lot of invoices. An Invoice would have detail record(s) within it. Refer the following pictures for a better understanding on the file's structure.

Note: Click on the picture to get an enlarged view.

Translated EDI 810 file with lot of batches in it.


A batch from the translated EDI 810 file with many invoices; Each invoice with many detail records.


Each batch would have a batch header and batch trailer record. Each invoice within a batch would have an invoice header and an invoice trailer record. Between an invoice header and trailer records, would be the invoice detail records. 

The format of the file had a record type field in each record to identify whether that record is a batch header, a batch trailer, an Invoice header, an Invoice detail, an Invoice trailer and so on. 

This translated file was fed as input to a COBOL DB2 program. During the program's execution, each Invoice and its detail record(s) were fed to an internal table with maximum limit hardcoded as 99,999. Sometimes, an invoice's detail records would exceed 99,999 thereby making the program to fail with ABEND SOC4. The abend occurred so frequently that we had to implement a permanent fix. 

We decided to remove the entire batch which had invoice(s), whose detail records exceeded 99,999.

Experts suggested to write a COBOL program to exclude the batch, but I had other ideas (DFSORT 😎)

If that title sounds familiar, all thanks goes to Mc Dowell's advertisement which featured MS Dhoni 🏏.

The approach we adopted was to uniquely number each record in the following way:
  • Assign a unique ID, 9 bytes long, to each batch in the file.
  • Assign a unique ID, 9 bytes long, to each Invoice that is part of the batch
  • Assign sequence numbers, 9 bytes long, for each record that is part of the Invoice.
We used the numbered file to check if the invoice sequence number exceeded 99,999. If true, the batch ID of that invoice was written to an intermediate file.

JOINKEYS application in DFSORT was used to match the numbered file and the intermediate file to create 2 output files; BYPASS file that had the records of the batch which had to be bypassed (as the invoice detail records exceeded 99,999) and FINAL file that had the records of all other batches (whose invoice detail records doesn't exceed 99,999).

Note: For better understanding, I've created an input file which is somewhat similar to what I had at my site. I'll be using that input file in an example to show how we can exclude a group of records from the file. The ID's and sequence numbers are only 5 bytes long in the example, as opposed to the 9 bytes that we used at our site. Also, in the example we will exclude the batch which holds an invoice with detail records exceeding 20 as opposed to 99,999. 

1st byte of the input file is the Record type field. 
0 - Batch header record. 
1 - Invoice header record. 
8 -  Invoice trailer record. 
9 - Batch trailer record.
Other record types are out of scope for the problem that we are dealing.


 
The input file with 2 batches. Each batch has got an invoice. The 2 batch's invoice data records exceed 20.


Step 1: 
The first step is to assign,
  • a unique ID for each batch,
  • a unique ID for each invoice within that batch,
  • sequence numbers for all the records between an Invoice header and Invoice trailer record.
To do this, we will use DFSORT's IFTHEN WHEN=GROUP feature, with BEGIN and PUSH parameters. 

First step. Assigning unique IDs. 


INREC in DFSORT is used to format fields before sorting.

First INREC IFTHEN condition assigns a unique ID for each batch:

WHEN=GROUP - Used to introduce a group of records to DFSORT. 

BEGIN=(1,1,CH,EQ,C'0') - Each '0' in the first byte indicates the start of a new batch. 

PUSH(41:ID=5) - Assigns an identifier, 5 bytes long starting from 41st position, for each batch. The identifier is +1'd when a new batch is started. 


Second INREC IFTHEN condition assigns a unique ID for each invoice within a batch and sequence of numbers for each records that are part of the invoice. 

WHEN=GROUP - Used to introduce a group of records to DFSORT.

BEGIN=(1,1,CH,EQ,C'1') - Each '1' in the first byte indicates a new invoice within that batch.

PUSH(46:ID=5,51:SEQ=5) - Assigns an identifier, 5 bytes long starting from 46th position, for each Invoice. The identifier is +1'd when a new invoice is started. Also, assigns sequence number, 5 bytes long starting from 51st position. The sequence number will be restarted from 1 whenever a new invoice header record is encountered.

OUTREC in DFSORT is used to format fields after sorting.

The OUTREC IFTHEN conditions given at the last are used to populate zeros, in the place of Invoice ID and Invoice sequence number, in the Batch header and trailer records. 

Running the first step will give the following output. 
This is how the numbered file looks like.
Col 41 thru 45 contains the Batch ID. 
Col 46 thru 50 contains the Invoice ID. 
Col 51 thru 55 contains the Invoice Sequence number. 


Step 2: 
The second step is to find that batch ID whose invoice sequence number exceeds 20. Output file created from step 1 is passed as input to this step. It's evident from the input file that Batch 2 contains an Invoice which has got more than 20 detail records. 


INCLUDE COND=(51,5,ZD,EQ,21) - Only those records whose sequence number equals 21 is included for the processing. 

SORT FIELDS=(41,5,CH,A) - SORT the file based on the batch ID. 

SUM FIELDS=NONE - Remove duplicates. Sometimes, there can be multiple invoices within a batch with sequence number exceeding 20. In such cases, Batch ID might be written more than once. Hence, we are removing duplicates. 

OUTREC FIELDS=(1:41,5,80:X) - Write the Batch ID to an intermediate output file. 

We get the following output after running Step 2:
Batch ID whose invoice number is exceeding 20, is written to the output file.


Step 3:
The final step is to use JOINKEYS application in DFSORT to write 2 output files. One with the excluded batch and the other one with batch that should be sent as input to COBOL DB2 program. 



Input files to be used:
The output file from Step 2 (the one contains the batch ID which should be excluded).
The output file from Step 1 (the numbered file). 

Note: The order of the input files in JCL should be in such a way that the file with no duplicates should be provided first. 

JOINKEYS FILES=F1,FIELDS=(1,5,A) - Defines the key field from the first file with which the matching should occur. 

JOINKEYS FILES=F1,FIELDS=(41,5,A) - Defines the key field from the second file with which the matching should occur. 

REFORMAT FIELDS=(F2:1,55,F1:1,5,?) - The REFORMAT statement indicates the fields from File 1 and File 2 that we should include in the joined records. '?' symbol is used as an indicator with the following possible values:
"B" - indicates that it is a paired record.
"1" - indicates that it is an unpaired record from File 1.
"2" - indicates that it is an unpaired record from File 2. 

JOIN UNPAIRED,F2 - This statement is similar to RIGHT JOIN in DB2. The joined records will contain both the paired and unpaired records from File 2. 

SORT FIELDS=COPY - Sort statement. 

OUTFIL FNAMES=BYPASS,BUILD=(1,55),INCLUDE=(61,1,CH,EQ,C'B') - The output file with BYPASS as the DD name in the JCL will be written with the paired records i.e, all the records in the input file 2 which have 00002 as the Batch ID. 

OUTFIL FNAMES=FINAL,BUILD=(1,40),INCLUDE=(61,1,CH,EQ,C'2') - The output file with FINAL as the DD name in the JCL will be written with the unpaired records from File 2 i.e., the records whose batch ID is not 00002. 

After running Step 3, we get the following output files. 

The BYPASS file that contains all the records with Batch ID 00002.


The FINAL file which will be fed as input to the COBOL DB2 program. 


At our site, we used the BYPASS file in subsequent steps to extract crucial information such as the Batch number, Invoice number and so on, in order to send an email to business with details about the batches that were omitted for the day. 

That's it. As always, thanks much for reading. Have any questions or suggestions? Please post them in the comments section below. Thx!

P.S. Screen grab of Mainframe screens throughout this post were taken from Master the Mainframe system.  
 


1 comment:

  1. That's brilliant.. keep posting stuff like this

    ReplyDelete