DFSORT - Selecting Duplicate records

murali3955 · New User Joined: 25 Nov 2009 Posts: 11 Location: Chennai

Hi,

I have to select records from a file which has duplicates. The file layout is below:

Client no ID No Account no
===========================
123344 1 0123344
234455 1 0123344
324521 1 0123344
234531 2 1236789
456879 3 1236789
784322 3 1236789

The file is sorted based on Account no.
For the same Account no and ID no, I have to select the records.

As in the example for Account no '0123344' and ID no '1' there are 3 rows, so this should be written to the file. But in case of Account no '1236789' the ID no changes as '2' and '3' so this should not be written to output.

Only when the ID no is same with an Account no it should be written to output file. If the ID no changes within the same Account no it should not be written to output.

Please let me know if this be achieved with DFSORT. I will be grateful for your ideas.

Thanks and Regards,
Murali

Nic Clouston · Posted: Sun Apr 24, 2011 8:40 pm

I am not clear about what your output is like - is it one example of the duplicates of an account or is it all duplicates of the account or is it ALL records for an account with duplicates? I.E for account 0123344 do you want 1, 2 or 3 records written? If 1, which one? If 2, which one is not to be written?

shrivatsa · Posted: Mon Apr 25, 2011 10:49 am

Hi Murali,

I hope the below SORT output you are expecting

murali3955 · New User Joined: 25 Nov 2009 Posts: 11 Location: Chennai

Hi Nic/Shrivatsa,

Sorry for not mentioning how my output file should be.

Since for the Account no '1236789' the ID no changes to '2' and '3'. This should not be written to output.

Finally my output file should look like below.

Client no ID No Account no
===========================
123344 1 0123344
234455 1 0123344
324521 1 0123344

Thanks,
Murali.

Frank Yaeger · Posted: Tue Apr 26, 2011 3:13 am

Murali,

So basically are you saying that you only want to output the records for each account number that has only a single ID (e.g. every ID is 1 for that account number, or every ID is 2 for that account number, etc)?

What is the RECFM and LRECL of the input file?

What is the starting position, length and format of each field?

Is the ID always already in ascending order by account number? Or could you have IDs out of order, for example:

murali3955 · New User Joined: 25 Nov 2009 Posts: 11 Location: Chennai

Hi Frank,

Yes, you are correct. I want to write the records to output for the same Account number that has single ID.

RECFM=FB, LRECL=80.

The fields start position,length and format :

CLIENT NO --> 3,9, Numeric
ID NO --> 29,7,Alphanumeric
ACC NO --> 38,14,Numeric
WIP NO --> 60,9,Numeric

The input file is sorted based on Account number, ID No. So within the same account number, the ID NO will be in sorted order.

In the input, for the Account number '00000000001329' as the ID No changes. This Account number records should not be written to output and remaining records should be written.

The input/output files are attached.

Thanks,
Murali.

Nic Clouston · Posted: Tue Apr 26, 2011 7:49 pm

Well, I am confused - for account 1329 you have duplicate IDs (2 for 7176830 and 2 for 7179998) so why is that account not to be written?

murali3955 · New User Joined: 25 Nov 2009 Posts: 11 Location: Chennai

My requirement is to write only those Account numbers that have same ID NO.

For account number '1329' there are 2 different ID NO's (7176830 and 7179998) so this should not be written to output.

For example refer to the Account number '30079'. There are 6 records with same ID no '8227896'. So this should be written to output.

Thanks,
Murali.

Skolusu · Posted: Tue Apr 26, 2011 11:00 pm

murali3955,

The following DFSORT JCL will give you the desired results.

murali3955 · New User Joined: 25 Nov 2009 Posts: 11 Location: Chennai

Hi Skolusu,

Thank you very much:D. The job worked fine.

Almost I was able to understand the job provided by you.But could you please explain me 'SUM FIELDS=(81,3,ZD)'.

I ran the job without SUM FIELDS.For each record in input, there were two records written to the output file.

SUM Fields example :
--------------------
SORT FIELDS=(1,5,CH,A)
SUM FIELDS=(11,4,ZD)

This will sort the fields 1,5 in Ascending order and then If there are any duplicates in 1,5 it will SUM the values in 11,4. Please correct me if I am wrong.

As per the SUM fields in the job, position 81,3 will have '000' since overlayed,when 81,3 and 84,3 are equal.Please let me know how SUM FIELDS=(81,3,ZD) identifies the duplicates and eliminates it without using any SORT fields.

Thanks,
Murali

Skolusu · Posted: Wed Apr 27, 2011 9:55 pm

murali3955 · New User Joined: 25 Nov 2009 Posts: 11 Location: Chennai

Hi Skolusu,

Thank you for your help and explanation.

Now I understood how it works.

Thanks,
Murali.