|
|
| Author |
Message |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
I work in a shop which has several gdg bases that many jobs write to on a daily basis. A nightly batch job copies the gdg base, processes the copy, and deletes the generations at the end of the job.
This works fine except for the following scenario, which results in lost transactions.
1. Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step.
2. While JOBB is down, JOBX, which creates a new generation that JOBB should process, runs and creates a new generation.
3. The abend for JOBB is resolved and JOBB is restarted from the step that abended. In other words doesn't process the new generation JOBX created while JOBB was down.
4. JOBB runs to eoj and thus deletes the unprocessed generation in it's delete step at the end of the job.
So far it's up to us to watch the abends and catch these situations and reprocess the missing transactions.
But I think there's got to be a better way. My idea is:
1. Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code.
2. Code DISP=(OLD,DELETE,KEEP) on the copy step.
I think this should work as if JOBB abends and JOBX makes a new generation while JOBB is down, the new generation won't be processed by the restart of JOBB but it won't be deleted either. That way the next run of JOBB will pick of the generation JOBX made while JOBB was down.
And if JOBB is reran from the top instead of restarted, the new generation will be picked from the RESTART of JOBB.
This seems like such an obvious solution to me I'm guessing there must be something I'm missing or they would have coded it this way to begin with. So I'm asking here to see what you guys think.
What's the right way to handle this situation? |
|
| Back to top |
|
 |
References
|
Posted: Thu Aug 23, 2007 5:06 pm Post subject: Re: Multiple jobs writing to same gdg. Best practice? |
 |
|
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
| I did search for this before posting. But sometimes my search skills aren't the best. If this is covered somewhere else, kindly point me in the right direction. Better yet, post where you searched and the keywords you used. |
|
| Back to top |
|
 |
bijumon
New User
Joined: 14 Aug 2006 Posts: 21 Location: Pune,India
|
|
|
|
Hi,
How is the GDG base specified in the JOB, JOBB , is it the current base of the GDG created by JOBX, or the entire GDG generation, please provide with more details...
Thanks & Regards,
----------------------
Biju |
|
| Back to top |
|
 |
IQofaGerbil
Active User
Joined: 05 May 2006 Posts: 190 Location: Scotland
|
|
|
|
| Isn't this a scheduling problem? |
|
| Back to top |
|
 |
William Thompson
Global Moderator
Joined: 18 Nov 2006 Posts: 2978 Location: Tucson AZ
|
|
|
|
| Quote: |
| Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step. |
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......
| Quote: |
| Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code. |
Set the cond on the process to fail if the IDCAMS returns anything greater than zero. If your schedular can't handle that, add a step after the delete that will produce an abend and set it to be skipped if cond = zero. |
|
| Back to top |
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
| bijumon wrote: |
Hi,
How is the GDG base specified in the JOB, JOBB , is it the current base of the GDG created by JOBX, or the entire GDG generation, please provide with more details...
Thanks & Regards,
----------------------
Biju |
The gdg base in JOBB is just that. All generations. |
|
| Back to top |
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
| IQofaGerbil wrote: |
| Isn't this a scheduling problem? |
I don't think so. There is no reason that the jobs that create new generations should run while the job that processes them is down. |
|
| Back to top |
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
| William Thompson wrote: |
| Quote: |
| Nightly batch job, JOBB, copies the gdg base, processes the copy, and deletes the gdg base at the end of the job, abends after the copy step. |
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......
I figured I'd get this response. We talked about it but as long as it's in two different steps, the same issue exists, albeit to a lesser degree. That's the reason for my suggesting disp=new,delete,keep. That why they aren't 'treated as one'. They actually are one.
| Quote: |
| Change the copy step to a utility that will cause an abend if something goes wrong, unlike idcams which will just give a bad return code. |
Set the cond on the process to fail if the IDCAMS returns anything greater than zero. If your schedular can't handle that, add a step after the delete that will produce an abend and set it to be skipped if cond = zero. |
If we do that, the delete step is still in a different step and issue still exists as noted above. |
|
| Back to top |
|
 |
dick scherrer
Global Moderator
Joined: 23 Nov 2006 Posts: 8060 Location: 221 B Baker St
|
|
|
|
Hello,
In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.
The nightly process job was started and was able to run independent of the sales/distribution runs that created new generations as the gdg was never mentioned in the process job. We needed this because there were something like 422 sales/distribution centers and we could not control when they would complete their daily processing and upload the sales and shipping info.
Please let me know if i've not been clear. |
|
| Back to top |
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
| Quote: |
Your best bet would be to slide the processing to after the base delete and treat the copy/delete as a single entity, either they both run or they are both rerun together. Once the delete is done, process the copy and don't worry about it......
|
One of the goals is to have it so the job can restart in the same step it abended in. This isn't a requirement proper so it's a possible solution. But so far nobody has addressed whether disp=(old,delete,keep) works here. |
|
| Back to top |
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
[quote="dick scherrer"]Hello,
In a similar situation, we ran a job that copied all of the generations to a "processing" dataset and deleted all of the cataloged versions before the process job ever started. It also cataloged a "new" empty generation.
[quote]
But the same question applies, just for the job that does the copy, or no?
Errr, I forgot an important point
There are hundreds of jobs that write to the gdg bases in question. And, while a goal is to prevent generations from being overwritten as I've stated. The main goal is to avoid coding conflcts in ca-7. The reason is that we created and delete jobs rapidly and updating the scheduler with the conflicts is too burdensome. So, I guess the question of 'Is this a scheduling issue?' has merit.
Since throughput manager handles the dataset conflicts, I figured that coding disp=old,delete,keep would handle the restart situation and we'd not have to code the conflicts. |
|
| Back to top |
|
 |
dick scherrer
Global Moderator
Joined: 23 Nov 2006 Posts: 8060 Location: 221 B Baker St
|
|
|
|
Hello,
If the scheduling system is told to manage that particular dataset, it surely does look like a "scheduling issue".
What if that dataset was removed from being "scheduled"? Big-OZ will not let anything "bad" happen if the DISPs are proper for the job that initially copies all of the accumulated generations, deletes them, and catalogs the new "empty" starter generation. The worst thing we had happen was that one or more of the processes that needed to catalog the +1's had to wait until the pre-processor completed. We only let the jobs that created +1's run single thread anyway - they all had the same jobname. They were often a spin-off of a more involved process and existed only to copy the needed data to a new +1 of the common gdg. |
|
| Back to top |
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
| dick scherrer wrote: |
Hello,
If the scheduling system is told to manage that particular dataset, it surely does look like a "scheduling issue".
What if that dataset was removed from being "scheduled"? Big-OZ will not let anything "bad" happen if the DISPs are proper for the job that initially copies all of the accumulated generations, deletes them, and catalogs the new "empty" starter generation. The worst thing we had happen was that one or more of the processes that needed to catalog the +1's had to wait until the pre-processor completed. We only let the jobs that created +1's run single thread anyway - they all had the same jobname. They were often a spin-off of a more involved process and existed only to copy the needed data to a new +1 of the common gdg. |
Currently, the scheduling system is just told not to let any of the +1 jobs run while the copy job is running. That's what we want to aviod, defining that to the scheduler. It's tedious and we have lots of jobs and there is high 'job turnover'. In this case, the jobs that create the +1's aren't the same. They're totally different. And there isn't an issue if there are no abends with the copy job. The issue comes in if there is an abend after the copy step but before the delete step finishes... if a new generation is created during the downtime and the job is restarted from the point of abend, thus, not picking up the new generation in the copy but deleting it.
I proposed the solution given here of moving the delete step to the one right after the copy step. The response I got was we couldn't do that because if the delete step abended the same situation exists just not as likely to happen. So even if we had the delete step right after the copy step, we'd still need to define the job conflicts to the scheduler.
It's slightly more complicated since there are about a dozen of these gdg bases that get copied in this job and the abend might happen on the 3rd one. And part of the goal is to make this bullet proof in the event of the job being restarted incorrectly.
That's why I figured we could put a sort step in that copied the input to both the temp file to get processed and the backup and code a disp=old,delete,keep on the input. So if the sort completes successfully, we have the backup to revert to if need be in case the job is restarted improperly and at the same time we don't have to worry about missing a new generation created by a +1 job if the copy job abends.
So, is there an issue with coding disp=old,delete,keep? I mean why have the delete step in a different step in the first place? That's what I'm trying to get at: What's the rationale behind creating a separate delete step vs old,delete,keep? |
|
| Back to top |
|
 |
MtClimber
New User
Joined: 31 Aug 2007 Posts: 1 Location: Milwaukee WI
|
|
|
|
jasorn;
Try the "switch dataset" technique. Here's how it works.
In your JOBB that currently has 3 steps: copies, processes, and deletes the GDG files, add a step ahead of those that deletes a dummy file. Call it file A. After the copy, process and delete steps add a new last step to catalog that dummy file, file A. So JOBB will now have 5 steps.
Now add a new first step to JOBX that uses file A. The way this works is when JOBB starts, the first thing that happens is file A is deleted. If a JOBX starts, it will try to use file A and immediately abend with a dataset not found. JOBX will only run after the 5th step in JOBB runs to re-create file A. You can take all the time in the world to resolve problems with JOBB because JOBX will not run until that 5th step runs to re-create file A. All JOBX abends can easily be restarted from the top after JOBB finishes.
To get this all working, you will, one time, have to create a file A. You can catalog the dummy file A with IEBGENER with SYSUT1 dummied. For your delete step, I would recommend using IDCAMS with a DELETE A sysin card.
I hope this helps.
MtClimber |
|
| Back to top |
|
 |
jasorn
Active User
Joined: 12 Jul 2006 Posts: 109
|
|
|
|
I'm not sure how to read the fact that nobody addressed whether there is an issue with using the disp=old,delete,keep in the copy step and forgoing the delete step altogether. Not sure if that means everyone thinks it's problematic or if nobody's considered it?
All of the suggestions given here are good, appreciated, and ones we're considering but are more 'invloved' than changing the disp on the copy step and eliminating the delete step. This seems to work in the testing I've done so far but I wanted to see if anyone here knew if there were issues with that approach. |
|
| Back to top |
|
 |
|
|
|