Kubernetes任务调用Job与CronJob及源码分析( 二 )
在job Controller的源码中 , 我们可以看到这部分的逻辑:
job Controller首先会去校验任务是不是处理次数是不是超过了BackoffLimit设置 , 如果没有超过的话就校验有没有设置ActiveDeadlineSeconds , 如果设置了的话 , 就校验当前job运行时间是否超过了ActiveDeadlineSeconds设置的的时间 , 超过了那么会打上标记 , 表示这个job运行失败 。
...jobHaveNewFailure := failed > job.Status.FailedexceedsBackoffLimit := jobHaveNewFailurei < diff; i++ {//并发删除多余的 active podsgo func(ix int32) {defer wait.Done()if err := jm.podControl.DeletePod(job.Namespace, activePods[ix].Name, job); err != nil {// Decrement the expected number of deletes because the informer won't observe this deletionjm.expectations.DeletionObserved(jobKey)if !apierrors.IsNotFound(err) {klog.V(2).Infof("Failed to delete %v, decremented expectations for job %q/%q", activePods[ix].Name, job.Namespace, job.Name)activeLock.Lock()active++activeLock.Unlock()errCh <- errutilruntime.HandleError(err)}}}(i)}wait.Wait()//若处于 active 状态的 pods 数小于 job 设置的并发数 , 则需要创建出新的 pod} else if active < parallelism {wantActive := int32(0)//如果没有声明Completions , 那么active的pod应该等于parallelism , 如果有pod已经完成了 , 那么不再创建新的 。if job.Spec.Completions == nil {if succeeded > 0 {wantActive = active} else {wantActive = parallelism}//如果声明了Completions , 那么需要比较Completions和succeeded// 如果wantActive大于parallelism , 那么需要创建的Pod数等于parallelism} else {// Job specifies a specific number of completions.Therefore, number// active should not ever exceed number of remaining completions.wantActive = *job.Spec.Completions - succeededif wantActive > parallelism {wantActive = parallelism}}//计算出 diff 数diff := wantActive - activeif diff < 0 {utilruntime.HandleError(fmt.Errorf("More active than wanted: job %q, want %d, have %d", jobKey, wantActive, active))diff = 0}//表示已经有足够的pod , 不需要再创建了if diff == 0 {return active, nil}jm.expectations.ExpectCreations(jobKey, int(diff))errCh = make(chan error, diff)klog.V(4).Infof("Too few pods running job %q, need %d, creating %d", jobKey, wantActive, diff)active += diffwait := sync.WaitGroup{}//创建的 pod 数依次为 1、2、4、8...... , 呈指数级增长for batchSize := int32(integer.IntMin(int(diff), controller.SlowStartInitialBatchSize)); diff > 0; batchSize = integer.Int32Min(2*batchSize, diff) {errorCount := len(errCh)wait.Add(int(batchSize))for i := int32(0); i < batchSize; i++ {//并发程创建podgo func() {defer wait.Done()//创建poderr := jm.podControl.CreatePodsWithControllerRef(job.Namespace,echo Hello from the Kubernetes clusterrestartPolicy: OnFailure
这个CronJob会每分钟创建一个Pod:
$ kubectl get podNAMEREADYSTATUSRESTARTSAGEhello-1596406740-tqnlb0/1ContainerCreating08s
cronjob会记录最近的调度时间:
$ kubectl get cronjob helloNAMESCHEDULESUSPENDACTIVELAST SCHEDULEAGEhello*/1 * * * *False116s2m33s
spec.concurrencyPolicy
如果设置的间隔时间太短 , 那么可能会导致任务还没执行完成又创建了新的Pod 。 所以我们可以通过修改spec.concurrencyPolicy来定义处理策略:
- 在kubernetes中部署企业级ELK并使用其APM
- kubernetes-Prometheus基于邮件告警
- 软网推荐:右键调用 多重更名更方便
- MIUI很出色,唯一不满意的地方就是:多任务,打开慢
- 一个Postman小技巧:利用脚本赋值中间变量,实现两步调用
- 大规模分布式强化学习基础架构Menger, 大幅提高真实任务的学习效率
- Python调用时使用*和**
- go-zero 如何应对海量定时/延迟任务?
- Java|Java:调用参数p的printPhone方法
- 月球|我国航天史上难度最高的任务即将实施,看嫦五如何将月壤送回地球