k8s garbage collector分析(1)-启动分析
阅读原文时间:2021年09月12日阅读:2

garbage collector介绍

Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它负责回收kubernetes中的资源对象,监听资源对象事件,更新对象之间的依赖关系,并根据对象的删除策略来决定是否删除其关联对象。

关于删除关联对象,细一点说就是,使用级联删除策略去删除一个owner时,会连带这个owner对象的dependent对象也一起删除掉。

关于对象的关联依赖关系,garbage collector会监听资源对象事件,根据资源对象中ownerReference 的值,来构建对象间的关联依赖关系,也即ownerdependent之间的关系。

关于owner与dependent的介绍

以创建deployment对象为例进行讲解。

创建deployment对象后,kube-controller-manager为其创建出replicaset对象,且自动将该deployment的信息设置到replicaset对象ownerReference值。如下面示例,即说明replicaset对象test-1-59d7f45ffbowner为deployment对象test-1,deployment对象test-1dependent为replicaset对象test-1-59d7f45ffb

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-1
  namespace: test
  uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
...


apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: test-1-59d7f45ffb
  namespace: test
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: Deployment
    name: test-1
    uid: 4973d370-3221-46a7-8d86-e145bf9ad0ce
  uid: 386c380b-490e-470b-a33f-7d5b0bf945fb
...

同理,replicaset对象创建后,kube-controller-manager为其创建出pod对象,这些pod对象也会将replicaset对象的信息设置到pod对象的ownerReference的值中,replicaset是pod的owner,pod是replicaset的dependent

对象中ownerReference 的值,指定了ownerdependent之间的关系。

garbage collector架构图

garbage collector中最关键的代码就是garbagecollector.gograph_builder.go两部分。

garbage collector的主要组成为1个图(对象关联依赖关系图)、2个处理器(GraphBuilderGarbageCollector)、3个事件队列(graphChangesattemptToDeleteattemptToOrphan):

1个图

(1)uidToNode:对象关联依赖关系图,由GraphBuilder维护,维护着所有对象间的关联依赖关系。在该图里,每一个k8s对象会对应着关系图里的一个node,而每个node都会维护一个owner列表以及dependent列表。

示例:现有一个deployment A,replicaset B(owner为deployment A),pod C(owner为replicaset B),则对象关联依赖关系如下:

3个node,分别是A、B、C

A对应一个node,无owner,dependent列表里有B;
B对应一个node,owner列表里有A,dependent列表里有C;
C对应一个node,owner列表里有B,无dependent。

2个处理器

(1)GraphBuilder:负责维护所有对象的关联依赖关系图,并产生事件触发GarbageCollector执行对象回收删除操作。GraphBuildergraphChanges事件队列中获取事件进行消费,根据资源对象中ownerReference的值,来构建、更新、删除对象间的关联依赖关系图,也即ownerdependent之间的关系图,然后再作为生产者生产事件,放入attemptToDeleteattemptToOrphan队列中,触发GarbageCollector执行,看是否需要进行关联对象的回收删除操作,而GarbageCollector进行对象的回收删除操作时会依赖于uidToNode这个关系图。

(2)GarbageCollector:负责回收删除对象。GarbageCollector作为消费者,从attemptToDeleteattemptToOrphan队列中取出事件进行处理,若一个对象被删除,且其删除策略为级联删除,则进行关联对象的回收删除。关于删除关联对象,细一点说就是,使用级联删除策略去删除一个owner时,会连带这个owner对象的dependent对象也一起删除掉。

3个事件队列

(1)graphChanges:list/watch apiserver,获取事件,由informer生产,由GraphBuilder消费;

(2)attemptToDelete:级联删除事件队列,由GraphBuilder生产,由GarbageCollector消费;

(3)attemptToOrphan:孤儿删除事件队列,由GraphBuilder生产,由GarbageCollector消费。

garbage collector相关启动参数分析

kcm组件启动参数中,与garbage collector相关的参数代码如下:

// cmd/kube-controller-manager/app/options/garbagecollectorcontroller.go
// AddFlags adds flags related to GarbageCollectorController for controller manager to the specified FlagSet.
func (o *GarbageCollectorControllerOptions) AddFlags(fs *pflag.FlagSet) {
    if o == nil {
        return
    }

    fs.Int32Var(&o.ConcurrentGCSyncs, "concurrent-gc-syncs", o.ConcurrentGCSyncs, "The number of garbage collector workers that are allowed to sync concurrently.")
    fs.BoolVar(&o.EnableGarbageCollector, "enable-garbage-collector", o.EnableGarbageCollector, "Enables the generic garbage collector. MUST be synced with the corresponding flag of the kube-apiserver.")
}

从代码中可以看到,kcm组件启动参数中有两个参数与garbage collector相关,分别是:

(1)enable-garbage-collector:是否开启garbage collector,默认值为true

(2)concurrent-gc-syncsgarbage collector同步操作的worker数量,默认20

garbage collector的源码分析将分成两部分进行,分别是:

(1)启动分析;

(2)核心处理逻辑分析。

本篇博客先对garbage collector进行启动分析。

基于tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

直接以startGarbageCollectorController函数作为garbage collector的源码分析入口。

startGarbageCollectorController

startGarbageCollectorController函数主要逻辑如下:

(1)根据EnableGarbageCollector变量的值来决定是否开启garbage collectorEnableGarbageCollector变量的值根据kcm组件启动参数--enable-garbage-collector配置获取,默认为true;不开启则直接返回,不会继续往下执行;

(2)初始化discoveryClient,主要用来获取集群中的所有资源对象;

(3)调用garbagecollector.GetDeletableResources,获取集群内garbage collector需要处理去删除回收的所有资源对象,支持delete, list, watch三种操作的资源对象称为 deletableResource

(4)调用garbagecollector.NewGarbageCollector初始化garbage collector

(5)调用garbageCollector.Run,启动garbage collector

(6)调用garbageCollector.Sync监听集群中的deletableResources ,当出现新的deletableResources时同步到monitors中,确保监控集群中的所有资源;

(7)暴露http服务,注册 debug 接口,用于debug,用来提供由GraphBuilder构建的集群内所有对象的关联关系。

// cmd/kube-controller-manager/app/core.go
func startGarbageCollectorController(ctx ControllerContext) (http.Handler, bool, error) {
    if !ctx.ComponentConfig.GarbageCollectorController.EnableGarbageCollector {
        return nil, false, nil
    }

    gcClientset := ctx.ClientBuilder.ClientOrDie("generic-garbage-collector")
    discoveryClient := cacheddiscovery.NewMemCacheClient(gcClientset.Discovery())

    config := ctx.ClientBuilder.ConfigOrDie("generic-garbage-collector")
    metadataClient, err := metadata.NewForConfig(config)
    if err != nil {
        return nil, true, err
    }

    // Get an initial set of deletable resources to prime the garbage collector.
    deletableResources := garbagecollector.GetDeletableResources(discoveryClient)
    ignoredResources := make(map[schema.GroupResource]struct{})
    for _, r := range ctx.ComponentConfig.GarbageCollectorController.GCIgnoredResources {
        ignoredResources[schema.GroupResource{Group: r.Group, Resource: r.Resource}] = struct{}{}
    }
    garbageCollector, err := garbagecollector.NewGarbageCollector(
        metadataClient,
        ctx.RESTMapper,
        deletableResources,
        ignoredResources,
        ctx.ObjectOrMetadataInformerFactory,
        ctx.InformersStarted,
    )
    if err != nil {
        return nil, true, fmt.Errorf("failed to start the generic garbage collector: %v", err)
    }

    // Start the garbage collector.
    workers := int(ctx.ComponentConfig.GarbageCollectorController.ConcurrentGCSyncs)
    go garbageCollector.Run(workers, ctx.Stop)

    // Periodically refresh the RESTMapper with new discovery information and sync
    // the garbage collector.
    go garbageCollector.Sync(gcClientset.Discovery(), 30*time.Second, ctx.Stop)

    return garbagecollector.NewDebugHandler(garbageCollector), true, nil
}

下面对startGarbageCollectorController函数里的部分逻辑稍微展开一下分析。

1.garbagecollector.NewGarbageCollector

NewGarbageCollector函数负责初始化garbage collector。主要逻辑如下:

(1)初始化GarbageCollector结构体;

(2)初始化GraphBuilder结构体,并赋值给GarbageCollector结构体的dependencyGraphBuilder属性。

// pkg/controller/garbagecollector/garbagecollector.go
func NewGarbageCollector(
    metadataClient metadata.Interface,
    mapper resettableRESTMapper,
    deletableResources map[schema.GroupVersionResource]struct{},
    ignoredResources map[schema.GroupResource]struct{},
    sharedInformers controller.InformerFactory,
    informersStarted <-chan struct{},
) (*GarbageCollector, error) {
    attemptToDelete := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_delete")
    attemptToOrphan := workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_attempt_to_orphan")
    absentOwnerCache := NewUIDCache(500)
    gc := &GarbageCollector{
        metadataClient:   metadataClient,
        restMapper:       mapper,
        attemptToDelete:  attemptToDelete,
        attemptToOrphan:  attemptToOrphan,
        absentOwnerCache: absentOwnerCache,
    }
    gb := &GraphBuilder{
        metadataClient:   metadataClient,
        informersStarted: informersStarted,
        restMapper:       mapper,
        graphChanges:     workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "garbage_collector_graph_changes"),
        uidToNode: &concurrentUIDToNode{
            uidToNode: make(map[types.UID]*node),
        },
        attemptToDelete:  attemptToDelete,
        attemptToOrphan:  attemptToOrphan,
        absentOwnerCache: absentOwnerCache,
        sharedInformers:  sharedInformers,
        ignoredResources: ignoredResources,
    }
    if err := gb.syncMonitors(deletableResources); err != nil {
        utilruntime.HandleError(fmt.Errorf("failed to sync all monitors: %v", err))
    }
    gc.dependencyGraphBuilder = gb

    return gc, nil
}

1.1 gb.syncMonitors

gb.syncMonitors的主要作用是调用gb.controllerFor对各个deletableResourcesdeletableResources指支持 “delete”, “list”, “watch” 三种操作的资源对象)资源对象的infomer做初始化,并为资源的变化事件注册eventHandler(AddFunc、UpdateFunc 和 DeleteFunc),对于资源的add、update、delete event,都会push到graphChanges队列中,然后gb.processGraphChanges会从graphChanges队列中取出event进行处理(后面介绍garbage collector处理逻辑的时候会做详细分析)。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) syncMonitors(resources map[schema.GroupVersionResource]struct{}) error {
    gb.monitorLock.Lock()
    defer gb.monitorLock.Unlock()

    toRemove := gb.monitors
    if toRemove == nil {
        toRemove = monitors{}
    }
    current := monitors{}
    errs := []error{}
    kept := 0
    added := 0
    for resource := range resources {
        if _, ok := gb.ignoredResources[resource.GroupResource()]; ok {
            continue
        }
        if m, ok := toRemove[resource]; ok {
            current[resource] = m
            delete(toRemove, resource)
            kept++
            continue
        }
        kind, err := gb.restMapper.KindFor(resource)
        if err != nil {
            errs = append(errs, fmt.Errorf("couldn't look up resource %q: %v", resource, err))
            continue
        }
        c, s, err := gb.controllerFor(resource, kind)
        if err != nil {
            errs = append(errs, fmt.Errorf("couldn't start monitor for resource %q: %v", resource, err))
            continue
        }
        current[resource] = &monitor{store: s, controller: c}
        added++
    }
    gb.monitors = current

    for _, monitor := range toRemove {
        if monitor.stopCh != nil {
            close(monitor.stopCh)
        }
    }

    klog.V(4).Infof("synced monitors; added %d, kept %d, removed %d", added, kept, len(toRemove))
    // NewAggregate returns nil if errs is 0-length
    return utilerrors.NewAggregate(errs)
}
gb.controllerFor

gb.controllerFor主要是对资源对象的infomer做初始化,并为资源的变化事件注册eventHandler(AddFunc、UpdateFunc 和 DeleteFunc),对于资源的add、update、delete event,都会push到graphChanges队列中。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) controllerFor(resource schema.GroupVersionResource, kind schema.GroupVersionKind) (cache.Controller, cache.Store, error) {
    handlers := cache.ResourceEventHandlerFuncs{
        // add the event to the dependencyGraphBuilder's graphChanges.
        AddFunc: func(obj interface{}) {
            event := &event{
                eventType: addEvent,
                obj:       obj,
                gvk:       kind,
            }
            gb.graphChanges.Add(event)
        },
        UpdateFunc: func(oldObj, newObj interface{}) {
            // TODO: check if there are differences in the ownerRefs,
            // finalizers, and DeletionTimestamp; if not, ignore the update.
            event := &event{
                eventType: updateEvent,
                obj:       newObj,
                oldObj:    oldObj,
                gvk:       kind,
            }
            gb.graphChanges.Add(event)
        },
        DeleteFunc: func(obj interface{}) {
            // delta fifo may wrap the object in a cache.DeletedFinalStateUnknown, unwrap it
            if deletedFinalStateUnknown, ok := obj.(cache.DeletedFinalStateUnknown); ok {
                obj = deletedFinalStateUnknown.Obj
            }
            event := &event{
                eventType: deleteEvent,
                obj:       obj,
                gvk:       kind,
            }
            gb.graphChanges.Add(event)
        },
    }
    shared, err := gb.sharedInformers.ForResource(resource)
    if err != nil {
        klog.V(4).Infof("unable to use a shared informer for resource %q, kind %q: %v", resource.String(), kind.String(), err)
        return nil, nil, err
    }
    klog.V(4).Infof("using a shared informer for resource %q, kind %q", resource.String(), kind.String())
    // need to clone because it's from a shared cache
    shared.Informer().AddEventHandlerWithResyncPeriod(handlers, ResourceResyncTime)
    return shared.Informer().GetController(), shared.Informer().GetStore(), nil
}

2.garbageCollector.Run

garbageCollector.Run负责启动garbage collector,主要逻辑如下:

(1)调用gc.dependencyGraphBuilder.Run:启动GraphBuilder

(2)根据启动参数配置的worker数量,起相应数量的goroutine,执行gc.runAttemptToDeleteWorkergc.runAttemptToOrphanWorker,两者属于GarbageCollector的核心处理逻辑,都是去删除需要被回收对象,具体分析会在下篇博客里进行分析。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) Run(workers int, stopCh <-chan struct{}) {
    defer utilruntime.HandleCrash()
    defer gc.attemptToDelete.ShutDown()
    defer gc.attemptToOrphan.ShutDown()
    defer gc.dependencyGraphBuilder.graphChanges.ShutDown()

    klog.Infof("Starting garbage collector controller")
    defer klog.Infof("Shutting down garbage collector controller")

    go gc.dependencyGraphBuilder.Run(stopCh)

    if !cache.WaitForNamedCacheSync("garbage collector", stopCh, gc.dependencyGraphBuilder.IsSynced) {
        return
    }

    klog.Infof("Garbage collector: all resource monitors have synced. Proceeding to collect garbage")

    // gc workers
    for i := 0; i < workers; i++ {
        go wait.Until(gc.runAttemptToDeleteWorker, 1*time.Second, stopCh)
        go wait.Until(gc.runAttemptToOrphanWorker, 1*time.Second, stopCh)
    }

    <-stopCh
}

2.1 gc.dependencyGraphBuilder.Run

gc.dependencyGraphBuilder.Run负责启动启动GraphBuilder,主要逻辑如下:

(1)调用gb.startMonitors,启动前面1.1 gb.syncMonitors中提到的infomers;

(2)每隔1s循环调用gb.runProcessGraphChanges,做GraphBuilder的核心逻辑处理,核心处理逻辑会在下篇博客里进行分析。

// pkg/controller/garbagecollector/graph_builder.go
func (gb *GraphBuilder) Run(stopCh <-chan struct{}) {
    klog.Infof("GraphBuilder running")
    defer klog.Infof("GraphBuilder stopping")

    // Set up the stop channel.
    gb.monitorLock.Lock()
    gb.stopCh = stopCh
    gb.running = true
    gb.monitorLock.Unlock()

    // Start monitors and begin change processing until the stop channel is
    // closed.
    gb.startMonitors()
    wait.Until(gb.runProcessGraphChanges, 1*time.Second, stopCh)

    // Stop any running monitors.
    gb.monitorLock.Lock()
    defer gb.monitorLock.Unlock()
    monitors := gb.monitors
    stopped := 0
    for _, monitor := range monitors {
        if monitor.stopCh != nil {
            stopped++
            close(monitor.stopCh)
        }
    }

    // reset monitors so that the graph builder can be safely re-run/synced.
    gb.monitors = nil
    klog.Infof("stopped %d of %d monitors", stopped, len(monitors))
}

3.garbageCollector.Sync

garbageCollector.Sync的主要功能是周期性的查询集群中所有的deletableResources,调用gc.resyncMonitors来更新GraphBuildermonitors,为新出现的资源对象初始化infomer和注册eventHandler,然后启动infomer,对已经移除的资源对象的monitors进行销毁。

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) Sync(discoveryClient discovery.ServerResourcesInterface, period time.Duration, stopCh <-chan struct{}) {
    oldResources := make(map[schema.GroupVersionResource]struct{})
    wait.Until(func() {
    // Get the current resource list from discovery.
    newResources := GetDeletableResources(discoveryClient)
    ...
    if err := gc.resyncMonitors(newResources); err != nil {
        utilruntime.HandleError(fmt.Errorf("failed to sync resource monitors (attempt %d): %v", attempt, err))
        return false, nil
    }
    klog.V(4).Infof("resynced monitors")
    ...

3.1 gc.resyncMonitors

调用gc.dependencyGraphBuilder.syncMonitors:初始化infomer和注册eventHandler

调用gc.dependencyGraphBuilder.startMonitors:启动infomer

// pkg/controller/garbagecollector/garbagecollector.go
func (gc *GarbageCollector) resyncMonitors(deletableResources map[schema.GroupVersionResource]struct{}) error {
    if err := gc.dependencyGraphBuilder.syncMonitors(deletableResources); err != nil {
        return err
    }
    gc.dependencyGraphBuilder.startMonitors()
    return nil
}

4.garbagecollector.NewDebugHandler

garbagecollector.NewDebugHandler暴露http服务,注册 debug 接口,用于debug,用来提供由GraphBuilder构建的集群内所有对象的关联关系。

// pkg/controller/garbagecollector/dump.go
func NewDebugHandler(controller *GarbageCollector) http.Handler {
    return &debugHTTPHandler{controller: controller}
}

type debugHTTPHandler struct {
    controller *GarbageCollector
}

func (h *debugHTTPHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
    if req.URL.Path != "/graph" {
        http.Error(w, "", http.StatusNotFound)
        return
    }

    var graph graph.Directed
    if uidStrings := req.URL.Query()["uid"]; len(uidStrings) > 0 {
        uids := []types.UID{}
        for _, uidString := range uidStrings {
            uids = append(uids, types.UID(uidString))
        }
        graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraphForObj(uids...)

    } else {
        graph = h.controller.dependencyGraphBuilder.uidToNode.ToGonumGraph()
    }

    data, err := dot.Marshal(graph, "full", "", "  ")
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }
    w.Header().Set("Content-Type", "text/vnd.graphviz")
    w.Header().Set("X-Content-Type-Options", "nosniff")
    w.Write(data)
    w.WriteHeader(http.StatusOK)
}
获取对象关联关系图

获取全部的对象关联关系图:

curl http://{master_ip}:{kcm_port}/debug/controllers/garbagecollector/graph -o {output_file}

获取特定uid的对象关联关系图:

curl http://{master_ip}:{kcm_port}/debug/controllers/garbagecollector/graph?uid={project_uid} -o {output_file}

示例:

curl http://192.168.1.10:10252/debug/controllers/garbagecollector/graph?uid=8727f640-112e-21eb-11dd-626400510df6 -o /home/test

garbage collector介绍

Kubernetes garbage collector即垃圾收集器,存在于kube-controller-manger中,它负责回收kubernetes中的资源对象,监听资源对象事件,更新对象之间的依赖关系,并根据对象的删除策略来决定是否删除其关联对象。

garbage collector架构图

garbage collector的主要组成为1个图(对象关联依赖关系图)、2个处理器(GraphBuilderGarbageCollector)、3个事件队列(graphChangesattemptToDeleteattemptToOrphan)。

garbage collector启动分析

garbage collector的启动主要是启动了2个处理器(GraphBuilderGarbageCollector),定义了对象关联依赖关系图以及3个事件队列(graphChangesattemptToDeleteattemptToOrphan)。

从apiserver list/watch的事件会放入到graphChanges队列,而GraphBuildergraphChanges队列中取出事件进行处理,构建对象关联依赖关系图,并根据对象删除策略将关联对象放入attemptToDeleteattemptToOrphan队列中,接着GarbageCollector会从attemptToDeleteattemptToOrphan队列中取出事件,再从对象关联依赖关系图中获取信息进行处理,最后回收删除对象。