Digital Twin-Empowered Network Planning for Multi-Tier Computing

时间：2024-08-31

Conghao Zhou,Jie Gao,Mushu Li,Xuemin(Sherman)Shen,Weihua Zhuang

Abstract—In this paper,we design a resource management scheme to support stateful applications,which will be prevalent in sixth generation(6G)networks.Different from stateless applications,stateful applications require context data while executing computing tasks from user terminals(UTs).Using a multi-tier computing paradigm with servers deployed at the core network,gateways,and base stations to support stateful applications,we aim to optimize long-term resource reservation by jointly minimizing the usage of computing,storage,and communication resources and the cost of reconfiguring resource reservation.The coupling among different resources and the impact of UT mobility create challenges in resource management.To address the challenges,we develop digital twin(DT)empowered network planning with two elements,i.e.,multi-resource reservation and resource reservation reconfiguration.First,DTs are designed for collecting UT status data,based on which UTs are grouped according to their mobility patterns.Second,an algorithm is proposed to customize resource reservation for different groups to satisfy their different resource demands.Last,a Meta-learning-based approach is developed to reconfigure resource reservation for balancing the network resource usage and the reconfiguration cost.Simulation results demonstrate that the proposed DTempowered network planning outperforms benchmark frameworks by using less resources and incurring lower reconfiguration costs.

Keywords—6G,digital twin,network planning,multitier computing,Meta-learning

I.INTRODUCTION

The sixth generation(6G)networks are expected to support a wide range of computing applications[1].A large portion of these applications are stateful,meaning that context data is required to execute computing tasks[2,3].For example,augmented reality applications require volumetric media objects or holograms,as the context data,to process video segments for user terminals(UTs).The prevalent mobile edge computing(MEC)paradigm provides a solution to supporting computing applications with low offloading delay but has limitations in supporting stateful applications[4].Specifically,edge servers close to UTs generally have limited storage capacity to store all context data of stateful applications.Moreover,even if the context data could be fully stored at an edge server,limited communication coverage and a relatively small number of UTs served by the edge server would degrade storage resource utilization.

To address the above limitations,both the industry and the academia have started looking into the collaboration of servers[5,6].Extending from MEC,multi-tier computing integrates multiple servers deployed at the core network,gateways,access points,and other locations in the network for executing computing tasks from UTs.Servers at different tiers have diverse features in terms of resource capacity and service coverage[7].Specifically,servers deployed at the core network and gateways have larger service coverage and more abundant resources than servers deployed at access points.Through coordinating servers at different tiers,multi-tier computing can exploit different features of servers to support computing applications,especially the stateful ones,in the era of 6G.

Network planning,as an important part of network management,can facilitate the coordination of servers at different tiers.Network planning consists of resource reservation and resource reservation reconfiguration[1].Resource reservation refers to proactively reserving network resources for satisfying the upcoming resource demands from UTs.Resource reservation reconfiguration refers to timely updating resource reservation decisions to adapt to time-varying resource demands and dynamic network environments.Network planning for supporting stateful applications faces four challenges.First,the reservation of computing,storage,and communication resources for stateful applications is tightly coupled,yielding existing resource reservation solutions for supporting stateless applications inapplicable.Second,the requests for context data may vary across a network,rendering both computing task assignment and storage resource reservation dependent on specific servers and UT mobility patterns.Third,information regarding individual UT status,e.g.,UT mobility,is unavailable at the time of network planning,yet such UTlevel information can be useful for accurately calculating the amount of network resources needed for supporting stateful applications[8].Fourth,reconfiguring computing and storage resource reservation for stateful applications in a dynamic network environment yields additional costs due to computing service interruption,which complicates resource reservation reconfiguration[9].Addressing the above challenges is important to accurate and adaptive network planning for supporting stateful applications in 6G.

Recently,the digital twin paradigm has started attracting attention as a potential solution to advancing network management for 6G[10].The concept of digital twins(DTs)originates from product life-cycle management in industry,where a DT is a synchronized virtual replica of a physical object[11,12].For 6G networks,DTs can be introduced to represent individual UTs.Each DT consists of a UT data profile that describes the corresponding UT,including the UT’s mobility,service demands,and quality of service(QoS)satisfaction,and DT functions for data acquisition,processing,and analysis[10].The introduction of DTs brings three benefits to network planning.First,historical data contained in DTs can be used to predict UT status in the upcoming time interval,which can,in turn,facilitate customized resource reservation for highly diversified UTs and enable fine-grained network planning.Second,data indicating the performance of network planning can be collected based on DTs,which can provide a foundation for resource reservation reconfiguration in network planning to adapt to a highly dynamic network environment.Third,DTs should acquire extensive and well-organized data that can be used to explore and exploit hidden network characteristics,thereby facilitating effective data-driven network planning approaches to enhancing network performance.Due to the above benefits,DTs can be exploited and designed to improve the granularity,adaptivity,and intelligence of network planning in 6G.

In this paper,we design a network planning scheme for supporting a stateful application in the scenario of multi-tier computing.Our research objective is to find out the minimum amount of network resources(including computing,storage,and communication)needed for supporting the application and also balance the resource usage and the cost of resource reservation reconfiguration in a dynamic network environment.To achieve this objective,we propose a DTempowered network planning framework with the following two elements:group-based multi-resource reservation and closed-loop resource reservation reconfiguration.First,we design DTs for individual UTs to characterize their status and group them based on their mobility patterns.We propose an algorithm based on matching theory and particle swarm optimization to address the coupling relation among computing,storage,and communication in resource reservation.The proposed method enables customized resource reservation for satisfying different resource demands of UT groups with different mobility patterns.Second,we develop a metalearning-based approach for resource reservation reconfiguration to cope with the dynamic network environment.The main contributions of this paper are the followings.

•We propose a novel network planning framework to facilitate fine-grained resource reservation based on UT data contained in DTs.

•We address a challenging multi-resource reservation problem for supporting stateful applications in multi-tier computing.

•We develop an automated closed-loop approach to reconfigure resource reservation in a dynamic network environment for balancing the network resource usage and the cost of reconfiguring resource reservation.

The remainder of this paper is organized as follows.Section II provides an overview of related studies.Section III describes the considered network scenario,the proposed DTempowered framework,and the system model.Section IV formulates the network planning problem for multi-tier computing.Sections V and VI introduce the proposed solutions for resource reservation and resource reservation reconfiguration,respectively.Section VII presents the simulation results,followed by the conclusion and future work in section VIII.

II.RELATED WORK

Network resource management is conducted in both shortterm and long-term[1].Short-term resource allocation in the operation stage relies on real-time information on individual UTs,such as UT locations,and targets real-time UT satisfaction.By contrast,long-term resource reservation in the planning stage is based on aggregated information of UTs,such as the number of UTs covered by an access point,and focuses on network performance,such as resource utilization over a relatively long time period,ranging from several minutes to hours[13,14].

Most works on real-time resource allocation focus on computing task offloading and service placement,among which many consider one-tier computing such as cloud computing or MEC[15-19].Based on the real-time computing task arrival of each UT,decentralized and centralized communication and computing resource allocation approaches are proposed to minimize the delay of computing task offloading and executing in cloud computing and MEC,respectively[15,16].Service placement is studied in MEC based on the real-time UT location and the type of service required by each UT to maximize the number of UTs that can be severed under each edge server’s resource capacity[17,18].Some works focus on resource allocation for multi-tier computing[20-24].Computing resources on fog nodes and the cloud server are allocated to UTs at different locations to satisfy the delay requirements of their computing tasks[20].Li et al.investigate a service placement approach for cloud and edge computing to satisfy each UT’s computing demands[21].Given that the same type of computing tasks can share computing results,Yu et al.study joint computing task offloading and service placement in multi-tier computing to reduce the delay of executing computing tasks[22].Considering space-ground integrated networks,the authors in Refs.[23,24]investigate how to allocate computing and communication resources available at satellites,terrestrial access points,and UTs to minimize the delay of executing computing tasks from UTs.

There are less studies on long-term resource reservation for computing task offloading.Based on the aggregated computing demands from all access points,a proactive computing resource reservation approach in MEC is designed to minimize the delay of executing computing tasks[25]and maximize resource utilization in computing task execution[26],respectively.Yin et al.study edge server placement to minimize the network resource usage based on statistical computing demands[27].There are limited works on long-term resource reservation in multi-tier computing[8,28].Considering edge and cloud computing,Zhou et al.propose a computing resource reservation approach to minimize network resource usage while satisfying different delay requirements of two applications[28].For servers located at different tiers of space-ground integrated networks,joint communication and computing resource reservation is studied to minimize the long-term cost of delay requirement violation and network reconfiguration[8].

While allocating resources to UTs to satisfy their real-time computing demands is important,proactively reserving resources on servers is also essential.In this work,we focus on network planning in the presence of the coupling relation among computing,storage,and communication resources,and the impact of UT mobility in long-term resource reservation to support stateful applications.We leverage DTs to improve the granularity and effectiveness of network planning as compared to conventional resource reservation.

III.NETWORK SCENARIO AND SYSTEM MODEL

In this section,we first introduce the considered network scenario of multi-tier computing.Then,we propose a DTempowered network planning framework to support stateful applications and present the corresponding system model.

A.Network Scenario

The considered scenario is shown in Fig.1.Computing servers are deployed at different locations,including:(i)base stations(BSs);(ii)network aggregation points(NAPs),such as gateways;and(iii)the core network(CN)[4].Each server can build one dedicated virtual machine(VM)to execute computing tasks from UTs for the stateful application.The service coverage area of a server at a BS(S-BS)is the BS’s communication coverage.The service coverage area of a server at a NAP(S-NAP)is the union of the communication coverage of all the BSs connected to it.For example,S-NAP 1 connects to S-BS 1 and S-BS 2,and the communication coverage area of S-NAP 1 consists of the two green cells as shown in Fig.1.The server at the CN(S-CN)can provide computing service to all UTs in the considered network.The service coverage areas of servers at the same tier of the network do not overlap.UTs generate computing tasks and offload their computing tasks to a server when located in its service coverage area.We assume that the computing task generation at all UTs corresponds to an identical statistical process due to the same stateful application.The VM at the server then executes the offloaded computing tasks and sends computing results back to the UTs.Denote the S-CN byecn,and letEbsandEnapdenote the set of S-BSs and the set of S-NAPs,respectively.LetE=Ebs∪Enap∪{ecn}represent the set of all servers in the network.Denote the set of BSs byB={1,2,···,B},and letEb⊂Edenote the set of servers that include BSb∈Bin their service coverage areas.

Fig.1 The considered scenario of multi-tier computing

We focus on resource reservation to support stateful applications,which requires context data for executing computing tasks.The application uses a fixed set of context data,and the popularity of different data chunks in this set is different.Moreover,the popularity of each data chunk can vary at different BSs across the network,i.e.,the popularity of context data chunks is location-dependent.Therefore,if a UT moves intothe coverage of a different BS,its requests for context data may also change,creating a dependence of its resource demand on its mobility.A UT that connects to more BSs within a time interval is considered to have higher mobility.UTs with similar mobility patterns can be grouped together in resource reservation due to their similar need for context data.

The time duration of interest is partitioned intoKtime intervals of lengthτ(e.g.,5 to 10 minutes per time interval).Denote the set of time intervals byK={1,2,···,K}.A central controller in the CN maintains the DTs for UTs and makes resource reservation decisions proactively for the stateful application at the beginning of each time interval.We define vectorϕd,kto represent the mobility pattern of UTdin time intervalk∈K,where thebth element ofϕd,krepresents the time duration in which UTdis in the coverage of BSbduring the time interval.Tab.1 lists the important symbols for easy reference.

Tab.1 List of symbols

B.DT-empowered Network Planning

In this subsection,we present the idea of DT-empowered network planning with our specific design of DTs,which is a succession and development of the framework in Ref.[10].A DT is created for an individual UT,called UDT,which consists of a UT data profile and UDT functions.As shown in Fig.2,UDTs are located at the CN,and UT data profiles are maintained and updated by the central controller via UDT functions.Specifically,each UT data profile is a wellorganized set of UT data.In this work,the data attributes of each UDT consist of the mobility of the corresponding UT,including the UT’s location and velocity,as well as the information of each computing task from the UT,including context data requirement,input and output data size,computing workloads,and resource demands.There are three UDT functions used to manage and analyze UT data profiles for network planning,as follows.

Fig.2 The designed UDTs used for network planning

•Data collection and storage function:UT data required for network planning is collected from individual UTs via BSs or offered by service providers.Data regarding UT mobility,e.g.,UT locations,can be uploaded by individual UTs periodically,as specified in 5G.Data regarding services,e.g.,computing workloads and required context data,can be obtained from service providers or computing servers[4].The collected UT data is stored in the corresponding UDTs.

•Data processing function:At the beginning of time intervalk,the mobility patterns of each UT in the pastTtime intervals,i.e.,ϕd,k′,∀k′∈[k-T,k-1],are obtained based on the UT data in the corresponding UDT.Then,the historical mobility patterns are used to predict the mobility pattern of each UT,i.e.,ϕd,k,in the subsequent time interval.Other data prediction,e.g.,spatial task distributions and requested context data,based on historical data in UDTs is also conducted via this function.

•UT grouping function:UTs are grouped based on their predicted mobility patterns,i.e.,{ϕd,k,∀d},at the beginning of each time interval.UTs from the same group have similar network resource demands for executing computing tasks due to their similar mobility patterns,allowing for an accurate approximation of resource demands for UTs from each group.Based on UT grouping,network resources can be reserved accurately to achieve fine-grained network planning1Note that the mechanism of UT grouping,as well as the data attributes used for grouping,can be customized based on the data contained in UDTs in different network scenarios.The number of groups depends on the trade-off between granularity and complexity..

The UDT functions are used to manage and analyze UT data profiles to empower network planning.However,the UDT functions do not make network planning decisions but only provide information for network planning.

Based on UDTs,we propose a novel network planning framework as shown in Fig.3,which includes two core elements:group-based resource reservation and closed-loop resource reservation reconfiguration.

Fig.3 An illustration of DT-empowered network planning:(a)Workflow;(b)Time line

•Group-based resource reservation:Based on UT grouping,we reserve storage and computing resources accurately to satisfy the resource demands for user equipments(UEs)in each group in the upcoming time interval.Denote the set of groups byN={1,2,···,N}.Letxnb,kdenote the number of computing tasks generated by the UTs who are associated with groupnand are in the coverage of BSbduring time intervalk.We refer to matrixxnk=[xnb,k]∀b∈Bas spatial task distribution of groupnin time intervalk.Since it is impossible to know the actual value ofxnkat the beginning of time intervalk,xnkis predicted based on historical data contained in UDTs.We use superscript“～”,e.g.,˜x,to represent the predicted values ofx.Given spatial task distributions,we propose an algorithm to address the storage and computing resource reservation problem with highly coupled variables.The detail of group-based resource reservation is discussed in section V.

•Closed-loop resource reservation reconfiguration:Since spatial task distributions may change across time intervals,reconfiguring the reserved resources on servers to adapt to dynamic computing demands is necessary but yields additional costs of reconfiguring resource reservation.We design a closed-loop approach to reconfigure resource reservation reconfiguration for balancing the network resource usage and the cost of reconfiguring resource reservation,and the time line is shown in Fig.3(b).Specifically,at the beginning of each time interval,the central controller identifies whether to reconfigure resource reservation with a proposed algorithm.If the resource reservation needs to be reconfigured for the subsequent time interval,the controller will make a new resource reservation decision;Otherwise,the controller will keep using the resource reservation from the previous time interval.At the end of each time interval,the data regarding network performance is collected based on UDTs.The detail of closedloop resource reservation reconfiguration is discussed in section VI.

C.Computing Model

The overall load of computing tasks assigned to each server during a time interval affects network resource reservation on the server.Denote the load of computing tasks generated by UTs from groupnin the coverage of BSband assigned to servereduring time intervalkbyfnb,e,k.The relation between computing task assignment and spatial task distribution is given by Letmne,kdenote the overall load of computing tasks generated by UTs from groupnand assigned to servereduring time intervalk,whereBedenotes the set of BSs within the service coverage of servere.In(2),the fine granularity of computing task assignment reflected throughfnb,e,khelps determine the computing load at each server accurately.

Since we consider one specific stateful application,the computing tasks for this application are assumed to have(approximately)the same input data size(in bits),computing workload(in CPU cycles per bit),and result data size(in bits)2The proposed framework can be straightforwardly extended to handle the case of computing tasks with different input data sizes,computing workloads,and result data sizes by leveraging DTs to collect such information for each computing task.The problem-solving approach(including the proposed algorithms)remains applicable in such a case..Letα,β,andγdenote the average input data size,the average computing workload,and the average result data size,respectively.Each server can reserve a certain amount of computing resource(in CPU cycles per second)for VMs to execute computing tasks.Letcne,kdenote the amount of computing resource of serverereserved for groupnduring time intervalk,and defineck=[cne,k]∀e∈E,n∈Nas the computing resource reservation decision in time intervalk.For time intervalk,the computing resource reservation should satisfy the following constraint.

whereCedenotes the maximum computing resource at serverethat can be utilized for the stateful application.We assume that the computing resources on the S-CN,i.e.,e=ecn,is sufficient for executing all computing tasks.The time that serveretakes for executing the computing tasks assigned to it from groupnduring time intervalkshould satisfy the following requirement.

whereτpdenotes the maximum tolerable computation time3We do not consider queuing delay in the planning stage because we do not assume a particular computing task arrival pattern.For modeling the queueing delay,the task arrival pattern must be known a priori.However,such a pattern is usually unavailable in practice,while assuming a particular arrival pattern can oversimplify the scenario..Due to(4),computing resource reservation and computing task assignment for each server are mutually dependent.

Letϵce,krepresent the overall computing resource usage for executing all computing tasks assigned to serverein time intervalk,which is computing load-dependent.Based on Refs.[29,30],we adopt a linear model of computing resource usage as follows.

whereεceis the computing resource usage for executing each computing task at servere.

D.Storage Model and Remote Access

Different from stateless applications,the execution of a computing task for stateful applications requires the corresponding context data.If the context data is not in the storage of the server,the server should download the context data from a remote server,thereby yielding additional communication resource usage.We model the storage and the additional communication resource usage for the stateful application in this subsection.

Denote the amount of the reserved storage resource(in bits)of servereduring time intervalkand the storage capacity(in bits)of serverebyge,kandGe,respectively.The value ofge,kshould satisfy the following constraint.

We assume that the storage resources reserved on the S-CN,i.e.,e=ecn,is sufficient for storing all context data.Definegk=[ge,k]∀e∈E{ecn}as the storage resource reservation decision for all S-NAPs and S-BSs in time intervalk.Based on the model of storage resource usage in Ref.[31],the resource usage for reserving storage resource at serverein time intervalk,denoted byϵse,k,is given by

whereεserepresents the per bit resource usage for reserving storage resource at servere.

LetI={1,2,···,I}represent the set of all chunks of context data in the library for the stateful application,whereIdenotes the number of chunks in the setI.All chunks of context data for the stateful application have the same data structure and thus identical data size(in bits),denoted byL.Executing each computing task requires one chunk of context data in the setI[32].Denote the request ratio of chunkiin the coverage of BSbin time intervalkbypib,k,i.e.,the load of computing tasks requesting chunki∈Iover the load of all computing tasks generated in the coverage of BSbin time intervalk.The value ofpib,kmay be different in the coverage of different BSs and may vary across different time intervals.Letpk=[pib,k]∀b∈B,i∈Idenote the chunk request ratio profile in time intervalk,which can be obtained via prediction based on historical data contained in UDTs[33].

The overall data volume of all chunks may be much larger than the storage capacities of S-BSs and S-NAPs.Each SBS and S-NAP can only store some chunks of context data for executing computing tasks prior to the beginning of each time interval.The S-CN stores all chunks of context data.LetIe,k⊆Idenote the set of chunks stored at serverein time intervalk.Given the amount of reserved storage resource on servere,i.e.,gk,the number of chunks in setIe,kis|Ie,k|=whererepresents the floor function.

Since our focus is storage resource reservation,we follow the hierarchical storage policy in Refs.[34,35]to determine the set of stored chunks on each S-BS and S-NAP,i.e.,Ie,k,based on the effective request ratio of chunks.Define the effective request ratio of chunkifor executing the computing tasks assigned to serverein time intervalkasqie,k,i.e.,the load of computing tasks requesting chunki∈Iover the load of all computing tasks generated in the service coverage of serverein time intervalk.In multi-tier computing,the stored chunks on S-NAPs and the S-CN may not be requested for executing computing tasks when the chunks are stored on S-BSs for the sake of reducing communication resource usage.As a result,the effective request ratio of a chunk for an S-NAP depends on not only the chunk request ratio profile,i.e.,pk,but also the set of chunks stored on S-BSs and computing task assignment,which is difficult to determine.

For simplicity,we make two following assumptions to estimate the effective request ratio for each S-NAP.First,S-BSelocated at BSbstores|Ie,k|chunks with largestpib,k.Second,given setIe,kfor any S-BS,the computing tasks requesting a chunk stored on the S-BS are assigned to the S-BS as much as possible when not violating the computing resource capacity of the S-BS[35].Based on the two assumptions,the value ofqie,kfor an S-BS equals to the request ratio of chunkiin the coverage of the corresponding BS in time intervalk,i.e.,pib,k.The value ofqie,k(gk,pk)for an S-NAP can be estimated givengkandpk,which is detailed in Appendix A).Each S-BS and S-NAP sorts the chunks in a non-increasing order based on the effective request ratio,i.e.,qie,k(gk,pk),and stores the most requested|Ie,k|chunks.

When the chunk required for executing a computing task is not found in the storage of an S-BS,the S-BS will first attempt to download the chunk of context data from the S-NAP covering it and resort to the S-CN in the case that the S-NAP does not have the chunk of context data either.When the chunk required for executing a computing task is not found in the storage of an S-NAP,the server will download the context data from the S-CN.For any server,accessing another server remotely and downloading context data from the remote server yields additional communication resource usage,referred to as remote access of context data[36].

Denote the data volume(in bits)of remotely accessing context data for each computing task byLre,the value of which may be different from the value ofLdue to headers used by transmission protocols.Letdenote the number of computing tasks that require chunkiand are generated by groupnin the coverage of BSband assigned to servereduring time intervalk.The relation betweenfnb,e,kdefined in(1)andfn,ib,e,kis.Define the computing task assignment in time intervalkas.For S-BSe,the communication resource usage for downloading context data from S-NAPe′covering S-BSefor executing computing tasks from groupnduring time intervalkis given by

where coeffciientsξee′andξeecnrepresent the communication resource usage for downloading per bit context data to S-BSefrom S-NAPe′and the S-CN,i.e.,ecn,respectively.In(8),the first term represents the communication resource usage for servereto access the context data remotely from servere′at the NAP,and the second term represents the communication resource usage for servereto access the context data remotely from S-CNecn.For S-NAPe,the communication resource usage for downloading context data from the S-CN during time intervalkis as follows.

wheredenotes the communication resource usage for downloading per bit context data to S-NAPefrom the S-CN,i.e.,ecn.

E.Communication Model

Generally,communication resource usage for uploading the input data and downloading the result of a computing task involves two parts:(i)the resource usage for the wireless communication between the UT and the connected BS;and(ii)the resource usage for the wired communication between the BS and a server.In the considered scenario,each UT is associated with only one BS.Regardless of which computer server deployed in the multi-tier network is selected for executing the computing task,the resource consumption for the wireless communication between the associated BS and the UT for uploading input data and downloading computing results is a constant.As a result,the wireless communication does not affect the solution of the resource reservation problem.Since S-BSs and BSs are co-located,there is no additional communication resource usage if any computing task is processed at an S-BS.

Denote the maximum communication resource usage of servere∈Enap∪{ecn}for uploading input data and downloading result data byVupeandVdowne,respectively.Letηb,edenote the coefficient representing the communication resource usage for uploading and downloading per bit data between BSband servere.The communication resource usage for uploading input data and downloading computing results between servereand BSbduring time intervalkare given by

and

respectively.

IV.PROBLEM FORMULATION

We formulate the problem of planning-stage resource reservation for multi-tier computing to support stateful applications in this section.We aim to find out the minimum amount of network resources needed for supporting the application while balancing the resource usage and the cost of reconfiguring resource reservation in the presence of network dynamics.

Letrk=[ck,gk,fk]denote the resource reservation decision,including computing resource reservation,storage resource reservation,and computing task assignment,in time intervalk.Variablerkis determined at the beginning of time intervalk.Givenrk,the overall network resource usage in time intervalk,denoted byΔk(rk),can be obtained based on(5),(7),(10),and(11),as follows.

Letak∈{0,1}indicate whether the resource reservation in time intervalkshould be reconfigured or not,which is determined at the beginning of time intervalk.Ifak=0,the controller makes a new resource reservation decision for time intervalk;Otherwise,the controller keeps using the resource reservation from time intervalk-1.Define=[]∀b∈B,n∈Nas spatial task distributions of groups,referred to as group-based spatial task distribution.Let functionrk=ψ(,pk),representing that the resource reservation decision is made according to the group-based spatial task distribution and the chunk request profile in time intervalk.The value ofrk,resource reservation in time intervalk,evolves as follows.

Given the value ofak,the cost of reconfiguring resource reservation in time intervalk,denoted byovk,is as follows.

The problem of minimizing the long-term network resource usage and the cost of reconfiguring resource reservation overKtime intervals,is formulated as follows.

whereVredenotes the maximum communication resource usage of remote access for one computing task,a=[ak]∀k∈K,R=[rk]∀k∈K,andλis the weight balancing the network resource usage and the cost of reconfiguring resource reservation.R+represents the set of positive real numbers,and N represents the set of natural numbers.Constraint(15d)limits the communication resource usage of remote access averaged overall computing tasks from each group to be less thanVnre.The solution of problem P0 provides a lower bound on the resources needed to support the stateful application,taking into account the resource reservation reconfiguration cost in network planning.If the network resources are sufficient,more resources can be reserved for the application for better service quality.In addition,in a practical network supporting multiple applications,statistical multiplexing among resources reserved for different applications can be implemented for high resource utilization.

Solving problem P0 is challenging due to the following two reasons:(i)For each time interval,determining the value ofrkis a mix-integer optimization problem,and the variables offkandgkare mutually dependent;(ii)determining the value ofakis a sequential decision-making problem,and the decision at any time interval affects the subsequent decisions.To solve problem P0,we decouple it into two problems.We propose algorithms for resource reservation in each time interval,which are presented in section V,and a learning-based approach for reconfiguring resource reservation over multiple time intervals,which is presented in section VI.

V.GROUP-BASED RESOURCE RESERVATION

In this section,we design an algorithm to enable groupbased resource reservation at each time interval.

Whenak=1,the controller keeps using the resource reservation decision from time intervalk-1.For whenak=0,we formulate the group-based multi-resource reservation problem in time intervalk,given the predicted spatial task distributions,as follows.

Problem P1 is a combinatorial optimization problem,and variablesfkandgkare still coupled.We first address computing task assignment,i.e.,fk,and computing resource reservation,i.e.,ck,given a storage resource reservation decision,i.e.,gk.Then,we leverage particle swarm optimization to find the solution of storage resource reservation.The solution of problem P1 corresponds to group-specific resource reservation,and the total amount resources to be reserved for all groups can be calculated accordingly.The reserved resources can be multiplexed among different groups.

A.Computing Task Assignment and Computing Resource Reservation

Since multiple computing tasks can be assigned to the same server,assigning computing tasks to servers is many-to-one matching.We first transform the many-to-one matching into a one-to-one matching.Specifically,severalvirtual serversare created to represent each physical server.Denote the number of virtual servers for S-NAP or S-BSebyNe.The value ofNeis the maximum number of computing tasks that can be assigned to serverewhile not violating the constraints of resource usage,i.e.,constraints(3),(15b),and(15c).The number of virtual servers for physical servereis given by

Given the amount of storage resource reserved on S-BSs and S-NAPs,the sets of stored context data chunks,i.e.,Ie,k,on all S-BSs and S-NAPs are determined based on the hierarchal storage policy described in section III.D.Denote bythe network resource usage,including resource usage from computing,uplink communication,downlink communication,and remote access,for executing a computing task that requests chunkiand is generated by a UT from groupnin the coverage of BSbduring time intervalkat servere.The calculation ofconsists of two parts.The first part is the resource usage from computing,uplink communication,and downlink communication,which is not related to the sets of stored chunks,while the second part is the communication resource usage from remote access,which depends on the sets of stored chunks.Denote byWb,e=wcεce+wo(α+γ)ηb,ethe sum of resource usage from computing,uplink communication,and downlink communication used to execute a computing task generated in the coverage of BSbat servere.According to the communication resource usage for remote access,the calculation ofis categorized into the following four cases.

In(18),if chunkiis not stored on S-BSebut stored on SNAPe′,i.e.,,the communication resource usage for S-BSeto remotely access S-NAPe′for one computing task iswoLre;If chunkiis not stored on S-BSeor S-NAPe′,i.e.,,the communication resource usage for S-BSeto remotely access the S-CN for one computing task iswoLre;If chunkiis not stored on S-NAPe,i.e.,i̸∈Ie,k,the communication resource usage for S-NAPeto remotely access the S-CN for one computing task iswoLre;Otherwise,chunkiis stored on servere,and no communication resource is used for remote access.

Denote the set of virtual servers and the set of computing tasks byUandT,respectively.Given computing taskt∈Tand virtual serveru∈U,we can determine the corresponding values of(n,i,b)of the computing task and physical servere.Letrepresent the sum of computing and communication resource usage for executing computing tasktat virtual serveruin time intervalk,which can be calculated via(18)based on the corresponding values of(n,i,b)and physical servere.We introduce variable0,1}to indicate whether to assign computing tasktto virtual serveruin time intervalkor not and define.If computing tasktis assigned to virtual serveru,=1.Otherwise,=0.Finding the solution offkin problem P1 can be transformed into finding the solution ofzk,as follows.

We propose a matching-based computing task assignment(MCLA)algorithm to select a virtual server for each computing task,as shown in Algorithm 1.

We construct the preference listfor virtual serveruin time intervalk,which is a vector containing the indexes of all computing tasks that can be assigned to virtual serveruin time intervalk,sorted by the value ofin a nondecreasing order.In each iteration,the controller checks the current preference list for each virtual server and selects the first computing task in the preference list as the proposal from the virtual server.After that,the selected computing task is removed from the virtual server’s preference list.The controller selects a proposal for each computing task to minimize the objective function in problem P2 while satisfying constraint(15d),which is a 0-1 Knapsack problem.LetUnot⊂Udenote the set of virtual servers that are not yet matched.A dynamic programming approach in Ref.[37]is adopted to adjust the matching result,i.e.,selecting proposals for virtual servers in the setUnotand adjusting the matched proposals for virtual servers in the setUUnotfor minimizing the objective function in problem P2 while satisfying constraint(15d).Then,setUnotis updated accordingly after each iteration.A computing task should be re-proposed for each virtual server in the setUnotin the next iteration.The matching process terminates when all computing tasks in the setTare successfully matched.Based on the matching result,i.e.,zk,we can determine the computing task assignment decision,i.e.,fk.

Algorithm 1 MCLA algorithm 1 Input:Lserver u,k,∀u∈U 2 Initialization:U,U not=U,T,j=0;3 while|T|=0 do 4 j=j+1;5 for u∈U not do 6 Select the first computing task in preference list Lserver u,k as the proposal from virtual server u;7 Remove the selected computing task from preference list Lserver u,k;8 end 9 Adopt the dynamic programming in Ref.[37]to select proposals from the new proposals in iteration j and adjust the matched proposals in iteration j-1 for minimizing the objective function in(19)while satisfying constraint(15d).10 Remove the matched virtual servers from U not;11 Add the unmatched virtual servers to U not;12 Remove the matched computing tasks from T based on the matching result in iteration j;3 end 4 fk←zk;5 Output:fk

Givenfk,the numbers of computing tasks assigned to the servers,i.e.,mk,are determined based on(2).Then,computing resource reservation in time intervalkis determined according to(3),which is:

The time complexity of Algorithm 1 depends on the number of iterations of the outer loop for matching(Lines 5～13)and the time complexity of solving the knapsack problems in each iteration(Line 9).For the outer loop,the time complexity of solving the matching problem isO(ZXk),whereZ=∑e∈E NeandXk=∑n∈N∑b∈B xnb,kdenote the number of all virtual servers and the number of all computing tasks in time intervalk,respectively[38].In each iteration,we should solve a knapsack problem to satisfy constraint(15d)for every group.The time complexity of the adopted dynamic programming approach for groupnisO(VnreXkn),wheredenotes the number of all computing tasks from groupnin time intervalk[37].Therefore,the time complexity of Algorithm 1 isO(ZXk∑n∈N Vren Xnk).

B.Storage Resource Reservation

Determining the amount of storage resources reserved on S-NAPs and S-BSs is a combinatorial optimization problem.Therefore,finding the globally optimal solution is challenging[39].Evolutionary heuristics,specifically particle swarm optimization(PSO),is leveraged to achieve the local optima of the problem.Based on PSO,we propose a resource reservation(RR)algorithm to solve problem P1.In the proposed RR algorithm,we leverage a number of particles,referred to as the particle swarm,where the position of each particle corresponds to the solution of storage resource reservation,i.e.,gk.In each iteration,each particle moves within the solution space while adjusting its position and speed dynamically based on Ref.[39].After repeating such an iteration multiple times,the positions of all particles can converge to the same position,which is the found solution of storage resource reservation[40].Given the storage resource reservation,the values offkandckcan be determined based on section IV.A.

Algorithm 2 RR algorithm 1 Input:ς,ς1,ς2,φ1,φ2,lmax 2 Initialization:l=0,s(0)y=0,g(0)y∈F,∀y∈Y;3 f(0)y,∀y∈Y←Calculate by Algorithm 1 given g(0)y;4ˆΔy,∀y∈Y←Calculate by(12)given c(0)y,g(0)y,and f(0)y;5ˆgy,∀y∈Y←g(0)y,∀y∈Y;6Δ*←min■ˆΔy,∀y∈Y■;7 g*←ˆgy′where y′=argminy■ˆΔy,∀y∈Y■;8 while l≤lmax do 9 for y∈Y do 10 if constraint(6)is not satisfied then 11 g(l)y←Select a position in F randomly;12 end 13 f(l)y←Calculate by Algorithm 1 given g(l)y;14 Δ(l)y←Calculate by(12)given c(l)y,g(l)y,and f(l)y;15 ifΔ(l)y＜ˆΔy then 16 ˆg,ˆΔy←g(l)y,Δ(l)y;17 end 18 ifΔ(l)y＜Δ*then 19 g*,Δ*←g(l)y,Δ(l)y;20 end 21 s(l+1)y ,g(l+1)y ←Update by(21)and(22),respectively;22 end 23 end 24 f*,c*←Calculate by Algorithm 1 and(20)given g*,respectively;25 Output:g*,f*,c*,Δ*

The detailed procedures of the RR algorithm are introduced in Algorithm 2.Since the RR algorithm applies to any time interval,we omit subscript“k”in the rest of this subsection.We define the solution space of the storage resource reservation problem as F,where the possible solution offk,i.e.,particles’positions,should satisfy constraint(6).Denote the set of particles and the position of particleyin thelth iteration byYandg(l)y,respectively.Letandg*denote the best position of particleyand the best position among all particles’positions,i.e.,the particle swarm’s best position,up to thelth iteration,respectively.Accordingly,letΔ(l)y,,andΔ*denote the value ofΔin problem P1 given positiong(l)y,,andg*,respectively.Based on Ref.[40],the speed of particley∈Yin thelth iteration,denoted bys(l)y,evolves as follows.

where parameterςis the weight for each particle to keep its speed from the previous iteration.Parametersς1andς2are cognitive and social coefficients for learning from each particle’s own best position and the particle swarm’s best position up to the current iteration,respectively,and both are positive random variables for exploring the solution space[40].The position of particley,i.e.,g(l)y,in thelth iteration is given by

If a particle moves out of the solution space,the particle is replaced by a new particle with a random position in the solution space F.In this way,the positions of all particles are guaranteed to satisfy constraint(6).

Algorithm 2 shows the detail of the proposed RR algorithm.Denote the maximum number of iteration bylmax.Line 2 initializes all particles with the positions in the solution space F.Line 3 to Line 4 obtain the value ofΔin problem P1,i.e.,,given the position of particley,i.e.,g(l)y.Line 5 to Line 7 find the best solution among all particles based on the value of.Line 10 to Line 21 update the position of each particle based on(21)and(22)in each iteration.The outputs of the RR algorithm include the solution of problem P1,i.e.,g*,f*,andc*,and the value ofΔ*.

VI.META-LEARNING-BASED RESOURCE RESERVATION RECONFIGURATION

In the preceding section,we solve the network resource reservation problem for one-time interval.In this section,we determine the value ofa=[ak]∀k∈Kto reconfigure resource reservation decisions amongKtime intervals.

We formulate the sub-problem of resource reservation reconfiguration based on problem P0,as follows.

We definek⋆as the time interval when the latest resource reservation was reconfigured up to time intervalk.In time intervalk⋆,ak⋆=0,aj=1 forj∈[k⋆+1,···,k-1]andk⋆＜k.The relation betweenkandk⋆is given by

Problem P3 is a sequential decision-making problem,which can be solved by reinforcement learning(RL)-based methods[23,26].However,RL-based methods cannot be applied directly to solve problem P3.Resource reservation reconfiguration is owing to the difference in network status,i.e.,spatial task distributions and chunk request ratio profiles in this work,in different time intervals.Identifying the difference between network status can potentially improve the learning efficiency of RL-based methods.

Algorithm 3 MetaR3 approach 1 Input:ρ 2 Initialization:θ,ˆθ,ϑ,k⋆,H1,ˆσ1 3 for k=1,···,K do 4 σk←Obtainˆσ(Hk,Hk⋆;ˆθ);5 ak←Determine by(28);6 rk←Determine by(13);7 Δk(rk),ˆσk+1←Implement rk for time interval k;8 θ,ˆθ,ϑ←Train parameters via(26)and(29);9 k⋆←Update by(24);10 end 11 Output:a

We propose a meta-learning-based resource reservation reconfiguration(MetaR3)approach to solve problem P3,as given in Algorithm 3.At the end of each time interval,data regarding spatial task distributions,chunk request ratio profiles,indicatorakand the network performance is collected and stored based on UDTs.Meta-learning is adopted to capture the similarity between network status in two-time intervalskandk⋆,and the policy of resource reservation decision reconfiguration is learned based on the captured similarity by using RL.Based on the learned policy of resource reservation decision reconfiguration,the value ofakcan be determined.Ifak=0,resource reservation decision is reconfigured at the beginning of time intervalk.

Two components are underlying the proposed MetaR3approach:(i)capturing the similarity between network status during two different time intervals,and(ii)reconfiguring resource reservation in a closed-loop manner,which are presented in the following two subsections,respectively.

A.Similarity Capture

Denote the network status in time intervalkbyhk=[xk,pk].The value ofhkis unavailable at the beginning of the time intervalk.Therefore,we use the data regarding spatial task distributions and chunk request ratio profiles in pastT′time intervals contained in UDTs as the features of the network status in time intervalk,denoted byHk=[hk-T′,···,hk-1].Based onHk,the value ofakis determined.Ifak=0,Hkcan be used to predict the network status,i.e.,hk,for making resource reservation decision.Define the similarity between the network status in different time intervalskandk′asσ(Hk,Hk′).

We leverage meta-learning with siamese neural networks to approximate the value ofσ(Hk,Hk′)[41],as illustrated in Fig.4.Letθanddenote the parameters of the whole siamese neural networks and the parameters of the siamese neural networks without the output layer,respectively.The inputs of the siamese neural networks are the features of network status,i.e.,HkandHk′.The siamese neural networks have two outputs,i.e.,the output of the whole siamese neural network and the output of the penultimate layer.The output of the whole siamese neural network represents the value of similarity,denoted byσ(Hk,Hk′;θ),and the output of the penultimate layer represents the latent factors of similarity,denoted by(Hk,Hk′;).

Fig.4 The proposed MetaR3 approach

Training siamese neural networks should be based on labeled data.Defineϱ(Hk,Hk′)∈{0,1}as a label of the features of network status in two-time intervalskandk′,given by

In(25),Δk(rk)denotes the network resource usage in time intervalkif the resource reservation is reconfigured,i.e.,ak=0.Δk(rk′)denotes the network resource usage in time intervalkif the resource reservation is not reconfigured,i.e.,ak=1,and the resource reservation decision from time intervalk′is used,i.e.,rk′.IfΔk(rk)-Δk(rk′)＞λOv,the network status in two-time intervalkandk′are considered to be“similar”;Otherwise,the network status in the two-time intervals are considered to be“not similar”.The features of the network status in any two-time intervals and the corresponding value ofϱ(Hk,Hk′)are referred to as a labeled data entry.The goal of training the siamese neural networks is to letσ(Hk,Hk′;θ)approximate labelϱ(Hk,Hk′)by using extensive labeled data entries.The parameters of the siamese neural networks,i.e.,θ,are obtained by minimizing the following loss function via gradient descent[41].

The approximated value of similarity,i.e.,σ(Hk,Hk′;θ),can indicate whether the features of network status in twotime intervals are similar or not.However,its information on how much a difference between network statuses is insufficient for reconfiguring resource reservation4The output layer in the siamese neural networks is used for training the siamese neural networks..Therefore,we use the latent factors of similarity,i.e.,(Hk,Hk′;),to determine the value ofak.

B.Closed-loop Resource Reservation Reconfiguration

We leverage deep Q learning with deep neural networks,named Q networks,to determine the value ofakgiven the latent factors of similarity.The state and action in time intervalkare(Hk,Hk⋆;)andak,respectively.For simplicity,letdenote(Hk,Hk⋆;)in the rest of this section.Define a Qvalue function to represent the discounted long-term resource usage and cost of making decisionakin state,given by

whereρ∈(0,1)is the discount factor.In state,akcan be determined based on the Q-values as follows.

The Q network with parameterϑis used to approximate the Q-value function for learning the policy of resource reservation reconfiguration.The parametersϑof the Q networks are obtained by minimizing the following loss function via gradient descent[23].

We summarize the workflow of the MetaR3approach in Algorithm 3.Line 4 to Line 5 determineakat the beginning of time intervalkbased on the predicted network status obtained from UDTs.Givenak,Line 6 to Line 7 determine and implement the resource reservation decision.At the end of the time interval,data regarding network performance,actual spatial task distributions,and chunk request ratio profiles is collected and stored in UDTs.Given the stored historical data,the siamese neural networks and Q networks can be trained to adapt to dynamic spatial task distribution and chunk request ratio profile in a closed-loop manner.

VII.PERFORMANCE EVALUATION

A.Simulation Settings

The simulated multi-tier network consists of one S-CN,two S-NAPs,and four to ten S-BSs.There are 600 UTs with different trajectories.Based on the average time within the coverage of each BS for each UT,these UTs are grouped into two to four groups.The input data size,computing workload,and the size of computing results of each computing task are set to 2 MB,4 Megacycles/s,and 15 MB,respectively[24].The network resource usage and the resource capacity of servers at the same tier can be different.The average network resourceusage,average resource capacity of servers,and other parameters are listed in Tab.2.

Tab.2 Simulation parameters

For the siamese neural networks illustrated in Fig.4,we use 3 fully connected layers with(64,64,32)neurons as an embedding layer.The features of network statusHkandHk′are fed to two embedding layers separately,each with the same structure.The merging layer merges the outputs of the two embedding layers based on Euclidean distance,followed by the penultimate layer with 16 neurons and the output layer with 1 neuron.For the Q networks,we adopt 4 fully connected layers with 128,512,128,32 neurons,respectively.We adopt the RMSprop optimizer and the Adam optimizer for training the siamese neural networks and the Q networks,respectively.

B.Performance of Group-based Resource Reservation

The convergence performance of the proposed RR algorithm for group-based resource reservation is shown in Fig.5.Given the same spatial task distribution and configuration of all servers,we conduct the simulation with 2,8,and 32 particles for 30 iterations.The proposed algorithm converges after 10,14,and 16 iterations,respectively.With more particles,the algorithm achieves better performance at the cost of computation complexity.

Fig.5 Convergence performance of the proposed RR algorithm

Fig.6 Network resource usage per computing task under even and uneven spatial task distributions:(a)Uneven spatial task distribution;(b)Even spatial task distribution

Figs.6(a)and 6(b)compare the network resource usage per computing task,versus the load of computing tasks,of the proposed RR algorithm and the benchmark algorithms.We adopt four benchmark algorithms:(1)BS-first,which assigns computing tasks to S-BSs and reserves storage and computing resources on S-BSs first as much as possible,then on SNAPs,and last on the S-CN.(2)NAP-first,which assigns computing tasks to S-NAPs and reserves storage and computing resources on S-NAPs first as much as possible,then on S-BSs,and last on the S-CN.(3)EA,which assigns computing tasks and reserves storage and computing resources on S-BS,S-NAP,and S-CN with equal priority.(4)Brute force(BF),which is an algorithm to find the global optimum.We have the three following observations.First,for both even and uneven task distribution,the network resource usage per computing task of the RR algorithm is close to the global optimum and much lower compared to the benchmark algorithms(except the BF algorithm).Second,the network resource usage per computing task decreases as the load of computing tasks generated in the network increases.This is because the storage resource usage per task decreases as the load of computing tasks requesting the same stored chunk increases.Third,the performance gap between the RR algorithm and the benchmark algorithms(except the BF algorithm)under uneven spatial task distribution is larger than under even spatial task distribution.With uneven spatial task distribution,the optimal resource reservation may be different for servers at the same tier.The RR algorithm can differentiate resource reservation decisions for different servers at the same tier using groupbased spatial task distribution.

Fig.7 Network resource usage per computing task in heterogeneous and homogeneous scenarios:(a)Heterogeneous settings;(b)Homogeneous settings

Figs.7(a)and 7(b)show the network resource usage per computing task versus the number of BSs and the number of groups in different scenarios.Specifically,the heterogeneous scenario means that the network resource usage for executing a computing task and resource capacity of servers at the same tier are different.The homogeneous scenario means that the network resource usage for executing a computing task and resource capacity of servers at the same tier are identical.The scheme labeled as“1 Group(w/o UDTs)”represents resource reservation without UDTs,and the schemes labeled as“nGroups(w/UDTs)”represent the proposed RR algorithm withngroups.We have the following two observations.First,group-based resource reservation with UDTs outperforms resource reservation without UDTs in both homogeneous and heterogeneous scenarios.Without UDTs,group-based spatial task distribution is unknown for resource reservation and computing task assignment.As a result,schemes without UDTs should reserve more resources to satisfy constraint(15d),i.e.,each group’s communication resource usage for remote access.With more groups,the group-based resource reservation can achieve better performance at the cost of computation complexity and data management for UDTs.Second,the curve with diamond markers represents the performance gap between 4-group-based resource reservation with UDTs[“4 Group(w/UDTs)”]and resource reservation without UDTs[“1 Group(w/o UDTs)”].In both homogeneous and heterogeneous scenarios,the effectiveness of group-based resource reservation increases with the number of BSs.

C.Performance of Closed-Loop Resource Reservation Reconfiguration

A training dataset that includes spatial task distributions in 80-time intervals is created.Conducting the simulation for all 80 spatial task distributions in the training dataset are referred to as one episode.We conduct 10 simulations on the dataset,and each simulation includes 220 episodes.In Fig.8(a),the smooth solid line is the average result over 10 simulations,while the spikes in the background represent the corresponding variance.Fig.8(a)shows that the proposed MetaR3algorithm can converge and find a policy of resource reservation reconfiguration given a fixed network environment.

In Fig.8(b),we compare the convergence performance of the MetaR3approach with that of a deep Q-Learning(DQN)-based algorithm,labeled as“DQN”.In DQN,the group-based spatial task distribution is used as the state to determine the value ofak.We create two datasets with different spatial task distributions.One training dataset is used to train the neural networks in advance,and one evaluation dataset is used to evaluate the convergence performance of MetaR3.The evaluation dataset reveals network status from unknown network environments.Note that MetaR3keeps training the siamese neural networks and the Q networks in unknown network environment due to the closed-loop reconfiguration of resource reservation.We observe that MetaR3achieves lower network resource usage and lower cost of reconfiguring resource reservation per computing task,and also converges in fewer episodes in unknown network environment compared with the DQN algorithm.This is because MetaR3captures the similarity of network status,instead of learning the variation of network status,to determineakeven though the current network status is unknown.

Fig.8(c)shows the performance in the weighted sum of the network resource usage and the cost of reconfiguring resource reservation per computing task versus the average difference in the load of computing tasks in adjacent two-time intervals.When the average difference in the load of computing tasks in adjacent two-time intervals increases,spatial task distribution changes faster.The benchmark algorithm,labeled as“Myopic”,determines whether to reconfigure resource reservation or not in each time interval without considering the long-term impact.We observe that the performance gaps between MetaR3and“DQN”and“Myopic”algorithms increase with the average difference in the load of computing tasks in adjacent two-time intervals since the similarity capture features of MetaR3can reduce the state space for finding a good policy of resource reservation reconfiguration in dynamic network environments,which improves learning efficiency.

Fig.8 Performance of MetaR3 in the weighted sum of network resource usage and cost of reconfiguring resource reservation per computing task:(a)Convergence performance of MetaR3;(b)Performance comparison between MetaR3 and DQN;(c)The impact of network dynamics

In Fig.9,we compare the performance of the proposed MetaR3algorithm with that of two popular RL algorithms,i.e.,deep deterministic policy gradient(DDPG)-based(labeled as“DDPG”)and DQN-based algorithms in different network environments[26,42].Specifically,we conduct simulations of the three algorithms with different numbers of BSs and average the resource usage and cost per computing task over three independent simulations.We can observe that the proposed MetaR3algorithm outperforms the DDPG-and DQN-based algorithms.This is because both DDPG-and DQN-based algorithms use network status as states.When the network status has a large dimensionality and network environments are highly dynamic,finding the optimal resource reservation reconfiguration policy is challenging for the DDPG and DQN-based algorithms.In contrast,the proposed MetaR3algorithm can capture the similarity of network status in consecutive time intervals.Since the similarity is low-dimensional,using similarities as states has advantages on finding a proper policy of resource reservation reconfiguration,particularly in complicated network environments.Therefore,the proposed MetaR3algorithm achieves better network performance than DDPG-and DQN-based algorithms.

Fig.9 Resource usage and cost per computing task of the MetaR3,DDPG,and DQN-based algorithms

VIII.CONCLUSION AND FUTURE WORK

In this paper,we have designed DT-empowered network planning for supporting stateful applications in multi-tier computing and proposed two approaches to enable groupbased multi-resource reservation and closed-loop resource reservation reconfiguration.Our study focuses on minimizing the long-term network resource usage and the cost of reconfiguring resource reservation.The results have demonstrated that DT-empowered network planning can support UTs with diverse characteristics and adapt to dynamic network environments.In addition,the meta-learning-based approach can exploit data contained in DTs to facilitate closed-loop network planning.Overall,we have demonstrated the essential role that DTs can play in network planning for 6G.In the future,we will improve the flexibility of DTs by differentiating and optimizing DTs for various applications or UTs.

APPENDIX

A)Effective Request Ratio for S-NAPsFor S-BSelocated at BSb,S-BSesorts|Ie,k|chunks with the largest values ofpib,kin time intervalk.LetJie,kdenote the order of chunkiamong the chunks with the largest values ofpib,kin setIe,k,and denote byI(Jie,k)e,k⊆Ie,kthe set ofJie,kchunks with largest values ofpib,k.We assume that the computing tasks requesting any chunk inIe,kare assigned to the S-BS as much as possible while not violating the communication and computing resource capacities.Given different values ofge,kfor SBSe,the load of computing tasks assigned to S-BSemay be different.For S-BSeco-located with BSb,the overall load of computing tasks requiring any chunk in setI(Jie,k)e,kis given bywhereis the load of computing tasks requiring chunkiin the coverage of BSbin time intervalk.The load of computing tasks that request chunki∈Ie,kin the coverage of BSband are not assigned to S-BSein time intervalk,denoted byPib,e,k,is as follows.

whereis the maximum load of computing tasks that can be assigned to S-BSewith satisfying the computing capacity.In(30),if S-BSehas sufficient computing resource for executing all computing tasks requiring chunki,no computing task requiring chunkineeds to be assigned to an S-NAP or the S-CN.Otherwise,a certain load of computing tasks requiring chunkicannot be processed at S-BSe,which should be assigned to an S-NAP or the S-CN.

The overall load of computing tasks that request chunkiand are not assigned to any S-BSewithin the service coverage of S-NAPe′in time intervalkis given by∑b∈Be′Pib,e,k.The effective request ratio of chunkifor S-NAPe′in time intervalkis as follows.

S-NAPe′stores|Ie′,k|chunks with the largest values ofqie′,k.