How to Split a Learning Model Across a 5G Wireless Network

Blog on Selected Ideas in Communications

Written By:

Petar Popovski, Editor in Chief of IEEE JSAC

Published: 22 Mar 2022

The February 2022 issue of IEEE JSAC, just as the previous one, related to the interplay of learning and communications and it’s the second part of “Distributed Learning Over Wireless Edge Networks”. Having two parts on the topics is an indicator for its high popularity, with a large number of submissions. From this special issue, the blog will feature the following article, co-authored through collaboration by academic (UCSD) and industrial (Sony) researchers:

S. Wang, X. Zhang, H. Uchiyama and H. Matsuda, "HiveMind: Towards Cellular Native Machine Learning Model Splitting," in IEEE Journal on Selected Areas in Communications, vol. 40, no. 2, pp. 626-640, Feb. 2022, doi: 10.1109/JSAC.2021.3118403.

Mobile terminals and other User Equipment (UE) are often contributing to the training of a certain Machine Learning (ML) model or, vice versa, rely on the model to carry out inference, such as image recognition or providing a recommendation. These tasks can be very demanding for the UE in terms of computation, and thereby energy consumption. This led previously to the idea of splitting the ML model between the UE and the powerful cloud server, where the latter can offer faster computation. Yet, offloading the computations to the cloud brings in another latency component, as the communication between the UE and the cloud acts an overhead that is converted into extra latency.

The contribution of the article is centered on the idea of generalizing splitting into multi-splitting between the UE, one or multiple intermediate Mobile Edge Computing (MEC) nodes and the cloud. In this setup, a linear learning architecture, such as Deep Neural Network (DNN), the natural candidates for splitting are the cuts between different DNN layers, such that each node executes the model up to a specific layer and sends the intermediate data to the next node. This setup offers the opportunity to trade off between computation and communication in a way that adapts to the dynamic network conditions and computational loads. The key insight from the authors is that the 5G network architecture at a single site often contains multiple MEC servers and is thus amenable to multi-split solutions of the learning model. Besides the latency optimization, the multi-split problem can be posed to have multiple objectives, such as energy efficiency or privacy preservation.

Using Figure 1, the authors provide a lucid explanation of the benefits of multi-split architectures for a toy example of a 3-layer neural network that is split across UE, MEC and cloud server. The link capacity between MEC and the cloud varies between 1 and 0.1 in order to highlight the impact of link dynamics. When the link capacity is high, a single split model allocates all layers to the cloud and, as the link capacity drops, it brings layers back to the UE. In the single-split case MEC just forwards the data to the cloud and does not participate in the communication; thus, the orange line that denotes the single split converges to a flat line that corresponds to the case in which all layers are put into the UE. In the multi-split case, the deterioration of the MEC-cloud link brings more layers into the MEC and when the blue line becomes flat, it covers the case when all layers are in the MEC. Thus, the difference between the flat orange and flat blue line shows the difference in terms of computational latency between the UE and the MEC, respectively.

Figure 1. (Copyright IEEE) Comparison in terms of latency, expressed through the total overhead, for four different architectures: (1) UE computing (no split) (2) Cloud single split, (3) Edge single split, (4) Multi-split on UE, edge, and Cloud.

There is an interesting aspect of the paper that, rooted in practical constraints posed by large learning models. Namely, the formulated problem cannot be solved by the classical Dijkstra algorithm, due to the sheer size of the graph that needs to be treated in the optimization. For example, ResNet50, a convolutional neural network with 50 layers, running on 5 nodes leads to a graph with >100000 edges. The authors solve this by introducing Split Cost information (SCI) design, related to the distributed Dijkstra algorithm.

Reflections by the Authors

JSAC: What do you think is the weak point of your approach? Do you think there are some parts of the model that, if changed, can drastically change the conclusions?

I think the lack of support for models with multi-branch architecture is a notable weak point of our approach. In the paper, we abstract the neural network model as a chain of layers. However, we noticed that in recent years, models with multi-branch architectures, e.g. Mask R-CNN and Transformer, are gaining traction in computer vision and signal processing communities. Although we have shown in our paper that our split design can be easily extended to some non-chain neural network architectures such as RNN and Collaborative Learning (Sec. VI), the multi-branch model poses non-trivial challenges since its multi-branch architecture introduces dependencies among the split decisions on different branches.

JSAC: What has been criticized by the reviewers during the review process and how did you address that?

The reviews are generally positive. The reviewers are mostly concerned about the efficiency of transmitting the output of the middle layers. One reviewer asked: “Generally, the outputs' size of some layers is far bigger than the input features. In this case, why not directly forward the input features to the cloud.” To address this comment, we note that this statement is not applicable for many popular models such as AlexNet and VGG. In addition, existing works are able to compress intermediate data with a ratio as large as 5000/1, which makes it far smaller than the input data. We also point out that our split design will converge to the cloud offloading scheme to achieve minimal computation overhead. However, the cloud offloading scheme induces large overhead when the UE-cloud network capacity is poor while our multi-split ML scheme is able to reassign the layers on the MECs or UEs to avoid transmitting to the cloud.

Statements and opinions given in a work published by the IEEE or the IEEE Communications Society are the expressions of the author(s). Responsibility for the content of published articles rests upon the authors(s), not IEEE nor the IEEE Communications Society.

Publications

How to Split a Learning Model Across a 5G Wireless Network

Petar Popovski, Editor in Chief of IEEE JSAC

Published: 22 Mar 2022

Reflections by the Authors