Approaching Explainable Machine Learning in the Analysis of Communication Network Traffic

Blog on Selected Ideas in Communications

Written By:

Petar Popovski, Editor in Chief of IEEE JSAC

Published: 18 Oct 2022

The August 2022 issue of IEEE JSAC is the fifth issue of the series on Machine Learning in communications and networks and the article selected from this issue is:

J. Knofczynski, R. Durairajan and W. Willinger, "ARISE: A Multitask Weak Supervision Framework for Network Measurements," in IEEE Journal on Selected Areas in Communications, vol. 40, no. 8, pp. 2456-2473, Aug. 2022.

The paper is a result of collaboration between academia (University of Oregon) and industry (NIKSUN, Inc.).

Data traffic in communication networks has been one of the primordial examples of big data, as it inherently produces vast amount of data that can be either stored for subsequent analysis or, very often, analyzed as real-time data streams. While this seems to be an exemplary setup for applying Machine Learning (ML) techniques, in practice, network operators often prefer to rely on more traditional statistical techniques in conjunction with manual oversight. This is due to the reluctance to use black-box ML models whose decisions and actions cannot be reliably explained or predicted for an untested setup.

This paper represents an important step towards creating explainable ML solutions that can have a practical significance for the network operators. In doing so, this work observes that the number of diverse network measurement tasks to be automated by ML-based techniques increases at a much faster pace compared to the pace by which labeled training data is produced for each of those tasks. Hence, it is necessary to resort to multi-task learning techniques and improve the classification accuracy of related tasks. This is the basis for the proposed ML framework, termed ARISE. The framework channelizes the domain expertise of operators into creation of labeling functions, which avoids manual or crowdsourced labeling. ARISE is designed in a way that allows network operators to explain and interpret the decisions made by the ML model.

Figure 1: Applying labeling to raw measurement data of network latency, expressed through the round trip time.

To illustrate the type of tasks considered, Figure 1 depicts measurement of latency, expressed through the round trip time. The abrupt increase in latency measurements, termed network volatility, can indicate noise in the network measurements, or, alternatively, network congestion. Here one sub-task can be “identify and remove the network measurement noise”, while the other sub-task is “detect network congestion”. Figure 1 also contains labeling thresholds for classifying noise and congestion, respectively, and they are generated by the proposed labeling functions. In the proposed framework, these labels are refined through a multi-task sharing stage and, finally, a stage in which the network operator can apply its domain expertise to increase the confidence of classification and provide reasoning behind the result.

JSAC: What do you think could be a major weakness of ARISE compared to single-task learning and in which scenarios ARISE can appear to be more challenged than the other approaches from the state-of-the-art?

Authors: In designing ARISE, we were motivated by two key observations. First, many networking problems of practical interest are multi-task in nature; that is, solving such a problem requires performing a number of different sub-tasks. Second, while these sub-tasks typically differ in terms of objective and feature engineering, there is often a significant overlap among the features that get selected for the various sub-tasks; we refer to such overlapping features as “composite characteristics.” An illustrative example is the long-standing problem of detecting amplification-style DDoS attacks. This problem is inherently multi-task---one sub-task per known protocol (e.g., DNS, NTP, ICMP) that can be exploited to generate a high volume of traffic for the purpose of overwhelming a target (e.g., web server, end host). At the same time, the different sub-tasks exhibit readily identifiable composite characteristics, such as certain aggregate traffic statistics (e.g., number of packets, bytes or flows) or the number of distinct source IPs.

ARISE is ill-suited for networking problems for which these two key observations do not hold. An example of such a problem is being asked to perform the basic task of removing noise in a given measurement dataset. This problem is not amenable to multi-task learning and also contains no composite characteristics. As a result, applying single-task learning is a more appropriate approach to tackling this problem. More generally, in cases where both ARISE and state-of-the-art alternatives are applicable, the challenge in using ARISE is that generating suitable multi-task learning models typically requires more domain knowledge than developing the type of black-box learning models that are at the center of most of the currently considered alternative solution approaches. However, instead of being problematic, we view this challenge as an opportunity to overcome the existing distrust that operators have in handing over critical decision making to black-box models they do not understand.

JSAC: What was criticized by the reviewers? Was there a criticism from the reviewers that led you to rethink part of your work?

Authors: One astute reviewer pointed out that ARISE ought to be compared to alternative advanced multi-task learning approaches such as meta-learning. Thanks to this comment, we were able to better position our work within the existing literature on modern meta-learning. For one, we were able to show that ARISE already shares several aspects with meta-learning (e.g., quickly adapting existing models to new tasks based on common representations of underlying information). Importantly, we demonstrate empirically that ARISE advances modern meta-learning by solving some inherent challenges in existing meta-learning algorithms. In particular, we show that ARISE does not require large labeled datasets, is effective (i.e., achieves high accuracy) and computationally efficient, and represents a first attempt at leveraging meta-learning to enhance the application of multi-task learning for solving networking problems in practice.

Statements and opinions given in a work published by the IEEE or the IEEE Communications Society are the expressions of the author(s). Responsibility for the content of published articles rests upon the authors(s), not IEEE nor the IEEE Communications Society.

Publications

Approaching Explainable Machine Learning in the Analysis of Communication Network Traffic

Petar Popovski, Editor in Chief of IEEE JSAC

Published: 18 Oct 2022