- Research
- Open access
- Published:

# The robustest clusters in the input–output networks: global \(\hbox {CO}_2\) emission clusters

*Journal of Economic Structures*
**volume 6**, Article number: 3 (2017)

## Abstract

Finding environmentally significant clusters in global supply-chain networks of goods and services has been investigated by Kagawa et al. (Soc Netw 35(3):423–438, 2013a; Econ Syst Res 25(3):265–286, 2013b; Glob Environ Chang, 2015), using the popular clustering method of nonnegative matrix factorization, which actually yields sensitive cluster assignments. Due to this sensitivity issue, there is a danger of overfitting of the results. In order to confirm the robustness of the obtained clusters, which in fact have strong implications for international climate change mitigation, especially for the US-induced Chinese clusters, we design a simulation-based experiment. Empirical findings of the proposed approach are compared with those of Kagawa et al. (Glob Environ Chang, 2015). The environmental implications are reported as well.

## 1 Background

Graph partitioning methods or clustering methods in general have been widely used for understanding and visualizing fundamental features of social and economic network complexity, e.g., Newman and Girvan (2004), Kagawa et al. (2013a, b), Liang et al. (2015). A striking environmental study has been provided by Kagawa et al. (2015); the authors identified \(\hbox {CO}_2\) emission clusters within global supply-chain networks formed by the final demand impulse of a specific final product and argued how the identified emission clusters have contributed to increasing \(\hbox {CO}_2\) emission transfers and have grown over time [see also Davis et al. (2011) and Peters et al. (2011) for the analysis of \(\hbox {CO}_2\) emission transfers]. The authors applied the nonnegative matrix factorization (NMF) approach (Lee and Seung 1999) and obtained certain clusters whose normalized cut value (Ncut value) is minimized, which implies that the obtained clusters could best explain the environmentally important supply chains (Kagawa et al. 2015).

Although Kagawa et al. (2015) provided important emission clusters for climate change mitigation, there is a crucial problem that the obtained results highly depend on the employed algorithms and parameters. To see the problem, we show an example: Suppose that we apply a typical clustering algorithm such as the *K*-means method (MacQueen et al. 1967) for the analysis. By setting the parameter \(K=10\), we can obtain a set of 10 clusters, but if we instead set the parameter \(K=11\), the obtained 11 clusters could include very different sectors from the ones of \(K=10\). In such a situation, which “clusters” really reflect the actual economic structure? Or which “clusters” are plausible for the analysis? We have the same problem for not only the value of *K* but also the many other parameters used in the employed algorithms. It is worth noting that the *K*-means algorithm is indeed used in the NMF method.

The same problem is seen for the quality of the datasets. Economic network data such as input–output tables usually contain errors, or they always just constitute an approximation, which is a central issue in input–output analysis (e.g., Dietzenbacher 1995, 2006). Due to the errors, the same problem mentioned above appears in cluster analysis. That is, the clustering analysis could be very sensitive to the employed algorithms and datasets. In fact, the actual datasets used for constructing the supply-chain networks of Kagawa et al. (2015) are estimations derived from the multi-regional input–output framework (e.g., Lenzen et al. 2012; Dietzenbacher et al. 2013). If the employed clustering technique is quite sensitive to changes in the input–output data, which is our case, we need to be careful to claim that the resulted clusters are plausible.

This paper investigates this problem and proposes a method to obtain clusters that are “stable” with respect to errors or noise in the data and parameters of the algorithm. That is, even if we slightly perturb the values in datasets, clusters obtained by our method still have a good Ncut value; even though the original data may contain errors or noise, the obtained clusters are still reliable if the noise is small enough. The idea of our approach is rather simple and is based on simulations. It can be interpreted as applying a Monte Carlo-type simulation to obtain stable clusters in terms of perturbations by noise in data or choices of parameters. We also propose two criteria and a diagram based on the criteria to guarantee the robustness of the obtained clusters. The details will be described in Sect. 2. It should be noted that, due to its generality, this diagram could provide a new guideline to measure the reliability of analysis results, used in various fields where clustering analyses are applied, such as economic and social networks.

As a case study, we focused on an adjacency matrix obtained by using a multi-regional input–output analysis (Kagawa et al. 2015). The proposed two criteria were applied to obtain robust \(\hbox {CO}_2\) emission clusters within global supply-chain networks. The robustness and performance of our clustering results are compared to those of Kagawa et al. (2015). We particularly evaluate the difference in terms of cluster compositions, which carry strong environmental implications. The remaining sections are as follows: Section 2 describes the methodology in this study, Sect. 3 provides a numerical example, Sect. 4 presents the obtained empirical results and discussions, and Sect. 5 concludes this paper.

## 2 Methods

### 2.1 Constructing an adjacency matrix

An economic transaction between geographically distributed industries is defined as \(\mathbf{Z}= ({Z}_{ij}^{rs})\)
\((i, j = 1, \ldots M; r, s = 1, \ldots, N)\), which represents a product sale from industry *i* in country *r* to industry *j* in country *s*. Here, *M* is the number of industries and *N* is the number of countries. If geographical input coefficients are defined by \(\mathbf{A}=(a_{ij}^{rs})\) with \(a_{ij}^{rs}={Z}_{ij}^{rs}/x_j^s\), where \(x_j^s\) denotes domestic output of industry *j* in country *s*, the widely used interregional input–output (IRIO) model (e.g., Miller and Blair 2009) can be formulated as

or \(\mathbf{x}=\mathbf{Ax}+\mathbf{f}\) in matrix notation, where \(\mathbf{x}=(x_i^r)\), \(\mathbf{f}=(\sum _{s=1}^{N} f_i^{rs})\), and \(f_i^{rs} (i = 1, \ldots, M; r, s = 1, \ldots, N)\) represents final demand from industry *i* of country *r* to country *s*.

Solving the IRIO model in Eq. (1) yields \({{\mathbf {x}}} = ({\mathbf {I}}- {\mathbf {A}})^{-1}{\mathbf {f}}={{{\mathbf {B}}}{{\mathbf {f}}}}\). Here \({{\mathbf {I}}}\) is the identity matrix, and \({{\mathbf {B}}} = ({{\mathbf {I}}}-{\mathbf {A}})^{-1}\) is the direct and indirect requirement matrix, in which each element \(b^{rs}_{ij}\) represents how many units of the products of industry *i* in country *r* are needed to produce one unit of the products of industry *j* in country *s*.

Using the above IRIO framework, we can further formulate the unit structure model based on the IRIO model (e.g., Kagawa et al. 2015):

where \({\mathbf {b}}_j^s\) can be easily obtained as the \(((s-1) M + j)\)-th column vector in the direct and indirect requirement matrix \({{\mathbf {B}}}\), and \(f^s_j\) is the final demand of products produced by industry *j* in country *s*. The matrix \({\mathbf X}_j^s\) shows the economic transactions between geographically distributed industries that are triggered by the final demand on industry *j* located in country *s*.

If the direct emission coefficient vector showing \(\hbox {CO}_2\) emissions generated per unit of output of industry *i* in country *r* is defined as \({\varvec{\alpha }}=(\alpha _i^r)\), the \(\hbox {CO}_2\) emissions embedded in the economic transactions are obtained as

where \(\mathrm {diag}\) represents the diagonalization. Using this formulation, we define adjacency matrix \({{\mathbf {W}}}=\left( w_{ij}^{rs}\right)\), where

### 2.2 Clustering input–output analysis

A graph is a discrete structure that consists of vertices and edges that connect two vertices. In the context of economic analysis, a vertex and an edge correspond to a sector and a transaction between the corresponding two sectors, respectively. Graph clustering concerns finding for a graph similar vertices that can be arranged in dissimilar groups. This problem has multiple variants, algorithms, and applications; see Schaeffer (2007). Spectral and NMF-based clusterings have become popular in recent years, especially in the field of machine learning, e.g., Shi and Malik (2000), Ng et al. (2002), Kannan et al. (2004), Ding et al. (2005), Von Luxburg (2007), Zhang and Jordan (2008). However, spectral clustering can be traced back to the field of computer science for the graph partitioning problem, due to the work of Donath and Hoffman (1973) and Fiedler (1973).

Suppose an undirected weighted network \(G=(V,E)\) of order \(n=|V|\) with edge weights \(w_{uv}\). In the context of clustering IO analysis (CIOA), a vertex *u* corresponds to a sector of an industry *i* in a country *r*. We denote this by \(u=(i,r)\). Here, *V* and *E*, respectively, represent the set of all the sectors and the set of all the transactions between two sectors, and \(|V|=n=M\times N\). The edge weights represent the amounts of \(\hbox {CO}_2\) emissions associated with the corresponding transactions. It is also possible to consider unweighted graphs with zero-one edge weights. A central instrument of the spectral and NMF clustering framework is the use of *Laplacian matrices*, which are matrix representations of graphs. If \(\mathbf{W}=(w_{uv})_{1\le u,v \le n}\) is the *adjacency matrix* of the network *G* and \(\mathbf{D}=\text {diag}({\mathbf {d}})\) is the diagonal *degree matrix* of *G*, with \({\mathbf {d}}=(d_u)_{1\le u\le n}=(\sum _{v=1}^{n} w_{uv})_{1\le u\le n}\) being the vector of vertices’ degrees, then the Laplacian matrix of *G* can be defined as \(\mathbf{L}=\mathbf{D}-\mathbf{W}\). The normalized version of \(\mathbf{L}\) is given by \(\mathbf{D}^{-\frac{1}{2}}{} \mathbf{L}\mathbf{D}^{-\frac{1}{2}}\) (Shi and Malik 2000; Ng et al. 2002).

Such clustering is considered graph partitioning. A family of subsets \(U_1,\ldots, U_k\) of set *V* is called a (*k*-)*partition* of *V* if \(\bigcup _{p=1}^{k} U_p = V\) and \(U_p \cap U_q = \emptyset\) for \(1\le p,q \le k, p\ne q\). A graph partition is a partition of vertices. The objective is to minimize for each cluster its total weights to the rest of the graph, which is called *cut* in graph theory and expressed as \(\text {cut}(U,{\bar{U}})=\sum _{u\in U, v\in {\bar{U}}}w_{uv}\) for a subset \(U \subset V\) of vertices and its complement \({\bar{U}}\). In this vein, Shi and Malik (2000) introduced the *normalized cut* criterion, abbreviated as Ncut, that produces when it is minimized clusters of reasonable sizes. For *k* partition \(U_1, U_2, \ldots, U_k\) of *V*, the \(\text {Ncut}\) is defined as

where the denominators \(\sum _{u\in U_p}d_u=\sum _{u\in U_p, v\in V}w_{uv}\) are implicitly implementing the objective of increasing the connectivity within clusters. Graph clustering into *k* clusters can be formulated as the following combinatorial problem:

Unfortunately, this problem has been proven to be NP-hard (Shi and Malik 2000), unlike the abovementioned “min-cut” problem, which can be solved efficiently. However, problem (5) can be converted to matrix form as a minimization of the Rayleigh quotient (Shi and Malik 2000), which can be in its turn expressed as a trace matrix (Ding et al. 2005; Von Luxburg 2007; Zhang and Jordan 2008; Ding et al. 2008). For instance, using the notation of Ding et al. (2005), problem (5) becomes

where \(\mathbf{H}=({\mathbf {h}}_1,{\mathbf {h}}_2,\ldots,{\mathbf {h}}_k)\) is the \((n\times k)\) matrix defined by \({\mathbf {h}}_i={{\mathbf {D}}^{\frac{1}{2}}{\mathbf {q}}^{(i)}}/{||{\mathbf {D}}^{\frac{1}{2}}{\mathbf {q}}^{(i)}||}\). Here, \({\mathbf {H}}^{T}\) is the transpose matrix of \({\mathbf {H}}\), and \({\mathbf {q}}^{(i)}=(q^{(i)}_1 q^{(i)}_2 \cdots q^{(i)}_n)^T\) is the *n*-dimensional indicator vector of the cluster \(U_i\); \({q}^{(i)}_u=1\) if a vertex *u* belongs to cluster \(U_i\), 0 otherwise. Note that \({\mathbf {H}}\) is a nonnegative matrix; that is, all the elements of \({\mathbf {H}}\) are nonnegative.

Problem (6) can be rewritten in an easily solvable form, specifically, accordingly to Ding et al. (2005),

where \(||.||_F\) is the Frobenius matrix norm. In conformity with the algorithm of Lee and Seung (2001), problem (7) can be solved using update rules, which are iterative improvements converging to local optima solutions. In this study, we make use the following rule given by Ding et al. (2005) and used by Kagawa et al. (2013a, 2015):

where \(\beta\) is a parameter such that \(0<\beta \le 1\). We set \(\beta =0.5\) and initialize the matrix \({\mathbf {H}}\) according to the previous studies (Ding et al. 2008; Kagawa et al. 2013a, 2015). After iteratively applying (8), an approximated real-valued solution matrix \(\hat{{\mathbf {H}}}\) is reached. Although we can expect to obtain \(\hat{{\mathbf {H}}}\) in realistic running time, \(\hat{{\mathbf {H}}}\) itself does not give any clustering due to the non-integer property. To obtain a concrete clustering, we apply the well-known and well-used *K*-means algorithm, which we call the *rounding step* in our algorithm and explain below.

The *K*-means algorithm introduced by MacQueen et al. (1967) is one of the most popular hierarchical clustering methods. This algorithm starts by selecting *k* initial clusters identified by their cluster centers and then iteratively refining them as follows. Given a target dataset \(\{d_1,d_2,\ldots,d_n\}\) to be clustered, each iteration of the algorithm aims to minimize the within-cluster sum of squared distance, which is expressed as

where \(m_p\) is the center of the cluster \(U_p\), which is defined as the mean value of the elements belonging to \(U_p\), i.e., \(m_p=\sum _{i\in U_p} d_i/|U_p|\), and ||.|| is a norm function on the dataset space. Minimizing this distance involves assigning each data instance \(d_i\) to its closest cluster, i.e., \(U_p\) such that the distance \(||d_i - m_p||\) is minimum. The \(m_p\) centers are updated thereafter. The final clusters describe a partition of the dataset. This algorithm converges once no further assignment of instances is applied. The *K*-means can actually be proven to converge to a local minimum of expression (9) (Bottou and Bengio 1994). However, this algorithm suffers from several drawbacks, mainly its sensitivity to the initial conditions, which can lead to potentially misleading results (Bradley and Fayyad 1998). This issue is actually common among hill-climbing algorithms, where according to Duda et al. (1995): “different starting points can lead to different solutions and one never knows whether or not the best solution has been found.” Thus, a bad choice of the initial cluster centers can easily converge to a poor cluster assignment. A second issue concerns the best value of the parameter *k* to be chosen. A bad choice here also can lead to poor results.

The basic steps of the NMF method presented in this section are summarized as the following algorithm.

### 2.3 Simulation module

This section analyzes the rounding step of the NMF algorithm described in the previous section. A sampling-based simulation procedure is performed in the rounding step, instead of running one instance of the *K*-means algorithm on the matrix \(\hat{{\mathbf {H}}}\), as prescribed in the NMF algorithm. In order to simulate uncertain environments for the input clusterings, which will be introduced in Table 1, small perturbations in \(\mathbf{W}=(w_{uv})_{1\le u,v \le n}\), the adjacency matrix of the network *G*, are generated *N* times. Consequently, perturbed adjacency matrices can be obtained as \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\), such that *N* denotes the number of the samples from now on. To assess the performance of a clustering \(\text {C}=\{U_1,U_2,\ldots,U_k\}\), obtained from the “initial” adjacency matrix \(\mathbf{W}_0=\mathbf{W}\), we propose the modified Ncut criterion as follows: \(\text {Ncut}(\text {C},\mathbf{W}_I)=\sum ^k_{p=1} \Big (\sum _{u \in U_p}\sum _{v \notin U_p}x_{uv}/\sum _{ u\in U_p}\sum _{v \in V}x_{uv}\Big )\). It should be noted that this modified Ncut value represents the goodness achieved in a network that includes a small amount of noise in the edge weights, i.e., the perturbed adjacency matrix \(\mathbf{W}_I=(x_{uv})_{1\le u,v \le n}\), under the given clustering C. Thus, under the same clustering C, this Ncut criterion is well distinguished for different perturbed matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\).

Table 1 illustrates the simulation module developed in this study. First, we endogenously determine the matrix \(\hat{{\mathbf {H}}}\) by applying NMF to the initial adjacency matrix \(\mathbf{W}_0\) and subsequently repeatedly apply the *K*-means algorithm to the *n*-rows of the matrix \(\hat{{\mathbf {H}}}\), *M* times. The clusterings \(\text {C}_1, \text {C}_2, \ldots, \text {C}_M\), termed input clusterings, are thus obtained. These clusterings are assigned differently due to the instability of repeatedly conducting *K*-means rounding. The perturbation scenario \((I_I)\) implies that the matrix \(\mathbf{W}_I\) is used for the Ncut computation. We suppose that the perturbed matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\) take values from the following uncertainty set:

where \(\xi\) is a noise magnitude parameter, which usually takes small values, e.g., 0.1. From the set *U*, deviations are symmetric around the values of the initial matrix \(\mathbf{W}_0\), such that each element \((x)_{uv}\) of the randomly perturbed adjacency matrices is limited within the interval \(\Big [(1-\xi )w_{uv},\,(1+\xi )w_{uv}\Big ]\). In practice, to obtain the matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\), we draw *N* independent and identically distributed samples from the set *U*.

Two measures of robustness are introduced in the simulation module, \(R_1^{\mathcal {X}}\) and \(R_2^{\mathcal {X}}\), such that the set \({\mathcal {X}}=\{\text {C}_1,\text {C}_2, \ldots, \text {C}_M\}\) denotes the *M* input clusterings. The first measure, \(R_1^{\mathcal {X}}\), reports for a given clustering \(\text {C}\in {\mathcal {X}}\) the fractional degree that it is the best clustering within \({\mathcal {X}}\) across the perturbation scenarios. For instance, C being the best clustering within \({\mathcal {X}}\) for the perturbation scenario \((I_I)\) means that C yields the smallest Ncut value among the Ncut values computed using the *M* clusterings of \({\mathcal {X}}\) and the perturbed adjacency matrix \(\mathbf{W}_I\), i.e., \(\text {C}\in \arg \underset{\text {D} \in {\mathcal {X}}}{\min }\,\text {Ncut}(\text {D}, \mathbf{W}_I)\). We introduce the indicator function \(I_{{\mathcal {X}}}(\text {C},\mathbf{W}_I)\) that takes value one if C is the best clustering within \({\mathcal {X}}\) for perturbed matrix \(\mathbf{W}_I\), zero otherwise. The indicator function is expressed as follows:

We can formulate the probability of appearance or being best, i.e., robustness measure, for a given clustering C over the initial and the perturbed *N* matrices as \(\widehat{R_1^{\mathcal {X}}}(\text {C})\) as follows:

This expression is obviously dependent on the number of samples *N*. Thus, the measure expressed by (12) can be viewed as an estimation of the accurate measure \(R_1^{\mathcal {X}}\). We use the caret symbol \((\;\;\widehat{}\;\;)\) to indicate this approximation. The accurate measure \(R_1^{\mathcal {X}}(\text {C})\) is assumed to be reached for an infinite number of samples: \(\widehat{R_1^{\mathcal {X}}}(\text {C}) \xrightarrow [N \rightarrow \infty ]{} R_1^{\mathcal {X}}(\text {C})\).

The second measure, \(R_2^{\mathcal {X}}(\text {C})\), reports the average value of the *performance ratio*. We define the performance ratio of a clustering \(\text {C}\in {\mathcal {X}}\) under the perturbed adjacency matrix \(\mathbf{W}_I\) as the ratio of the Ncut value of C to the smallest Ncut value computed for the perturbed matrix \(\mathbf{W}_I\). The ratio expression for a given clustering C and a given matrix \(\mathbf{W}_I\) is given by

It should be noted that if the given clustering C yields the smallest Ncut value under the perturbed adjacency matrix \(\mathbf{W}_I\), the performance ratio takes value one. Using the performance ratios of a given clustering C, we can formulate the average of performance ratios, i.e., robustness measure, for C over the initial and the perturbed *N* matrices as \(\widehat{R_2^{\mathcal {X}}}(\text {C})\) as follows:

Here, \(\widehat{R_2^{\mathcal {X}}}(\text {C})\) ranges within the interval between 0 and 1. If \(\widehat{R_2^{\mathcal {X}}}(\text {C})\) takes value one, the given clustering C always gives the smallest Ncut value under the initial and the perturbed adjacency matrices. Any lower value of \(\widehat{R_2^{\mathcal {X}}}(\text {C})\) implies that the given clustering C does not yield a better graph partition. The measure \(\widehat{R_1^{\mathcal {X}}}(\text {C})\) is always between 0 and 1. Both measures can be then seen as percentages. Note that the caret symbol is used also for measure \(R_2^{\mathcal {X}}\), due to the dependence on the number of samples *N*. The more a clustering \(\text {C}\) is robust within \({\mathcal {X}}\), the higher are the estimated values of \(R_1^{\mathcal {X}}(\text {C})\) and \(R_2^{\mathcal {X}}(\text {C})\). To see why both measures are useful, we provide a numerical example with a focus with simplified network data in the next section.

## 3 Numerical example

We use the following simplified adjacency matrix:Using this initial adjacency matrix, \(\mathbf{W}_0\), we can depict a network with 16 vertices. A problem is how to find the robust clusterings from the network data. Following Ding et al. (2005) and Kagawa et al. (2013a, b), the clustering method based on the NMF of the normalized adjacency matrix is useful for achieving this goal. We apply the NMF method to the normalized adjacency matrix \(\mathbf{D}^{-1/2}\,\mathbf{W}_0\,\mathbf{D}^{-1/2}\) as follows:

Here, it should be noted that the matrix size of \(\hat{{\mathbf {H}}}\) is 16 by 3. From this matrix, we obtain the following 16 feature vectors corresponding to \(\hat{{\mathbf {H}}}\) row vectors:

We applied the *K*-means method using these 16 feature vectors ten times and obtained the ten clusterings listed in Table 2.

We notice in the case of Table 2 that the clusterings \(\text {C}_1\) and \(\text {C}_5\) describe the same cluster assignment, and that this is also true for \(\text {C}_6\) and \(\text {C}_{10}\). Our interest is which clustering among the eight different cluster assignments gives the best graph partition. If we examine the Ncut values for the initial adjacency matrix \(\mathbf{W}_0\), the best performance within our input clusterings \({\mathcal {X}}=\{\text {C}_1, \ldots, \text {C}_{10}\}\) is exhibited by clustering \(\text {C}_3\) according to the following table:

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(\text {Ncut}(\text {C}, \mathbf{W}_0)\) | 1.611 | 0.556 | 0.551 | 1.09 | 1.611 | 1.412 | 0.889 | 1.791 | 1.225 | 1.412 |

In order to draw the perturbed adjacency matrices, we first set the noise magnitude parameter, e.g., \(\xi =0.5\). We have sampled the following perturbed adjacency matrix \(\mathbf{W}_I\) from the set *U* given in expression (10):For this specific perturbed adjacency matrix, clustering \(\text {C}_2\) is deemed to be the best clustering according to the Ncut results, which are shown in the following table:

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(\text {Ncut}(\text {C}, \mathbf{W}_I)\) | 1.615 | 0.550 | 0.564 | 1.170 | 1.615 | 1.334 | 0.909 | 1.738 | 1.315 | 1.334 |

We could thus confirm that random noise in the edges of this simplified network clearly affects the choice of the best clustering. Next, we consider which among \(\text {C}_2\) and \(\text {C}_3\) is a more reliable clustering. To answer this question, we proceed by examining the values of the indicator function and the performance ratio, which are given, respectively, in expressions (11) and (13). Our proposed robustness measures are entirely based on these values. For the current perturbed matrix \(\mathbf{W}_I\), the values of the indicator function and the performance ratio are shown in the following table:

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(I_{{\mathcal {X}}}(\text {C}, \mathbf{W}_I)\) | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Performance | 2.933 | 1 | 1.024 | 2.124 | 2.933 | 2.422 | 1.651 | 3.156 | 2.387 | 2.422 |

Ratio |

The indicator function tacked on clusterings can be intuitively derived. It merely indicates through a 0–1 binary representation which clusterings yield the best performance for the current perturbation scenario. In the current matrix \(\mathbf{W}_I\), clustering \(\text {C}_2\) obviously takes value one, while the other cluster assignments take value zero. For the performance ratio, the emphasis is instead put on the relative span to the best clusterings; a smaller value of the ratio indicates a better clustering. The clusterings \(\text {C}_3\) and \(\text {C}_7\) seem to be better in this regard for the perturbed matrix \(\mathbf{W}_I\).

Our proposed simulation module iteratively draws a perturbed matrix \(\mathbf{W}_I\) from the set *U* and computes the values of the indicator function and the performance ratio for the input clusterings and the matrix \(\mathbf{W}_I\). The robustness measures \(\widehat{R_1^{\mathcal {X}}}\) and \(\widehat{R_2^{\mathcal {X}}}\) as explained in the previous section are averages of these iteratively computed values. The following table shows the results we obtained for the current simplified network when the sample size is set to \(N=100\) perturbation scenarios:

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(\widehat{R_1^{\mathcal {X}}}(\text {C})\) | 0 | 0.247 | 0.752 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

\(\widehat{R_2^{\mathcal {X}}}(\text {C})\) | 0.333 | 0.985 | 0.996 | 0.492 | 0.333 | 0.384 | 0.611 | 0.304 | 0.443 | 0.384 |

According to both robustness measures, clustering \(\text {C}_3\) performs better throughout the perturbation scenarios. However, \(\text {C}_3\) is closely followed by clustering \(\text {C}_2\) according to measure \(\widehat{R_2^{\mathcal {X}}}\), even if in terms of measure \(\widehat{R_1^{\mathcal {X}}}\) clustering \(\text {C}_2\) appears to be the best for only 24.7% of the randomly perturbed adjacency matrices.

## 4 Empirical results

In this section, we introduce the experiment scheme that we designed in order to obtain clusterings that not only are robust as shown in the simulation module of Sect. 2.3 but also have a good enough normalized cut performance. Subsequently, the empirical results of Kagawa et al. (2015) are discussed and compared to those of the experiment run on the same datasets.

### 4.1 Experiment scheme

Robustness and performance are conflicting objectives. In order to not sacrifice the performance while finding robust clusterings, we incorporate the simulation module in the bigger experiment scheme shown in Fig. 1. Prior to starting the experiment, we apply the NMF method to the initial adjacency matrix in order to obtain the approximate matrix \(\hat{{\mathbf {H}}}\), which has been described in Sect. 2.2. Our experiment is iterative. It starts by initializing the two clusterings \(\text {C}_{R_1}\) and \(\text {C}_{R_2}\) that are the output of the experiment, which are the overall best clusterings according to respective measures \(R_1^{\mathcal {X}}\) and \(R_2^{\mathcal {X}}\). Clusterings \(\text {C}^\mathrm{simul}_{R_1}\) and \(\text {C}^\mathrm{simul}_{R_2}\) are similarly top clusterings, but only for one simulation run. Clusterings \(\text {C}_{R_1}\) and \(\text {C}_{R_2}\) are initially set by random initializations of the *K*-means algorithm applied to matrix \(\hat{{\mathbf {H}}}\). The experiment continues by generating *M* clusterings, using same initialization process as for \(\text {C}_{R_1}\) and \(\text {C}_{R_2}\), to construct the set of input clusterings \({\mathcal {X}}=\{\text {C}_{R_1}, \text {C}_{R_2}, \text {C}_1, \text {C}_2,\ldots,\text {C}_M\}\). Afterward, the simulation module of Table 1 is performed on the set \({\mathcal {X}}\). The best resulting clusterings \(\text {C}^\mathrm{simul}_{R_1}\) and \(\text {C}^\mathrm{simul}_{R_2}\) thereafter become \(\text {C}_{R_1}\) and \(\text {C}_{R_2}\), respectively. The set \({\mathcal {X}}\) and the simulation are iteratively computed. Convergence is reached when the same \(\text {C}^\mathrm{simul}_{R_1}\) and \(\text {C}^\mathrm{simul}_{R_2}\) are obtained for a number of consecutive iterations. In our study, we choose a minimum of five iterations. Iterative schemes such as ours are widely used for practical solving of optimization problems. The basic idea is to continuously refine the approximate solutions. For instance, the various meta-heuristic solvers such as genetic algorithms, simulated annealing, and iterated local search could be mentioned (Gendreau and Potvin 2010).

### 4.2 Results and discussion

Kagawa et al. (2013a, b) have been among the first to apply clustering methods to environmental analysis of economic systems. Their approach connects input–output models to techniques of network partition. As mentioned in introduction, highly important clusters in terms of \(\hbox {CO}_2\) emissions have been identified by Kagawa et al. (2015). While China in 2009 was the largest emitter of \(\hbox {CO}_2\) production-based emissions (Kagawa et al. 2015), the authors’ goal was then to identify which country contributed most to these emissions and through which supply chains. To quantify \(\hbox {CO}_2\) intensity for a cluster, the *within-cluster sum* is used. This is defined for a cluster \(U_p\) of a supply-chain network with the adjacency matrix \(\mathbf{W}= (w_{uv})_{1\le u,v\le n}\) as the summation \(\sum _{u \in U_p}\sum _{v \in U_p} w_{uv}\).

In Kagawa et al. (2015), it was found among the 4756 industry clusters induced by the final demand of various good and services in the five developed countries of the USA, the UK, Germany, France, and Japan that both the US construction industry and the US transport equipment industry generate prominent Chinese clusters that are among the clusters with the 15th highest within-cluster sums. These two Chinese clusters have nevertheless the highest annual growth rates (also within the top 15), equal to 57.5 and 41.7\(\%\) for US construction and transport equipment demands, respectively. In the top 15 clusters, there is only one other Chinese cluster, that one induced by the Japanese construction demand, but this cluster has a lower growth rate and a smaller within-cluster sum compared to both of the abovementioned US-induced Chinese clusters.

Therefore, for data, we consider the two adjacency matrices of \(\hbox {CO}_2\) emissions induced by US construction and transport equipment demands for the year 2009, which were also used by Kagawa et al. (2015). We shall refer to them as the US construction and US transport datasets. Each adjacency matrix characterizes a network in which vertices are specified by a combination of a country plus an industry category (country–industry), with a total of 41 countries and 35 industries. The considered categories of countries and industries are listed in the supporting materials.

To detect the appropriate number of clusters *K* to be used in the experiment, we rely on the modularity index similarly as done by Kagawa et al. (2013a, 2015). This index, which has been developed by Newman and Girvan (2004), is optimal (maximized) for the correct number of clusters. The modularity index can be formulated for a network \(G=(V,E)\) of an adjacency matrix \(\mathbf{W}= (w_{uv})_{1\le u,v\le n}\) as

where \(p_{kk}=\big (\sum _{u\in U_k} \sum _{v\in U_k} w_{uv} /\sum _{u\in V} \sum _{v\in V} w_{uv}\big )\) represents the within-cluster ratio for the *k*-th cluster and \(q_k=\big (\sum _{u\in U_k} \sum _{v\in V} w_{uv} /\sum _{u\in V} \sum _{v\in V} w_{uv}\big )\) represents the betweenness ratio for the *k*-th cluster. For each dataset case, we compute the modularity index for the instances \(1 \le K \le 200\). Each *K* instance involves performing NMF clustering to obtain the approximate matrix \(\hat{{\mathbf {H}}}\) and then averaging the modularity index over 10 runs of *K*-means rounding. For the US construction dataset, the best index value is reached at \(K = 66\) for \(Q=0.197\). We opted, although, for \(K=64\) as in Kagawa et al. (2015) in order to ensure adequate comparison. Actually, \(Q(K=64)=0.195\) is very close in value to the best case \(Q(K = 66)\). For the US transport dataset, *K* is set to the maximum index value, \(K=68\), which coincides exactly with the Kagawa et al. (2015) choice. The plots of *Q*(*K*) are available in the supporting materials.

Adjacency matrices considered here represent \(\hbox {CO}_2\) emissions in interindustries induced by the final demand of products for a certain industry in a certain country. These matrices are based in an atomic level on three elements that are all obtained from the World Input–Output Database (Kagawa et al. 2015; Dietzenbacher et al. 2013; Tukker and Dietzenbacher 2013): the quantities of product sale between industries of different countries, quantities of product sale to final consumers, and the amounts of carbon dioxide (\(\hbox {CO}_2\)) emission of industries in different countries. All these data are actually estimates that could possibly suffer from statistical biases such as noise and missing values. Some weaknesses such as differences between countries in price concept or in import–export-processing activities are tackled by Dietzenbacher et al. (2013). However, the risk that estimates do not match reality always exists. This can be an issue if the employed analysis is quite sensitive to errors in the used estimates, which is exactly our case.

Actually, we rely on the update rule (8) for the solving process, which is nothing more than an iterative improvement based on a gradient-descent procedure (Lee and Seung 2001). This later method is famously known to be highly sensitive to the initial starting solution (Avriel 2003), which is set in our case according to Kagawa et al. (2015), Ding et al. (2008) to the indicator matrix solution obtained by spectral clustering and the application of *K*-means (plus a constant matrix). Furthermore, spectral clustering is sensitive to errors in the input adjacency matrix (Von Luxburg 2007, p. 18). Following this reasoning, the NMF method is equally sensitive to errors in the input adjacency matrix. On the other hand, we already mentioned about the sensitive nature of *K*-means to the initial starting points.

Given these different sources of uncertainty in the employed analysis, we could easily suspect a risk of overfitting dataset instances and model initial choices when reporting Kagawa et al. (2015) cluster assignments. Plus, the reported clusters convey significant information about entry points for mitigating global warming, e.g., which industry sectors could be starting points or priority targets when implementing policies of \(\hbox {CO}_2\) emissions reduction involving the USA and China. Due to the relevance of the results’ implications, special care needs to be taken regarding the impact of errors in input adjacency matrices on the output clustering results.

Instead of relying on heuristic procedures to reduce the model’s uncertainties, which is in a sense done by Kagawa et al. (2013a, 2015) when generating *M* clusterings corresponding to *M* runs of the *K*-means algorithm and then choosing the one with the optimal Ncut, a further systematic mechanism could be more reliable. By simultaneously encapsulating the simulation module and iteratively improving the normalized cut performance for clusterings, our method provides a more rigorous way to approach the sensitivity issue. Robustness against noise in the adjacency matrix is evaluated through a very large number of scenarios. Our overall method can be used as a black-box procedure for clustering methods relying on *K*-means rounding, which also include spectral clusterings.

The setting of the experiment parameters is as follows. In order to cover a large range of noise magnitudes \(\xi\), six instances are considered, \(\xi =0.0\) to 0.5 in steps of 0.1. The distribution of perturbations used to sample the perturbed adjacency matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\) is taken to be uniform. The parameters *N* the number of samples and *M* the number of clusterings are, respectively, set at 1000 and 100, which allow experiments to be run within an acceptable CPU time. All experiment runs were done in a Java environment on a 3-GHz CPU processor with 8 GB of memory. In the remainder of this section, we discuss the obtained results.

After the experiment is run for each instance of \(\xi\), we construct a solution pool \({\mathcal {X}}\) composed by the found solutions plus the Kagawa et al. (2015) clustering. Notice that in all experiment runs, we obtain \(\text {C}_{R_1}=\text {C}_{R_2}\). The results of performing the simulation module with three noise magnitudes, \(\xi =0.1\) (small), 0.5 (large), and 5 (huge), on the constructed set \({\mathcal {X}}\) are shown in Figs. 2, 3, and 4 and the corresponding Tables 3, 4, and 5 in the case of the US construction dataset. The results on the US transport dataset are available in the supporting materials. The first striking observation for both datasets is the low normalized cut performance of Kagawa et al. (2015) clustering for both the nominal adjacency matrix and across the perturbation scenarios. These Ncut values are almost twice the Ncut values of the experiment top clusterings, and those for the US transport case are much higher. Ncut values of Kagawa et al. (2015) clusterings are centered around approximately 21.56 and 45.7, compared with the robustest clusterings of the experiment, which gravitated around 9.8 and 10.9 for, respectively, US construction and transport demands. This means that in uncertain environments Kagawa et al. (2015) clusterings do not perform well, even if their Ncut values are close on average to the nominal case for low and large magnitude cases. It should be noted that the mean Ncut can be misleading due to the presence of outliers. These observations are elucidated clearly via the simulation assessments \(R_1^{\mathcal {X}}\) and \(R_2^{\mathcal {X}}\) of Figures 2, 3, and 4. While the clustering induced by the \(\xi =0.4\) experiment for the US construction case exhibits the most robust behavior across the various noise magnitudes, the Kagawa et al. (2015) clustering is found to be in an unfavorable position of below and to the left of all other plots of \((R_1^{\mathcal {X}}, R_2^{\mathcal {X}})\).

After examining the consistency of the clusters order throughout our experiment runs, which is available in the supporting materials, we compare the cluster components of the Kagawa et al. (2015) clusterings to those of our robustest clusterings for US transport and US construction in Tables 6 and 7, respectively. The US clusters, i.e., induced and generated by US industries, of (C1) and (C5) have the highest positions in Tables 6 and 7, respectively. Each cluster generated in 2009 more than 162 million tonnes of \(\hbox {CO}_2\) emissions in US territories. Their compact structure and high within-cluster emissions are the main differences with the Kagawa et al. (2015) US clusters. In fact, there were two US clusters in the Kagawa et al. (2015) clustering for the US construction dataset.

On the other hand, the two Chinese clusters of (C2) and (C6) have exactly the same components, except for two additional Korean elements, c1 and c3, in the US construction case. The differences between the (C2) and (C6) components and the Kagawa et al. (2015) Chinese clusters are small. The quasi-totalities of the elements are similar. More importantly, the two strong supply chains reported in Kagawa et al. (2015)—(1) c17 (CHN) \(\Rightarrow\) c12 (CHN) and (2) c17 (CHN) \(\Rightarrow\) c9 (CHN)—are still present in our Chinese clusters. c17 refers to Electricity, Gas and Water Supply, c12 to Basic Metals and Fabricated Metal, and c9 to Chemicals and Chemical Products. Both supply chains are major contributors in the \(\hbox {CO}_2\) emissions within the Chinese clusters. Figure 5 illustrates the Chinese emission cluster (C2) of Table 6, which corresponds to the US transport case.

By obtaining similar results for the most important supply chains, we confirm the environmental policy conclusions reached by Kagawa et al. (2015). Mitigating global warming through supply-chain transfers that could be based on reducing the amount of \(\hbox {CO}_2\) emissions for supply chains (1) and (2) has, from our point of view, a good immunity against random deviations of the input–output data. Additionally, our superior Ncut performance reached for the robustest clusterings confers more trust in the environmental conclusions about the policies suggested in Kagawa et al. (2015).

## 5 Conclusion

In this study, we establish a sampling-based procedure in order to examine the robustness of clusterings that could be found using nonnegative matrix factorization or spectral clustering methods. An application of the procedure is provided here by re-examining/comparing the analysis of Kagawa et al. (2015). In their paper, significant clusters in terms of \(\hbox {CO}_2\) emissions that are rapidly growing over time were found. Here, our procedure is applied to the datasets of Kagawa et al. (2015) that have strong environmental implications, namely the two \(\hbox {CO}_2\) emissions networks induced by the US construction and US transport equipment sectors. In our empirical results, we find clusterings that have much better normalized cut performance and robustness assessments than those of Kagawa et al. (2015). Some differences in the components between the compared clusters are observed. However, the main supply-chain paths on which Kagawa et al. (2015) based their recommendations for mitigating global warning still persist. These recommendations concern the significant Chinese clusters linked to our target US demands. In summary, from a robustness perspective, we concur with the Kagawa et al. (2015) environmental conclusions regarding policies.

## References

Avriel M (2003) Nonlinear programming: analysis and methods. Courier Corporation

Bottou L, Bengio Y (1994) Convergence properties of the k-means algorithms. In: Advances in neural information processing systems 7,[NIPS conference, Denver, Colorado, USA, 1994], pp 585–592

Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering. In: ICML, vol 98. Citeseer, pp 91–99

Davis SJ, Peters GP, Caldeira K (2011) The supply chain of CO\(_2\) emissions. Proceedings of the National Academy of Sciences, 201107409

Dietzenbacher E (1995) On the bias of multiplier estimates. J Reg Sci 35(3):377–390

Dietzenbacher E (2006) Multiplier estimates: to bias or not to bias? J Reg Sci 46(4):773–786

Dietzenbacher E, Los B, Stehrer R, Timmer M, De Vries G (2013) The construction of world input–output tables in the wiod project. Econ Syst Res 25(1):71–98

Ding C, Li T, Jordan MI (2008) Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In: 8th IEEE International conference on data mining, 2008. ICDM’08, pp 183–192. IEEE

Ding CH, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: SDM, vol 5. SIAM, pp 606–610

Donath WE, Hoffman AJ (1973) Lower bounds for the partitioning of graphs. IBM J Res Dev 17(5):420–425

Duda RO, Hart PE, Stork DG (1995) Pattern classification and scene analysis, 2nd edn. Wiley Interscience, New York

Epskamp S, Cramer AOJ, Waldorp LJ, Schmittmann VD, Borsboom D (2012) qgraph: network visualizations of relationships in psychometric data. J Stat Softw 48(4):1–18

Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305

Gendreau M, Potvin J-Y (2010) Handbook of metaheuristics, vol 2. Springer, New York

Kagawa S, Okamoto S, Suh S, Kondo Y, Nansai K (2013a) Finding environmentally important industry clusters: multiway cut approach using nonnegative matrix factorization. Soc Netw 35(3):423–438

Kagawa S, Suh S, Hubacek K, Wiedmann T, Nansai K, Minx J (2015) \(\text{CO}_2\) emission clusters within global supply chain networks: implications for climate change mitigation. Glob Environ Chang

Kagawa S, Suh S, Kondo Y, Nansai K (2013b) Identifying environmentally important supply chain clusters in the automobile industry. Econ Syst Res 25(3):265–286

Kannan R, Vempala S, Vetta A (2004) On clusterings: good, bad and spectral. J ACM 51(3):497–515

Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp 556–562

Lenzen M, Moran D, Kanemoto K, Foran B, Lobefaro L, Geschke A (2012) International trade drives biodiversity threats in developing nations. Nature 486(7401):109–112

Liang S, Feng Y, Xu M (2015) Structure of the global virtual carbon network: revealing important sectors and communities for emission reduction. J Ind Ecol 19(2):307–320

MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297

Miller RE, Blair PD (2009) Input–output analysis: foundations and extensions. Cambridge University Press, Cambridge

Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856

Peters GP, Minx JC, Weber CL, Edenhofer O (2011) Growth in emission transfers via international trade from 1990 to 2008. In: Proceedings of the national academy of sciences, 201006388

Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64

Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

Tukker A, Dietzenbacher E (2013) Global multiregional input-output frameworks: an introduction and outlook. Econ Syst Res 25(1):1–19

Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

Zhang Z, Jordan MI et al (2008) Multiway spectral clustering: a margin-based perspective. Stat Sci 23(3):383–403

## Authors' contributions

OR, HO, and SK proposed the methodology and provided discussions. Omar Rifki was in charge of data collection and conducted data analysis. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Funding

This research was supported by a Grant-in-Aid for research [Nos. 26241031 and 16H01797] from the Ministry of Education, Culture, Sports, Science and Technology in Japan.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Rifki, O., Ono, H. & Kagawa, S. The robustest clusters in the input–output networks: global \(\hbox {CO}_2\) emission clusters.
*Economic Structures* **6**, 3 (2017). https://doi.org/10.1186/s40008-017-0062-2

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s40008-017-0062-2