# The robustest clusters in the input–output networks: global \(\hbox {CO}_2\) emission clusters

- Omar Rifki
^{1}Email authorView ORCID ID profile, - Hirotaka Ono
^{1}and - Shigemi Kagawa
^{1}

**6**:3

https://doi.org/10.1186/s40008-017-0062-2

© The Author(s) 2017

**Received: **21 September 2016

**Accepted: **26 January 2017

**Published: **14 February 2017

## Abstract

Finding environmentally significant clusters in global supply-chain networks of goods and services has been investigated by Kagawa et al. (Soc Netw 35(3):423–438, 2013a; Econ Syst Res 25(3):265–286, 2013b; Glob Environ Chang, 2015), using the popular clustering method of nonnegative matrix factorization, which actually yields sensitive cluster assignments. Due to this sensitivity issue, there is a danger of overfitting of the results. In order to confirm the robustness of the obtained clusters, which in fact have strong implications for international climate change mitigation, especially for the US-induced Chinese clusters, we design a simulation-based experiment. Empirical findings of the proposed approach are compared with those of Kagawa et al. (Glob Environ Chang, 2015). The environmental implications are reported as well.

## Keywords

_{2}emissionsCluster analysisSimulation

## 1 Background

Graph partitioning methods or clustering methods in general have been widely used for understanding and visualizing fundamental features of social and economic network complexity, e.g., Newman and Girvan (2004), Kagawa et al. (2013a, b), Liang et al. (2015). A striking environmental study has been provided by Kagawa et al. (2015); the authors identified \(\hbox {CO}_2\) emission clusters within global supply-chain networks formed by the final demand impulse of a specific final product and argued how the identified emission clusters have contributed to increasing \(\hbox {CO}_2\) emission transfers and have grown over time [see also Davis et al. (2011) and Peters et al. (2011) for the analysis of \(\hbox {CO}_2\) emission transfers]. The authors applied the nonnegative matrix factorization (NMF) approach (Lee and Seung 1999) and obtained certain clusters whose normalized cut value (Ncut value) is minimized, which implies that the obtained clusters could best explain the environmentally important supply chains (Kagawa et al. 2015).

Although Kagawa et al. (2015) provided important emission clusters for climate change mitigation, there is a crucial problem that the obtained results highly depend on the employed algorithms and parameters. To see the problem, we show an example: Suppose that we apply a typical clustering algorithm such as the *K*-means method (MacQueen et al. 1967) for the analysis. By setting the parameter \(K=10\), we can obtain a set of 10 clusters, but if we instead set the parameter \(K=11\), the obtained 11 clusters could include very different sectors from the ones of \(K=10\). In such a situation, which “clusters” really reflect the actual economic structure? Or which “clusters” are plausible for the analysis? We have the same problem for not only the value of *K* but also the many other parameters used in the employed algorithms. It is worth noting that the *K*-means algorithm is indeed used in the NMF method.

The same problem is seen for the quality of the datasets. Economic network data such as input–output tables usually contain errors, or they always just constitute an approximation, which is a central issue in input–output analysis (e.g., Dietzenbacher 1995, 2006). Due to the errors, the same problem mentioned above appears in cluster analysis. That is, the clustering analysis could be very sensitive to the employed algorithms and datasets. In fact, the actual datasets used for constructing the supply-chain networks of Kagawa et al. (2015) are estimations derived from the multi-regional input–output framework (e.g., Lenzen et al. 2012; Dietzenbacher et al. 2013). If the employed clustering technique is quite sensitive to changes in the input–output data, which is our case, we need to be careful to claim that the resulted clusters are plausible.

This paper investigates this problem and proposes a method to obtain clusters that are “stable” with respect to errors or noise in the data and parameters of the algorithm. That is, even if we slightly perturb the values in datasets, clusters obtained by our method still have a good Ncut value; even though the original data may contain errors or noise, the obtained clusters are still reliable if the noise is small enough. The idea of our approach is rather simple and is based on simulations. It can be interpreted as applying a Monte Carlo-type simulation to obtain stable clusters in terms of perturbations by noise in data or choices of parameters. We also propose two criteria and a diagram based on the criteria to guarantee the robustness of the obtained clusters. The details will be described in Sect. 2. It should be noted that, due to its generality, this diagram could provide a new guideline to measure the reliability of analysis results, used in various fields where clustering analyses are applied, such as economic and social networks.

As a case study, we focused on an adjacency matrix obtained by using a multi-regional input–output analysis (Kagawa et al. 2015). The proposed two criteria were applied to obtain robust \(\hbox {CO}_2\) emission clusters within global supply-chain networks. The robustness and performance of our clustering results are compared to those of Kagawa et al. (2015). We particularly evaluate the difference in terms of cluster compositions, which carry strong environmental implications. The remaining sections are as follows: Section 2 describes the methodology in this study, Sect. 3 provides a numerical example, Sect. 4 presents the obtained empirical results and discussions, and Sect. 5 concludes this paper.

## 2 Methods

### 2.1 Constructing an adjacency matrix

*i*in country

*r*to industry

*j*in country

*s*. Here,

*M*is the number of industries and

*N*is the number of countries. If geographical input coefficients are defined by \(\mathbf{A}=(a_{ij}^{rs})\) with \(a_{ij}^{rs}={Z}_{ij}^{rs}/x_j^s\), where \(x_j^s\) denotes domestic output of industry

*j*in country

*s*, the widely used interregional input–output (IRIO) model (e.g., Miller and Blair 2009) can be formulated as

*i*of country

*r*to country

*s*.

Solving the IRIO model in Eq. (1) yields \({{\mathbf {x}}} = ({\mathbf {I}}- {\mathbf {A}})^{-1}{\mathbf {f}}={{{\mathbf {B}}}{{\mathbf {f}}}}\). Here \({{\mathbf {I}}}\) is the identity matrix, and \({{\mathbf {B}}} = ({{\mathbf {I}}}-{\mathbf {A}})^{-1}\) is the direct and indirect requirement matrix, in which each element \(b^{rs}_{ij}\) represents how many units of the products of industry *i* in country *r* are needed to produce one unit of the products of industry *j* in country *s*.

*j*in country

*s*. The matrix \({\mathbf X}_j^s\) shows the economic transactions between geographically distributed industries that are triggered by the final demand on industry

*j*located in country

*s*.

*i*in country

*r*is defined as \({\varvec{\alpha }}=(\alpha _i^r)\), the \(\hbox {CO}_2\) emissions embedded in the economic transactions are obtained as

### 2.2 Clustering input–output analysis

A graph is a discrete structure that consists of vertices and edges that connect two vertices. In the context of economic analysis, a vertex and an edge correspond to a sector and a transaction between the corresponding two sectors, respectively. Graph clustering concerns finding for a graph similar vertices that can be arranged in dissimilar groups. This problem has multiple variants, algorithms, and applications; see Schaeffer (2007). Spectral and NMF-based clusterings have become popular in recent years, especially in the field of machine learning, e.g., Shi and Malik (2000), Ng et al. (2002), Kannan et al. (2004), Ding et al. (2005), Von Luxburg (2007), Zhang and Jordan (2008). However, spectral clustering can be traced back to the field of computer science for the graph partitioning problem, due to the work of Donath and Hoffman (1973) and Fiedler (1973).

Suppose an undirected weighted network \(G=(V,E)\) of order \(n=|V|\) with edge weights \(w_{uv}\). In the context of clustering IO analysis (CIOA), a vertex *u* corresponds to a sector of an industry *i* in a country *r*. We denote this by \(u=(i,r)\). Here, *V* and *E*, respectively, represent the set of all the sectors and the set of all the transactions between two sectors, and \(|V|=n=M\times N\). The edge weights represent the amounts of \(\hbox {CO}_2\) emissions associated with the corresponding transactions. It is also possible to consider unweighted graphs with zero-one edge weights. A central instrument of the spectral and NMF clustering framework is the use of *Laplacian matrices*, which are matrix representations of graphs. If \(\mathbf{W}=(w_{uv})_{1\le u,v \le n}\) is the *adjacency matrix* of the network *G* and \(\mathbf{D}=\text {diag}({\mathbf {d}})\) is the diagonal *degree matrix* of *G*, with \({\mathbf {d}}=(d_u)_{1\le u\le n}=(\sum _{v=1}^{n} w_{uv})_{1\le u\le n}\) being the vector of vertices’ degrees, then the Laplacian matrix of *G* can be defined as \(\mathbf{L}=\mathbf{D}-\mathbf{W}\). The normalized version of \(\mathbf{L}\) is given by \(\mathbf{D}^{-\frac{1}{2}}{} \mathbf{L}\mathbf{D}^{-\frac{1}{2}}\) (Shi and Malik 2000; Ng et al. 2002).

*V*is called a (

*k*-)

*partition*of

*V*if \(\bigcup _{p=1}^{k} U_p = V\) and \(U_p \cap U_q = \emptyset\) for \(1\le p,q \le k, p\ne q\). A graph partition is a partition of vertices. The objective is to minimize for each cluster its total weights to the rest of the graph, which is called

*cut*in graph theory and expressed as \(\text {cut}(U,{\bar{U}})=\sum _{u\in U, v\in {\bar{U}}}w_{uv}\) for a subset \(U \subset V\) of vertices and its complement \({\bar{U}}\). In this vein, Shi and Malik (2000) introduced the

*normalized cut*criterion, abbreviated as Ncut, that produces when it is minimized clusters of reasonable sizes. For

*k*partition \(U_1, U_2, \ldots, U_k\) of

*V*, the \(\text {Ncut}\) is defined as

*k*clusters can be formulated as the following combinatorial problem:

*n*-dimensional indicator vector of the cluster \(U_i\); \({q}^{(i)}_u=1\) if a vertex

*u*belongs to cluster \(U_i\), 0 otherwise. Note that \({\mathbf {H}}\) is a nonnegative matrix; that is, all the elements of \({\mathbf {H}}\) are nonnegative.

*K*-means algorithm, which we call the

*rounding step*in our algorithm and explain below.

*K*-means algorithm introduced by MacQueen et al. (1967) is one of the most popular hierarchical clustering methods. This algorithm starts by selecting

*k*initial clusters identified by their cluster centers and then iteratively refining them as follows. Given a target dataset \(\{d_1,d_2,\ldots,d_n\}\) to be clustered, each iteration of the algorithm aims to minimize the within-cluster sum of squared distance, which is expressed as

*K*-means can actually be proven to converge to a local minimum of expression (9) (Bottou and Bengio 1994). However, this algorithm suffers from several drawbacks, mainly its sensitivity to the initial conditions, which can lead to potentially misleading results (Bradley and Fayyad 1998). This issue is actually common among hill-climbing algorithms, where according to Duda et al. (1995): “different starting points can lead to different solutions and one never knows whether or not the best solution has been found.” Thus, a bad choice of the initial cluster centers can easily converge to a poor cluster assignment. A second issue concerns the best value of the parameter

*k*to be chosen. A bad choice here also can lead to poor results.

The basic steps of the NMF method presented in this section are summarized as the following algorithm.

### 2.3 Simulation module

*K*-means algorithm on the matrix \(\hat{{\mathbf {H}}}\), as prescribed in the NMF algorithm. In order to simulate uncertain environments for the input clusterings, which will be introduced in Table 1, small perturbations in \(\mathbf{W}=(w_{uv})_{1\le u,v \le n}\), the adjacency matrix of the network

*G*, are generated

*N*times. Consequently, perturbed adjacency matrices can be obtained as \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\), such that

*N*denotes the number of the samples from now on. To assess the performance of a clustering \(\text {C}=\{U_1,U_2,\ldots,U_k\}\), obtained from the “initial” adjacency matrix \(\mathbf{W}_0=\mathbf{W}\), we propose the modified Ncut criterion as follows: \(\text {Ncut}(\text {C},\mathbf{W}_I)=\sum ^k_{p=1} \Big (\sum _{u \in U_p}\sum _{v \notin U_p}x_{uv}/\sum _{ u\in U_p}\sum _{v \in V}x_{uv}\Big )\). It should be noted that this modified Ncut value represents the goodness achieved in a network that includes a small amount of noise in the edge weights, i.e., the perturbed adjacency matrix \(\mathbf{W}_I=(x_{uv})_{1\le u,v \le n}\), under the given clustering C. Thus, under the same clustering C, this Ncut criterion is well distinguished for different perturbed matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\).

*K*-means algorithm to the

*n*-rows of the matrix \(\hat{{\mathbf {H}}}\),

*M*times. The clusterings \(\text {C}_1, \text {C}_2, \ldots, \text {C}_M\), termed input clusterings, are thus obtained. These clusterings are assigned differently due to the instability of repeatedly conducting

*K*-means rounding. The perturbation scenario \((I_I)\) implies that the matrix \(\mathbf{W}_I\) is used for the Ncut computation. We suppose that the perturbed matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\) take values from the following uncertainty set:

*U*, deviations are symmetric around the values of the initial matrix \(\mathbf{W}_0\), such that each element \((x)_{uv}\) of the randomly perturbed adjacency matrices is limited within the interval \(\Big [(1-\xi )w_{uv},\,(1+\xi )w_{uv}\Big ]\). In practice, to obtain the matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\), we draw

*N*independent and identically distributed samples from the set

*U*.

*M*input clusterings. The first measure, \(R_1^{\mathcal {X}}\), reports for a given clustering \(\text {C}\in {\mathcal {X}}\) the fractional degree that it is the best clustering within \({\mathcal {X}}\) across the perturbation scenarios. For instance, C being the best clustering within \({\mathcal {X}}\) for the perturbation scenario \((I_I)\) means that C yields the smallest Ncut value among the Ncut values computed using the

*M*clusterings of \({\mathcal {X}}\) and the perturbed adjacency matrix \(\mathbf{W}_I\), i.e., \(\text {C}\in \arg \underset{\text {D} \in {\mathcal {X}}}{\min }\,\text {Ncut}(\text {D}, \mathbf{W}_I)\). We introduce the indicator function \(I_{{\mathcal {X}}}(\text {C},\mathbf{W}_I)\) that takes value one if C is the best clustering within \({\mathcal {X}}\) for perturbed matrix \(\mathbf{W}_I\), zero otherwise. The indicator function is expressed as follows:

*N*matrices as \(\widehat{R_1^{\mathcal {X}}}(\text {C})\) as follows:

*N*. Thus, the measure expressed by (12) can be viewed as an estimation of the accurate measure \(R_1^{\mathcal {X}}\). We use the caret symbol \((\;\;\widehat{}\;\;)\) to indicate this approximation. The accurate measure \(R_1^{\mathcal {X}}(\text {C})\) is assumed to be reached for an infinite number of samples: \(\widehat{R_1^{\mathcal {X}}}(\text {C}) \xrightarrow [N \rightarrow \infty ]{} R_1^{\mathcal {X}}(\text {C})\).

*performance ratio*. We define the performance ratio of a clustering \(\text {C}\in {\mathcal {X}}\) under the perturbed adjacency matrix \(\mathbf{W}_I\) as the ratio of the Ncut value of C to the smallest Ncut value computed for the perturbed matrix \(\mathbf{W}_I\). The ratio expression for a given clustering C and a given matrix \(\mathbf{W}_I\) is given by

*N*matrices as \(\widehat{R_2^{\mathcal {X}}}(\text {C})\) as follows:

*N*. The more a clustering \(\text {C}\) is robust within \({\mathcal {X}}\), the higher are the estimated values of \(R_1^{\mathcal {X}}(\text {C})\) and \(R_2^{\mathcal {X}}(\text {C})\). To see why both measures are useful, we provide a numerical example with a focus with simplified network data in the next section.

## 3 Numerical example

*K*-means method using these 16 feature vectors ten times and obtained the ten clusterings listed in Table 2.

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(\text {Ncut}(\text {C}, \mathbf{W}_0)\) | 1.611 | 0.556 | 0.551 | 1.09 | 1.611 | 1.412 | 0.889 | 1.791 | 1.225 | 1.412 |

*U*given in expression (10): For this specific perturbed adjacency matrix, clustering \(\text {C}_2\) is deemed to be the best clustering according to the Ncut results, which are shown in the following table:

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(\text {Ncut}(\text {C}, \mathbf{W}_I)\) | 1.615 | 0.550 | 0.564 | 1.170 | 1.615 | 1.334 | 0.909 | 1.738 | 1.315 | 1.334 |

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(I_{{\mathcal {X}}}(\text {C}, \mathbf{W}_I)\) | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Performance | 2.933 | 1 | 1.024 | 2.124 | 2.933 | 2.422 | 1.651 | 3.156 | 2.387 | 2.422 |

Ratio |

The indicator function tacked on clusterings can be intuitively derived. It merely indicates through a 0–1 binary representation which clusterings yield the best performance for the current perturbation scenario. In the current matrix \(\mathbf{W}_I\), clustering \(\text {C}_2\) obviously takes value one, while the other cluster assignments take value zero. For the performance ratio, the emphasis is instead put on the relative span to the best clusterings; a smaller value of the ratio indicates a better clustering. The clusterings \(\text {C}_3\) and \(\text {C}_7\) seem to be better in this regard for the perturbed matrix \(\mathbf{W}_I\).

*U*and computes the values of the indicator function and the performance ratio for the input clusterings and the matrix \(\mathbf{W}_I\). The robustness measures \(\widehat{R_1^{\mathcal {X}}}\) and \(\widehat{R_2^{\mathcal {X}}}\) as explained in the previous section are averages of these iteratively computed values. The following table shows the results we obtained for the current simplified network when the sample size is set to \(N=100\) perturbation scenarios:

Clustering | \(\text {C}_1\) | \(\text {C}_2\) | \(\text {C}_3\) | \(\text {C}_4\) | \(\text {C}_5\) | \(\text {C}_6\) | \(\text {C}_7\) | \(\text {C}_8\) | \(\text {C}_9\) | \(\text {C}_{10}\) |
---|---|---|---|---|---|---|---|---|---|---|

\(\widehat{R_1^{\mathcal {X}}}(\text {C})\) | 0 | 0.247 | 0.752 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

\(\widehat{R_2^{\mathcal {X}}}(\text {C})\) | 0.333 | 0.985 | 0.996 | 0.492 | 0.333 | 0.384 | 0.611 | 0.304 | 0.443 | 0.384 |

According to both robustness measures, clustering \(\text {C}_3\) performs better throughout the perturbation scenarios. However, \(\text {C}_3\) is closely followed by clustering \(\text {C}_2\) according to measure \(\widehat{R_2^{\mathcal {X}}}\), even if in terms of measure \(\widehat{R_1^{\mathcal {X}}}\) clustering \(\text {C}_2\) appears to be the best for only 24.7% of the randomly perturbed adjacency matrices.

## 4 Empirical results

In this section, we introduce the experiment scheme that we designed in order to obtain clusterings that not only are robust as shown in the simulation module of Sect. 2.3 but also have a good enough normalized cut performance. Subsequently, the empirical results of Kagawa et al. (2015) are discussed and compared to those of the experiment run on the same datasets.

### 4.1 Experiment scheme

*K*-means algorithm applied to matrix \(\hat{{\mathbf {H}}}\). The experiment continues by generating

*M*clusterings, using same initialization process as for \(\text {C}_{R_1}\) and \(\text {C}_{R_2}\), to construct the set of input clusterings \({\mathcal {X}}=\{\text {C}_{R_1}, \text {C}_{R_2}, \text {C}_1, \text {C}_2,\ldots,\text {C}_M\}\). Afterward, the simulation module of Table 1 is performed on the set \({\mathcal {X}}\). The best resulting clusterings \(\text {C}^\mathrm{simul}_{R_1}\) and \(\text {C}^\mathrm{simul}_{R_2}\) thereafter become \(\text {C}_{R_1}\) and \(\text {C}_{R_2}\), respectively. The set \({\mathcal {X}}\) and the simulation are iteratively computed. Convergence is reached when the same \(\text {C}^\mathrm{simul}_{R_1}\) and \(\text {C}^\mathrm{simul}_{R_2}\) are obtained for a number of consecutive iterations. In our study, we choose a minimum of five iterations. Iterative schemes such as ours are widely used for practical solving of optimization problems. The basic idea is to continuously refine the approximate solutions. For instance, the various meta-heuristic solvers such as genetic algorithms, simulated annealing, and iterated local search could be mentioned (Gendreau and Potvin 2010).

### 4.2 Results and discussion

Kagawa et al. (2013a, b) have been among the first to apply clustering methods to environmental analysis of economic systems. Their approach connects input–output models to techniques of network partition. As mentioned in introduction, highly important clusters in terms of \(\hbox {CO}_2\) emissions have been identified by Kagawa et al. (2015). While China in 2009 was the largest emitter of \(\hbox {CO}_2\) production-based emissions (Kagawa et al. 2015), the authors’ goal was then to identify which country contributed most to these emissions and through which supply chains. To quantify \(\hbox {CO}_2\) intensity for a cluster, the *within-cluster sum* is used. This is defined for a cluster \(U_p\) of a supply-chain network with the adjacency matrix \(\mathbf{W}= (w_{uv})_{1\le u,v\le n}\) as the summation \(\sum _{u \in U_p}\sum _{v \in U_p} w_{uv}\).

In Kagawa et al. (2015), it was found among the 4756 industry clusters induced by the final demand of various good and services in the five developed countries of the USA, the UK, Germany, France, and Japan that both the US construction industry and the US transport equipment industry generate prominent Chinese clusters that are among the clusters with the 15th highest within-cluster sums. These two Chinese clusters have nevertheless the highest annual growth rates (also within the top 15), equal to 57.5 and 41.7\(\%\) for US construction and transport equipment demands, respectively. In the top 15 clusters, there is only one other Chinese cluster, that one induced by the Japanese construction demand, but this cluster has a lower growth rate and a smaller within-cluster sum compared to both of the abovementioned US-induced Chinese clusters.

Therefore, for data, we consider the two adjacency matrices of \(\hbox {CO}_2\) emissions induced by US construction and transport equipment demands for the year 2009, which were also used by Kagawa et al. (2015). We shall refer to them as the US construction and US transport datasets. Each adjacency matrix characterizes a network in which vertices are specified by a combination of a country plus an industry category (country–industry), with a total of 41 countries and 35 industries. The considered categories of countries and industries are listed in the supporting materials.

*K*to be used in the experiment, we rely on the modularity index similarly as done by Kagawa et al. (2013a, 2015). This index, which has been developed by Newman and Girvan (2004), is optimal (maximized) for the correct number of clusters. The modularity index can be formulated for a network \(G=(V,E)\) of an adjacency matrix \(\mathbf{W}= (w_{uv})_{1\le u,v\le n}\) as

*k*-th cluster and \(q_k=\big (\sum _{u\in U_k} \sum _{v\in V} w_{uv} /\sum _{u\in V} \sum _{v\in V} w_{uv}\big )\) represents the betweenness ratio for the

*k*-th cluster. For each dataset case, we compute the modularity index for the instances \(1 \le K \le 200\). Each

*K*instance involves performing NMF clustering to obtain the approximate matrix \(\hat{{\mathbf {H}}}\) and then averaging the modularity index over 10 runs of

*K*-means rounding. For the US construction dataset, the best index value is reached at \(K = 66\) for \(Q=0.197\). We opted, although, for \(K=64\) as in Kagawa et al. (2015) in order to ensure adequate comparison. Actually, \(Q(K=64)=0.195\) is very close in value to the best case \(Q(K = 66)\). For the US transport dataset,

*K*is set to the maximum index value, \(K=68\), which coincides exactly with the Kagawa et al. (2015) choice. The plots of

*Q*(

*K*) are available in the supporting materials.

Adjacency matrices considered here represent \(\hbox {CO}_2\) emissions in interindustries induced by the final demand of products for a certain industry in a certain country. These matrices are based in an atomic level on three elements that are all obtained from the World Input–Output Database (Kagawa et al. 2015; Dietzenbacher et al. 2013; Tukker and Dietzenbacher 2013): the quantities of product sale between industries of different countries, quantities of product sale to final consumers, and the amounts of carbon dioxide (\(\hbox {CO}_2\)) emission of industries in different countries. All these data are actually estimates that could possibly suffer from statistical biases such as noise and missing values. Some weaknesses such as differences between countries in price concept or in import–export-processing activities are tackled by Dietzenbacher et al. (2013). However, the risk that estimates do not match reality always exists. This can be an issue if the employed analysis is quite sensitive to errors in the used estimates, which is exactly our case.

Actually, we rely on the update rule (8) for the solving process, which is nothing more than an iterative improvement based on a gradient-descent procedure (Lee and Seung 2001). This later method is famously known to be highly sensitive to the initial starting solution (Avriel 2003), which is set in our case according to Kagawa et al. (2015), Ding et al. (2008) to the indicator matrix solution obtained by spectral clustering and the application of *K*-means (plus a constant matrix). Furthermore, spectral clustering is sensitive to errors in the input adjacency matrix (Von Luxburg 2007, p. 18). Following this reasoning, the NMF method is equally sensitive to errors in the input adjacency matrix. On the other hand, we already mentioned about the sensitive nature of *K*-means to the initial starting points.

Given these different sources of uncertainty in the employed analysis, we could easily suspect a risk of overfitting dataset instances and model initial choices when reporting Kagawa et al. (2015) cluster assignments. Plus, the reported clusters convey significant information about entry points for mitigating global warming, e.g., which industry sectors could be starting points or priority targets when implementing policies of \(\hbox {CO}_2\) emissions reduction involving the USA and China. Due to the relevance of the results’ implications, special care needs to be taken regarding the impact of errors in input adjacency matrices on the output clustering results.

Instead of relying on heuristic procedures to reduce the model’s uncertainties, which is in a sense done by Kagawa et al. (2013a, 2015) when generating *M* clusterings corresponding to *M* runs of the *K*-means algorithm and then choosing the one with the optimal Ncut, a further systematic mechanism could be more reliable. By simultaneously encapsulating the simulation module and iteratively improving the normalized cut performance for clusterings, our method provides a more rigorous way to approach the sensitivity issue. Robustness against noise in the adjacency matrix is evaluated through a very large number of scenarios. Our overall method can be used as a black-box procedure for clustering methods relying on *K*-means rounding, which also include spectral clusterings.

The setting of the experiment parameters is as follows. In order to cover a large range of noise magnitudes \(\xi\), six instances are considered, \(\xi =0.0\) to 0.5 in steps of 0.1. The distribution of perturbations used to sample the perturbed adjacency matrices \(\mathbf{W}_1, \mathbf{W}_2, \ldots, \mathbf{W}_N\) is taken to be uniform. The parameters *N* the number of samples and *M* the number of clusterings are, respectively, set at 1000 and 100, which allow experiments to be run within an acceptable CPU time. All experiment runs were done in a Java environment on a 3-GHz CPU processor with 8 GB of memory. In the remainder of this section, we discuss the obtained results.

Ncut values for the nominal scenario (nominal Ncut) and averaged over all scenarios (mean Ncut) for the clusterings of the simulation run corresponding to Fig. 2

Clustering | \(\xi =0\) | \(\xi =0.1\) | \(\xi =0.2\) | \(\xi =0.3\) | \(\xi =0.4\) | \(\xi =0.5\) | Kagawa et al. (2015) |
---|---|---|---|---|---|---|---|

Nominal Ncut | 9.800 | 11.991 | 12.931 | 12.557 | 9.521 | 9.557 | 21.564 |

Mean Ncut | 9.802 | 11.994 | 12.943 | 12.561 | 9.522 | 9.560 | 21.568 |

Ncut values for the nominal scenario (nominal Ncut) and averaged over all scenarios (mean Ncut) for the clusterings of the simulation run corresponding to Fig. 3

Clustering | \(\xi =0\) | \(\xi =0.1\) | \(\xi =0.2\) | \(\xi =0.3\) | \(\xi =0.4\) | \(\xi =0.5\) | Kagawa et al. (2015) |
---|---|---|---|---|---|---|---|

Nominal Ncut | 9.800 | 11.991 | 12.931 | 12.557 | 9.521 | 9.557 | 21.564 |

Mean Ncut | 9.810 | 12.006 | 12.953 | 12.566 | 9.529 | 9.567 | 21.578 |

Ncut values for the nominal scenario (nominal Ncut) and averaged over all scenarios (mean Ncut) for the clusterings of the simulation run corresponding to Fig. 4

Clustering | \(\xi =0\) | \(\xi =0.1\) | \(\xi =0.2\) | \(\xi =0.3\) | \(\xi =0.4\) | \(\xi =0.5\) | Kagawa et al. (2015) |
---|---|---|---|---|---|---|---|

Nominal Ncut | 9.800 | 11.991 | 12.931 | 12.557 | 9.521 | 9.557 | 21.564 |

Mean Ncut | 12.498 | 14.523 | 14.811 | 11.658 | 5.751 | 10.753 | 31.696 |

After examining the consistency of the clusters order throughout our experiment runs, which is available in the supporting materials, we compare the cluster components of the Kagawa et al. (2015) clusterings to those of our robustest clusterings for US transport and US construction in Tables 6 and 7, respectively. The US clusters, i.e., induced and generated by US industries, of (C1) and (C5) have the highest positions in Tables 6 and 7, respectively. Each cluster generated in 2009 more than 162 million tonnes of \(\hbox {CO}_2\) emissions in US territories. Their compact structure and high within-cluster emissions are the main differences with the Kagawa et al. (2015) US clusters. In fact, there were two US clusters in the Kagawa et al. (2015) clustering for the US construction dataset.

On the other hand, the two Chinese clusters of (C2) and (C6) have exactly the same components, except for two additional Korean elements, c1 and c3, in the US construction case. The differences between the (C2) and (C6) components and the Kagawa et al. (2015) Chinese clusters are small. The quasi-totalities of the elements are similar. More importantly, the two strong supply chains reported in Kagawa et al. (2015)—(1) c17 (CHN) \(\Rightarrow\) c12 (CHN) and (2) c17 (CHN) \(\Rightarrow\) c9 (CHN)—are still present in our Chinese clusters. c17 refers to Electricity, Gas and Water Supply, c12 to Basic Metals and Fabricated Metal, and c9 to Chemicals and Chemical Products. Both supply chains are major contributors in the \(\hbox {CO}_2\) emissions within the Chinese clusters. Figure 5 illustrates the Chinese emission cluster (C2) of Table 6, which corresponds to the US transport case.

\(\hbox {CO}_2\) clusters with the highest within-cluster emissions (Kt \(\hbox {CO}_2\) eq.) induced by the final demand of the US transport dataset for the following two cases: the Kagawa et al. (2015) clustering and the robustest clustering of the experiment (\(\xi =0.3\))

Rank | Kagawa et al. (2015) | Experiment (\(\xi =0.3\)) | ||||
---|---|---|---|---|---|---|

Cluster name | Industrial sectors | Within-cluster sum | Cluster name | Industrial sectors | Within-cluster sum | |

1 | American and polonese cluster | USA: Mining and Quarrying | 49,201 | American cluster (C1) | USA: Mining and Quarrying | 164,072 |

USA: Pulp, Paper, Paper, Printing and Publishing | USA: Pulp, Paper, Paper, Printing and Publishing | |||||

USA: Coke, Refined Petroleum and Nuclear Fuel | USA: Coke, Refined Petroleum and Nuclear Fuel | |||||

USA: Chemicals and Chemical Products | USA: Chemicals and Chemical Products | |||||

USA: Rubber and Plastics | USA: Rubber and Plastics | |||||

USA: Other Non-Metallic Mineral | USA: Other Non-Metallic Mineral | |||||

USA: Basic Metals and Fabricated Metal | USA: Basic Metals and Fabricated Metal | |||||

USA: Transport Equipment | USA: Wood and Products of Wood and Cork | |||||

USA: Electricity, Gas and Water Supply | USA: Electricity, Gas and Water Supply | |||||

USA: Wholesale Trade and Commission Trade, | USA: Wholesale Trade and Commission Trade, | |||||

Except of Motor Vehicles and Motorcycles | Except of Motor Vehicles and Motorcycles | |||||

USA: Inland Transport | USA: Inland Transport | |||||

USA: Other Supporting and Auxiliary Transport | USA: Retail Trade, Except of Motor Vehicles and | |||||

Activities; Activities of Travel Agencies | Motorcycles; Repair of Household Goods | |||||

USA: Financial Intermediation | USA: Construction | |||||

USA: Renting of M&Eq and Other Business Activities | USA: Air Transport | |||||

POL: Mining and Quarrying | USA: Other Supporting and Auxiliary Transport | |||||

POL: Coke, Refined Petroleum and Nuclear Fuel | Activities; Activities of Travel Agencies | |||||

POL: Chemicals and Chemical Products | USA: Post and Telecommunications | |||||

POL: Rubber and Plastics | USA: Financial Intermediation | |||||

POL: Other Non-Metallic Mineral | USA: Renting of M&Eq and Other Business Activities | |||||

POL: Basic Metals and Fabricated Metal | USA: Other Community, Social and Personal Services | |||||

POL: Machinery, Nec | ||||||

POL: Electrical and Optical Equipment | ||||||

POL: Transport Equipment | ||||||

POL: Electricity, Gas and Water Supply | ||||||

POL: Wholesale Trade and Commission Trade, | ||||||

Except of Motor Vehicles and Motorcycles | ||||||

POL: Retail Trade, Except of Motor Vehicles and | ||||||

Motorcycles; Repair of Household Goods | ||||||

POL: Inland Transport | ||||||

POL: Real Estate Activities | ||||||

POL: Renting of M&Eq and Other Business Activities | ||||||

POL: Other Community, Social and Personal Services | ||||||

2 | Big cluster | (664 elements) | 18,488 | Chinese cluster (C2) | CHN: Agriculture, Hunting, Forestry and Fishing | 13,035 |

CHN: Mining and Quarrying | ||||||

CHN: Food, Beverages and Tobacco | ||||||

CHN: Textiles and Textile Products | ||||||

CHN: Wood and Products of Wood and Cork | ||||||

CHN: Pulp, Paper, Paper, Printing and Publishing | ||||||

CHN: Coke, Refined Petroleum and Nuclear Fuel | ||||||

CHN: Chemicals and Chemical Products | ||||||

CHN: Rubber and Plastics | ||||||

CHN: Other Non-Metallic Mineral | ||||||

CHN: Basic Metals and Fabricated Metal | ||||||

CHN: Machinery, Nec | ||||||

CHN: Electrical and Optical Equipment | ||||||

CHN: Electricity, Gas and Water Supply | ||||||

CHN: Hotels and Restaurants | ||||||

CHN: Inland Transport | ||||||

CHN: Renting of M&Eq and Other Business Activities | ||||||

3 | Chinese cluster | CHN: Agriculture, Hunting, Forestry and Fishing | 12,805 | Rest of the world cluster (C3) | DNK: Water Transport | 5963 |

CHN: Mining and Quarrying | FRA: Agriculture, Hunting, Forestry and Fishing | |||||

CHN: Food, Beverages and Tobacco | FRA: Food, Beverages and Tobacco | |||||

CHN: Textiles and Textile Products | FRA: Wood and Products of Wood and Cork | |||||

CHN: Leather, Leather and Footwear | FRA: Pulp, Paper, Paper, Printing and Publishing | |||||

CHN: Pulp, Paper, Paper, Printing and Publishing | FRA: Coke, Refined Petroleum and Nuclear Fuel | |||||

CHN: Coke, Refined Petroleum and Nuclear Fuel | FRA: Chemicals and Chemical Products | |||||

CHN: Chemicals and Chemical Products | FRA: Rubber and Plastics | |||||

CHN: Rubber and Plastics | FRA: Other Non-Metallic Mineral | |||||

CHN: Other Non-Metallic Mineral | FRA: Basic Metals and Fabricated Metal | |||||

CHN: Basic Metals and Fabricated Metal | FRA: Machinery, Nec | |||||

CHN: Machinery, Nec | FRA: Electrical and Optical Equipment | |||||

CHN: Electrical and Optical Equipment | FRA: Transport Equipment | |||||

CHN: Transport Equipment | FRA: Manufacturing, Nec; Recycling | |||||

CHN: Electricity, Gas and Water Supply | FRA: Electricity, Gas and Water Supply | |||||

CHN: Inland Transport | FRA: Sale, Maintenance and Repair of Motor Vehicles | |||||

CHN: Renting of M&Eq and Other Business Activities | and Motorcycles; Retail Sale of Fuel | |||||

FRA: Wholesale Trade and Commission Trade, | ||||||

Except of Motor Vehicles and Motorcycles | ||||||

FRA: Retail Trade, Except of Motor Vehicles and | ||||||

Motorcycles; Repair of Household Goods | ||||||

FRA: Hotels and Restaurants | ||||||

FRA: Inland Transport | ||||||

FRA: Financial Intermediation | ||||||

FRA: Renting of M&Eq and Other Business Activities | ||||||

FRA: Other Community, Social and Personal Services | ||||||

KOR: Water Transport | ||||||

ROW: Agriculture, Hunting, Forestry and Fishing | ||||||

ROW: Mining and Quarrying | ||||||

ROW: Food, Beverages and Tobacco | ||||||

ROW: Wood and Products of Wood and Cork | ||||||

ROW: Pulp, Paper, Paper, Printing and Publishing | ||||||

ROW: Chemicals and Chemical Products | ||||||

ROW: Rubber and Plastics | ||||||

ROW: Basic Metals and Fabricated Metal | ||||||

ROW: Machinery, Nec | ||||||

ROW: Electrical and Optical Equipment | ||||||

ROW: Electricity, Gas and Water Supply | ||||||

ROW: Wholesale Trade and Commission Trade, | ||||||

Except of Motor Vehicles and Motorcycles | ||||||

ROW: Retail Trade, Except of Motor Vehicles and | ||||||

Motorcycles; Repair of Household Goods | ||||||

ROW: Hotels and Restaurants | ||||||

ROW: Inland Transport | ||||||

ROW: Water Transport | ||||||

ROW: Air Transport | ||||||

ROW: Other Supporting and Auxiliary Transport | ||||||

Activities; Activities of Travel Agencies | ||||||

ROW: Post and Telecommunications | ||||||

ROW: Financial Intermediation | ||||||

ROW: Renting of M&Eq and Other Business Activities | ||||||

ROW: Other Community, Social and Personal Services | ||||||

4 | Rest of the world cluster | ROW: Mining and Quarrying | 4278 | Indian cluster (C4) | IND: Textiles and Textile Products | 1570 |

ROW: Rubber and Plastics | IND: Wood and Products of Wood and Cork | |||||

ROW: Basic Metals and Fabricated Metal | IND: Pulp, Paper, Paper, Printing and Publishing | |||||

ROW: Electricity, Gas and Water Supply | IND: Coke, Refined Petroleum and Nuclear Fuel | |||||

ROW: Inland Transport | IND: Chemicals and Chemical Products | |||||

ROW: Water Transport | IND: Rubber and Plastics | |||||

ROW: Renting of M&Eq and Other Business Activities | IND: Other Non-Metallic Mineral | |||||

IND: Basic Metals and Fabricated Metal | ||||||

IND: Machinery, Nec | ||||||

IND: Electrical and Optical Equipment | ||||||

IND: Transport Equipment | ||||||

IND: Manufacturing, Nec; Recycling | ||||||

IND: Electricity, Gas and Water Supply | ||||||

IND: Construction | ||||||

IND: Inland Transport | ||||||

IND: Post and Telecommunications | ||||||

IND: Financial Intermediation | ||||||

IND: Renting of M&Eq and Other Business Activities | ||||||

ROW: Manufacturing, Nec; Recycling | ||||||

5 | Canadian cluster | CAN: Agriculture, Hunting, Forestry and Fishing | 2110 | Mexican cluster | MEX: Agriculture, Hunting, Forestry and Fishing | 1302 |

CAN: Mining and Quarrying | MEX: Mining and Quarrying | |||||

CAN: Food, Beverages and Tobacco | MEX: Food, Beverages and Tobacco | |||||

CAN: Wood and Products of Wood and Cork | MEX: Textiles and Textile Products | |||||

CAN: Pulp, Paper, Paper, Printing and Publishing | MEX: Wood and Products of Wood and Cork | |||||

CAN: Coke, Refined Petroleum and Nuclear Fuel | MEX: Pulp, Paper, Paper, Printing and Publishing | |||||

CAN: Chemicals and Chemical Products | MEX: Coke, Refined Petroleum and Nuclear Fuel | |||||

CAN: Rubber and Plastics | MEX: Chemicals and Chemical Products | |||||

CAN: Other Non-Metallic Mineral | MEX: Rubber and Plastics | |||||

CAN: Basic Metals and Fabricated Metal | MEX: Other Non-Metallic Mineral | |||||

CAN: Machinery, Nec | MEX: Basic Metals and Fabricated Metal | |||||

CAN: Electrical and Optical Equipment | MEX: Machinery, Nec | |||||

CAN: Transport Equipment | MEX: Electrical and Optical Equipment | |||||

CAN: Electricity, Gas and Water Supply | MEX: Transport Equipment | |||||

CAN: Wholesale Trade and Commission Trade, | MEX: Wholesale Trade and Commission Trade, | |||||

Except of Motor Vehicles and Motorcycles | Except of Motor Vehicles and Motorcycles | |||||

CAN: Retail Trade, Except of Motor Vehicles and | MEX: Retail Trade, Except of Motor Vehicles and | |||||

Motorcycles; Repair of Household Goods | Motorcycles; Repair of Household Goods | |||||

CAN: Inland Transport | MEX: Inland Transport | |||||

CAN: Water Transport | MEX: Manufacturing, Nec; Recycling | |||||

CAN: Air Transport | MEX: Electricity, Gas and Water Supply | |||||

CAN: Other Supporting and Auxiliary Transport | MEX: Hotels and Restaurants | |||||

Activities; Activities of Travel Agencies | MEX: Sale, Maintenance and Repair of Motor | |||||

CAN: Post and Telecommunications | Vehicles and Motorcycles; Retail Sale of Fuel | |||||

CAN: Financial Intermediation | MEX: Financial Intermediation | |||||

CAN: Real Estate Activities | MEX: Renting of M&Eq and Other Business Activities | |||||

CAN: Renting of M&Eq and Other Business Activities | ||||||

CAN: Other Community, Social and Personal Services |

\(\hbox {CO}_2\) clusters with the highest within-cluster emissions (Kt \(\hbox {CO}_2\) eq.) induced by the final demand of the US construction dataset for the following two cases: the Kagawa et al. (2015) clustering and the robustest clustering of the experiment (\(\xi =0.4\))

Rank | Kagawa et al. (2015) | Experiment (\(\xi =0.4\)) | ||||
---|---|---|---|---|---|---|

Cluster name | Industrial sectors | Within-cluster sum | Cluster name | Industrial sectors | Within-cluster sum | |

1 | American cluster (1) | USA: Mining and Quarrying | 120,033 | American cluster (C5) | USA: Mining and Quarrying | 162,275 |

USA: Coke, Refined Petroleum and Nuclear Fuel | USA: Coke, Refined Petroleum and Nuclear Fuel | |||||

USA: Other Non-Metallic Mineral | USA: Other Non-Metallic Mineral | |||||

USA: Basic Metals and Fabricated Metal | USA: Basic Metals and Fabricated Metal | |||||

USA: Electricity, Gas and Water Supply | USA: Electricity, Gas and Water Supply | |||||

USA: Construction | USA: Construction | |||||

USA: Inland Transport | USA: Inland Transport | |||||

USA: Wood and Products of Wood and Cork | ||||||

USA: Pulp, Paper, Paper, Printing and Publishing | ||||||

USA: Chemicals and Chemical Products | ||||||

USA: Rubber and Plastics | ||||||

USA: Wholesale Trade and Commission Trade, | ||||||

Except of Motor Vehicles and Motorcycles | ||||||

USA: Retail Trade, Except of Motor Vehicles and | ||||||

Motorcycles; Repair of Household Goods | ||||||

USA: Air Transport | ||||||

USA: Other Supporting and Auxiliary Transport | ||||||

Activities; Activities of Travel Agencies | ||||||

USA: Financial Intermediation | ||||||

USA: Renting of M&Eq and Other Business Activities | ||||||

USA: Other Community, Social and Personal Services | ||||||

2 | Chinese cluster | CHN: Agriculture, Hunting, Forestry and Fishing | 12,900 | Chinese cluster (C6) | CHN: Agriculture, Hunting, Forestry and Fishing | 13,037 |

CHN: Mining and Quarrying | CHN: Mining and Quarrying | |||||

CHN: Food, Beverages and Tobacco | CHN: Food, Beverages and Tobacco | |||||

CHN: Textiles and Textile Products | CHN: Textiles and Textile Products | |||||

CHN: Wood and Products of Wood and Cork | CHN: Wood and Products of Wood and Cork | |||||

CHN: Pulp, Paper, Paper, Printing and Publishing | CHN: Pulp, Paper, Paper, Printing and Publishing | |||||

CHN: Coke, Refined Petroleum and Nuclear Fuel | CHN: Coke, Refined Petroleum and Nuclear Fuel | |||||

CHN: Chemicals and Chemical Products | CHN: Chemicals and Chemical Products | |||||

CHN: Rubber and Plastics | CHN: Rubber and Plastics | |||||

CHN: Other Non-Metallic Mineral | CHN: Other Non-Metallic Mineral | |||||

CHN: Basic Metals and Fabricated Metal | CHN: Basic Metals and Fabricated Metal | |||||

CHN: Machinery, Nec | CHN: Machinery, Nec | |||||

CHN: Electrical and Optical Equipment | CHN: Electrical and Optical Equipment | |||||

CHN: Electricity, Gas and Water Supply | CHN: Electricity, Gas and Water Supply | |||||

CHN: Inland Transport | CHN: Inland Transport | |||||

CHN: Renting of M&Eq and Other Business Activities | CHN: Renting of M&Eq and Other Business Activities | |||||

CHN: Hotels and Restaurants | ||||||

KOR: Agriculture, Hunting, Forestry and Fishing | ||||||

KOR: Food, Beverages and Tobacco | ||||||

3 | American cluster (2) | USA: Agriculture, Hunting, Forestry and Fishing | 7458 | Big cluster (C8) | (570 elements) | 3816 |

USA: Food, Beverages and Tobacco | ||||||

USA: Textiles and Textile Products | ||||||

USA: Wood and Products of Wood and Cork | ||||||

USA: Pulp, Paper, Paper, Printing and Publishing | ||||||

USA: Chemicals and Chemical Products | ||||||

USA: Rubber and Plastics | ||||||

USA: Wholesale Trade and Commission Trade, | ||||||

Except of Motor Vehicles and Motorcycles | ||||||

USA: Retail Trade, Except of Motor Vehicles and | ||||||

Motorcycles; Repair of Household Goods | ||||||

USA: Hotels and Restaurants | ||||||

USA: Air Transport | ||||||

USA: Other Supporting and Auxiliary Transport | ||||||

Activities; Activities of Travel Agencies | ||||||

USA: Post and Telecommunications | ||||||

USA: Financial Intermediation | ||||||

USA: Renting of M&Eq and Other Business Activities | ||||||

USA: Other Community, Social and Personal Services | ||||||

ROW: Chemicals and Chemical Products | ||||||

4 | Rest of the world cluster | ROW: Mining and Quarrying | 3869 | Rest of the world cluster (C7) | ROW: Mining and Quarrying | 3264 |

ROW: Electricity, Gas and Water Supply | ROW: Electricity, Gas and Water Supply | |||||

ROW: Inland Transport | ROW: Inland Transport | |||||

ROW: Water Transport | ROW: Water Transport | |||||

ROW: Rubber and Plastics | ||||||

ROW: Renting of M&Eq and Other Business Activities | ||||||

5 | Canadian cluster | CAN: Mining and Quarrying | 1533 | Russian cluster | RUS: Mining and Quarrying | 1297 |

CAN: Pulp, Paper, Paper, Printing and Publishing | RUS: Basic Metals and Fabricated Metal | |||||

CAN: Coke, Refined Petroleum and Nuclear Fuel | RUS: Coke, Refined Petroleum and Nuclear Fuel | |||||

CAN: Electricity, Gas and Water Supply | RUS: Electricity, Gas and Water Supply | |||||

CAN: Wholesale Trade and Commission Trade, | RUS: Wholesale Trade and Commission Trade, | |||||

Except of Motor Vehicles and Motorcycles | Except of Motor Vehicles and Motorcycles | |||||

CAN: Chemicals and Chemical Products | RUS: Chemicals and Chemical Products | |||||

CAN: Inland Transport | RUS: Inland Transport | |||||

CAN: Agriculture, Hunting, Forestry and Fishing | ||||||

CAN: Wood and Products of Wood and Cork | ||||||

CAN: Basic Metals and Fabricated Metal | ||||||

CAN: Retail Trade, Except of Motor Vehicles and | ||||||

Motorcycles; Repair of Household Goods | ||||||

CAN: Renting of M&Eq and Other Business Activities | ||||||

CAN: Other Community, Social and Personal Services |

## 5 Conclusion

In this study, we establish a sampling-based procedure in order to examine the robustness of clusterings that could be found using nonnegative matrix factorization or spectral clustering methods. An application of the procedure is provided here by re-examining/comparing the analysis of Kagawa et al. (2015). In their paper, significant clusters in terms of \(\hbox {CO}_2\) emissions that are rapidly growing over time were found. Here, our procedure is applied to the datasets of Kagawa et al. (2015) that have strong environmental implications, namely the two \(\hbox {CO}_2\) emissions networks induced by the US construction and US transport equipment sectors. In our empirical results, we find clusterings that have much better normalized cut performance and robustness assessments than those of Kagawa et al. (2015). Some differences in the components between the compared clusters are observed. However, the main supply-chain paths on which Kagawa et al. (2015) based their recommendations for mitigating global warning still persist. These recommendations concern the significant Chinese clusters linked to our target US demands. In summary, from a robustness perspective, we concur with the Kagawa et al. (2015) environmental conclusions regarding policies.

## Declarations

### Authors' contributions

OR, HO, and SK proposed the methodology and provided discussions. Omar Rifki was in charge of data collection and conducted data analysis. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Funding

This research was supported by a Grant-in-Aid for research [Nos. 26241031 and 16H01797] from the Ministry of Education, Culture, Sports, Science and Technology in Japan.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Avriel M (2003) Nonlinear programming: analysis and methods. Courier CorporationGoogle Scholar
- Bottou L, Bengio Y (1994) Convergence properties of the k-means algorithms. In: Advances in neural information processing systems 7,[NIPS conference, Denver, Colorado, USA, 1994], pp 585–592Google Scholar
- Bradley PS, Fayyad UM (1998) Refining initial points for k-means clustering. In: ICML, vol 98. Citeseer, pp 91–99Google Scholar
- Davis SJ, Peters GP, Caldeira K (2011) The supply chain of CO $$_2$$ 2 emissions. Proceedings of the National Academy of Sciences, 201107409Google Scholar
- Dietzenbacher E (1995) On the bias of multiplier estimates. J Reg Sci 35(3):377–390View ArticleGoogle Scholar
- Dietzenbacher E (2006) Multiplier estimates: to bias or not to bias? J Reg Sci 46(4):773–786View ArticleGoogle Scholar
- Dietzenbacher E, Los B, Stehrer R, Timmer M, De Vries G (2013) The construction of world input–output tables in the wiod project. Econ Syst Res 25(1):71–98View ArticleGoogle Scholar
- Ding C, Li T, Jordan MI (2008) Nonnegative matrix factorization for combinatorial optimization: spectral clustering, graph matching, and clique finding. In: 8th IEEE International conference on data mining, 2008. ICDM’08, pp 183–192. IEEEGoogle Scholar
- Ding CH, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: SDM, vol 5. SIAM, pp 606–610Google Scholar
- Donath WE, Hoffman AJ (1973) Lower bounds for the partitioning of graphs. IBM J Res Dev 17(5):420–425View ArticleGoogle Scholar
- Duda RO, Hart PE, Stork DG (1995) Pattern classification and scene analysis, 2nd edn. Wiley Interscience, New YorkGoogle Scholar
- Epskamp S, Cramer AOJ, Waldorp LJ, Schmittmann VD, Borsboom D (2012) qgraph: network visualizations of relationships in psychometric data. J Stat Softw 48(4):1–18View ArticleGoogle Scholar
- Fiedler M (1973) Algebraic connectivity of graphs. Czechoslov Math J 23(2):298–305Google Scholar
- Gendreau M, Potvin J-Y (2010) Handbook of metaheuristics, vol 2. Springer, New YorkView ArticleGoogle Scholar
- Kagawa S, Okamoto S, Suh S, Kondo Y, Nansai K (2013a) Finding environmentally important industry clusters: multiway cut approach using nonnegative matrix factorization. Soc Netw 35(3):423–438View ArticleGoogle Scholar
- Kagawa S, Suh S, Hubacek K, Wiedmann T, Nansai K, Minx J (2015) $$\text{CO}_2$$ CO 2 emission clusters within global supply chain networks: implications for climate change mitigation. Glob Environ ChangGoogle Scholar
- Kagawa S, Suh S, Kondo Y, Nansai K (2013b) Identifying environmentally important supply chain clusters in the automobile industry. Econ Syst Res 25(3):265–286View ArticleGoogle Scholar
- Kannan R, Vempala S, Vetta A (2004) On clusterings: good, bad and spectral. J ACM 51(3):497–515View ArticleGoogle Scholar
- Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791View ArticleGoogle Scholar
- Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp 556–562Google Scholar
- Lenzen M, Moran D, Kanemoto K, Foran B, Lobefaro L, Geschke A (2012) International trade drives biodiversity threats in developing nations. Nature 486(7401):109–112View ArticleGoogle Scholar
- Liang S, Feng Y, Xu M (2015) Structure of the global virtual carbon network: revealing important sectors and communities for emission reduction. J Ind Ecol 19(2):307–320View ArticleGoogle Scholar
- MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297Google Scholar
- Miller RE, Blair PD (2009) Input–output analysis: foundations and extensions. Cambridge University Press, CambridgeView ArticleGoogle Scholar
- Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113View ArticleGoogle Scholar
- Ng AY, Jordan MI, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856Google Scholar
- Peters GP, Minx JC, Weber CL, Edenhofer O (2011) Growth in emission transfers via international trade from 1990 to 2008. In: Proceedings of the national academy of sciences, 201006388Google Scholar
- Schaeffer SE (2007) Graph clustering. Comput Sci Rev 1(1):27–64View ArticleGoogle Scholar
- Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905View ArticleGoogle Scholar
- Tukker A, Dietzenbacher E (2013) Global multiregional input-output frameworks: an introduction and outlook. Econ Syst Res 25(1):1–19View ArticleGoogle Scholar
- Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416View ArticleGoogle Scholar
- Zhang Z, Jordan MI et al (2008) Multiway spectral clustering: a margin-based perspective. Stat Sci 23(3):383–403View ArticleGoogle Scholar