Structural decomposition analysis with disaggregate factors within the Leontief inverse

A trivial case in input–output structural decomposition analysis is a decomposition of a product of variables, or factors, where one factor is an inverse—typically, Leontief inverse—of a sum of other factors. There may be dozens and hundreds of such factors that describe the changes in subsets of technical coefficients. The existing literature offers ambiguous guidance in this case. The solution that is consistent with the index number theory may be virtually infeasible. The simplified ad hoc solutions require the researcher to make arbitrary choices, lead to biased estimates and do not ensure the consistency-in-aggregation of factors. This paper reviews the ad hoc solutions to the said problem and describes a numerical test to identify the best-performing solution. It is found that calculating the average of the two polar decomposition forms for each factor is superior to other approximations in terms of minimising the errors.

of the two 'polar decompositions' as their default SDA strategy, though de Haan (2001) indicated that these 'polar decompositions' were not unique, and there was no reason why a particular pair of 'polar decompositions' should be preferred. This is even more obvious if an inverse matrix is a sum of factors where their order can be freely altered.
The problem of additive factors nested within the Leontief inverse is the biggest complication in SDA. Because of matrix inversion this problem is unknown in index number analysis, and decompositions that are consistent with the index number theory (surveyed by Wang et al 2017; de Boer and Rodrigues 2020) may be infeasible. A 'textbook' solution, discussed by Rose and Casler (1996), uses a factorisation of the change in Leontief inverse. Whether this solution biases the result and whether there is another ad hoc solution that minimises the bias-has not been systematically explored.
Bridging this gap in the literature, this paper reviews the ad hoc solutions that simplify the SDA calculations and approximate the results of complete decompositions. There are four solutions, called here 'shortcuts' , for the additive SDA and two for the multiplicative SDA. The author runs numerical tests of all 'shortcuts' using the time series of inputoutput data from Statistics Denmark and Statistics Netherlands. The number of factors within the Leontief inverse varies from 5 to 8. The tests reveal that one of the 'shortcuts' performs consistently better than others. This 'shortcut'-the average of the two 'polar decomposition forms'-is not new, but remained largely unnoticed, though reported to be efficient by Dietzenbacher and Los (1998). Indeed, it makes the SDA 'short': instead of computing 2 N −1 decomposition forms and taking their weighted average for each of N factors to obtain the true result, it only requires computing 2 decomposition forms and taking their simple average to obtain an approximate result. Furthermore, a pair of 'polar decomposition forms' is unique to each factor and does not require the researcher to guess which of the forms should be preferred.
The contribution of this paper is the evidence of the best-performing approximate solution in a special case of SDA where the exact and true solution becomes prohibitively complex. The design of the numerical experiment that yields that evidence stands out for a larger data input and number of factors than in many previous surveys of SDA techniques.
The special case addressed here may become a typical case in SDA of inter-country input-output tables where a factor may describe, e.g., a change in trade relationship between two countries, rather than the change in global trade pattern.
In the language of econometric modelling, SDA is designed to construct a 'counterfactual' to answer the question "how the target variable would have changed if only one factor changed and other remained the same?" With country-and industry-specific factors, this 'counterfactual' can be constructed at a very disaggregate level. Then the direct calculation of a disaggregate effect by SDA may be an appealing alternative to the indirect estimation of an aggregate effect by means of statistical inference followed by a possible disaggregation. This is true in cases where both SDA and econometric solutions apply, with due account of their conceptual differences, for example, in cases where the target and explanatory variables are connected via global value chains.
The paper is laid out as follows. Section 2 reminds the reader of the classic solutions to the additive and multiplicative SDA that are consistent with the index number theory, briefly reviews generic approaches to reduce the intensity of SDA calculations and to obtain reasonable approximations. Following is an exposition of a particular case where one of the factors in the decomposition is an inverse of a sum of other factors, or, more precisely, the Leontief inverse. Then the ad hoc solutions, or 'shortcuts' to the complete decomposition, are formulated. Section 3 explains how these 'shortcuts' are tested and evaluated and how the necessary data are selected and prepared. This section concludes by reporting and discussing the results of the numerical experiment. Final remarks are set out in Sect. 4.

The general case of structural decomposition
For the general case of SDA, we will consider a variable Y that is a product of N variables: Each variable Z n is called factor where the subscript n identifies the nth factor in the decomposition. In the general case, Y and Z n are matrices, and in special cases, Z 1 and/ or Z N may be vectors and Y may be a vector or a scalar.
Using superscripts (0) and (1) for the initial period 0 and the terminal period 1, define the change in Y . This can be done in two ways. First, define the absolute change in Y: where is the difference operator. Y may be understood as an increment of Y and has the same units of measurement as Y (usually, monetary units).
Second, define the relative change in Y: where P is the ratio operator and ⊘ signifies the element-by-element division. P Y measures the growth of each element in Y , and the units of measurement are dimensionless. The problem of structural decomposition is to decompose Y and P Y into N terms that would attribute the change in Y to the changes in each nth factor. For example, the absolute change in Y that is induced by a change in Z 1 may be written as: where the superscripts (t 2 ) . . . (t N ) can take values 1 or 0, and factors other than Z 1 can therefore be defined at either initial or terminal periods. For brevity, we will denote �Y(Z 1 , t 2 , . . . , t N ) by �Y(Z 1 ) . One combination of {t 2 , . . . , t N } corresponds to one decomposition form of �Y(Z 1 ) . From the literature on the index number theory (Siegel 1945) and structural decomposition analysis (Seibel 2003), it is known that the number of possible unique decomposition forms of �Y(Z 1 ) is 2 N −1 . (1) It is customary to refer to the two forms of �Y(Z n ) where all factors other than Z n are defined at period 1 or period 0 as polar decomposition forms (Dietzenbacher and Los 1998;de Haan 2001). For example, the polar forms of �Y(Z 1 ) are: All 2 N −1 forms of �Y(Z n ) can be classified according to the distance from the polar form, denoted by k ∈ {0, . . . , N − 1} . One of the polar forms needs to be chosen as the starting point in the decomposition: for convenience, let it be (5a), and the corresponding value of k is 0. This may be understood as follows: none of the remaining factors is defined at period 0, and all of those are at period 1. At k = 1 , one of the remaining factors is now defined at period 0, while the rest are still at period 1. Obviously, there are N − 1 such forms of �Y(Z n ) . At k = 2 , two of the remaining factors are defined at period 0, while the rest are at period 1, which continues until the other polar form is reached at k = N − 1 . The number of unique forms of �Y(Z n ) at each k is equal to the number of k combinations of N − 1 : We will denote each mth unique form of �Y(Z n ) at k steps from the polar form by �Y(Z n ) k,m with the subscript m running from 1 to N − 1 k . Calculating the weighted average of all unique forms at each k with the respective weights c k and across all k yields the aggregate form of �Y(Z n ): Expression (6) may be recognised as the Bennet indicator for the nth factor (de Boer and Rodrigues 2020) that is the additive counterpart to the well-known Fisher index. Finally, collecting the changes induced by all N factors produces a full additive decomposition of Y: In case of multiplicative decomposition, the relative change in Y that is induced by a change in Z 1 should be written as: The relative change P Y(Z n ) aggregated within and between all k is as follows: where denotes the Hadamard product of a sequence, the power •c k applies to the elements of the respective vectors (Hadamard power), and the weights c k are defined as in Eq. (6). Expression (9) is nothing but the Fisher index for the nth factor. And the full multiplicative decomposition of P Y is given by: Decompositions (7) and (10) are exact in the sense that they do not have any residual terms. These two formulae may be derived by computing the simple average (respectively, arithmetic or geometric) of all elementary decompositions of the change in Y . An elementary decomposition is made of a unique sequence of N decomposition forms denoting the change in Y because of the changes in each nth factor, that is �Y(Z n ) or P Y(Z n ) , where one n is consecutively chosen at each k . The total number of sequences is equal to the number of permutations of N factors, or N ! (Dietzenbacher and Los 1998).
Computing the simple average from the elementary decompositions requires N ! forms of �Y(Z n ) or P Y(Z n ) some of which are duplicates, while computing the weighted average only involves 2 N −1 unique forms thereof. The sum of all weights, with due account of the number of times they apply to all m th forms at step k , is unity: Tables 1 and 2 provide an exemplary calculation of the coefficients (weights) and the number of unique decomposition forms for each factor required to implement decompositions (7) and (10) where the number of factors is up to 10. Thanks to the underlying formula, Table 2 contains the Pascal's triangle less the first row. The number of unique decomposition forms for each factor ( 2 N −1 ) and their sum for all factors ( N 2 N −1 ) are a good indication of the complexity of the decomposition exercise: in case of 5 factors, 16 decomposition forms need to be computed for each factor and 80 for all factors, and in case of 10 factors, the respective numbers are 512 and 5120.
Hence, the well-known problem of SDA is an exponential growth of the array of terms to be computed for an exact and full decomposition of Y as the number of factors increases. Several approaches have been proposed in the literature on SDA to address this problem.
It may be reasonable to handle factors in groups. For example, the factors that affect final demand may be delimited from those that affect intermediate demand. This enables hierarchical or nested SDA (see e.g., Koller and Stehrer 2010). In the most typical case of two groups, the array of N factors is divided into two subsets with R and S factors, where R + S = N . The decomposition will involve two aggregate factors at the first tier and R + S factors at the second tier. The total number of decomposition forms for all N factors will now be R2 R + S2 S which is necessarily less than N 2 N −1 if at least R or S is more than 1. 1 Dietzenbacher and Los (1998) and Dietzenbacher et al (2000) test the average of the two polar decompositions against the average of all elementary decompositions. One

Table 1 Denominator of the coefficients (weights) in the decomposition
The numerator in all coefficients is 1. N is the number of factors in the decomposition and k is the distance from one of the polar decomposition forms  1 Reduction in the total number of decomposition forms may be described by the ratio R + S R 2 S−1 + S 2 R−1 . Note that, in the case of additive SDA with two subsets of factors, two decomposition forms for each aggregate factor may be conveniently merged into one expression, e.g., �Y(Z 1 ) = (�Z 1 ) 1 2 (Z 1 + Z 2 ) . Then R2 R + S2 S decomposition forms may be computed as R2 R−1 + S2 S−1 expressions, but these expressions must not be confused with the decomposition forms. polar decomposition is the elementary decomposition where at each k starting from 0, n = k + 1 , and another polar decomposition is its 'mirror image' with k running in reverse order, starting from N − 1 . There are two polar decompositions that contain the polar decomposition forms for the first and the last factors. This only requires computing two decomposition forms for each factor and 2N forms for all factors. They show that the result is a good approximation of the full decomposition. However, de Haan (2001) stresses that the selection of the two polar decompositions is arbitrary. There exist N !/2 such decomposition pairs among the elementary decompositions, and any factor can be first or last. Dietzenbacher and Los (1998) also discuss two ad hoc solutions to simplify the SDA calculations that, however, do not provide exact decompositions. One solution is to take the averages of pairs of the polar decomposition forms for each factor. 2 These averages may be treated as special cases of Eqs. (6) and (9) where the coefficients (weights) c k are set to 1/2 and k are set to 0 and N − 1 ( m is irrelevant because there is only one polar decomposition form at k = 0 and k = N − 1 ). Then the full decompositions may be written as: where e and e p are residual terms, respectively, in the additive and multiplicative cases, the lower index pol denotes a polar decomposition form and • denotes the Hadamard product or Hadamard power in the superscript. This ad hoc solution requires computing 2N decomposition forms for all factors.
Another opportunity is to apply the so-called mid-point weights to each factor Z n . In the additive case, this signifies that, for example, �Y(Z 1 ) is defined as Z n . This solution, also known as the Marshall-Edgeworth decomposition, therefore involves only one decomposition form for each factor and N forms for all factors, but also has a residual term. It is less clear how the mid-term weights apply in the multiplicative case. Dietzenbacher and Los (1998) report that both solutions perform rather well, and their results are very close to those from the full and exact additive decomposition of Y.
In addition to Bennet and Fisher decompositions that are combinatorial in nature and build on, respectively, the arithmetic and geometric averages, another family of decompositions employs the logarithmic mean. These include Montgomery (additive) decomposition, Montgomery-Vartia (multiplicative) decomposition and Sato-Vartia (additive and multiplicative) decomposition that are exhaustively reviewed by de Boer (2008de Boer ( , 2009; de Boer and Rodrigues (2020). These decompositions, particularly favoured for 2 Polar decomposition forms here and elsewhere must not be confused with polar decompositions.
SDA of energy and emissions, are also known as 'Divisia-based' , 'Divisia-linked' decomposition approaches or 'Logarithmic Mean Divisia Index' (LMDI) methods (reviewed by Su and Ang (2012), Wang et al (2017)). Montgomery, Montgomery-Vartia and Sato-Vartia decompositions are exact, and the related indicators and indices for each factor Z n are shown to be ideal-as are the Bennet indicator and the Fisher index-because they satisfy a number of tests from the index number theory (e.g., time reversal test, product test, etc.). Only one decomposition needs to be computed to obtain indicators or indices for all factors, so the methods based on the logarithmic mean are believed to be easier to implement than the combinatorial ones. However, these methods are not reviewed here because they do not apply to the special case of structural decomposition that motivated this paper.

A special case of structural decomposition: factors nested within an inverse matrix
Further complication arises if one of the factors Z n is an inverse of a sum of other factors. This is a typical case in input-output SDA. To elaborate, we now turn to a simple Leontief model 3 where industry output x is a product of the Leontief inverse L and final demand f: A is a matrix of technical coefficients with the j th column describing the expenditures on intermediate inputs per one unit of output of industry j ∈ {1, . . . , J } . For our illustrative example, we will treat columns of A as factors. Each j th factor is then matrix A j where all columns other that the j th column from A are set to zero. Under constant prices, the changes in j th factor may be understood as the changes in the 'production recipe' or technology. The matrix of technical coefficients is now the sum of all J factors, A = A 1 + A 2 + · · · + A J , and the change in output in the additive case is: and in the multiplicative case: Invoking the hierarchical approach, we first decompose the change in output into the changes induced by the change in the Leontief inverse and the change in final demand: For easier exposition, in the multiplicative case (16) the element-by-element division symbol ⊘ is replaced by the fraction sign, and the power applies to each element of the vector in brackets. The first term on the right-hand side of Eqs. (15) and (16) describes the changes in output related to the changes in the Leontief inverse. Note that this term is the average of the two underlying decomposition forms of �x(L) or P x(L).
There are two basic options (Rose and Casler 1996) to further decompose the changes related to the Leontief inverse. First, similar to Eq. (4), the difference of x that is attributed, for example, to the changes in the outlays of the first industry ( j = 1 ) may be written as follows: Then the complete decomposition of �x(A) will be identical to that of Y in Sect. 2.1 involving a total of 2 J decomposition forms 4 of �x(A j ) for all J factors and their weighted averages, leading to J Bennet indicators. Similarly, the decomposition of the ratio P x(A) involves J Fisher indices.
The second option utilises the known property of the Leontief inverse: �L = L (1) (�A)L (0) = L (0) (�A)L (1) . Replacing A with the sum of the changes in J factors yields: For any j, L (1) (�A j )L (0) is no longer equal to L (0) (�A j )L (1) , and an average of Eqs. (18a) and (18b) can be taken to avoid an arbitrary choice. Then there will be 2 decomposition forms for each j th factor and 2J forms for all J factors.
There are two issues with this second option. Although mathematically correct, the terms related to each jth factor in Eqs. (18a) and (18b) or their average do not exactly capture changes attributable to that factor: The correct term for each jth factor would be L (1) (�A j )(I − A (1) + �A j ) −1 and L (0) (�A j )(I − A (0) − �A j ) −1 , but these will not add up to L . Furthermore, this simplifying option is not available in the case of multiplicative decomposition.
Note that the decompositions based on the logarithmic mean require that the dependent variable be expressed as a product of factors as in Eq. (1). As an inverse of a sum of (16) Px = factors cannot be expressed as a product of those factors, the Montgomery, Montgomery-Vartia and Sato-Vartia decompositions are irrelevant in this special case. The problem is now to find a reasonable 'shortcut' to the decomposition of �x(A) and P x(A) and to avoid computing all 2 J decomposition forms where each form includes an inverse of a unique combination of factors therein.

'Shortcuts' to the complete decomposition with factors nested in the Leontief inverse
There is no specific order of factors within the Leontief inverse: any factor can appear first or last in the sum without affecting the result. It is even more obvious in this case that the choice of two polar decompositions is arbitrary. 5 Therefore, we will not consider the average of two polar decompositions as a viable shortcut.
The shortcuts to be considered are as follows: Shortcut 1 The averages of pairs of the polar decomposition forms for each factor This shortcut applies Eqs. (11) and (12) to the factors within the Leontief inverse. For each j , compute two polar decomposition forms of �x(A j ) according to Eq. (17) where all time periods t other than t j are set to 1 and 0 (in other words, two decomposition forms at distance k ∈ {0, J − 1} from the polar form), then take the average of the two forms. The change of x because of change in the j th factor then is: The above requires computing only 2 forms for each factor, 2J forms for all factors and does not provide an exact decomposition.
Shortcut 2 The normalised averages of pairs of the polar decomposition forms for each factor The information from Eqs. (19) and (20) can be utilised to modify the averaged polar forms �x(A j ) pol and Px(A j ) pol , so that the decompositions of �x(A) and P x(A) are exact. Therefore, the residual term is distributed across the indicators or indices for all J factors. In the additive case, this can be achieved by multiplying each element in the averaged j th polar form by a respective coefficient: In the multiplicative case, the coefficients need to be defined as powers that apply to the base on an element-by-element basis: and i denotes the i th element in the respective J × 1 vector. The above modification of shortcut 1 requires the computation of 2J polar decomposition forms for each factor and for all factors and provides an exact decomposition.
Shortcut 3 Decomposition with mid-point weights (Marshall-Edgeworth decomposition)-only for additive decomposition For the calculation of the change attributed to the j th factor, other factors are defined as the arithmetic mean of their values at period 0 and period 1. We modify Eq. (17) to formally describe shortcut 3: There are two decomposition forms for each factor and 2J forms for all factors. An aggregation across all J factors does not provide an exact decomposition.
Shortcut 4 Factorisation of the change in the Leontief inverse-only for additive decomposition We will finally compute the difference of x attributed to the change in each factor that builds on the factorisation of L as it appears in Eqs. (18a)-(18b): The above computes two decomposition forms for each factor, merged into one expression, and 2J forms for all factors. The decomposition is exact. Although shortcut 4 cannot be interpreted as a correct measure of the effect of the change in A j , it can still provide an approximation of the true result, and we will see whether that approximation is good.
respectively, Eqs. (15) and (16). This is the consistency-in-aggregation property that shortcuts 1 and 3 do not have. 2. Compute the estimates of the vectors of changes using shortcuts 1-4 for additive decomposition and shortcuts 1-2 for multiplicative decomposition for each j th factor and aggregates for all J factors. 3. Use standard matrix comparison methods to evaluate the performance of the tested shortcuts. In related literature, it is customary to use an array of measures of distance that meet researcher's criteria. A measure of distance is a scalar that weighs, scales and aggregates the differences between each element of an estimated matrix (vector) and the respective element of the true matrix (vector). The best estimate should be the closest, i.e., should have the smallest distance, to the true values. A survey of literature led to a selection of 12 measures of distance listed in the Additional file 1.
To avoid misleading conclusions, measures of similarity rather than distance were disregarded. 6 Calculate all selected measures of distance for each shortcut and each j th factor. Identify the best-performing shortcut by each measure of distance and for each j th factor. Then aggregate the information across all 12 measures and select the best-performing shortcut for each j th factor. Finally, summarise the results for all J factors, selecting one best-performing shortcut (or, if not possible, two or more shortcuts). In the case of additive decomposition, the measures of distance will also be calculated for the aggregate change representing all J factors to find out whether shortcut 1 or 3 provides estimates that are closer to the exact decomposition. 4 Calculate an additional summary statistic for each shortcut: the share of entries in the estimated vectors of changes for each j th factor where this shortcut provides the best estimate, i.e., the closest to the true value. This statistic ignores the average distance to the true results and, rather, indicates a success rate in producing the best estimates. Use this percentage as the score to identify the best-performing shortcut.

Data management strategy
Input-output data for the numerical experiment can be generated from random numbers or can be drawn from recognised sources of statistical information. As the structure of real input-output tables is not entirely random, the latter option is preferred here, which is also in line with previous studies of SDA (e.g., de Boer and Rodrigues, 2020). Data selection and processing steps are as follows: • Look for time series of input-output tables at current prices and previous year's prices, preferably, released by national statistical offices. A survey of official releases of national input-output data available online revealed two sources: Statistics Denmark and Statistics Netherlands. Statistics Denmark maintains perhaps the longest annual series of input-output tables starting from 1966 in 117-industry and 69-industry classifications of which the latter is sufficient for the purpose of this paper. Statistics Netherlands offers annual input-output tables starting from 1995 in 76-industry classification. Both statistical offices provide the tables at current prices and previous year's prices which ensures the consistency of structural decomposition. For the proposed numerical test, the time coverage of tables was limited to 2006-2015. 7 • Adapt and aggregate input-output tables. Input-output tables for both countries were aggregated to four alternate classifications with 5 to 8 industries to ensure that the structural decomposition is computationally manageable. 8 The aggregation schemes (available with the supporting dataset, see the data availability statement) are not random and are designed to delineate larger classes of economic activities. • Run the SDA on the aggregated input-output tables for Denmark and the Netherlands according to the testing strategy with 5 to 8 factors that correspond to the changes in the 'production recipe' or technology of the aggregate industries. Ignore other factors that are not related to the changes in technical coefficients.
In terms of the number of factors and time coverage, the data input in this numerical experiment is larger than in many previous studies of SDA techniques. For interested readers, executable codes that can be used to replicate the computations are in Additional file 2.

Results of the numerical experiment
The results of the numerical experiment, including the true and estimated vectors of changes by factor and indicators of their performance, are deposited at the figshare repository and are available at the link that can be found in the data availability statement. Tables 3, 4, 5, 6 and 7 summarise the results, indicating which shortcut provides the best estimates for each set of factors and each of the 9 yearly intervals. The last row identifies the single best-performing shortcut for all intervals. Two or three shortcuts appear in one cell in the cases where these shortcuts perform equally well. Tables 3 and 4 reveal that, by and large, shortcut 1 outperforms other shortcuts in additive SDA. The results are not uniform and vary across time and the number of factors. For example, in the additive SDA of industry output of the Netherlands in 2010-2011, shortcut 4 provides the best estimates with five factors corresponding to the five aggregate industries, shortcut 1 ranks the best with six factors, shortcut 3 with seven factors, and both shortcuts 1 and 3 provide the best estimates with the number of factors increasing to eight, as measured by the distance to the true results. For any set of factors, except five-factor SDA for the Netherlands, shortcut 1 provides the estimates at minimal overall distance from the true values in most yearly intervals (see Table 3).
The prevalence of shortcut 1 is somewhat weaker if judged by the share of the best estimates that this shortcut yields. Still shortcut 1 ensures the largest share of best estimates in most yearly intervals for all set of factors except five-factor SDA for the Netherlands, six-factor and eight-factor SDA for Denmark (see Table 4). Table 5 shows that shortcut 1 outperforms shortcut 3 in terms of the distance of the aggregate estimates to the aggregate true results.
A reasonable indicator of the accuracy of estimates by shortcut 1 in additive SDA is the mean absolute percentage error (MAPE, see Additional file 1) that is calculated for each j th factor and each yearly interval. MAPE is rather small: maximum 0.12% for Denmark and 0.21% for the Netherlands.
In multiplicative SDA, shortcut 1 is the best-performing approach in the vast majority of cases, as Tables 6 and 7 show. A possible reason is that shortcuts 3 and  4 do not apply to multiplicative SDA. As the results of multiplicative decomposition are ratios, we can judge the accuracy of estimates by considering the mean absolute difference (MAD, see Additional file 1) transformed into percentage form, i.e., multiplied by 100. The maximum MAD for Denmark is 0.000066 percentage points and for the Netherlands 0.000021 percentage points. Another important observation from Tables 3 and 4 is that shortcut 4-that builds on the factorisation of the change in the Leontief inverse and is largely accepted in the literature (e.g., Oosterhaven and Hoen 1998;Miller and Blair 2009, chap.13)-produces a bias in the results of additive SDA, and this bias tends to be larger than that of other shortcuts.