Skip to main content

The Official Journal of the Pan-Pacific Association of Input-Output Studies (PAPAIOS)

A new approach to modelling the input–output structure of regional economies using non-survey methods

Abstract

This paper proposes a new approach to the regionalization of national input–output tables where suitable regional data are scarce and analysts are considering using location quotients (LQs). We focus on the FLQ formula, which frequently yields the best results of the pure LQ-based methods, and develop an enhanced way of implementing this approach. We use a modified cross-entropy (MCE) method, along with a regression model, to estimate values of the unknown parameter δ in the FLQ formula, specific to both region and country. An analysis of survey-based data for 16 South Korean regions reveals that the proposed FLQ+ approach yields more accurate estimates of both input coefficients and sectoral output multipliers than those from simpler LQ-based methods or the MCE approach alone. Sectoral outputs (or employment) are the only regional data required. The MCE method also clearly outperforms GRAS.

Introduction

Many non-survey methods of regionalizing a national input–output table (NIOT) have been developed, with the aim of avoiding the high costs and lengthy delays associated with constructing regional tables via survey-based methods. Non-survey methods include those based on location quotients (LQs) and constrained matrix-balancing methods.

The LQ-based methods include the classical simple and cross-industry LQs, along with several refinements examined in the next section. These methods hinge on the assumption that regions and nations employ the same technology of production, with the implication that regional input coefficients differ from their national counterparts only so far as each region imports goods and services from other regions. The performance of LQ-based methods deteriorates significantly when this assumption of a common technology is violated.

By contrast, constrained matrix-balancing methods are less sensitive to violations of the assumption of identical technology. These procedures estimate unknown data from limited initial information, subject to a set of linear constraints. The most popular of such techniques are RAS (Bacharach 1970; Stone 1961) and the cross-entropy (CE) method (Golan et al. 1994), both of which are known to work well (Davis et al. 1977; Golan and Vogel 2000; Hosoe 2014; Kapur and Kesavan 1992; Lamonica et al. 2020; Léony et al. 1999; Robinson et al. 2001; Vazquez et al. 2015). The constrained matrix-balancing procedures include methods based on minimizing squared or absolute differences (Pavia et al. 2009). However, such techniques are more time-consuming to implement than the pure LQ-based methods and normally require the solution of a constrained nonlinear optimization problem, whereas the LQ-based methods are very quick and simple to apply.

The present study focuses on the FLQ (Flegg’s LQ), which is often found to be one of the most accurate pure LQ-based methods (Bonfiglio and Chelli 2008; Flegg and Tohmo 2016). Applications of the FLQ include Dávila-Flores (2015), Hermannsson (2016), Jahn (2017), Kronenberg and Fuchs (2021), Morrissey (2016), and Singh and Singh (2011). To apply the FLQ, a value for a crucial unknown parameter, δ, must be chosen. Although several empirical studies have sought to find appropriate values of δ, these studies have not been conclusive. Examples include analyses of data for Scotland (Flegg and Webber 2000), Germany (Kowalewksi 2015), Argentina (Flegg et al. 2016) and South Korea (Flegg and Tohmo 2019). Given these diverse outcomes, Jahn et al. (2020) recommend that analysts should consider regional characteristics when selecting a value of δ in the interval 0.3 ± 0.1. This range, which is suggested by earlier empirical work, can then be refined in the light of the results. Jahn et al. also formulate an econometric model that should assist in this process.

Nevertheless, it is evident that the choice of a suitable value of δ is still an open question, which has limited the practical use of the FLQ. Our aim is, therefore, to propose a strategy to estimate the value of δ in such a way as to optimize the regionalization of the NIOT. This strategy combines constrained matrix-balancing procedures with the FLQ. The expectation is that this hybrid approach should yield better results than would be attainable by applying each approach alone.

Of the possible matrix-balancing procedures, the CE method was chosen rather than RAS for several reasons. In particular, the standard RAS method can only handle non-negative matrices, which would limit the application proposed in this paper. Although its generalization GRAS (Junius and Oosterhaven 2003) enables matrices with negative elements to be updated, its objective function has been questioned by Lemelin (2009), Huang et al. (2008) and Temurshoev et al. (2013). Moreover, the results obtained by Golan et al. (1994) and Robinson et al. (2001) appear to favour the CE method, while Lamonica et al. (2020) demonstrate that this method performs well when it is applied to real data, especially for small economies. Indeed, we find that the modified CE (MCE) method, as implemented in this study, outperforms GRAS.

The proposed procedure involves three steps. The first applies the MCE method to regionalize the NIOT. This is designed to account for negative or zero input coefficients. The second step uses the derived regional matrix, along with the national table, to estimate the optimal δ for each region via a simple regression model. In the third step, this estimated δ is used to apply the FLQ formula, thereby computing the final estimates of the regional input coefficients. It is worth noting that this hybrid approach can easily be adapted to enhance the performance of other pure LQ-based methods that depend on one or more unknown parameters.

Our approach is tested on the South Korean interregional input–output table (KIRIOT) for the year 2005. It was built by the Bank of Korea for all 16 South Korean regions, with a classification of 78 economic sectors. A smaller version with 28 economic sectors is also available, but we opted to use the larger version, so as to minimize aggregation bias. The KRIOT is one of the very few survey-based full interregional I–O tables. It has data on the volume of all intersectoral transactions, both within and across regions, so it is ideal for our purposes. Indeed, Zhao and Choi (2015, p. 909) remark:

[In 2009, the Bank of Korea] divided the entire country into 16 regions and issued regional Leontief inverse tables and type I output multipliers for each of these regions. These data can be considered as benchmark tables since they are entirely based on surveys.

Since the KIRIOT contains negative and null flows, the MCE method is employed. The empirical analysis demonstrates that the enhanced version of the FLQ approach developed here outperforms the MCE method, when it is applied separately, and has a similar performance to the ‘optimal’ FLQ approach, where δ is selected by using the observed regional coefficient matrix. Of course, in reality, such a matrix would seldom be available to regional analysts; its use here is merely to generate optimal values of δ that can be used as a benchmark in our analysis. Our aim is to offer a new way of regionalizing a NIOT in situations where an analyst has access only to the total output of each regional sector. Where such data are unavailable, sectoral employment could be used as a proxy. This proposed hybrid method, hereafter referred to as the FLQ+ method, is an improvement on the present state of the art, in which analysts need to select values of δ on the basis of a priori considerations, e.g. values found in earlier studies, or by taking regional characteristics into account, as is suggested by Jahn et al. (2020). In order to demonstrate the practical advantages of using the FLQ+ method, we compare its performance with the results from the MCE method, GRAS and the FLQ with a single assumed value of δ for every region.

Review of pure LQ-based methods

Here, we review the pure LQ methods most often employed to construct regional input–output tables (RIOTs). Some alternative approaches are examined thereafter.

Consider a national economic system consisting of k sectors. Let \({\mathbf{X}}^{n} = \left[ {x_{ij}^{n} } \right]\) and \({\mathbf{X}}^{r} = \left[ {x_{ij}^{r} } \right]\) be matrices whose elements are the flows for intermediate use from sector i to sector j at the national and regional levels, respectively, while xn and xr are vectors of national and regional total sectoral output. Also, let \({\mathbf{A}}^{n} = \left[ {a_{ij}^{n} = \frac{{x_{ij}^{n} }}{{x_{j}^{n} }}} \right]\) and \({\mathbf{A}}^{r} = \left[ {a_{ij}^{r} = \frac{{x_{ij}^{r} }}{{x_{j}^{r} }}} \right]\) be the matrices whose elements are the respective national and regional input coefficients.

Now suppose that only An and the vector of regional total sectoral output, xr, are known. The pure LQ methods are used to estimate the matrix of regional input coefficients, \({\mathbf{A}}^{r}\), by adjusting the national input coefficient as follows:

$$ \hat{a}_{ij}^{r} = a_{ij}^{n} q_{ij} , $$
(1)

where qij represents a scalar applied to the national coefficient.

With the simple LQ, \(a_{ij}^{r}\) is estimated via the following formula:

$$ \hat{a}_{ij}^{r} = {\text{SLQ}}_{i} a_{ij}^{n} , $$
(2)

where

$$ {\text{SLQ}}_{i} = \frac{{x_{i}^{r} /x^{r} }}{{x_{i}^{n} /x^{n} }} = \frac{{x_{i}^{r} }}{{x_{i}^{n} }} \times \frac{{x^{n} }}{{x^{r} }}. $$
(3)

Here, \(x_{i}^{r}\) and \(x_{i}^{n}\) are the total output (production) of the ith regional and national sector, respectively, while \(x^{r}\) and \(x^{n}\) are the corresponding regional and national aggregates. \({\text{SLQ}}_{i}\) measures the degree of specialization of region r in sector i relative to the nation. The regional input coefficients are derived according the following rule:

$$ \hat{a}_{ij}^{r} = \left\{ {\begin{array}{*{20}ll} {a_{ij}^{n} {\text{SLQ}}_{i} } & {{\text{if}}\;{\text{SLQ}}_{i} < 1} \\ {a_{ij}^{n} } & {{\text{if}}\;{\text{SLQ}}_{i} \ge 1} \\ \end{array} } \right.. $$
(4)

However, it has long been known that the SLQ tends to underestimate a region’s imports from other regions; this understatement occurs as the SLQ rules out any ‘cross-hauling’ (Stevens et al. 1989). Cross-hauling takes place when a region simultaneously imports and exports a given commodity.

The cross-industry LQ was one of the first refinements of the SLQ, as it considers the relative size of both supplying sector i and purchasing sector j. The formula is as follows:

$$ {\text{CILQ}}_{ij} = \frac{{{\text{SLQ}}_{i} }}{{{\text{SLQ}}_{j} }} = \frac{{x_{i}^{r} /x_{i}^{n} }}{{x_{j}^{r} /x_{j}^{n} }}, $$
(5)

where the constraints are applied as in (4). Unlike the SLQ, however, the CILQ applies a cell-by-cell adjustment. This means that it does, in principle at least, deal with the problem of cross-hauling.Footnote 1 What it does not do is to consider the relative size of a region, xr/xn, which cancels out in formula (5). By contrast, this ratio remains a component of the SLQ formula (3).

Round (1978) argues that any adjustment formula should incorporate three elements: (i) the relative size of the supplying sector i; (ii) the relative size of the purchasing sector j and (iii) the relative size of the region. The CILQ satisfies (i) and (ii) but not (iii), whereas the SLQ satisfies (i) and (iii) but not (ii). Round therefore suggests the following formula, which simultaneously satisfies all three requirements:

$$ {\text{RLQ}}_{ij} = \frac{{{\text{SLQ}}_{i} }}{{\log_{2} \left( {1 + {\text{SLQ}}_{j} } \right)}} . $$
(6)

Nonetheless, Flegg et al. (1995) criticize the SLQ and RLQ on the grounds that they would both tend to underestimate the imports of relatively small regions owing to the way in which the ratio xr/xn is implicitly incorporated in each formula. To overcome this drawback, the FLQ was introduced.

The crucial hypothesis underpinning the FLQ is that a region’s propensity to import from other domestic regions is inversely and nonlinearly related to its relative size. By incorporating explicit adjustments for regional size, the FLQ should yield more precise estimates of regional input coefficients and hence multipliers.Footnote 2 Along with other non-survey methods, the FLQ aims to offer regional analysts a means by which they can build regional tables that reflect, as closely as possible, each region’s economic structure. It is defined as follows (cf. Flegg and Webber 1997):

$$ {\text{FLQ}}_{ij} = \left\{ {\begin{array}{*{20}ll} {{\text{CILQ}}_{ij} \lambda } & {{\text{for}}\;i \ne j} \\ {{\text{SLQ}}_{ij} \lambda } & {{\text{for}}\; i = j} \\ \end{array} } \right., $$
(7)

where λ captures a region’s relative size. This scalar is defined as follows:

$$ \lambda = \left[ {\log_{2} \left( {1 + \frac{{x^{r} }}{{ x^{n} }}} \right)} \right]^{\delta } . $$
(8)

Here 0 ≤ δ < 1 is a parameter that controls the degree of convexity in Eq. (8). The larger the value of δ, the lower the value of λ, and the greater the allowance for extra regional imports. The FLQ formula is implemented just like other LQ methods, so:

$$ \hat{a}_{ij}^{r} = \left\{ {\begin{array}{*{20}ll} {a_{ij}^{n} {\text{FLQ}}_{ij} } & {{\text{if}}\;{\text{FLQ}}_{ij} < 1} \\ {a_{ij}^{n} } & {{\text{if}}\;{\text{FLQ}}_{ij} \ge 1} \\ \end{array} } \right.. $$
(9)

Many case studies, including those mentioned earlier, have demonstrated that the FLQ can yield more accurate results than the SLQ and CILQ. This evidence is corroborated by the Monte Carlo study of Bonfiglio and Chelli (2008). Nonetheless, some conflicting evidence is presented by Lamonica and Chelli (2017), who find initially that the SLQ gives slightly better results than the FLQ.

Lamonica and Chelli’s unusual study employs data from the World Input–Output Database, whereas other studies have analysed data for individual countries or used Monte Carlo methods (Bonfiglio and Chelli 2008). Lamonica and Chelli examined data for the period 1995–2011, classified into 35 economic sectors. Their sample comprised 27 European countries and 13 other major countries, with the rest of the world as a composite ‘country’. However, when this sample was disaggregated by size of economy, an interesting divergence appeared. For the smaller economies, characterized by a high percentage of near-zero input coefficients, the FLQ (with δ = 0.2) was found to be the best method, whereas the SLQ performed the best in the larger economies.

McCann and Dewhurst (1998) criticize the FLQ on the grounds that regional coefficients may surpass national coefficients where there is regional specialization, a possibility that is precluded by the FLQ formula. Flegg and Webber (2000) therefore proposed the following augmented FLQ:

$$ {\text{AFLQ}}_{ij} = \left\{ {\begin{array}{*{20}ll} {{\text{FLQ}}_{ij} \left[ {\log_{2} \left( {1 + {\text{SLQ}}_{j} } \right)} \right] } & {{\text{for}}\;{\text{SLQ}}_{j} > 1} \\ {{\text{FLQ}}_{ij} } & {{\text{for}}\;{\text{SLQ}}_{j} \le 1} \\ \end{array} } \right., $$
(10)

where \(\log_{2} \left( {1 + {\text{SLQ}}_{j} } \right)\) captures the regional specialization of sector j. If SLQj > 1 and FLQij ≥ 1, the national coefficients are scaled upwards. However, to avoid excessive upward adjustments, the restriction FLQij ≤ 1 is retained. Hence the regionalization proceeds as follows:

$$ \hat{a}_{ij}^{r} = \left\{ {\begin{array}{*{20}ll} {a_{ij}^{n} {\text{AFLQ}}_{ij} } & {{\text{if}}\;{\text{SLQ}}_{j} > 1} \\ {a_{ij}^{n} {\text{FLQ}}_{ij} } & {{\text{if}}\;{\text{SLQ}}_{j} \le 1} \\ \end{array} } \right., $$
(11)

subject to FLQij ≤ 1.

While the AFLQ has the theoretical merit of incorporating a measure of regional specialization, it tends to produce similar outcomes to the FLQ (Bonfiglio and Chelli 2008; Flegg and Tohmo 2013; Flegg and Webber 2000; Flegg et al. 2016; Kowalewksi 2015). Indeed, the analysis in Appendix demonstrates that the AFLQ does not yield more accurate estimates of input coefficients than the FLQ, although it does perform better in terms of multipliers.Footnote 3

Another variant of the FLQ is put forward by Kowalewksi (2015). Using output rather than employment to measure regional size, her industry-specific FLQ can be defined as:

$$ {\text{SFLQ}}_{ij} \equiv {\text{CILQ}}_{ij} \times \left[ {\log_{2} (1 \, + x^{r} /x^{n} )} \right]^{{^{\delta j} }} . $$
(12)

The novel feature of this formula is that δ is allowed to vary across industries. This greater realism is undoubtedly an attractive feature, yet it does introduce much greater complexity into the modelling process (Flegg and Tohmo 2019). For that reason, we do not consider the SFLQ further.

To complete this review of pure LQ-based methods, we should note that the FLQ’s focus is on the output and employment generated within a given region. Consequently, as Flegg and Tohmo (2019) emphasize, it should only be applied to NIOTs where imports are excluded from the inter-industry transactions (type B tables). By contrast, where the focus is on the total supply of commodities, Kronenberg’s Cross-Hauling Adjusted Regionalization Method (CHARM) is an appropriate technique (Többen and Kronenberg 2015; Flegg and Tohmo 2018). Unlike the FLQ, however, CHARM requires type A tables, where the national transactions include imports.Footnote 4

Review of some alternative approaches

Here, we review three recent studies that offer alternatives to the pure LQ-based methods discussed previously.Footnote 5 The first is an innovative study by Fujimoto (2019), who examines official survey-based data for nine Japanese regions in 2005, the most recent year available in a series of official tables published quinquennially since 1960. This study’s focus is on cross-hauling and its primary aim is to determine which of four alternative assumptions is most appropriate. Each assumption is associated with a particular modelling approach, as follows:

  1. 1.

    There is no cross-hauling (LQ approach);

  2. 2.

    Cross-hauling depends on regional size (FLQ approach);

  3. 3.

    Cross-hauling is proportional to its potential, as measured by output or demand (RCHARM);

  4. 4.

    Cross-hauling is proportional to its potential, as measured by the volume of trade (MCHARM).

To represent the ‘LQ approach’, Fujimoto rejects the SLQ in favour of the scaling formula

$$ {\text{DSLQ}}_{i}^{r} = {{\frac{{X_{i}^{r} }}{{D_{i}^{r} }}} \mathord{\left/ {\vphantom {{\frac{{X_{i}^{r} }}{{D_{i}^{r} }}} { \frac{{X_{i}^{n} }}{{D_{i}^{n} }}}}} \right. \kern-\nulldelimiterspace} { \frac{{X_{i}^{n} }}{{D_{i}^{n} }}}}, $$
(13)

where X denotes output and D denotes demand. The rationale for using this alternative formula is to overcome aggregation bias (Fujimoto 2019, p. 113).

For the second approach, Fujimoto employs the formula:

$$ {\text{FLQ}}_{i}^{r} \equiv {\text{DSLQ}}_{i}^{r} \times \left[ {\log_{2} (1 + v^{r} /v^{n} )} \right]^{\delta } , $$
(14)

where vr/vn is the ratio of total regional to total national value-added payments. However, the author does not explain why \({\text{DSLQ}}_{i}^{r}\) is used instead of \({\text{CILQ}}_{ij}^{r}\) nor why value added is used as a proxy for regional size rather than superior measures such as output or employment. It is, therefore, misleading to refer to this approach as the ‘FLQ approach’. The third approach is the refined version of CHARM developed by Többen and Kronenberg (2015), while the fourth is the modified version of CHARM proposed by Fujimoto.

Fujimoto (2019, p. 115) remarks that ‘[t]he FLQ approach has a problem in addition to the difficulty of [specifying] a value for δ: the cross-hauling caused in interregional trade depends not only on regional size.’ To demonstrate this, he derives the following equation for \({\text{DSLQ}}_{i}^{r}\) < 1:

$$ \Delta E_{i}^{r} = \Delta M_{i}^{r} = {\text{DSLQ}}_{i}^{r} (1 - \lambda )\left( {1 - m_{i}^{n} } \right) D_{i}^{r} , $$
(15)

where E and M denote exports and imports, respectively, \(m_{i}^{n}\) is the national propensity to import from abroad and D is demand. Fujimoto adds that there is ‘no reason why cross-hauling [should] depend on \({\text{DSLQ}}_{i}^{r}\) and \(m_{i}^{n}\) [and] this dependence causes serious bias …’ (p. 116).

However, a straightforward interpretation can be given to the role of both \({\text{DSLQ}}_{i}^{r}\) and \(m_{i}^{n}\) in Eq. (15). Since it is assumed that \(m_{i}^{r}\) = \(m_{i}^{n}\), the term (1 − \(m_{i}^{n}\)) captures the proportion of regional demand \( D_{i}^{r}\) that is met by domestic suppliers, some of which are located in region r and the rest in other regions. As expected, there is a negative relationship between \(m_{i}^{n}\) and Δ\(M_{i}^{r}\), ceteris paribus.

To explain the positive relationship between \({\text{DSLQ}}_{i}^{r}\) and Δ\(M_{i}^{r}\), we note that Δ\(M_{i}^{r}\) represents the difference between the extra imports generated by using \({\text{FLQ}}_{i}^{r}\) and those generated by applying \({\text{DSLQ}}_{i}^{r}\). Although this difference is invariably positive, its magnitude increases as \({\text{DSLQ}}_{i}^{r}\) rises. This is due to the inclusion of λ in \({\text{FLQ}}_{i}^{r}\). The upshot is that there is a positive relationship between \({\text{DSLQ}}_{i}^{r}\) and Δ\(M_{i}^{r}\).Footnote 6

The above discussion suggests that Fujimoto has failed to identify a genuine problem with the FLQ approach. Nonetheless, after performing various statistical tests, using data for 106 sectors and nine regions, he rejects this approach on the grounds (i) that it yields biased estimates of regional imports and (ii) that the appropriate value of δ is unknown and depends on each case.

However, since Fujimoto does not use the correct FLQ formula, his findings do not constitute a valid test of the FLQ approach. It is also likely that superior estimates could have been obtained by using a different δ for each Japanese region. In particular, the islands of Hokkaido and Okinawa may well require different values of δ from the mainland regions. The approach discussed later in this paper affords a way of generating such region-specific values.

As regards the other approaches, the scatter diagrams of estimated and survey-based import propensities (Fujimoto 2019, figure 2) reveal an almost identical pattern for the DSLQ and RCHARM methods, with evidence of substantial and widespread underestimation. By contrast, the diagram for MCHARM suggests a more random distribution, albeit with a greater variance and some heteroscedasticity. Even so, using the mean absolute error as the criterion, RCHARM invariably outperforms MCHARM (Fujimoto 2019, table 2). The DSLQ is clearly in third place.Footnote 7

In another recent study, Pereira-López et al. (2020) focus on the AFLQ rather than the FLQ. They argue persuasively that regional specialization can have different effects on the columns (cost structure) and rows (selling structure) of a regional coefficient matrix. For instance, a region that is specialized in the extraction of mining products may sell most of its output to processing sectors such as metal industries located in other regions. The AFLQ has the limitation that it presumes that specialization only affects purchasing sectors (columns). The authors thus propose a bidimensional procedure, the 2DLQ, whereby a parameter α is used to adjust the rows, while another parameter β is applied to the columns. The regionalization employs the following relationships:

$$ \hat{a}_{ij}^{r} = \left\{ {\begin{array}{*{20}ll} {({\text{SLQ}}_{i} )^\alpha a_{ij}^{n} (x_{j}^{r} /x_{j}^{n} )\beta } & {{\text{if}}\;{\text{SLQ}}_{i} \le 1} \\ {[0.5 \tan h ({\text{SLQ}}_{i} - 1) + 1]^\alpha a_{ij}^{n} (x_{j}^{r} /x_{j}^{n} )\beta } & {{\text{if}}\;{\text{SLQ}}_{i} > 1} \\ \end{array} } \right., $$
(16)

where \((x_{j}^{r} /x_{j}^{n} )\) measures the relative size of purchasing sector j. The hyperbolic tangent function (tanh) allows the estimated regional coefficients to be ‘slightly higher’ than the corresponding national coefficients if \({\text{SLQ}}_{i} > 1\) (Pereira-López et al. 2020, p. 480). They make the interesting observation that the CILQ, FLQ and AFLQ are all nested within the 2DLQ: \({\text{FLQ}}_{ij} = 2{\text{DLQ}}_{ij}\) if α = (ln λ \({\text{SLQ}}_{i}\)/ln \({\text{SLQ}}_{i}\)) and β = 1, \({\text{FLQ}}_{ij} = {\text{AFLQ}}_{ij}\) if \({\text{SLQ}}_{j} \le 1\), and \({\text{FLQ}}_{ij} = {\text{CILQ}}_{ij}\) if δ = 0.

Equation (16) has the same data requirements as the AFLQ but the authors claim that it makes more efficient use of such data and consequently yields more accurate results. To test this claim, they use the Eurostat IO database for 2005 to extract symmetric 59 × 59 domestic coefficient matrices for:

  1. 1.

    Austria, Belgium, France, Germany, Italy and Spain (the ‘observed’ matrices);

  2. 2.

    the European Area 17 (EA17) (the parent table).

Two estimated coefficient matrices are derived for each country by applying the 2DLQ and AFLQ formulae to the parent table. These estimates are then compared by using the weighted absolute percentage error (WAPE) and two other statistics, U and U*, which consider the number of cells (n2) and the number of non-empty cells (n2 − \(z\)), respectively. WAPE is defined as follows:Footnote 8

$$ {\text{WAPE}} = {{100 \sum \limits_{i = 1}^{n} \sum \limits_{j = 1}^{n} \left| {\hat{a}_{ij}^{r} - a_{ij}^{r} } \right|} \mathord{\left/ {\vphantom {{100 \sum \limits_{i = 1}^{n} \sum \limits_{j = 1}^{n} \left| {\hat{a}_{ij}^{r} - a_{ij}^{r} } \right|} { \sum \limits_{i = 1}^{n} \sum \limits_{j = 1}^{n} a_{ij}^{r} }}} \right. \kern-\nulldelimiterspace} { \sum \limits_{i = 1}^{n} \sum \limits_{j = 1}^{n} a_{ij}^{r} }}. $$
(17)

The results for coefficients show that using the 2DLQ method reduces the WAPE for all countries, most noticeably for Austria and Italy (Pereira-López et al. 2020, table 2). However, there is little change in the outcomes for Belgium, France and Germany. Spain is an intermediate case. On average, the WAPE is lowered by 4.5%. The U and U* measures yield similar results. The authors also assess the performance of the 2DLQ in terms of multipliers. Here the 2DLS method gives better results than the AFLQ for all countries apart from Belgium. The average improvement is 3.55%.Footnote 9

Of the two studies reviewed thus far, the 2DLS method seems the most promising, when evaluated in terms of its theoretical foundations, empirical performance and ease of application. Even so, some caveats should be borne in mind. The first concerns the need to determine suitable values of α and β. Here the authors provide some reassurance that the range of suitable values of these parameters is relatively small and that analysts would not go far wrong by choosing an α of 0.1 or 0.15 and a β in the range 0.8 to 1.2 (Pereira-López et al. 2020, table 5).

The second caveat concerns the fact that countries rather than regions are used in the testing process, which poses some potential problems. While the authors are right to stress the quality and consistency of the Eurostat data they employ, it is not true to say that using countries instead of regions is the only possible way to proceed. For instance, suitable survey-based official regional and national data are available for Finland (Flegg and Tohmo 2013), South Korea (Jahn et al. 2020) and Japan (Fujimoto 2019). It would be instructive to re-examine the 2DLQ method by using one or more of such data sets to perform a sensitivity analysis.

The study by Lahr et al. (2020) represents a radical departure from those discussed hitherto. The authors examine data for 2014 from the World Input–Output Database, pertaining to 23 manufacturing sectors in 28 EU countries. A novel feature of this study is its use of quasi-binomial econometric models, in which the regional purchase coefficient (RPC) is regressed on various variables.Footnote 10 Each country is treated as a ‘region’ and the aggregate of all 28 countries as the ‘nation’. Accordingly, \({\text{RPC}}_{i}^{c}\) is the proportion of national requirements of industry i supplied by firms located within country c.

The following regressors were found to be statistically significant (p = 0.001, two-tailed test):

  • SLQ

  • ln (land area)

  • ln (hotel room-nights/area)

  • ln (weight/value)

  • supply/demand ratio (SDR).

In addition, six industry-specific binary variables were included. R2 = 0.660. By contrast, a model with the SLQ alone gave R2 = 0.142. This worse fit is unsurprising, since the SLQ rules out the key factor of cross-hauling and cannot allow for the peculiarities of specific industries.

Lahr et al. also carry out a test of what is described as the FLQ method. This is done by regressing \({\text{RPC}}_{i}^{c}\) on \({\text{SLQ}}_{i}^{c}\) and the ‘employment share’. R2 = 0.195. However, this test is inconsistent in several respects with the FLQ approach. Most importantly, the FLQ has a cross-industry foundation, which cannot be captured in a rows-only estimation. Consequently, no account is taken of the likelihood that purchasing industries would differ in their use of particular inputs. This aspect is captured in \({\text{CILQ}}_{ij}^{c}\) and hence in \({\text{FLQ}}_{ij}^{c}\) but not in \({\text{SLQ}}_{i}^{c}\). We should note too that the FLQ formula is multiplicative, whereas the regression model is additive. It is also unclear how the regressor ‘employment share’ was measured.Footnote 11 Finally, whereas output was used to calculate \({\text{SLQ}}_{i}^{c}\), employment was used to measure regional size. That would affect the results to the extent that productivity differed across EU countries.

The authors compare the performance of their econometric approach with that of the SLQ, SDR and CHARM. The procedures are judged in terms of their ability to estimate RPCs and to replicate each country’s coefficient matrix, Leontief inverse and output multipliers. As expected, the SLQ and SDR yield similar results and, on average, both methods greatly overstate input coefficients (Lahr et al. 2020, table 5). By comparison, the regression-based approach necessarily yields a mean error of zero. The authors find that RCHARM performs ‘somewhat better’ than the SLQ and SDR, yet it still systematically overstates RPCs, with a mean error of 0.240 (Lahr et al. 2020, p. 1594). By contrast, MCHARM yields negative RPCs for 337 of 1568 national industries. Therefore, both variants of CHARM have serious demerits as a means of estimating RPCs.

A crucial consideration when selecting a non-survey method is its ability to yield unbiased estimates of input coefficients. A regression-based approach can be relied upon to perform very well in that respect but it is also possible to obtain unbiased estimates via the FLQ approach, so long as an appropriate value of the parameter δ is used.

A drawback of the RPC approach vis-à-vis the FLQ is its more demanding data requirements.Footnote 12 In the model discussed above, for instance, it would be challenging to find data for some of the regressors, whereas the FLQ only requires figures for output (or employment) in each regional and national industry.

On the other hand, the FLQ has often been criticized on the basis that the results obtained from one country or region are not necessarily transferable elsewhere, since the optimal δ would differ. This problem is addressed in the present paper via a procedure whereby country-specific and region-specific values of δ can be derived. Lahr et al. (2020, p. 1591) note that econometric approaches face a similar challenge in terms of transferability of results.

While the model constructed by Lahr et al. sheds some helpful light on the determinants of RPCs in EU countries, it would be interesting to see how well it would perform when constructing a RIOT for, say, Catalonia from a Spanish NIOT.

The CE approach to regionalizing NIOTs

Here, we explain our MCE approach to regionalizing NIOTs. Various mathematical programming methods based on a constrained optimization framework exist. These typically minimize a penalty function, which measures the deviation of the balanced matrix from the initial matrix, subject to a set of balancing conditions. We limit our attention to the CE method, which is one of the pioneering methods. It is widely used and Lamonica et al. (2020) have shown that it performs very well when applied to countries with small economies. Moreover, in contrast to other constrained optimization methods, it is stable in the sense that the MAD index, i.e. the sum of the absolute differences between the observed and estimated input coefficients, does not change abruptly from one economy to the next. Furthermore, unlike the standard RAS method, which can only handle non-negative matrices, the CE method can easily be adapted to deal with matrices with negative entries. Although we could have used GRAS to deal with this issue, we wished to avoid the problem noted in the Introduction regarding this technique.

Referring to Lamonica et al. (2020) for details and treating the primary input vn and the final demand fn as ‘additional’ intermediate input and output, the method starts with the following augmented input coefficients matrix An:

$$ {\mathbf{A}}^{n} = \left[ {\begin{array}{*{20}ll} {a_{1,1}^{n} = {\raise0.7ex\hbox{${x_{1,1}^{n} }$} \!\mathord{\left/ {\vphantom {{x_{1,1}^{n} } {x_{1}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{1}^{n} }$}}} & \cdots & {a_{1,k}^{n} = {\raise0.7ex\hbox{${x_{1,k}^{n} }$} \!\mathord{\left/ {\vphantom {{x_{1,k}^{n} } {x_{k}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{k}^{n} }$}}} & {a_{1,k + 1}^{n} = {\raise0.7ex\hbox{${f_{1}^{n} }$} \!\mathord{\left/ {\vphantom {{f_{1}^{n} } {x_{k + 1}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{k + 1}^{n} }$}}} \\ \vdots & {} & \vdots & \vdots \\ {a_{k,1}^{n} = {\raise0.7ex\hbox{${x_{k,1}^{n} }$} \!\mathord{\left/ {\vphantom {{x_{k,1}^{n} } {x_{1}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{1}^{n} }$}}} & \cdots & {a_{k,k}^{n} = {\raise0.7ex\hbox{${x_{k,k}^{n} }$} \!\mathord{\left/ {\vphantom {{x_{k,k}^{n} } {x_{k}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{k}^{n} }$}}} & {a_{k,k + 1}^{n} = {\raise0.7ex\hbox{${f_{k}^{n} }$} \!\mathord{\left/ {\vphantom {{f_{k}^{n} } {x_{k + 1}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{k + 1}^{n} }$}}} \\ {a_{k + 1,1}^{n} = {\raise0.7ex\hbox{${v_{1}^{n} }$} \!\mathord{\left/ {\vphantom {{v_{1}^{n} } {x_{1}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{1}^{n} }$}}} & \cdots & {a_{k + 1,k}^{n} = {\raise0.7ex\hbox{${v_{k}^{n} }$} \!\mathord{\left/ {\vphantom {{v_{k}^{n} } {x_{k}^{n} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${x_{k}^{n} }$}}} & {a_{k + 1,k + 1}^{n} = 0} \\ \end{array} } \right] . $$
(18)

It turns out that:

$$ {\mathbf{A}}^{n} {\mathbf{x}}^{n} = {\mathbf{x}}^{n} , $$
(19)

and

$$ {\mathbf{u^{\prime}A}}^{n} = {\mathbf{u}}, $$
(20)

where u is a unitary vector and \({\mathbf{u^{\prime}}}\) is its transpose.

The task at hand is to generate a new matrix Ar from the existing An, with the same dimension as before, but respecting new row and column totals xr. More specifically, we seek a matrix Ar that satisfies the following consistency and additivity conditions:

$$ {\mathbf{A}}^{r} {\mathbf{x}}^{r} = {\mathbf{x}}^{r} , $$
(21)
$$ {\mathbf{u^{\prime}A}}^{r} = {\mathbf{u}}. $$
(22)

Formally, the problem involves minimizing the following function for \(a_{ij}^{n}\) > 0:

$$ \mathop {{\text{Min}}}\limits_{{a_{ij}^{r} }} H = \sum \limits_{i = 1}^{k + 1} \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }}; $$
(23)

subject to

$$ \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} x_{j}^{r} = x_{i}^{r} , $$
(24)

and

$$ \sum \limits_{i = 1}^{k + 1} a_{ij}^{r} = 1. $$
(25)

The solution is obtained by solving the following Lagrangian function for problem (23):

$$ L = \sum \limits_{i = 1}^{k + 1} \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }} + \sum \limits_{i = 1}^{k + 1} \lambda_{i} \left( {x_{i}^{r} - \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} x_{j}^{r} } \right) + \sum \limits_{j = 1}^{k + 1} \mu_{j} \left( {1 - \sum \limits_{i = 1}^{k + 1} a_{ij}^{r} } \right). $$
(26)

The first-order optimal conditions are:

$$ \frac{\partial L}{{\partial a_{ij}^{r} }} = \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }} + 1 - \lambda_{i} x_{j}^{r} - \mu_{j} = 0,\;i,j = \, 1, \, 2, \, \ldots ,k + \, 1; $$
(27)
$$ \frac{\partial L}{{\partial \lambda_{i} }} = x_{i}^{r} - \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} x_{j}^{r} = 0,\;i = 1, \, 2, \ldots ,k + 1; $$
(28)
$$ \frac{\partial L}{{\partial \mu_{j} }} = 1 - \sum \limits_{i = 1}^{k + 1} a_{ij}^{r} = 0,\;j = 1,2, \ldots ,k + 1. $$
(29)

Solving this system of equations yields:

$$ a_{ij}^{r} = a_{ij}^{n} e^{{\left( { - 1 + \lambda_{i} x_{j}^{r} + \mu_{j} } \right)}} ,\;i,j = 1,2, \ldots ,k + 1; $$
(30)
$$ \sum \limits_{j = 1}^{k + 1} x_{j}^{r} a_{ij}^{n} e^{{\left( { - 1 + \lambda_{i} x_{j}^{r} + \mu_{j} } \right)}} = x_{i}^{r} ,\;i = 1,2, \ldots ,k + 1; $$
(31)
$$ \sum \limits_{i = 1}^{k + 1} a_{ij}^{n} e^{{\left( { - 1 + \lambda_{i} x_{j}^{r } + \mu_{j} } \right)}} = 1, \;j = 1,2, \ldots ,k + 1. $$
(32)

We should note that the estimates of \(a_{ij}^{r}\) depend on the values of the Lagrangian multipliers, which must be determined by solving the nonlinear system (31) and (32) for the unknowns λ1, λ2, …, λk+1. As no closed-form solution exists, this system is solved by using numerical algorithms.

However, any standard solver will fail to find a solution to Eqs. (31) and (32) when \(a_{ij}^{n}\) and \(x_{i}^{r}\), i, j = 1, 2, …, k + 1, are values from real IOTs. In fact, the solution to these equations requires some attention for two main reasons: the evaluation of the exponential functions for large values of their arguments and the solution to a high-dimension system of nonlinear equations. These difficulties are addressed by adopting the solution proposed by Lamonica et al. (2020). Since the KIRIOT shows null and negative entries, we adapt the CE method to account for negative entries in the NIOT.

To this end, we revise problem (23) by assuming that the sign of the national matrix is preserved in the regional matrix (i.e. \(a_{ij}^{n} < 0\) implies \(a_{ij}^{r} < 0,\) and \(a_{ij}^{n} > 0\) implies \(a_{ij}^{r} > 0\)), while a null entry in the national matrix implies a null entry in the regional matrix (i.e. \(a_{ij}^{n} = 0\) implies \(a_{ij}^{r} = 0\)). In the following, we refer to this assumption as the ‘sign-preserving assumption’. We therefore formulate the following optimization problem:

$$ \mathop {{\text{Min}}}\limits_{{a_{ij}^{n} }} H = \sum \limits_{i = 1}^{k + 1} \sum \limits_{j = 1}^{k + 1} |a_{ij}^{r} | \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }}; $$
(33)

subject to

$$ \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} x_{j}^{r} = x_{i}^{r} , $$
(34)

and

$$ \sum \limits_{i = 1}^{k + 1} a_{ij}^{r} = 1. $$
(35)

The solution is obtained by solving the following Lagrangian function for problem (33):

$$ L = \sum \limits_{i = 1}^{k + 1} \sum \limits_{j = 1}^{k + 1} |a_{ij}^{r} | \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }} + \sum \limits_{i = 1}^{k + 1} \lambda_{i} \left( {x_{i}^{r} - \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} x_{j}^{r} } \right) + \sum \limits_{j = 1}^{k + 1} \mu_{j} \left( {1 - \sum \limits_{i = 1}^{k + 1} a_{ij}^{r} } \right). $$
(36)

We consider the set of pairs (i, j)  Γ = {1, 2, …, k + 1} × {1, 2, …, k + 1} and its partition \( \Gamma { = }\Gamma { }^{ + } \cup \Gamma^{ - }\), where the sets \(\Gamma^{ + }\) and \(\Gamma^{ - }\) are defined as: \(\Gamma^{ + } = \{ (i, j) | a_{ij}^{r} > 0\}\) and \(\Gamma^{ - } = \{ (i, j) | a_{ij}^{r} < 0\}\). The Lagrangian can be rewritten as

$$ L = \sum \limits_{{(i, j) \in \Gamma^{ + } }}^{k + 1} a_{ij}^{r} \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }} - \sum \limits_{{(I, j) \in \Gamma^{ - } }}^{k + 1} a_{ij}^{r} \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }} + \sum \limits_{i = 1}^{k + 1} \lambda_{i} \left( {x_{i}^{r} - \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} x_{j}^{r} } \right) + \sum \limits_{j = 1}^{k + 1} \mu_{j} \left( {1 - \sum \limits_{i = 1}^{k + 1} a_{ij}^{r} } \right). $$
(37)

The first-order optimal conditions are:

$$ \frac{\partial L}{{\partial a_{ij}^{R} }} = \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }} + 1 - \lambda_{i} x_{j}^{r} - \mu_{j} = 0,\;\left( {i,j} \right)\, \in \,\Gamma^{ + } ; $$
(38)
$$ \frac{\partial L}{{\partial a_{ij}^{R} }} = - \ln \frac{{a_{ij}^{r} }}{{a_{ij}^{n} }} + 1 - \lambda_{i} x_{j}^{r } - \mu_{j} = 0,\; \left( {i,j} \right) \in \Gamma^{ - } ; $$
(39)
$$ \frac{\partial L}{{\partial \lambda_{i} }} = x_{i}^{r} - \sum \limits_{j = 1}^{k + 1} a_{ij}^{r} x_{j}^{r} = 0,\;i = \, 1,2, \ldots ,k + 1; $$
(40)
$$ \frac{\partial L}{{\partial \mu_{j} }} = 1 - \sum \limits_{i = 1}^{k + 1} a_{ij}^{r} = 0,\;j = 1,2, \ldots ,k + 1. $$
(41)

Solving this system of equations yields:

$$ a_{ij}^{r} = a_{ij}^{n} e^{{\left( { - 1 + \lambda_{i} x_{j}^{r } + \mu_{j} } \right)}} ,\;\left( {i,j} \right) \in \Gamma^{ + } ; $$
(42)
$$ a_{ij}^{r} = a_{ij}^{n} e^{{\left( { + 1 - \lambda_{i} x_{j}^{r} - \mu_{j} } \right)}} ,\,\left( {i,j} \right) \in \Gamma^{ - } ; $$
(43)
$$ \sum \limits_{{(i, j) \in \Gamma^{ + } }}^{k + 1} x_{j}^{r} a_{ij}^{n} e^{{\left( { - 1 + \lambda_{i} x_{j}^{r} + \mu_{j} } \right)}} + \sum \limits_{{(i ,j) \in \Gamma^{ - } }}^{k + 1} x_{j}^{r} a_{ij}^{n} e^{{\left( { + 1 - \lambda_{i} x_{j }^{r} - \mu_{j} } \right)}} = x_{i}^{r} ,\; i = 1,2, \ldots ,k + 1; $$
(44)
$$ \sum \limits_{{(i, j) \in \Gamma^{ + } }}^{k + 1} a_{ij}^{n} e^{{\left( { - 1 + \lambda_{i} x_{j}^{r} + \mu_{j} } \right)}} + \sum \limits_{{(i, j) \in \Gamma^{ - } }}^{k + 1} a_{ij}^{n} e^{{\left( { + 1 - \lambda_{i} x_{j}^{r} - \mu_{j} } \right)}} = 1, \;j = \, 1,2, \ldots ,k + 1. $$
(45)

We solve Eqs. (44) and (45) with the Matlab solver fsolve, by using the scaling procedure as in Lamonica et al. (2020) to overcome the numerical challenges of these equations.

Estimation of δ and the FLQ+ approach

We now explain the method employed to estimate δ. We consider the more detailed version of the South Korean interregional input–output table (KIRIOT) for the year 2005, which was constructed by the Bank of Korea for 16 regions and 78 sectors.

Table 1 illustrates a simplified form of the KIRIOT, where:

  • Zr;r (r = 1, 2, …, 16) is a 78 × 78 matrix whose elements \(\left( {z_{ij}^{r;r} } \right)\) are the flows for intermediate use from sector i to sector j of region r;

  • Zr;k (r, k = 1, 2, …, 16 and r ≠ k) is a 78 × 78 matrix whose elements \(\left( {z_{ij}^{r;k} } \right)\) are the exports for intermediate use from sector i of region r to sector j of region k;

  • Cr;r (r = 1, 2, …, 16) is a 78 × 5 matrix of the domestic final demand in region r;

  • Er;k (r, k = 1, 2, …, 16 and r ≠ k) is a 78 × 5 matrix of the exports of region r for final demand in region k;

  • xr (r = 1, 2, …, 16) is a 78 × 1 vector whose elements are the sectoral output of region r;

  • (vr)′ (r = 1, 2, …, 16) is a 1 × 78 vector whose elements are the sectoral value added plus the primary sectoral input of region r, inclusive of imports from abroad.

Table 1 Simplified pattern of the KIRIOT

Using the South Korean NIOT, the national augmented matrix of input coefficients, \({\mathbf{A}}^{n} = [a_{ij}^{n} ]\), was determined as follows. First, let:

  • \({\mathbf{X}}^{n} = \sum \limits_{r = 1}^{16} \sum \limits_{r = 1}^{16} {\mathbf{Z}}^{r;r} ;\)

  • \({\mathbf{f}}^{n} = \left( { \sum \limits_{r = 1}^{16} {\mathbf{C}}^{r;r} + \sum \limits_{r = 1}^{16} \sum \limits_{k = 1}^{16} {\mathbf{E}}^{r;k} } \right);\)

  • \({\mathbf{x}}^{n } = \sum \limits_{r = 1}^{16} {\mathbf{x}}^{r} ;\)

  • \(\left( {{\mathbf{v}}^{n} } \right)^{\prime } = \sum \limits_{r = 1}^{16} \left( {{\mathbf{v}}^{r} } \right)^{\prime } .\)

Next, consider the following national table, IOTn, in block-matrix notation:

$$ {\mathbf{IOT}}^{n} = \left[ {\begin{array}{*{20}ll} {{\mathbf{X}}^{n} } & {{\mathbf{f}}^{n} } \\ {\left( {{\mathbf{v}}^{n} } \right)^{\prime } } & 0 \\ \end{array} } \right]. $$
(46)

Hence, we can determine the following national augmented matrix of input coefficients:

$$ {\mathbf{A}}^{n} = {\mathbf{IOT}}^{n} {\mathbf{D}}({\mathbf{x}}^{n} )^{ - 1} , $$
(47)

where D(xn) is the diagonal matrix whose elements are \(d_{ii} ({\mathbf{x}}^{n} ) = x_{i}\), i = 1, 2, …, k + 1.

We are now able to define the three steps of our hybrid method:

  • Step 1: From An and the regional sectoral total output xr, estimate the regional input coefficient matrices \({\hat{\mathbf{A}}}^{r}\) (r = 1, 2, …, 16), using the MCE method described in the previous section.

  • Step 2: Using \({\hat{\mathbf{A}}}^{r}\), estimate the parameter δ in the FLQ formula as follows:

    Let αij = CILQij if i ≠ j and αij = SLQi if i = j. Also, define β ≡ \(\log_{2} \left( {1 + \frac{{x^{r} }}{{x^{n} }}} \right)\). Hence, using Eqs. (7) and (8), we get:

    $$ \hat{a}_{ij}^{r} = a_{ij}^{n} \alpha_{ij} \beta^{\delta } . $$
    (48)

    Thus, for any pair \((\hat{a}_{ij}^{r} , a_{ij}^{n}\)) that satisfies the sign-preserving assumption, we set:

    $$ \log \left( {\frac{{\hat{a}_{ij}^{r} }}{{a_{ij}^{n} \alpha_{ij} }}} \right) = \delta \left( \beta \right) + \varepsilon ij, $$
    (49)

    where \({\upvarepsilon }\)ij is a random number with zero mean. Taking expectations, we can obtain an estimate of the optimal δ for each region simply by dividing the mean of the regressand by \(\log \left( \beta \right)\), which is a given value for each region. See Table 2.

    Table 2 Computation of δ from model (49).
  • Step 3. Using An, estimate the entries of the regional matrix Ar, the \(a_{ij}^{r}\), using the FLQ method with the estimated optimal parameter \(\hat{\delta }\). We refer to this hybrid approach as the FLQ+ method. As before, the estimated input coefficients are denoted by \(\hat{a}_{ij}^{r} .\)

Analysis of input coefficients

To validate the proposed FLQ+ method, the estimated matrices of regional input coefficients are compared with the true ones, \({\mathbf{A}}^{r} = \left[ {a_{ij}^{r} = \frac{{z_{ij}^{r;r} }}{{x_{j} }}} \right]\). Since the RIOT being considered contains negative and null flows, we assess its performance by using the mean absolute difference index (MAD), as suggested by Wiebe and Lenzen (2016). It is defined as:

$$ {\text{MAD }} = \sum \limits_{i = 1}^{78} \sum \limits_{j = 1}^{78} \frac{{\left| {a_{ij} - \hat{a}_{ij} } \right|}}{78 \times 78}. $$
(50)

The FLQ, as implemented here, is the optimal FLQ in the sense that δ is chosen on the basis of minimizing the MAD. Specifically, for each of the sixteen regions, we compute the MAD of the FLQ for 99 different values of δ in the interval [0, 1], in increments of 0.01. The optimal δ is the value yielding minimum MAD. Determined in this way, the regional δ values vary from 0.22 to 0.48, with a mean of 0.35 (see Table 3). We refer to these values as the true or optimal values of δ. The characteristics of these regions are illustrated in Table 3 and their locations are shown in Fig. 1. For more details, see Flegg and Tohmo (2019) and Jahn et al. (2020).

Table 3 Characteristics of South Korean regions in 2005.
Fig. 1
figure1

Source: South Korea regions map merged.png author: Peter Fitzgerald, NordNordWest; licensed under the Creative Commons Attribution-Share Alike 3.0 Unported; available in Wikimedia Commons

South Korean regions.

Before examining the results, it may be helpful to explore why the optimal values of δ in Table 3 vary noticeably across regions. The role of δ is to adjust for any differences in regional propensities to import from other regions or from abroad that cannot be explained solely by differences in regional size. Now consider the following regression model fitted using data from Table 3:

$$ \ln \delta = - 3.154 + 0.582\ln R + 1.040\ln F + 2.515\ln V + e, $$
(51)

where R is the share of gross output imported from other regions; F is the share imported from abroad; V is the share of value added in gross output; e is a residual. All regressors are statistically significant at the 1% level (one-tailed tests) and the model easily passes all χ2 diagnostic tests. R2 = 0.677.

Equation (51) shows that regions with an above-average share of imports from other regions or from abroad require a bigger δ to compensate; and likewise if the share of value added in gross output is above average. Just over two thirds of the interregional variation in ln δ can be explained by these three factors, while almost one third must be ascribed to region-specific factors, differences in regional industrial structure, measurement errors, etc.

Table 4 displays the values of MAD (100×) for the MCE, FLQ with optimal δ, and FLQ+ methods. A key finding is that the MADs for the FLQ (both versions) are typically less than half of those for the MCE method. This result demonstrates the potential gains from employing the FLQ as a regionalization technique rather than applying the MCE method alone.

Table 4 MAD (×100) and δ values by region for the MCE, optimal FLQ and FLQ+ methods.

Another key finding is that the estimates of δ from the proposed FLQ+ method are mostly fairly close to the optimal values; the differences are invariably only in the second decimal place and range from − 0.06 to 0.09. It is noticeable that the FLQ+ overestimates δ in the two biggest regions, Gyeonggi and Seoul. In the majority of regions, however, the estimates from this method are a little lower than the optimal values and, on average, δ is understated by 0.02. However, when the results are weighted by the regional share of national output, this outcome changes to a small overstatement on average. The biggest errors occur in the two smallest regions, Daejeon and Jeju. Table 3 reveals that the metropolitan city of Daejeon has an unusually low intraregional share of inputs, which could explain why the estimated δ is noticeably below the optimal value, with a discrepancy of 0.09. As for Jeju, the fact that it is a remote island can probably explain much of the understatement of δ by the FLQ+ method.

Table 5 provides some descriptive statistics of regional input coefficients estimated by the optimal FLQ and FLQ+ methods. Clearly, the two distributions are very close to each other in terms of central tendency and dispersion. Furthermore, they have an identical shape, as measured by the coefficients of kurtosis and skewness (not shown). The uniform behaviour of the two methods lends credence to the FLQ+ method proposed in this work, which is applicable when regional sectoral outputs are the only regional data available. By contrast, the optimal FLQ method requires knowledge of the entire regional matrix, so that an optimal value of δ can be computed for each region.

Table 5 Descriptive indices (×100) for the estimated input coefficients from the optimal FLQ and FLQ+ methods

Table 6 displays descriptive statistics for the absolute differences between the observed input coefficients and those estimated via the FLQ+ and GRAS+ methods. FLQ+ combines the MCE method, our preferred matrix-balancing approach, with the FLQ, whereas GRAS+ combines GRAS with the FLQ. The two methods produce similar results on the whole, although it is interesting that GRAS+ gives larger mean absolute differences than FLQ+ for the biggest eight regions, yet almost identical values for five of the six smallest regions. A possible explanation of this outcome is that the FLQ+ is apt to perform better when the percentage of zero flows is relatively low, as is the case for the larger regions. The two methods concur that Busan and Gangwon have, respectively, the smallest and biggest maximum absolute differences.Footnote 13 Taking a holistic view, the MAD is increased by 7.3% on average by using GRAS+ rather than FLQ+ and by 12.3% when the results are weighted by regional size. This finding, along with the theoretical criticisms noted earlier regarding the objective function employed in GRAS, lends support to our use of a modified CE method as the foundation of our proposed FLQ+ approach.

Table 6 Statistics of the absolute differences (×100) between the observed and estimated input coefficients from the FLQ+ and GRAS+ methods

Analysis of the important input coefficients

Here we analyse the ability of the FLQ+ and GRAS+ methods to reproduce what Hewings and Romanos (1981) call ‘inverse important coefficients’, whereby an input coefficient aij is said to be inverse important if an error of α% in that coefficient produces a corresponding error of β% in one or more entries of the Leontief inverse.

In this analysis, α and β were set equal to 30% and 20%, respectively, as suggested by Hewings and Romanos. This means that an input coefficient is held to be inverse important if a perturbation of 30% generates a change of at least 20% in one or more entries in the Leontief inverse. More formally, we assume that only one coefficient, aij, of the matrix A is perturbed. Let \(a_{ij}^{p}\) denote this perturbed value of aij, i.e. \(a_{ij}^{p} \) = aij (1 + α/100). This shock generates a perturbed matrix Lp = [\(L_{ks}^{p}\)] of the Leontief matrix L = (I − A)−1 = [\(L_{ks}\)], k, s = 1, 2, …, 78, whose elements are determined as follows:

$$ L_{ks}^{p} = L_{ks} + [L_{ki} L_{sj} a_{ij} (/100)]/[1 - L_{ji} a_{ij} (/100)], $$
(52)

where the indices i and j relate to the perturbed coefficient in A, while k and s pertain to the Leontief inverse L and the corresponding perturbed matrix Lp.

The relative error of each coefficient of the Leontief matrix reads as

$$ \frac{{L_{ks}^{P} - L_{ks} }}{{L_{ks} }} = \frac{{L_{ki} L_{sj} a_{ij} (/100)}}{{L_{ks} [1 L_{ji} a_{ij} (/100)]}}. $$
(53)

We then say that the coefficient aij is inverse important if there exists at least one pair (k, s) such that the following inequality holds:

$$ \left| {\frac{{L_{ki} L_{sj} a_{ij} (/100)}}{{L_{ks} [1 - L_{ji} a_{ij} (/100)]}}} \right| \ge \beta /100. $$
(54)

Table 7 reveals a sharper distinction between the two methods, with the superiority of FLQ+ over GRAS+ now much more apparent, especially when the results are weighted by regional size. The MAD is now raised by 11.5% on average by using GRAS+ rather than FLQ+ and by 18.2% when regional weights are applied. It is clear that GRAS+ yields relatively poor estimates of the most important input coefficients. It is also apparent that the absolute differences increase when the focus is placed on the most important coefficients. This outcome is to be expected, given the increased size of the coefficients being estimated. Of the 782 = 6084 coefficients, those deemed to be important ranged from 1651 (27.1%) for Jeju to 2147 (35.3%) for Daejeon, with a median of 1869 (30.7%).

Table 7 Statistics of the absolute differences (×100) between the observed and estimated ‘inverse important’ input coefficients from the FLQ+ and GRAS+ methods

Comparison of alternative methods

The analysis thus far has demonstrated the superiority of the proposed FLQ+ approach over the MCE and GRAS+ methods. However, what has not yet been explored is whether the FLQ can be expected to yield more accurate estimates of input coefficients than more straightforward methods such as the SLQ. For the purposes of this comparison, it is assumed that the analyst does not wish to employ matrix-balancing methods. It is further assumed that the choice is between the SLQ and the FLQ with a fixed value of δ = 0.3 for all regions under consideration.

As noted in Sect. 2, the SLQ has the theoretical drawback that it precludes cross-hauling and hence tends to understate a region’s imports from other regions. It is unsurprising, therefore, to observe in Table 8 that the SLQ generates less accurate results than the FLQ for all regions apart from Seoul and (marginally) South Jeolla. On average, the MAD is raised by 25.8% by using the SLQ rather than the FLQ and by 20.7% if regional weights are applied. In addition, the MAD for the SLQ exhibits greater interregional variation. The superior performance of the SLQ in Seoul is probably due to the relatively high variation in the size of the input coefficients in this region. In such situations, the SLQ is apt to produce more accurate estimates (Lamonica and Chelli 2017).

Table 8 Statistics of the absolute differences (×100) between the observed and estimated input coefficients from the FLQ (with a fixed δ = 0.3) and SLQ

Given the uncertainty regarding the appropriate value of δ, it is reassuring that Table 9 shows that the outcomes from the FLQ are little affected by variation in the value of this parameter in the range 0.2 to 0.4. The SLQ still outperforms the FLQ in Seoul and South Jeolla but yields inferior results in the other fourteen regions.

Table 9 Statistics illustrating how the absolute differences (×100) between the observed and estimated input coefficients vary with changes in the assumed value of δ

Analysis of multipliers

One of the most useful features of RIOTs is the fact that they yield estimates of sectoral output multipliers, so Tables 10 and 11 examine the relative performance in this regard of eight alternative procedures. The following formulae are used in this evaluation:

$$ {\text{MAPE}} = \frac{100}{{78}} \sum \limits_{j = 1}^{78} \frac{{\left| {\widehat{{L_{j} }} - L_{j} } \right|}}{{L_{j} }}; $$
(55)
$$ {\text{MPE}} = \frac{100}{{78}} \sum \limits_{j = 1}^{78} \frac{{\widehat{{L_{j} }} - L_{j} }}{{L_{j} }}, $$
(56)

where \(\widehat{{L_{j} }}\) and \(L_{j}\) denote the estimated and observed column totals of the Leontief inverse matrices.

Table 10 Statistics of the mean absolute percentage differences (MAPEs) between the estimated and observed type I output multipliers from alternative approaches
Table 11 Statistics of the mean percentage differences (MPEs) between the estimated and observed type I output multipliers from alternative approaches

Table 10 displays the outcomes in terms of MAPEs when estimating type I output multipliers for the sixteen regions. The first two columns illustrate the benefits of pursuing the FLQ+ approach rather than the MCE method alone. In particular, there is a sharp fall in both unweighted and weighted means, along with a large decrease in dispersion. It is evident that the MCE method by itself does not yield satisfactory estimates of multipliers.

Although the FLQ+ and FLQ with δ = 0.3 have similar means, there are marked differences in the MAPEs for several regions. Busan and North Chungcheong are exceptions where the two methods yield identical results; this is because δ = 0.3 happens to be the estimated value from the FLQ+.

By itself, GRAS generates highly inaccurate results, yet combining it with the FLQ in the form of GRAS+ is clearly helpful in producing a more acceptable set of results. This better outcome is due to the fact that, when estimating δ in the GRAS+ procedure, we consider only those pairs \((\hat{a}_{ij}^{r} , a_{ij}^{n} )\) satisfying the ‘sign-preserving assumption’, whereas GRAS accounts for all possible pairs. Nonetheless, GRAS+ still yields substantially worse results than FLQ+.

It is interesting that the AFLQ has the lowest weighted mean of all eight approaches, whereas FLQ+ is the best in terms of the unweighted mean. This outcome is attributable to the superior performance of the AFLQ in several larger regions, most noticeably in Seoul. Nonetheless, it is odd that AFLQ+ often generates worse results than the AFLQ, especially in South Jeolla and Jeju. As regards the remaining approach, the SLQ, Table 10 reveals that its performance is clearly inferior to that of the FLQ and AFLQ.

While the MAPE index is a good way of assessing the relative accuracy of alternative methods, it tells us nothing about the direction or extent of bias. For that reason, Table 11 displays outcomes for actual rather than absolute percentage deviations.

A key finding from Table 11 is that the FLQ+ and GRAS+ procedures invariably generate underestimates of multipliers. Taking regional size into account, the FLQ+ understates these multipliers by 2.6% on average, whereas GRAS+ does so by 7.2%. Apart from the sign, the results from the FLQ+ and GRAS+ methods are identical in the two tables. By contrast, the SLQ produces a positive bias of 6.8% on average. Seoul is the only region where this bias is negative.

Another important outcome from Table 11 is that the AFLQ shows minimal bias on average. However, the estimated multipliers from the AFLQ display greater dispersion than those from FLQ+, which would tend to boost the MAPEs in Table 10. Even so, the AFLQ still emerges with a weighted mean MAPE that is marginally lower than that for FLQ+. What is more, of the eight procedures examined in Table 10, the AFLQ produces the best results for the problematic Seoul region.

Conclusion

This paper has proposed a new approach to the regionalization of national input–output tables where very limited regional data exist and analysts are considering employing methods based on location quotients. We focus on the FLQ, which often yields the most accurate results of such methods. The FLQ formula involves an unknown parameter, δ, which plays a crucial role in the regionalization process. However, the difficulty of selecting an appropriate value of δ has been an obstacle to the successful application of the FLQ approach. Hitherto, analysts have had to choose a value of δ on the basis of a priori considerations and the results of previous case studies, which often conflict. Our aim has been to develop an enhanced and more objective way of implementing the FLQ approach.

Our proposed procedure, the FLQ+ method, is a hybrid approach that uses the results from a modified cross-entropy (MCE) method and a simple regression model to estimate δ. This value, which is specific to a given country and region, is used to estimate regional input coefficients and hence multipliers via the FLQ method. Our empirical analysis, based on a rare survey-based South Korean interregional input–output table for 2005 with 16 regions and 78 sectors, gave credence to the proposed method, in that the FLQ+ behaved much like the FLQ with an optimal δ.

The FLQ+ approach generated substantially more accurate estimates of both input coefficients and sectoral output multipliers than those from the MCE approach alone. Moreover, the MCE method clearly outperformed GRAS. In further testing of the FLQ+ approach, we considered the simple LQ (SLQ), the augmented FLQ (AFLQ) and the FLQ with an assumed value of δ. However, the SLQ gave inferior outcomes to the FLQ+, so we would not recommend using this method.

The AFLQ and FLQ+ gave almost identical results when estimating input coefficients. By contrast, in terms of multipliers, the AFLQ gave some good outcomes, characterized by minimal bias, whereas the FLQ+ tended to understate the multipliers. Indeed, the AFLQ generated the most accurate estimates of multipliers for the problematic Seoul region.

Using the FLQ with an assumed value of δ has the merit of ease of application, yet it runs the risk of choosing an inappropriate value. To explore this issue, we used values of δ in the interval 0.3 ± 0.1 to estimate input coefficients. This range is suggested by Jahn et al. (2020). We found that the mean absolute differences were not much affected by variation in δ. However, an analysis of multipliers revealed marked differences in the mean absolute percentage errors for several regions, despite the fact that the FLQ+ and FLQ with δ = 0.3 had similar means. Therefore, we would recommend using the FLQ+ approach, which takes region-specific characteristics into account, along with any country-specific differences.

In concluding, we should note some avenues for further research on this topic. The first would be to re-examine the effectiveness of the AFLQ, in the light of its good performance in estimating multipliers. It would also be interesting to see how well the 2DLQ refinement of the AFLQ proposed by Pereira-López et al. (2020) performs when tested using a national data set such as that for South Korea. Secondly, it would be useful to investigate why the FLQ+ tended to underestimate output multipliers. This outcome may well because the optimal values of δ were derived by minimizing the mean absolute difference. An alternative might be to employ Theil’s inequality coefficient, which captures both bias and variance (Stevens et al. 1989). Thirdly, it would be worthwhile to consider how superior data might be used in practical applications of the FLQ+ approach. Finally, it would be interesting to examine the possible benefits of incorporating the theoretical restrictions regarding the value of δ into the constraint set of the MCE model.

Availability of data and materials

The basic data were downloaded from the Bank of Korea’s website, bok.or.kr.

Notes

  1. 1.

    Consider a region where SLQ1 = 0.8, SLQ2 = 1.2, SLQ3 = 0.6 and SLQ4 = 1.5, so that CILQ1,1 = 1, CILQ1,2 = \(0.\dot{6}\), CILQ1,3 = \(1.\dot{3}\), CILQ1,4 = \(0.5\dot{3}\), etc. For the SLQ to be valid, this region would need to be an importer but not an exporter of commodities 1 and 3, and vice versa for commodities 2 and 4. The CILQ would encompass a wider set of possibilities. For instance, industries 2 and 4 could import but not export commodity 1, yet this commodity could be exported but not imported by industry 3; consequently, cross-hauling of commodity 1 could occur. In contrast, only exporting of commodity 4 would be possible because CILQ4j ≥ 1 for all j.

  2. 2.

    Based on an analysis of data for all Chinese provinces, Okamoto (2014) casts some doubt on this hypothesis. However, the sheer size and diversity of China make it difficult to apply non-survey methods successfully. Furthermore, the intermediate inputs in Chinese regional tables include imports, whereas the FLQ requires such imports to be excluded. Okamoto attempted to circumvent this problem by calculating regional self-sufficiency rates, based on the assumption that imports were determined by regional demand. While this assumption is reasonable, its use is bound to introduce some inaccuracy.

  3. 3.

    Lampiris et al. (2020) use Eurostat data for 2010 and 2014 to assess the performance of various pure LQ-based formulae in a sample of 18 countries. For this data set, the AFLQ generally outperforms the FLQ, although the differences in outcomes are typically very small. δ in the range 0.1 to 0.3 gives the best results.

  4. 4.

    See Fujimoto (2019, p. 108–111) for a very clear explanation of how the Japanese type A tables can be converted into type B tables for purposes of estimation.

  5. 5.

    Boero et al. (2018) use US county-level data on demand and supply, along with measures of transport costs, to estimate trade flows. They develop a way of estimating regional tables and trade flows simultaneously, thereby making it possible to obtain more precise estimates. Although this interesting new procedure seems to yield reasonably accurate results, the authors note (p. 236) that it is computationally burdensome, especially where the focus is on a single county. For that reason, along with the expectation that it would be more useful in the USA than elsewhere, we opted not to discuss it further.

  6. 6.

    To illustrate, let λ = 0.8 and \(m_{i}^{n}\) = 0.2. Also, let \({\text{DSLQ}}_{i}^{r} = 0.6\) initially (case A) but rise to 0.8 (case B).

    Case A. Using \({\text{FLQ}}_{{\text{i}}}^{r}\), \(M_{i}^{r}\) = [1 − 0.6 × 0.8 × 0.8]\( D_{i}^{r}\) = 0.616\( D_{i}^{r}\). By contrast, with \({\text{DSLQ}}_{{\text{i}}}^{r}\), \({\text{we}}\;{\text{get}}\;M_{i}^{r}\) = [1 − 0.6 × 0.8]\( D_{i}^{r}\) = 0.52\(D_{i}^{r}\). Hence Δ\(M_{i}^{r}\) = (0.616 − 0.52)\(D_{i}^{r}\) = 0.096\(D_{i}^{r}\), so the extra regional imports due to using \({\text{FLQ}}_{i}^{r}\) rather than \({\text{DSLQ}}_{i}^{r}\) are 9.6% of regional demand.

    Case B. Using \(FLQ_{i}^{r}\), \(M_{i}^{r}\) = [1 − 0.8 × 0.8 × 0.8]\(D_{i}^{r}\) = 0.488\(D_{i}^{r}\). By contrast, with \(DSLQ_{i}^{r}\), \({\text{we get}} {\text{M}}_{{\text{i}}}^{r}\) = [1 − 0.8 × 0.8]\(D_{i}^{r}\) = 0.36 \( {\text{D}}_{{\text{i}}}^{r}\). Hence Δ\(M_{i}^{r}\) = (0.488 − 0.36)\(D_{i}^{r}\) = 0.128 \(D_{i}^{r}\), so the extra regional imports are now 12.8% of regional demand.

  7. 7.

    This analysis could be enhanced by weighting the import propensities by regional size. In addition, the root mean squared error could be used instead of the mean absolute error; this would capture some very large errors evident in the scatter diagram for MCHARM, and allow the overall error to be decomposed into bias, variance and covariance components (Stevens et al. 1989).

  8. 8.

    The U and U* statistics are wrongly attributed to Zhao and Choi (2015), who do not use them. See Flegg and Tohmo (2013, p. 715) for a range of statistics where (n2 − \(z\)) is used as a divisor.

  9. 9.

    In a subsequent study, again using Eurostat data but with a larger sample, Pereira-López et al. (2021) obtain encouraging results for the 2DLQ vis-à-vis the FLQ, AFLQ and CILQ. However, it should be noted that the 2DLQ formula involves two unknown parameters, which introduces greater complexity into the analysis compared with the other three formulae.

  10. 10.

    These variables are similar to those used in the pioneering studies of Stevens et al. (1983) and Treyz and Stevens (1985).

  11. 11.

    It is unclear whether this regressor is simply each country’s share of overall EU employment, ec/eeu, or, more appropriately, the nonlinear scalar λ used in the FLQ formula. However, we note that λ is incorrectly defined as δ \({\text{log}}_{{2}}\)(1 + ec/eeu) rather than as [\({\text{log}}_{{2}}\)(1 + ec/eeu)]δ (cf. Lahr et al. 2020, p. 1590).

  12. 12.

    Szabó (2015, p. 50) remarks that ‘[d]espite its theoretical advantages, the [RPC] approach did not gain popularity [owing] to its high data requirements, which usually cannot be satisfied at the regional level.’

  13. 13.

    We can confirm that the unusual figure of 124.6 for Gangwon in the second column of Table 6 is correct. It exceeds 100 because the observed intermediate flow from sector 6 to sector 22 is 14882.38, while its total output is 11664, which is less than 14,882.38 because the taxes on net production are 8820.29. Hence the input coefficient = 14,882.38/11664 = 1.276. The corresponding coefficient estimated via the FLQ+ method is 0.030, so the maximum difference is 100 (1.276 − 0.030) = 124.6. Sector 6 is Mining of coal, crude petroleum and natural gas, while sector 22 is Coke and hard coal.

Abbreviations

\(a_{ij}^{n}\) :

Observed national input coefficient

\(\hat{a}_{ij}^{r}\) :

Estimated regional input coefficient

NIOT:

National input–output table

RIOT:

Regional input–output table

KIRIOT:

South Korean interregional input–output table

LQ:

Location quotient

SLQ:

Simple LQ

CILQ:

Cross-industry LQ

RLQ:

Round’s LQ

FLQ:

Flegg’s LQ

FLQ+:

A refined version of the FLQ approach proposed in this paper

AFLQ:

Augmented FLQ

SFLQ:

Industry-specific FLQ

DSLQ:

Scaling formula proposed by Fujimoto (2019)

2DLQ:

Bidimensional LQ

CHARM:

Cross-hauling adjusted regionalization method

RCHARM:

Revised CHARM

MCHARM:

Modified CHARM

RPC:

Regional purchase coefficient

SDR:

Supply/demand ratio

CE:

Cross-entropy

MCE:

Modified cross-entropy

RAS:

A biproportional matrix-balancing method

GRAS:

Generalized RAS

MAD:

Mean absolute difference

References

  1. Bacharach M (1970) Biproportional matrices and input output change. Cambridge University Press, England

    Google Scholar 

  2. Boero R, Edwards BK, Rivera MK (2018) Regional input–output tables and trade flows: an integrated and interregional non-survey approach. Reg Stud 52:225–238. https://doi.org/10.1080/00343404.2017.1286009

    Article  Google Scholar 

  3. Bonfiglio A, Chelli F (2008) Assessing the behaviour of non-survey methods for constructing regional input–output tables through a Monte Carlo simulation. Econ Syst Res 20:243–258. https://doi.org/10.1080/09535310802344315

    Article  Google Scholar 

  4. Dávila-Flores A (2015) Modelos Interregionales de Insumo Producto de la Economía Mexicana. Universidad Autónoma de Coahuila, Saltillo

    Google Scholar 

  5. Davis HC, Lofting EM, Sathaye JA (1977) A comparison of alternative methods of updating input–output coefficients. Technol Forecast Soc Change 10:79–87. https://doi.org/10.1016/0040-1625(80)90004-9

    Article  Google Scholar 

  6. Flegg AT, Tohmo T (2013) Regional input–output tables and the FLQ formula: a case study of Finland. Reg Stud 47:703–721. https://doi.org/10.1080/00343404.2011.592138

    Article  Google Scholar 

  7. Flegg AT, Tohmo T (2016) Estimating regional input coefficients and multipliers: the use of FLQ is not a gamble. Reg Stud 50:310–325. https://doi.org/10.1080/00343404.2014.901499

    Article  Google Scholar 

  8. Flegg AT, Tohmo T (2018) The Regionalization of national input–output tables: a review of the performance of two key non-survey methods. In: Mukhopadhyay K (ed) Applications of the input–output framework. Springer, Singapore, pp 347–386

    Chapter  Google Scholar 

  9. Flegg AT, Tohmo T (2019) The regionalization of national input–output tables: a study of South Korean regions. Pap Reg Sci 98:601–620. https://doi.org/10.1111/pirs.12364

    Article  Google Scholar 

  10. Flegg AT, Webber CD (1997) On the appropriate use of location quotients in generating regional input–output tables: reply. Reg Stud 31:795–805. https://doi.org/10.1080/713693401

    Article  Google Scholar 

  11. Flegg AT, Webber CD (2000) Regional size, regional specialization and the FLQ formula. Reg Stud 34:563–569. https://doi.org/10.1080/00343400050085675

    Article  Google Scholar 

  12. Flegg AT, Webber CD, Elliott MV (1995) On the appropriate use of location quotients in generating regional input–output tables. Reg Stud 29:547–561. https://doi.org/10.1080/00343409512331349173

    Article  Google Scholar 

  13. Flegg AT, Mastronardi LJ, Romero CA (2016) Evaluating the FLQ and AFLQ formulae for estimating regional input coefficients: empirical evidence for the province of Córdoba, Argentina. Econ Syst Res 28:21–37. https://doi.org/10.1080/09535314.2015.1103703

    Article  Google Scholar 

  14. Fujimoto T (2019) Appropriate assumption on cross-hauling national input–output table regionalization. Spat Econ Anal 14:106–128. https://doi.org/10.1080/17421772.2018.1506151

    Article  Google Scholar 

  15. Golan A, Vogel SJ (2000) Estimation of non-stationary social accounting matrix coefficients with supply-side information. Econ Syst Res 12:447–471. https://doi.org/10.1080/09535310020003775

    Article  Google Scholar 

  16. Golan A, Judge G, Robinson S (1994) Recovering information from incomplete or partial multisectorial economic data. Rev Econ Stat 76:541–549. https://doi.org/10.2307/2109978

    Article  Google Scholar 

  17. Hermannsson K (2016) Beyond intermediates: the role of consumption and commuting in the construction of local input–output tables. Spat Econ Anal 11:315–339. https://doi.org/10.1080/17421772.2016.1177194

    Article  Google Scholar 

  18. Hewings GJD, Romanos MC (1981) Simulating less-developed regional economies under conditions of limited information. Geogr Anal 13:373–390

    Article  Google Scholar 

  19. Hosoe N (2014) Estimation errors in input–output tables and prediction errors in computable general equilibrium analysis. Econ Model 42:277–286. https://doi.org/10.1016/j.econmod.2014.07.012

    Article  Google Scholar 

  20. Huang W, Kobayashi S, Tanji H (2008) Updating an input–output matrix with sign-preservation: some improved objective functions and their solutions. Econ Syst Res 20:111–123. https://doi.org/10.1080/09535310801892082

    Article  Google Scholar 

  21. Jahn M (2017) Extending the FLQ formula: a location quotient-based interregional input–output framework. Reg Stud 51:1518–1529. https://doi.org/10.1080/00343404.2016.1198471

    Article  Google Scholar 

  22. Jahn M, Flegg AT, Tohmo T (2020) Testing and implementing a new approach to estimating interregional output multipliers using input–output data for South Korean regions. Spat Econ Anal 15:165–185. https://doi.org/10.1080/17421772.2020.1720918

    Article  Google Scholar 

  23. Junius T, Oosterhaven J (2003) The solution of updating or regionalizing a matrix with both positive and negative entries. Econ Syst Res 15:87–96. https://doi.org/10.1080/0953531032000056954

    Article  Google Scholar 

  24. Kapur J, Kesavan H (1992) Entropy optimization principles with applications. Academic Press, New York

    Book  Google Scholar 

  25. Kowalewksi J (2015) Regionalization of national input–output tables: empirical evidence on the use of the FLQ formula. Reg Stud 49:240–250. https://doi.org/10.1080/00343404.2013.766318

    Article  Google Scholar 

  26. Kronenberg K, Fuchs M (2021) The socio-economic impact of regional tourism: an occupation-based modelling perspective from Sweden. J Sustain Tour. https://doi.org/10.1080/09669582.2021.1924757

    Article  Google Scholar 

  27. Lahr ML, Ferreira JP, Többen JR (2020) Intraregional trade shares for goods-producing industries: RPC estimates using EU data. Pap Reg Sci 99:1583–1605. https://doi.org/10.1111/pirs.12541

    Article  Google Scholar 

  28. Lamonica RG, Chelli FM (2017) The performance of non-survey techniques for constructing sub-territorial input–output tables. Pap Reg Sci 97:1169–1202. https://doi.org/10.1111/pirs.12297

    Article  Google Scholar 

  29. Lamonica RG, Recchioni MC, Chelli FM, Salvati L (2020) The efficiency of the cross-entropy method when estimating the technical coefficients of input–output tables. Spat Econ Anal 15:62–91. https://doi.org/10.1080/17421772.2019.1615634

    Article  Google Scholar 

  30. Lampiris G, Karelakis C, Loizou E (2020) Comparison of non-survey techniques for constructing regional input–output tables. Ann Oper Res 294:225–266. https://doi.org/10.1007/s10479-019-03337-5

    Article  Google Scholar 

  31. Lemelin A (2009) A GRAS variant solving for minimum information loss. Econ Syst Res 21:399–408. https://doi.org/10.1080/09535311003589310

    Article  Google Scholar 

  32. Léony Y, Peeters L, Quinqu M, Surry Y (1999) The use of maximum entropy to estimate input–output coefficients from regional farm accounting data. J Agric Econ 50:425–439. https://doi.org/10.1111/j.1477-9552.1999.tb00891.x

    Article  Google Scholar 

  33. McCann P, Dewhurst JHL (1998) Regional size, industrial location and input–output expenditure coefficients. Reg Stud 32:435–444. https://doi.org/10.1080/00343409850116835

    Article  Google Scholar 

  34. Morrissey K (2016) A location quotient approach to producing regional production multipliers for the Irish economy. Pap Reg Sci 95:491–506. https://doi.org/10.1111/pirs.1214.3

    Article  Google Scholar 

  35. Okamoto N (2014) Does regional size matter in regionalization of national input–output table by the FLQ formula? A case study of China, Discussion Paper 222, Institute of Economic Research, Chuo University

  36. Pavia JM, Cabrer B, Sala R (2009) Updating input–output matrices: assessing alternatives through simulation. J Stat Comput Simul 79:1467–1482. https://doi.org/10.1080/00949650802415154

    Article  Google Scholar 

  37. Pereira-López X, Carrascal-Incera A, Fernández-Fernández M (2020) A bidimensional reformulation of location quotients for generating input–output tables. Spat Econ Anal 15:476–493. https://doi.org/10.1080/17421772.2020.1729996

    Article  Google Scholar 

  38. Pereira-López X, Sanchez-Choez NG, Fernández-Fernández M (2021) Performance of bidimensional location quotients for constructing input–output tables. J Econ Struc 10:7. https://doi.org/10.1186/s40008-021-00237-5

    Article  Google Scholar 

  39. Robinson S, Cattaneo A, El-Said M (2001) Updating and estimating a social accounting matrix using cross entropy methods. Econ Syst Res 13:47–64. https://doi.org/10.1080/09535310120026247

    Article  Google Scholar 

  40. Round JI (1978) An inter-regional input–output approach to the evaluation of nonsurvey methods. J Reg Sci 18:179–194. https://doi.org/10.1111/j.1467-9787.1978.tb00540.x

    Article  Google Scholar 

  41. Singh I, Singh L (2011) Regional input output table for the state of Punjab. https://mpra.ub.uni-muenchen.de/32344/. Accessed 17 May 2021

  42. Stevens BH, Treyz GI, Ehrlich DJ, Bower JR (1983) A new technique for the construction of non-survey regional input–output models. Int Reg Sci Rev 8:271–286

    Article  Google Scholar 

  43. Stevens BH, Treyz GI, Lahr ML (1989) On the comparative accuracy of RPC estimating techniques. In: Miller RE, Polenske KR, Rose AZ (eds) Frontiers of input–output analysis. Oxford University Press, Oxford, pp 245–257

    Google Scholar 

  44. Stone R (1961) Input output and national accounts. OEEC, Paris

    Google Scholar 

  45. Szabó N (2015) Methods for regionalizing input–output tables. Reg Stat 5:44–65. https://doi.org/10.15196/RS05103

    Article  Google Scholar 

  46. Temurshoev U, Miller RE, Bouwmeester MC (2013) A note on the GRAS method. Econ Syst Res 25:361–367. https://doi.org/10.1080/09535314.2012.746645

    Article  Google Scholar 

  47. Többen J, Kronenberg T (2015) Construction of multi-regional input–output tables using the CHARM method. Econ Syst Res 27:487–507. https://doi.org/10.1080/09535314.2015.1091765

    Article  Google Scholar 

  48. Treyz GI, Stevens BH (1985) The TFS regional modelling methodology. Reg Stud 19:547–562. https://doi.org/10.1080/09595238500185531

    Article  Google Scholar 

  49. Vazquez EF, Hewings GJD, Carvajal CR (2015) Adjustment of input–output tables from two initial matrices. Econ Syst Res 27:345–361. https://doi.org/10.1080/09535314.2015.1007839

    Article  Google Scholar 

  50. Wiebe KS, Lenzen M (2016) To RAS or not to RAS? What is the difference in outcomes in multi-regional input–output models? Econ Syst Res 28:383–402. https://doi.org/10.1080/09535314.2016.1192528

    Article  Google Scholar 

  51. Zhao X, Choi S-G (2015) On the regionalization of input–output tables with an industry-specific location quotient. Ann Reg Sci 54:901–926. https://doi.org/10.1007/s00168-015-0693-x

    Article  Google Scholar 

Download references

Acknowledgements

We thank the Editor and the anonymous referees for their scrutiny of our paper. Chris Webber read several earlier versions of the paper and made numerous helpful and insightful suggestions for refining it.

Funding

None to report.

Author information

Affiliations

Authors

Contributions

This paper reports the results of a collaborative study by the authors, all of whom read and approved the final manuscript.

Corresponding author

Correspondence to Anthony T. Flegg.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

This appendix shows how the proposed hybrid approach can be adapted to encompass the AFLQ. All that is required is to modify Eqs. (48) and (49) as follows. First define αij = CILQij if i ≠ j and αij = SLQi if i = j, so that Eq. (48) changes to

$$ \hat{a}_{ij}^{r} = a_{ij}^{n} w_{ij} \beta^{\delta } , $$
(57)

where \(w_{ij}\) reads as

$$ w_{ij} = \left\{ {\begin{array}{*{20}ll} {\alpha_{ij} \left[ {\log_{2} \left( {1 + SLQ_{j} } \right)} \right]} & {{\text{for}}\;{\text{SLQ}}_{j} > 1} \\ {\alpha_{ij} } & {{\text{for}}\;{\text{SLQ}}_{j} \le 1} \\ \end{array} } \right.. $$

Thus, for any pair \((\hat{a}_{ij}^{r} , a_{ij}^{n} )\) that satisfies the sign-preserving assumption, we set:

$$ \log \left( {\frac{{\hat{a}_{ij}^{r} }}{{w_{ij} a_{ij}^{n} }}} \right) = \delta \log \left( \beta \right) + \varepsilon ij, $$
(58)

from which a value of \(\delta\) can be derived as done for Eq. (49). However, it should be noted that the optimal δ for the AFLQ can normally be expected to exceed the corresponding value for the FLQ (e.g. Bonfiglio and Chelli 2008, table 1).

Table 12 reveals that the FLQ and AFLQ give almost identical values of the MAD. Therefore, Ockham’s principle suggests that the FLQ should be preferred on the basis of its greater simplicity. However, such a conclusion would not be supported by the outcomes in terms of type I output multipliers presented in Tables 10 and 11, which indicate that the AFLQ can produce superior results to the FLQ. It should be noted, finally, that the values of δ shown in Table 12 for the AFLQ are almost always higher than those for the FLQ but that expected outcome is irrelevant when assessing their relative performance.

Table 12 Comparison of results for the FLQ and AFLQ in terms of input coefficients.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Flegg, A.T., Lamonica, G.R., Chelli, F.M. et al. A new approach to modelling the input–output structure of regional economies using non-survey methods. Economic Structures 10, 12 (2021). https://doi.org/10.1186/s40008-021-00242-8

Download citation

Keywords

  • FLQ
  • FLQ+
  • Cross-entropy
  • GRAS
  • Regional tables