A new approach to modelling the input–output structure of regional economies using non-survey methods

This paper proposes a new approach to the regionalization of national input–output tables where suitable regional data are scarce and analysts are considering using location quotients (LQs). We focus on the FLQ formula, which frequently yields the best results of the pure LQ-based methods, and develop an enhanced way of implementing this approach. We use a modified cross-entropy (MCE) method, along with a regression model, to estimate values of the unknown parameter δ in the FLQ formula, specific to both region and country. An analysis of survey-based data for 16 South Korean regions reveals that the proposed FLQ+ approach yields more accurate estimates of both input coefficients and sectoral output multipliers than those from simpler LQ-based methods or the MCE approach alone. Sectoral outputs (or employment) are the only regional data required. The MCE method also clearly outperforms GRAS.

popular of such techniques are RAS (Bacharach 1970;Stone 1961) and the crossentropy (CE) method (Golan et al. 1994), both of which are known to work well (Davis et al. 1977;Golan and Vogel 2000;Hosoe 2014; Kapur and Kesavan 1992;Lamonica et al. 2020;Léony et al. 1999;Robinson et al. 2001;Vazquez et al. 2015). The constrained matrix-balancing procedures include methods based on minimizing squared or absolute differences (Pavia et al. 2009). However, such techniques are more timeconsuming to implement than the pure LQ-based methods and normally require the solution of a constrained nonlinear optimization problem, whereas the LQ-based methods are very quick and simple to apply.
The present study focuses on the FLQ (Flegg's LQ), which is often found to be one of the most accurate pure LQ-based methods (Bonfiglio and Chelli 2008;Flegg and Tohmo 2016). Applications of the FLQ include Dávila-Flores (2015), Hermannsson (2016), Jahn (2017), Kronenberg and Fuchs (2021), Morrissey (2016), and Singh and Singh (2011). To apply the FLQ, a value for a crucial unknown parameter, δ, must be chosen. Although several empirical studies have sought to find appropriate values of δ, these studies have not been conclusive. Examples include analyses of data for Scotland (Flegg and Webber 2000), Germany (Kowalewksi 2015), Argentina  and South Korea (Flegg and Tohmo 2019). Given these diverse outcomes, Jahn et al. (2020) recommend that analysts should consider regional characteristics when selecting a value of δ in the interval 0.3 ± 0.1. This range, which is suggested by earlier empirical work, can then be refined in the light of the results. Jahn et al. also formulate an econometric model that should assist in this process.
Nevertheless, it is evident that the choice of a suitable value of δ is still an open question, which has limited the practical use of the FLQ. Our aim is, therefore, to propose a strategy to estimate the value of δ in such a way as to optimize the regionalization of the NIOT. This strategy combines constrained matrix-balancing procedures with the FLQ. The expectation is that this hybrid approach should yield better results than would be attainable by applying each approach alone.
Of the possible matrix-balancing procedures, the CE method was chosen rather than RAS for several reasons. In particular, the standard RAS method can only handle non-negative matrices, which would limit the application proposed in this paper. Although its generalization GRAS (Junius and Oosterhaven 2003) enables matrices with negative elements to be updated, its objective function has been questioned by Lemelin (2009), Huang et al. (2008) and Temurshoev et al. (2013). Moreover, the results obtained by Golan et al. (1994) and Robinson et al. (2001) appear to favour the CE method, while Lamonica et al. (2020) demonstrate that this method performs well when it is applied to real data, especially for small economies. Indeed, we find that the modified CE (MCE) method, as implemented in this study, outperforms GRAS.
The proposed procedure involves three steps. The first applies the MCE method to regionalize the NIOT. This is designed to account for negative or zero input coefficients. The second step uses the derived regional matrix, along with the national table, to estimate the optimal δ for each region via a simple regression model. In the third step, this estimated δ is used to apply the FLQ formula, thereby computing the final estimates of the regional input coefficients. It is worth noting that this hybrid approach can easily be adapted to enhance the performance of other pure LQ-based methods that depend on one or more unknown parameters.
Our approach is tested on the South Korean interregional input-output table (KIRIOT) for the year 2005. It was built by the Bank of Korea for all 16 South Korean regions, with a classification of 78 economic sectors. A smaller version with 28 economic sectors is also available, but we opted to use the larger version, so as to minimize aggregation bias. The KRIOT is one of the very few survey-based full interregional I-O tables. It has data on the volume of all intersectoral transactions, both within and across regions, so it is ideal for our purposes. Indeed, Zhao and Choi (2015, p. 909) remark: [In 2009, the Bank of Korea] divided the entire country into 16 regions and issued regional Leontief inverse tables and type I output multipliers for each of these regions. These data can be considered as benchmark tables since they are entirely based on surveys.
Since the KIRIOT contains negative and null flows, the MCE method is employed. The empirical analysis demonstrates that the enhanced version of the FLQ approach developed here outperforms the MCE method, when it is applied separately, and has a similar performance to the 'optimal' FLQ approach, where δ is selected by using the observed regional coefficient matrix. Of course, in reality, such a matrix would seldom be available to regional analysts; its use here is merely to generate optimal values of δ that can be used as a benchmark in our analysis. Our aim is to offer a new way of regionalizing a NIOT in situations where an analyst has access only to the total output of each regional sector. Where such data are unavailable, sectoral employment could be used as a proxy. This proposed hybrid method, hereafter referred to as the FLQ+ method, is an improvement on the present state of the art, in which analysts need to select values of δ on the basis of a priori considerations, e.g. values found in earlier studies, or by taking regional characteristics into account, as is suggested by Jahn et al. (2020). In order to demonstrate the practical advantages of using the FLQ+ method, we compare its performance with the results from the MCE method, GRAS and the FLQ with a single assumed value of δ for every region.

Review of pure LQ-based methods
Here, we review the pure LQ methods most often employed to construct regional inputoutput tables (RIOTs). Some alternative approaches are examined thereafter.
Consider a national economic system consisting of k sectors. Let X n = x n ij and X r = x r ij be matrices whose elements are the flows for intermediate use from sector i to sector j at the national and regional levels, respectively, while x n and x r are vectors of national and regional total sectoral output. Also, let A n = a n ij = x n ij x n j and A r = a r ij = x r ij x r j be the matrices whose elements are the respective national and regional input coefficients. Now suppose that only A n and the vector of regional total sectoral output, x r , are known. The pure LQ methods are used to estimate the matrix of regional input coefficients, A r , by adjusting the national input coefficient as follows: where q ij represents a scalar applied to the national coefficient.
With the simple LQ, a r ij is estimated via the following formula: where Here, x r i and x n i are the total output (production) of the ith regional and national sector, respectively, while x r and x n are the corresponding regional and national aggregates. SLQ i measures the degree of specialization of region r in sector i relative to the nation. The regional input coefficients are derived according the following rule: However, it has long been known that the SLQ tends to underestimate a region's imports from other regions; this understatement occurs as the SLQ rules out any 'crosshauling' ). Cross-hauling takes place when a region simultaneously imports and exports a given commodity.
The cross-industry LQ was one of the first refinements of the SLQ, as it considers the relative size of both supplying sector i and purchasing sector j. The formula is as follows: where the constraints are applied as in (4). Unlike the SLQ, however, the CILQ applies a cell-by-cell adjustment. This means that it does, in principle at least, deal with the problem of cross-hauling. 1 What it does not do is to consider the relative size of a region, x r /x n , which cancels out in formula (5). By contrast, this ratio remains a component of the SLQ formula (3). Round (1978) argues that any adjustment formula should incorporate three elements: (i) the relative size of the supplying sector i; (ii) the relative size of the purchasing sector j and (iii) the relative size of the region. The CILQ satisfies (i) and (ii) but not (iii), whereas the SLQ satisfies (i) and (iii) but not (ii). Round therefore suggests the following formula, which simultaneously satisfies all three requirements: (1) a r ij = a n ij q ij , (2) a r ij = SLQ i a n ij , 1 Consider a region where SLQ 1 = 0.8, SLQ 2 = 1.2, SLQ 3 = 0.6 and SLQ 4 = 1.5, so that CILQ 1,1 = 1, CILQ 1,2 = 0.6 , CILQ 1,3 = 1.3 , CILQ 1,4 = 0.53 , etc. For the SLQ to be valid, this region would need to be an importer but not an exporter of commodities 1 and 3, and vice versa for commodities 2 and 4. The CILQ would encompass a wider set of possibilities. For instance, industries 2 and 4 could import but not export commodity 1, yet this commodity could be exported but not imported by industry 3; consequently, cross-hauling of commodity 1 could occur. In contrast, only exporting of commodity 4 would be possible because CILQ 4j ≥ 1 for all j.
Nonetheless, Flegg et al. (1995) criticize the SLQ and RLQ on the grounds that they would both tend to underestimate the imports of relatively small regions owing to the way in which the ratio x r /x n is implicitly incorporated in each formula. To overcome this drawback, the FLQ was introduced.
The crucial hypothesis underpinning the FLQ is that a region's propensity to import from other domestic regions is inversely and nonlinearly related to its relative size. By incorporating explicit adjustments for regional size, the FLQ should yield more precise estimates of regional input coefficients and hence multipliers. 2 Along with other nonsurvey methods, the FLQ aims to offer regional analysts a means by which they can build regional tables that reflect, as closely as possible, each region's economic structure. It is defined as follows (cf. Flegg and Webber 1997): where λ captures a region's relative size. This scalar is defined as follows: Here 0 ≤ δ < 1 is a parameter that controls the degree of convexity in Eq. (8). The larger the value of δ, the lower the value of λ, and the greater the allowance for extra regional imports. The FLQ formula is implemented just like other LQ methods, so: Many case studies, including those mentioned earlier, have demonstrated that the FLQ can yield more accurate results than the SLQ and CILQ. This evidence is corroborated by the Monte Carlo study of Bonfiglio and Chelli (2008). Nonetheless, some conflicting evidence is presented by Lamonica and Chelli (2017), who find initially that the SLQ gives slightly better results than the FLQ.
Lamonica and Chelli's unusual study employs data from the World Input-Output Database, whereas other studies have analysed data for individual countries or used Monte Carlo methods (Bonfiglio and Chelli 2008). Lamonica and Chelli examined data for the period 1995-2011, classified into 35 economic sectors. Their sample comprised 27 European countries and 13 other major countries, with the rest of the world as a composite 'country' . However, when this sample was disaggregated by size of economy, an interesting divergence appeared. For the smaller economies, characterized by a high percentage of near-zero input coefficients, the FLQ (with δ = 0.2) was found to be the best method, whereas the SLQ performed the best in the larger economies. McCann and Dewhurst (1998) criticize the FLQ on the grounds that regional coefficients may surpass national coefficients where there is regional specialization, a possibility that is precluded by the FLQ formula. Flegg and Webber (2000) therefore proposed the following augmented FLQ: where log 2 1 + SLQ j captures the regional specialization of sector j. If SLQ j > 1 and FLQ ij ≥ 1, the national coefficients are scaled upwards. However, to avoid excessive upward adjustments, the restriction FLQ ij ≤ 1 is retained. Hence the regionalization proceeds as follows: While the AFLQ has the theoretical merit of incorporating a measure of regional specialization, it tends to produce similar outcomes to the FLQ (Bonfiglio and Chelli 2008;Flegg and Tohmo 2013;Flegg and Webber 2000;Kowalewksi 2015). Indeed, the analysis in Appendix demonstrates that the AFLQ does not yield more accurate estimates of input coefficients than the FLQ, although it does perform better in terms of multipliers. 3 Another variant of the FLQ is put forward by Kowalewksi (2015). Using output rather than employment to measure regional size, her industry-specific FLQ can be defined as: The novel feature of this formula is that δ is allowed to vary across industries. This greater realism is undoubtedly an attractive feature, yet it does introduce much greater complexity into the modelling process (Flegg and Tohmo 2019). For that reason, we do not consider the SFLQ further.
To complete this review of pure LQ-based methods, we should note that the FLQ's focus is on the output and employment generated within a given region. Consequently, as Flegg and Tohmo (2019) emphasize, it should only be applied to NIOTs where imports are excluded from the inter-industry transactions (type B tables). By contrast, where the focus is on the total supply of commodities, Kronenberg's Cross-Hauling Adjusted Regionalization Method (CHARM) is an appropriate technique (Többen and Kronenberg 2015;Flegg and Tohmo 2018). Unlike the FLQ, however, CHARM requires type A tables, where the national transactions include imports. 4 (10)

Review of some alternative approaches
Here, we review three recent studies that offer alternatives to the pure LQ-based methods discussed previously. 5 The first is an innovative study by Fujimoto (2019), who examines official survey-based data for nine Japanese regions in 2005, the most recent year available in a series of official tables published quinquennially since 1960. This study's focus is on cross-hauling and its primary aim is to determine which of four alternative assumptions is most appropriate. Each assumption is associated with a particular modelling approach, as follows: 1. There is no cross-hauling (LQ approach); 2. Cross-hauling depends on regional size (FLQ approach); 3. Cross-hauling is proportional to its potential, as measured by output or demand (RCHARM); 4. Cross-hauling is proportional to its potential, as measured by the volume of trade (MCHARM).
To represent the 'LQ approach' , Fujimoto rejects the SLQ in favour of the scaling formula where X denotes output and D denotes demand. The rationale for using this alternative formula is to overcome aggregation bias (Fujimoto 2019, p. 113).
For the second approach, Fujimoto employs the formula: where v r /v n is the ratio of total regional to total national value-added payments. However, the author does not explain why DSLQ r i is used instead of CILQ r ij nor why value added is used as a proxy for regional size rather than superior measures such as output or employment. It is, therefore, misleading to refer to this approach as the 'FLQ approach' . The third approach is the refined version of CHARM developed by Többen and Kronenberg (2015), while the fourth is the modified version of CHARM proposed by Fujimoto. Fujimoto (2019, p. 115) remarks that '[t]he FLQ approach has a problem in addition to the difficulty of [specifying] a value for δ: the cross-hauling caused in interregional trade depends not only on regional size. ' To demonstrate this, he derives the following equation for DSLQ r i < 1: Flegg et al. Economic Structures (2021) 10:12 where E and M denote exports and imports, respectively, m n i is the national propensity to import from abroad and D is demand. Fujimoto adds that there is 'no reason why cross-hauling [should] depend on DSLQ r i and m n i [and] this dependence causes serious bias …' (p. 116).
However, a straightforward interpretation can be given to the role of both DSLQ r i and m n i in Eq. (15). Since it is assumed that m r i = m n i , the term (1 − m n i ) captures the proportion of regional demand D r i that is met by domestic suppliers, some of which are located in region r and the rest in other regions. As expected, there is a negative relationship between m n i and Δ M r i , ceteris paribus. To explain the positive relationship between DSLQ r i and Δ M r i , we note that Δ M r i represents the difference between the extra imports generated by using FLQ r i and those generated by applying DSLQ r i . Although this difference is invariably positive, its magnitude increases as DSLQ r i rises. This is due to the inclusion of λ in FLQ r i . The upshot is that there is a positive relationship between DSLQ r i and Δ M r i . 6 The above discussion suggests that Fujimoto has failed to identify a genuine problem with the FLQ approach. Nonetheless, after performing various statistical tests, using data for 106 sectors and nine regions, he rejects this approach on the grounds (i) that it yields biased estimates of regional imports and (ii) that the appropriate value of δ is unknown and depends on each case.
However, since Fujimoto does not use the correct FLQ formula, his findings do not constitute a valid test of the FLQ approach. It is also likely that superior estimates could have been obtained by using a different δ for each Japanese region. In particular, the islands of Hokkaido and Okinawa may well require different values of δ from the mainland regions. The approach discussed later in this paper affords a way of generating such region-specific values.
As regards the other approaches, the scatter diagrams of estimated and survey-based import propensities (Fujimoto 2019, figure 2) reveal an almost identical pattern for the DSLQ and RCHARM methods, with evidence of substantial and widespread underestimation. By contrast, the diagram for MCHARM suggests a more random distribution, albeit with a greater variance and some heteroscedasticity. Even so, using the mean absolute error as the criterion, RCHARM invariably outperforms MCHARM (Fujimoto 2019, table 2). The DSLQ is clearly in third place. 7 In another recent study, Pereira-López et al. (2020) focus on the AFLQ rather than the FLQ. They argue persuasively that regional specialization can have different effects on the columns (cost structure) and rows (selling structure) of a regional coefficient matrix. For instance, a region that is specialized in the extraction of mining products may sell most of its output to processing sectors such as metal industries located in other regions. 7 This analysis could be enhanced by weighting the import propensities by regional size. In addition, the root mean squared error could be used instead of the mean absolute error; this would capture some very large errors evident in the scatter diagram for MCHARM, and allow the overall error to be decomposed into bias, variance and covariance components . 6 To illustrate, let λ = 0.8 and m n i = 0.2. Also, let DSLQ r i = 0.6 initially (case A) but rise to 0.8 (case B).
, so the extra regional imports due to using FLQ r i rather than DSLQ r i are 9.6% of regional demand.
i , so the extra regional imports are now 12.8% of regional demand.
The AFLQ has the limitation that it presumes that specialization only affects purchasing sectors (columns). The authors thus propose a bidimensional procedure, the 2DLQ, whereby a parameter α is used to adjust the rows, while another parameter β is applied to the columns. The regionalization employs the following relationships: where (x r j /x n j ) measures the relative size of purchasing sector j. The hyperbolic tangent function (tanh) allows the estimated regional coefficients to be 'slightly higher' than the corresponding national coefficients if SLQ i > 1 (Pereira-López et al. 2020, p. 480). They make the interesting observation that the CILQ, FLQ and AFLQ are all nested within the 2DLQ: Equation (16) has the same data requirements as the AFLQ but the authors claim that it makes more efficient use of such data and consequently yields more accurate results. To test this claim, they use the Eurostat IO database for 2005 to extract symmetric 59 × 59 domestic coefficient matrices for: 1. Austria, Belgium, France, Germany, Italy and Spain (the 'observed' matrices); 2. the European Area 17 (EA17) (the parent table).
Two estimated coefficient matrices are derived for each country by applying the 2DLQ and AFLQ formulae to the parent table. These estimates are then compared by using the weighted absolute percentage error (WAPE) and two other statistics, U and U*, which consider the number of cells (n 2 ) and the number of non-empty cells (n 2 − z ), respectively. WAPE is defined as follows: 8 The results for coefficients show that using the 2DLQ method reduces the WAPE for all countries, most noticeably for Austria and Italy (Pereira-López et al. 2020, table 2). However, there is little change in the outcomes for Belgium, France and Germany. Spain is an intermediate case. On average, the WAPE is lowered by 4.5%. The U and U* measures yield similar results. The authors also assess the performance of the 2DLQ in terms of multipliers. Here the 2DLQ method gives better results than the AFLQ for all countries apart from Belgium. The average improvement is 3.55%. 9 Of the two studies reviewed thus far, the 2DLQ method seems the most promising, when evaluated in terms of its theoretical foundations, empirical performance and ease of application. Even so, some caveats should be borne in mind. The first concerns the n j=1 a r ij . Flegg et al. Economic Structures (2021) 10:12 need to determine suitable values of α and β. Here the authors provide some reassurance that the range of suitable values of these parameters is relatively small and that analysts would not go far wrong by choosing an α of 0.1 or 0.15 and a β in the range 0.8 to 1.2 (Pereira-López et al. 2020, table 5).
The second caveat concerns the fact that countries rather than regions are used in the testing process, which poses some potential problems. While the authors are right to stress the quality and consistency of the Eurostat data they employ, it is not true to say that using countries instead of regions is the only possible way to proceed. For instance, suitable survey-based official regional and national data are available for Finland (Flegg and Tohmo 2013), South Korea (Jahn et al. 2020) and Japan (Fujimoto 2019). It would be instructive to re-examine the 2DLQ method by using one or more of such data sets to perform a sensitivity analysis.
The study by Lahr et al. (2020) represents a radical departure from those discussed hitherto. The authors examine data for 2014 from the World Input-Output Database, pertaining to 23 manufacturing sectors in 28 EU countries. A novel feature of this study is its use of quasi-binomial econometric models, in which the regional purchase coefficient (RPC) is regressed on various variables. 10 Each country is treated as a 'region' and the aggregate of all 28 countries as the 'nation' . Accordingly, RPC c i is the proportion of national requirements of industry i supplied by firms located within country c.
In addition, six industry-specific binary variables were included. R 2 = 0.660. By contrast, a model with the SLQ alone gave R 2 = 0.142. This worse fit is unsurprising, since the SLQ rules out the key factor of cross-hauling and cannot allow for the peculiarities of specific industries.
Lahr et al. also carry out a test of what is described as the FLQ method. This is done by regressing RPC c i on SLQ c i and the 'employment share' . R 2 = 0.195. However, this test is inconsistent in several respects with the FLQ approach. Most importantly, the FLQ has a cross-industry foundation, which cannot be captured in a rows-only estimation. Consequently, no account is taken of the likelihood that purchasing industries would differ in their use of particular inputs. This aspect is captured in CILQ c ij and hence in FLQ c ij but not in SLQ c i . We should note too that the FLQ formula is multiplicative, whereas the regression model is additive. It is also unclear how the regressor 'employment share' was measured. 11 Finally, whereas output was used to calculate SLQ c i , employment was used Flegg et al. Economic Structures (2021) 10:12 to measure regional size. That would affect the results to the extent that productivity differed across EU countries.
The authors compare the performance of their econometric approach with that of the SLQ, SDR and CHARM. The procedures are judged in terms of their ability to estimate RPCs and to replicate each country's coefficient matrix, Leontief inverse and output multipliers. As expected, the SLQ and SDR yield similar results and, on average, both methods greatly overstate input coefficients (Lahr et al. 2020, table 5). By comparison, the regression-based approach necessarily yields a mean error of zero. The authors find that RCHARM performs 'somewhat better' than the SLQ and SDR, yet it still systematically overstates RPCs, with a mean error of 0.240 (Lahr et al. 2020(Lahr et al. , p. 1594. By contrast, MCHARM yields negative RPCs for 337 of 1568 national industries. Therefore, both variants of CHARM have serious demerits as a means of estimating RPCs.
A crucial consideration when selecting a non-survey method is its ability to yield unbiased estimates of input coefficients. A regression-based approach can be relied upon to perform very well in that respect but it is also possible to obtain unbiased estimates via the FLQ approach, so long as an appropriate value of the parameter δ is used.
A drawback of the RPC approach vis-à-vis the FLQ is its more demanding data requirements. 12 In the model discussed above, for instance, it would be challenging to find data for some of the regressors, whereas the FLQ only requires figures for output (or employment) in each regional and national industry.
On the other hand, the FLQ has often been criticized on the basis that the results obtained from one country or region are not necessarily transferable elsewhere, since the optimal δ would differ. This problem is addressed in the present paper via a procedure whereby country-specific and region-specific values of δ can be derived. Lahr et al. (2020Lahr et al. ( , p. 1591) note that econometric approaches face a similar challenge in terms of transferability of results.
While the model constructed by Lahr et al. sheds some helpful light on the determinants of RPCs in EU countries, it would be interesting to see how well it would perform when constructing a RIOT for, say, Catalonia from a Spanish NIOT.

The CE approach to regionalizing NIOTs
Here, we explain our MCE approach to regionalizing NIOTs. Various mathematical programming methods based on a constrained optimization framework exist. These typically minimize a penalty function, which measures the deviation of the balanced matrix from the initial matrix, subject to a set of balancing conditions. We limit our attention to the CE method, which is one of the pioneering methods. It is widely used and Lamonica et al. (2020) have shown that it performs very well when applied to countries with small economies. Moreover, in contrast to other constrained optimization methods, it is stable in the sense that the MAD index, i.e. the sum of the absolute differences between the observed and estimated input coefficients, does not change abruptly from one economy to the next. Furthermore, unlike the standard RAS method, which can only handle non-negative matrices, the CE method can easily be adapted to deal with matrices with negative entries. Although we could have used GRAS to deal with this issue, we wished to avoid the problem noted in the Introduction regarding this technique.
Referring to Lamonica et al. (2020) for details and treating the primary input v n and the final demand f n as 'additional' intermediate input and output, the method starts with the following augmented input coefficients matrix A n : It turns out that: and where u is a unitary vector and u ′ is its transpose.
The task at hand is to generate a new matrix A r from the existing A n , with the same dimension as before, but respecting new row and column totals x r . More specifically, we seek a matrix A r that satisfies the following consistency and additivity conditions: Formally, the problem involves minimizing the following function for a n ij > 0: subject to and The solution is obtained by solving the following Lagrangian function for problem (23): k+1 i=1 a r ij = 1.
The first-order optimal conditions are: Solving this system of equations yields: We should note that the estimates of a r ij depend on the values of the Lagrangian multipliers, which must be determined by solving the nonlinear system (31) and (32) for the unknowns λ 1 , λ 2 , …, λ k+1 . As no closed-form solution exists, this system is solved by using numerical algorithms.
However, any standard solver will fail to find a solution to Eqs. (31) and (32) when a n ij and x r i , i, j = 1, 2, …, k + 1, are values from real IOTs. In fact, the solution to these equations requires some attention for two main reasons: the evaluation of the exponential functions for large values of their arguments and the solution to a highdimension system of nonlinear equations. These difficulties are addressed by adopting the solution proposed by Lamonica et al. (2020). Since the KIRIOT shows null and negative entries, we adapt the CE method to account for negative entries in the NIOT.
To this end, we revise problem (23) by assuming that the sign of the national matrix is preserved in the regional matrix (i.e. a n ij < 0 implies a r ij < 0, and a n ij > 0 implies a r ij > 0 ), while a null entry in the national matrix implies a null entry in the regional matrix (i.e. a n ij = 0 implies a r ij = 0 ). In the following, we refer to this assumption as the 'sign-preserving assumption' . We therefore formulate the following optimization problem: a r ij = 0, j = 1, 2, . . . , k + 1.
(42) a r ij = a n ij e −1+ i x r j +µ j , i, j ∈ Ŵ + ; Flegg et al. Economic Structures (2021) 10:12 We solve Eqs. (44) and (45) with the Matlab solver fsolve, by using the scaling procedure as in Lamonica et al. (2020) to overcome the numerical challenges of these equations.

Estimation of δ and the FLQ+ approach
We now explain the method employed to estimate δ. We consider the more detailed version of the South Korean interregional input-output table (KIRIOT) for the year 2005, which was constructed by the Bank of Korea for 16 regions and 78 sectors. Table 1 illustrates a simplified form of the KIRIOT, where: • Z r;r (r = 1, 2, …, 16) is a 78 × 78 matrix whose elements z r;r ij are the flows for intermediate use from sector i to sector j of region r; • Z r;k (r, k = 1, 2, …, 16 and r ≠ k) is a 78 × 78 matrix whose elements z r;k ij are the exports for intermediate use from sector i of region r to sector j of region k; • C r;r (r = 1, 2, …, 16) is a 78 × 5 matrix of the domestic final demand in region r; • E r;k (r, k = 1, 2, …, 16 and r ≠ k) is a 78 × 5 matrix of the exports of region r for final demand in region k; • x r (r = 1, 2, …, 16) is a 78 × 1 vector whose elements are the sectoral output of region r; • (v r )′ (r = 1, 2, …, 16) is a 1 × 78 vector whose elements are the sectoral value added plus the primary sectoral input of region r, inclusive of imports from abroad.
Using the South Korean NIOT, the national augmented matrix of input coefficients, A n = [a n ij ] , was determined as follows. First, let: (43) a r ij = a n ij e Next, consider the following national table, IOT n , in block-matrix notation: Hence, we can determine the following national augmented matrix of input coefficients: where D(x n ) is the diagonal matrix whose elements are d ii (x n ) = x i , i = 1, 2, …, k + 1.
We are now able to define the three steps of our hybrid method: Step 1: From A n and the regional sectoral total output x r , estimate the regional input coefficient matrices Â r (r = 1, 2, …, 16), using the MCE method described in the previous section.
Step 2: Using Â r , estimate the parameter δ in the FLQ formula as follows: Let α ij = CILQ ij if i ≠ j and α ij = SLQ i if i = j. Also, define β ≡ log 2 1 + x r x n . Hence, using Eqs. (7) and (8), we get: Thus, for any pair (â r ij , a n ij ) that satisfies the sign-preserving assumption, we set: where ε ij is a random number with zero mean. Taking expectations, we can obtain an estimate of the optimal δ for each region simply by dividing the mean of the regressand by log (β) , which is a given value for each region. See Table 2.
Step 3. Using A n , estimate the entries of the regional matrix A r , the a r ij , using the FLQ method with the estimated optimal parameter δ . We refer to this hybrid approach as the FLQ+ method. As before, the estimated input coefficients are denoted by â r ij .

Analysis of input coefficients
To validate the proposed FLQ+ method, the estimated matrices of regional input coefficients are compared with the true ones, A r = a r ij = z r;r ij x j . Since the RIOT being considered contains negative and null flows, we assess its performance by using the mean (46) a r ij = a n ij α ij β δ .
(49) log â r ij a n ij α ij = δ(β) + εij, Flegg et al. Economic Structures (2021) 10:12 absolute difference index (MAD), as suggested by Wiebe and Lenzen (2016). It is defined as: The FLQ, as implemented here, is the optimal FLQ in the sense that δ is chosen on the basis of minimizing the MAD. Specifically, for each of the sixteen regions, we compute the MAD of the FLQ for 99 different values of δ in the interval [0, 1], in increments of 0.01. The optimal δ is the value yielding minimum MAD. Determined in this way, the regional δ values vary from 0.22 to 0.48, with a mean of 0.35 (see Table 3). We refer to these values as the true or optimal values of δ. The characteristics of these regions are illustrated in Table 3 and their locations are shown in Fig. 1. For more details, see Flegg and Tohmo (2019) and Jahn et al. (2020).
Before examining the results, it may be helpful to explore why the optimal values of δ in Table 3 vary noticeably across regions. The role of δ is to adjust for any differences in regional propensities to import from other regions or from abroad that cannot be explained solely by differences in regional size. Now consider the following regression model fitted using data from Table 3: where R is the share of gross output imported from other regions; F is the share imported from abroad; V is the share of value added in gross output; e is a residual. All regressors are statistically significant at the 1% level (one-tailed tests) and the model easily passes all χ 2 diagnostic tests. R 2 = 0.677. (51) ln δ = −3.154 + 0.582 ln R + 1.040 ln F + 2.515 ln V + e, Equation (51) shows that regions with an above-average share of imports from other regions or from abroad require a bigger δ to compensate; and likewise if the share of value added in gross output is above average. Just over two thirds of the interregional variation in ln δ can be explained by these three factors, while almost one third must be ascribed to region-specific factors, differences in regional industrial structure, measurement errors, etc. Table 4 displays the values of MAD (100×) for the MCE, FLQ with optimal δ, and FLQ+ methods. A key finding is that the MADs for the FLQ (both versions) are typically less than half of those for the MCE method. This result demonstrates the potential gains from employing the FLQ as a regionalization technique rather than applying the MCE method alone.
Another key finding is that the estimates of δ from the proposed FLQ+ method are mostly fairly close to the optimal values; the differences are invariably only in the second decimal place and range from − 0.06 to 0.09. It is noticeable that the FLQ+ overestimates δ in the two biggest regions, Gyeonggi and Seoul. In the majority of regions, however, the estimates from this method are a little lower than the optimal values and, on Shares are expressed as a proportion of gross output. Seoul is classified as a 'special city'; Busan, Daegu, Daejeon, Gwangju, Incheon and Ulsan as 'metropolitan cities'; Jeju as a 'special self-governing province'; and the rest as 'provinces' . The last column displays the optimal values of δ from average, δ is understated by 0.02. However, when the results are weighted by the regional share of national output, this outcome changes to a small overstatement on average. The biggest errors occur in the two smallest regions, Daejeon and Jeju. Table 3 reveals that the metropolitan city of Daejeon has an unusually low intraregional share of inputs, which could explain why the estimated δ is noticeably below the optimal value, with a discrepancy of 0.09. As for Jeju, the fact that it is a remote island can probably explain much of the understatement of δ by the FLQ+ method. Table 5 provides some descriptive statistics of regional input coefficients estimated by the optimal FLQ and FLQ+ methods. Clearly, the two distributions are very close to each other in terms of central tendency and dispersion. Furthermore, they have an identical shape, as measured by the coefficients of kurtosis and skewness (not shown). The uniform behaviour of the two methods lends credence to the FLQ+ method proposed in this work, which is applicable when regional sectoral outputs are the only regional data available. By contrast, the optimal FLQ method requires knowledge of the entire regional matrix, so that an optimal value of δ can be computed for each region.  Table 6 displays descriptive statistics for the absolute differences between the observed input coefficients and those estimated via the FLQ+ and GRAS+ methods. FLQ+ combines the MCE method, our preferred matrix-balancing approach, with the FLQ, whereas GRAS+ combines GRAS with the FLQ. The two methods produce similar results on the whole, although it is interesting that GRAS+ gives larger mean absolute differences than FLQ+ for the biggest eight regions, yet almost identical values for five of the six smallest regions. A possible explanation of this outcome is that the FLQ+ is apt to perform better when the percentage of zero flows is relatively low, as is the case for the larger regions. The two methods concur that Busan and Gangwon have, respectively, the smallest and biggest maximum absolute differences. 13 Taking a holistic view, the MAD is increased by 7.3% on average by using GRAS+ rather than FLQ+ and by 12.3% when the results are weighted by regional size. This finding, along with the theoretical criticisms noted earlier regarding the objective function employed in GRAS, lends support to our use of a modified CE method as the foundation of our proposed FLQ+ approach.

Analysis of the important input coefficients
Here we analyse the ability of the FLQ+ and GRAS+ methods to reproduce what Hewings and Romanos (1981) call 'inverse important coefficients' , whereby an input coefficient a ij is said to be inverse important if an error of α% in that coefficient produces a corresponding error of β% in one or more entries of the Leontief inverse. In this analysis, α and β were set equal to 30% and 20%, respectively, as suggested by Hewings and Romanos. This means that an input coefficient is held to be inverse important if a perturbation of 30% generates a change of at least 20% in one or more entries in the Leontief inverse. More formally, we assume that only one coefficient, a ij , of the matrix A is perturbed. Let a p ij denote this perturbed value of a ij, i.e. a p ij = a ij (1 + α/100). This shock generates a perturbed matrix L p = [L p ks ] of the Leontief matrix L = (I − A) −1 = [L ks ], k, s = 1, 2, …, 78, whose elements are determined as follows: where the indices i and j relate to the perturbed coefficient in A, while k and s pertain to the Leontief inverse L and the corresponding perturbed matrix L p .
The relative error of each coefficient of the Leontief matrix reads as We then say that the coefficient a ij is inverse important if there exists at least one pair (k, s) such that the following inequality holds: Table 7 reveals a sharper distinction between the two methods, with the superiority of FLQ+ over GRAS+ now much more apparent, especially when the results are weighted by regional size. The MAD is now raised by 11.5% on average by using GRAS+ rather than FLQ+ and by 18.2% when regional weights are applied. It is clear that GRAS+ yields relatively poor estimates of the most important input coefficients. It is also apparent that the absolute differences increase when the focus is placed on the most important coefficients. This outcome is to be expected, given the increased size of the coefficients being estimated. Of the 78 2 = 6084 coefficients, those deemed to be important ranged from 1651 (27.1%) for Jeju to 2147 (35.3%) for Daejeon, with a median of 1869 (30.7%).

Comparison of alternative methods
The analysis thus far has demonstrated the superiority of the proposed FLQ+ approach over the MCE and GRAS+ methods. However, what has not yet been explored is whether the FLQ can be expected to yield more accurate estimates of input coefficients than more straightforward methods such as the SLQ. For the purposes of this comparison, it is assumed that the analyst does not wish to employ matrix-balancing methods. It is further assumed that the choice is between the SLQ and the FLQ with a fixed value of δ = 0.3 for all regions under consideration. (54) L ki L sj a ij (/100) L ks [1 − L ji a ij (/100)] ≥ β/100.
As noted in Sect. 2, the SLQ has the theoretical drawback that it precludes cross-hauling and hence tends to understate a region's imports from other regions. It is unsurprising, therefore, to observe in Table 8 that the SLQ generates less accurate results than the FLQ for all regions apart from Seoul and (marginally) South Jeolla. On average, the MAD is raised by 25.8% by using the SLQ rather than the FLQ and by 20.7% if regional weights are applied. In addition, the MAD for the SLQ exhibits greater interregional variation. The superior performance of the SLQ in Seoul is probably due to the relatively high variation in the size of the input coefficients in this region. In such situations, the SLQ is apt to produce more accurate estimates (Lamonica and Chelli 2017).
Given the uncertainty regarding the appropriate value of δ, it is reassuring that Table 9 shows that the outcomes from the FLQ are little affected by variation in the value of this parameter in the range 0.2 to 0.4. The SLQ still outperforms the FLQ in Seoul and South Jeolla but yields inferior results in the other fourteen regions.

Analysis of multipliers
One of the most useful features of RIOTs is the fact that they yield estimates of sectoral output multipliers, so Tables 10 and 11 examine the relative performance in this regard of eight alternative procedures. The following formulae are used in this evaluation:   where L j and L j denote the estimated and observed column totals of the Leontief inverse matrices. Table 10 displays the outcomes in terms of MAPEs when estimating type I output multipliers for the sixteen regions. The first two columns illustrate the benefits of pursuing the FLQ+ approach rather than the MCE method alone. In particular, there is a sharp fall in both unweighted and weighted means, along with a large decrease in dispersion. It is evident that the MCE method by itself does not yield satisfactory estimates of multipliers.
Although the FLQ+ and FLQ with δ = 0.3 have similar means, there are marked differences in the MAPEs for several regions. Busan and North Chungcheong are exceptions where the two methods yield identical results; this is because δ = 0.3 happens to be the estimated value from the FLQ+.
By itself, GRAS generates highly inaccurate results, yet combining it with the FLQ in the form of GRAS+ is clearly helpful in producing a more acceptable set of results. This better outcome is due to the fact that, when estimating δ in the GRAS+ procedure, we consider only those pairs (â r ij , a n ij ) satisfying the 'sign-preserving assumption' , whereas GRAS accounts for all possible pairs. Nonetheless, GRAS+ still yields substantially worse results than FLQ+.
It is interesting that the AFLQ has the lowest weighted mean of all eight approaches, whereas FLQ+ is the best in terms of the unweighted mean. This outcome is attributable to the superior performance of the AFLQ in several larger regions, most noticeably in Seoul. Nonetheless, it is odd that AFLQ+ often generates worse results than the AFLQ, especially in South Jeolla and Jeju. As regards the remaining approach, the SLQ, Table 10 reveals that its performance is clearly inferior to that of the FLQ and AFLQ. While the MAPE index is a good way of assessing the relative accuracy of alternative methods, it tells us nothing about the direction or extent of bias. For that reason, Table 11 displays outcomes for actual rather than absolute percentage deviations.
A key finding from Table 11 is that the FLQ+ and GRAS+ procedures invariably generate underestimates of multipliers. Taking regional size into account, the FLQ+ understates these multipliers by 2.6% on average, whereas GRAS+ does so by 7.2%. Apart from the sign, the results from the FLQ+ and GRAS+ methods are identical in the two tables. By contrast, the SLQ produces a positive bias of 6.8% on average. Seoul is the only region where this bias is negative.
Another important outcome from Table 11 is that the AFLQ shows minimal bias on average. However, the estimated multipliers from the AFLQ display greater dispersion than those from FLQ+, which would tend to boost the MAPEs in Table 10. Even so, the AFLQ still emerges with a weighted mean MAPE that is marginally lower than that for FLQ+. What is more, of the eight procedures examined in Table 10, the AFLQ produces the best results for the problematic Seoul region.

Conclusion
This paper has proposed a new approach to the regionalization of national input-output tables where very limited regional data exist and analysts are considering employing methods based on location quotients. We focus on the FLQ, which often yields the most accurate results of such methods. The FLQ formula involves an unknown parameter, δ, which plays a crucial role in the regionalization process. However, the difficulty of selecting an appropriate value of δ has been an obstacle to the successful application of the FLQ approach. Hitherto, analysts have had to choose a value of δ on the basis of a priori considerations and the results of previous case studies, which often conflict. Our aim has been to develop an enhanced and more objective way of implementing the FLQ approach.
Our proposed procedure, the FLQ+ method, is a hybrid approach that uses the results from a modified cross-entropy (MCE) method and a simple regression model to estimate δ. This value, which is specific to a given country and region, is used to estimate regional input coefficients and hence multipliers via the FLQ method. Our empirical analysis, based on a rare survey-based South Korean interregional input-output table for 2005 with 16 regions and 78 sectors, gave credence to the proposed method, in that the FLQ+ behaved much like the FLQ with an optimal δ.
The FLQ+ approach generated substantially more accurate estimates of both input coefficients and sectoral output multipliers than those from the MCE approach alone. Moreover, the MCE method clearly outperformed GRAS. In further testing of the FLQ+ approach, we considered the simple LQ (SLQ), the augmented FLQ (AFLQ) and the FLQ with an assumed value of δ. However, the SLQ gave inferior outcomes to the FLQ+, so we would not recommend using this method.
The AFLQ and FLQ+ gave almost identical results when estimating input coefficients. By contrast, in terms of multipliers, the AFLQ gave some good outcomes, characterized by minimal bias, whereas the FLQ+ tended to understate the multipliers. Indeed, the AFLQ generated the most accurate estimates of multipliers for the problematic Seoul region.
Using the FLQ with an assumed value of δ has the merit of ease of application, yet it runs the risk of choosing an inappropriate value. To explore this issue, we used values of δ in the interval 0.3 ± 0.1 to estimate input coefficients. This range is suggested by Jahn et al. (2020). We found that the mean absolute differences were not much affected by variation in δ. However, an analysis of multipliers revealed marked differences in the mean absolute percentage errors for several regions, despite the fact that the FLQ+ and FLQ with δ = 0.3 had similar means. Therefore, we would recommend using the FLQ+ approach, which takes region-specific characteristics into account, along with any country-specific differences.
In concluding, we should note some avenues for further research on this topic. The first would be to re-examine the effectiveness of the AFLQ, in the light of its good performance in estimating multipliers. It would also be interesting to see how well the 2DLQ refinement of the AFLQ proposed by Pereira-López et al. (2020) performs when tested using a national data set such as that for South Korea. Secondly, it would be useful to investigate why the FLQ+ tended to underestimate output multipliers. This outcome may well because the optimal values of δ were derived by minimizing the mean absolute (58) log â r ij w ij a n ij = δ log (β) + εij, from which a value of δ can be derived as done for Eq. (49). However, it should be noted that the optimal δ for the AFLQ can normally be expected to exceed the corresponding value for the FLQ (e.g. Bonfiglio and Chelli 2008, table 1). Table 12 reveals that the FLQ and AFLQ give almost identical values of the MAD. Therefore, Ockham's principle suggests that the FLQ should be preferred on the basis of its greater simplicity. However, such a conclusion would not be supported by the outcomes in terms of type I output multipliers presented in Tables 10 and 11, which indicate that the AFLQ can produce superior results to the FLQ. It should be noted, finally, that the values of δ shown in Table 12 for the AFLQ are almost always higher than those for the FLQ but that expected outcome is irrelevant when assessing their relative performance. Abbreviations a n ij : Observed national input coefficient; â r ij : Estimated regional input coefficient; NIOT: National input-output table; RIOT: Regional input-output table;