Description of the study area
The study was conducted in North Gondar Zone, Ethiopia. The zone is located in the northwestern part of the country between 11 and 13 north latitude and 35 and 35 east longitudes 738 km far from Addis Ababa. The zonal capital is Gondar city, and geographically, the city is located at 12° 35′ 60.00″ N latitude and 37° 28′ 0.01″ E longitudes with an average elevation of 2133 meters above sea level. The total area of the Administrative Zone is 50,970 square km. The low lands contain some of the largest tracts of semiarid natural forest remaining in northern Ethiopia. The zone also has a total population of 2,921,470 (84.12% rural and 15.88% urban) of which 51% are men (Abate et al. 2019; CSA 2007). According to zone agriculture department, farmers used irrigation mainly for the production of vegetables such as onion, tomato, cabbage, pepper, potato and very often cereals such as maize and others. They also used mixed farming systems (i.e., livestock rearing and crop productions). The study was conducted in two main red pepper producer potential districts, namely Takusa and Dembia (Fig. 1).
Data types, sources and methods of data collection
A combination of quantitative and qualitative data was collected from primary and secondary sources. The research adopted a cross-sectional survey rather than longitudinal survey to collect such data, since the latter requires taking a repeated measurement on continuous bases that have cost and time implications. However, cross-sectional survey requires one-time data collection and analysis which in turn is time-saving and cost-effective (Kothari 2004). Therefore, this study was designed to undertake a cross-sectional survey to collect the primary data. The primary data were collected using semi-structured questionnaire. Moreover, secondary data were sourced from different published and unpublished sources to enrich the investigation. Furthermore, the survey was supplemented by using focus group discussions and key informant interviews.
Sample size and sampling techniques
A multistage sampling technique was used to select sample producers. In the first stage, Takusa and Dembia districts were selected purposively due to high potentials of red pepper production. In the second stage, eight largest red pepper-producing kebeles/villages, namely Mekonta, Chemera, Banbaro, Deber-zuria, Guramba Michael, Arebia, Achera and Gebaba-salge, were purposively selected in consultation with district agriculture office experts due to the high potentials of production and best smallholder farming experience in red pepper production and marketing. In the third stage, 385 red pepper producers were selected using systematic random sampling technique. However, in calculating the sample size, the following assumption is used regarding the value of p. When calculating the sample size for proportion, there are two situations to consider. First, if some approximation of P is known (example, from a previous study), that value can be used in the formula. Second, if no approximation of P is known, one should use p = 0.5. This value will give a sample size sufficiently large to guarantee an accurate prediction (Ott and Longnecker 2010). Hence, the required sample size is determined by (Cochran 1977).
$$n = \frac{{Z^{2} pq}}{{e^{2} }}$$
(1)
where n = sample size; Z = confidence level (α = 0.05); p = 0.5; q = 1 − p; and e = 0.05 (allowable error). Z = 1.96; hence, \(n = \frac{{1.96^{2} (0.5*0.5)}}{{0.05^{2} }} = 385.\)
Moreover, the sample traders were selected randomly based on the number of wholesalers’ assemblers and retailers participating in red pepper marketing. However, to select sample traders, first the sites for red pepper market were identified; then, out of the total traders identified, 47 sample traders were selected randomly based on proportion to the number of wholesalers, retailers and assemblers in the identified market.
Data analysis and model specification
To effectively handle and analyze the diverse data collected from the field, traders and household heads, a combination of descriptive analysis methods (frequencies, percentages, and means) and econometrics model i.e. multiple linear regression model was used. Multiple linear regression model was used to analyze factors affecting red pepper market supply in northwest Ethiopia. Multiple linear regression analysis is important for testing both economic theories and evaluating policy effects for non-experimental data because it can accommodate many explanatory variables that may be correlated (Maddala and Lahiri 1992). Unlike simple regression analysis, multiple linear regression analysis is more amenable to ceteris paribus analysis because it allows us to explicitly control many other factors which simultaneously affect the dependent variable. Greene (2003) also stated that multiple linear regression model is selected for its simplicity and practical applicability. To find out the impact of factors on market supply of red pepper, the functional relationship is specified in Eq. 2.
$$y = f(X_{1} ,X_{2} ,X_{3} ,X_{4} ,X_{5} ,X_{6} , \ldots ,X_{n} ,\varepsilon_{k} )$$
(2)
where y = market supply of red pepper (measured in quintals), xn = explanatory variables and εk = stochastic error term.
The general form of the multiple linear regression models for this study is expressed in Eqs. 3 and 4:
$$y = \beta_{0} + \beta_{1} X_{1} + \beta_{2} X_{2} + \beta_{3} X_{3} + \beta_{4} X_{4} + \beta_{5} X_{5} + \beta_{6} X_{6} , + \cdots + \beta_{k} X_{n} + \varepsilon_{k}$$
(3)
where y = dependent variable explained by different explanatory variables, Xn = independent variable used to explain dependent variable, β0 = intercept of regression model, βk = parameters associated with explanatory variable and εk = stochastic error term.
This can be written as matrix notation:
$$y = x\beta + \varepsilon ,\quad y = \left[ {\begin{array}{*{20}c} {y_{1} } \\ {y_{2} } \\ \vdots \\ {y_{n} } \\ \end{array} } \right],\quad x = \left[ {\begin{array}{*{20}c} {x_{11} } & {x_{12} } & \cdots & {x_{1k} } \\ {x_{21} } & {x_{22} } & \cdots & {x_{2k} } \\ \vdots & \vdots & \vdots & \vdots \\ {x_{n1} } & {x_{n2} } & \cdots & {x_{nk} } \\ \end{array} } \right],\quad \beta = \left[ {\begin{array}{*{20}c} {\beta_{0} } \\ {\beta_{1} } \\ \vdots \\ {\beta_{k} } \\ \end{array} } \right]\,and\,\varepsilon = \left[ {\begin{array}{*{20}c} {\varepsilon_{1} } \\ {\varepsilon_{2} } \\ \vdots \\ {\varepsilon_{n} } \\ \end{array} } \right],$$
(4)
Prior to the regression analysis, multicollinearity test, heteroscedasticity diagnosis, linearity test, omitted variable and normality test were undertaken to filter the variables that are highly dependent. According to Gujarati (2004), variance inflation factor (VIF) is used to check multicollinearity among continuous variables. Before fitting important variables in the model, it is necessary to test multicollinearity problem among continuous variables and check associations among discrete variables, because it highly affects the parameter estimates. As a rule of thumb, if the value of VIF is greater than 10, the variables are said to be highly collinear. Mathematically, it can be expressed in Eq. 5:
$${\text{VIF}} = \frac{1}{{1 - R_{j}^{2} }}$$
(5)
where R
2
j
is the multiple correlation coefficients between explanatory variables, the larger the value of R
2
j
, the higher the value of VIF (Xj) causing collinearity in the variable (Xj). Likewise, the multicollinearity between discrete variables can be calculated using contingency coefficient. The value ranges between 0 and 1, 0 indicating no association between the variables and value close to 1 indicating a high degree of association between variables. As a rule of thumb, if the value of CC is greater than 0.75, the variables are said to be collinear. Mathematically, it can be expressed in Eq. 6.
$${\text{CC}} = \sqrt {\frac{{\chi^{2} }}{{N + \chi^{2} }}}$$
(6)
where CC is the contingency coefficient, χ2—Chi-square test, and N—total sample size.