What is Assumed in the Gtap Database's Disaggregation of Labor by Skill Level?

The 45 region by 50 commodity by 5 primary factor version of the GTAP database provides us with the splits of total labor payments into two categories, viz. skilled and unskilled labor in each sector. The decomposition of total labor payments in all sectors and all regions according to differentials in the skill content of the labor force presupposes substitution possibilities between these two categories of labor. Our interest is to explore the elasticity of substitution implicit in this disaggregation of occupation types. Given the skilled labor payment shares (as calculated from the GTAP database), we offer an ex post rationalization of them within a production-theoretic framework, thereby deriving estimates of the elasticity of substitution between skilled and unskilled labor. The adoption of a suitable nesting of skilled and unskilled labor in GTAP's production function enables us to find a "reasonable" value for the substitution elasticity that is implicit between the two categories of labor in the GTAP database. This relies on the inter-regional covariation in the GTAP shares and in measures of educational attainment.


LIST OF TABLES
: Elasticity of substitution regressions excluding GTAP composite regions 16 Table A1: Skilled labor payment shares in GTAP sectors and regions 19 Table A2: Mean years of education per working person and skilled labor payment shares for all GTAP regions 24 Table A3: Skilled labor payment shares and mean school years of education for selected GTAP regions 25 Table A4: Skilled labor payment shares and mean years of tertiary education for selected GTAP regions 26 Table A5: Skilled labor payment shares, average years of higher and total schooling for selected GTAP regions 27 Table A6: Mean school years of education per working person (DNS database) and skilled labour payment shares for single GTAP regions 28 Table A7: Mean school years of education per working person (BL database) and skilled labour payment shares for single GTAP regions 29 Table A8: Elasticities of substitution regressions including GTAP composite regions 30 Scatter-plot of SKL_AC and MTRY_AC and fitted regression 7 Figure 3:

LIST OF FIGURES
Scatter-plot of SKL_AC and TYR_BL and fitted regression 9 Figure 4: Scatter-plot of SKL_AC and HYR_BL and fitted regression 9 Figure 5: Comparison between the regression lines 10 Figure 6: A Production tree for GTAP incorporating human capital 12

Introduction
The 45 region × 50 commodity × 5 primary factor version of the GTAP database provides us with the splits of total labor payments into two categories, viz. skilled and unskilled labor (in terms of the ILO one-digit classification of workers by occupation) in each sector. 1 The aggregate labor force going into the production process is, thus, classified into 'raw' labor and specialised 'skilled' labor. The decomposition of total labor payments in all sectors and all regions according to differentials in the skill content of the labor force presupposes substitution possibilities between these two categories of labor. Our interest is to explore the elasticity of substitution implicit in this disaggregation of occupation types 2 . As will be evident from our analysis, the adoption of a suitable nesting of skilled and unskilled labor in GTAP's production function enables us to find a 'reasonable' value for the substitution elasticity that is implicit between the two categories of labor in the GTAP database.
To do this, in Section 2 we describe the GTAP methodology, while Section 3 reviews available data sources on educational attainment and reconciles them with the GTAP data base. Section 4 offers some theoretical underpinning to the basis of our empirical analysis. Section 5 reports the results. Concluding remarks are offered in Section 6. # This short paper is an abridged version of a paper titled "Educational attainment, skilled labor payment and absorption capacity: Empirics and Theory" which forms part of my PhD thesis in progress. I gratefully thank my thesis supervisor Professor Alan Powell for stimulating me to write this paper, and for helpful discussions. The remaining errors are, however, mine. 1 Sectors and commodities map 1:1-that is, each sector produces only one commodity. 2 In the context of my thesis, the primary motivation being to study the role of absorption capacity and human capital formation in facilitating technology spillovers, this exploration is a necessary by-product of a larger project. Typically, higher human capital intensity of the work force leads to higher skill formation and augments the capabilities to adapt the current state-of-the-art in their field. With technology transfer, this implies substitution possibilities between skilled and unskilled labor which is being reflected in a higher payment share of the skilled labor.

GTAP Methodology
In GTAP, the labor value splits by all sectors and regions rely on regression analysis (Liu et al. 1998). To generate this database, the data on educational attainment of the working age population was used as a measure of skill to predict the skilled labor payment shares for the regions for which no labor force surveys and national censuses were available. Since data on employment by skill categories in each sector were not available for all the regions, inference was based on observations from 15 national censuses and labor surveys. Initially, this was performed for Versions 1-3 of the database. The industry splits for the non-sampled GTAP regions were predicted by fitting a linear regression model to these data.
The data on average lengths of per capita tertiary education for 30 GTAP regions from 1980-1987 were obtained from the World Bank (1993) sources.
Data for per capita GDP measured at 1987 prices were also acquired from the World Bank database.
A mathematical relationship linking the skilled labor payment share with the stages of development (proxied by regional per capita GDP) and educational attainment (measured by average years of tertiary education) was postulated by Liu et al (1998). An OLS regression model relating payment share of skilled labor to average years of tertiary education and per capita GDP for 30 GTAP regions was used to predict labor payment shares in the unobserved regions on the basis of these observed linkages. The data on the mean years of tertiary education per capita were extrapolated backward to 1970 and forward to 1992 to generate matched-year data for the observation period.
However, initially regressions were run using average years of secondary education for the workers as another explanatory variable in addition to the two mentioned above. As in the regressions at the sectoral level, this variable was not significant, and was omitted from the regression equation. Thus, in the regression model fitted, skilled labor payment share (MHP) is the dependent variable whilst per capita GDP (GDPC) and mean years of tertiary education (TER) for the region as a whole are explanatory variables. Hence, the equation fitted is the model: where F is a linear function. The prediction of sectoral splits of labor payments is based on this fitted equation.
As the education data are unavailable, for Hong Kong, Taiwan, New Zealand, Former Soviet Union and Central European Associates, the data for Singapore, Korea, Australia and European Free Trade respectively were used as proxies. However, it is not clear how the education data used for the prediction of skilled labor payment shares for the composite regions were obtained. We now document our empirical procedures.

Data Reviews and Reconciliation
To start with, we document alternative data sources measuring human capital formation at the aggregative country level. In the domain of empirical economics, the most widely cited databases for analysing interlinkage between human capital, growth and development are: (a) the Barro-Lee (1993, 1996 database (henceforth, BL) and (b) Nehru, Dubey and Swanson's (1995) dataset (henceforth, DNS). Both (a) and (b) make use of data on educational attainment at different levels of education from UNESCO data collected according to its international standard classification of education (ISCED) 3 .
We give a brief overview of each of these prior to describing our methodology for reconciling them with the published GTAP data base. Barro-Lee Dataset: BL (1993) estimate the proportion of the total population with primary, secondary and higher schooling level of education for male and female individuals aged 25 years and above. They present educational attainment data quinquennially from 1960 to 1985 for 129 countries. BL (1996) update it to include the figures for 1990 and the population aged 15 and over as well. However, BL (1996) have complied data on educational quality of each year of schooling at primary and secondary level, across countries. They measure educational attainment on the basis of gross/nett enrolment ratios at the primary, secondary and higher schooling levels. Thus, average years of schooling in the total population aged 15 and over is their proxy for human capital.

3.1)
3.2) DNS Dataset: DNS (1995) provide estimates of education stocks based on mean school years of education per working person for working age population between the ages of 15 and 64 for 85 countries over 28 continuous years . Theirs is an improvement over BL (1993BL ( , 1996  We now discuss the methodology adopted for checking consistencies between each of these alternative definitions and the skilled labor payment share calculated from the GTAP database. The primary motivation is to find any correlation between human capital (proxied by average years of schooling and/or, enrolment, at different levels of education) and the GTAP data on the payment shares of skilled labor. For this purpose, we consider both BL (1996) and DNS (1995) data sets and see how these alternative measures of human capital stock are related to the skilled labor payment shares in the GTAP database. Thus, following DNS (1995), we consider mean school years of education per working age person as a potential index of human capital (MEDY_AC), expecting that the higher is MEDY_AC, the higher will be the skilled payment share. Similar consideration applies in the case of BL (1996) where average schooling years in the total population (TYR_BL) proxies human capital formation.
We start with the construction of skilled labor payment share (SKL_AC) at the sectoral and aggregative levels according to the Version 4 of GTAP database.
Sector-wide skilled payment share (SKL_AC i ) for each traded GTAP sector 'i' is defined as the 'share of skilled labor payment to total labor payment in that sector for any GTAP region r'. All these sectoral indexes are reported in Appendix  Table A1. The region-specific aggregative share SKL_AC r is the ratio of total skilled labor payments to aggregate labor payments across all 50 sectors.
Our next step is to match the GTAP regions with those covered in the BL (1996) and DNS (1995) databases for educational attainment and then to plot the scattergrams for the matched observations of the data sets. All countries for which data are available are considered. However, contingent upon which regions have the necessary data, we include a subset of the GTAP regions in each regression. We exclude those regions (composite as well as single) which are not common in all these data sets. Moreover, for some of the GTAP composite regions not covered in either of the data sets for educational attainment, we have calculated simple averages of the data points related to schooling years of education for their component countries. This procedure assigns equal weights to each of the component countries, and hence does not reflect relative size differences of the constituent countries.
Subsections below document our proposed consistency checks.

3.3) Reconciliation of GTAP Database and DNS(1995) Database
Since GTAP uses average years of tertiary education in the work force as   USA is the outlier in both the scatter plots.
Both scatter plots show distinctive upward co-movements between SKL_AC and the relevant measure of human capital (MEDY_AC or MTRY_AC).
Typically, our equation to be estimated is written as: where X r ∈ { MEDY_AC r , MTRY_AC r } and SKL_AC r is regressed on each X r individually to estimate the intercept 'A' and the slope parameter 'b'. 'r' ranges over a cross-section of countries/regions. The ε r 's are assumed to be identically,  Using the t-statistics, we would reject the null hypothesis (that 'b' is zero) at the 1 per cent as well as the 5 percent level of significance in both cases. As expected, these significant t-statistics on the estimated slope parameter 'b' support the postulated relationship between SKL_AC and both the proxies of educational attainment level. The data used by the GTAP researchers (MTRY_AC) surprisingly does not fit the GTAP skill shares as well as the alternative (MEDY_AC).

3.4) Reconciliation of GTAP Database and BL (1996) Database
In the case of the BL database, the number of matched observations is 35.
Similar considerations to those applying to the DNS data set govern the selection of regions in conformity with GTAP database. The regions excluded are VNM, MAR, RAS, RME, RSM, REU, CEA, FSU, RSA and ROW. We consider two measures of educational attainment available in the BL (1996-7) data set viz., 'average years of higher schooling in the total population' (HYR_BL) and 'average schooling years in the total population' (TYR_BL) for the population aged 15 and over. TYR_BL includes average years of primary, secondary and higher schooling for the relevant age group in the total population. This has been motivated by our curiosity to check whether 'total schooling years' is better than 'higher schooling years' as a proxy for human capital. All these figures are reported in

SKL_AC (Y)
As before, we fit a linear regression model linking SKL_AC with TYR_BL and HYR_BL. The equation we estimated is written below: where X r ∈ { TYR_BL r , HYR_BL r } and all other variables are defined as before. The fitted regression line for each X r is given below. The higher value of R 2 and the t-statistics on the slope coefficient in the case of (3.2a) suggest that mean years of total education (schooling) over all levels of education is a better measure of skill intensity of the workforce than the average schooling years at a specific level of education (such as tertiary or secondary).
Comparison of the fitted regression lines for (3.1a) and (3.2a) as shown in Figure 5 demonstrates that MEDY_AC and TYR_BL exhibit almost exactly the same degree of compatibility with the SKL_AC data calculated from the GTAP database. Nevertheless, the regression involving the DNS data (MEDY_AC) fits the skilled share data slightly better than the BL data (TYR_BL). As seen above in Section 2b, the DNS (1995-96) data is also preferred on other grounds.
Having specified an a priori relationship between various measures of human capital formation and skilled labor payment share and having checked the relationship statistically, it can be inferred (as expected) from the previous analysis that educational attainment explains quite significantly the GTAP shares of aggregate labor payments going to the skilled work force. This prompted us to investigate whether skilled labor embodying human capital and 'raw' or unskilled labor can be combined in a production nest which allows for substitutability between them. A 'reasonable' estimated value of the elasticity of substitution between them would validate our surmise in the sense that the existing GTAP data, including the partition of the wage bill into skilled and unskilled, is consistent with an empirically realistic degree of substitutability between the two classes of labor. The next section documents a formal theory to rationalize the procedures and the results of the skill disaggregation in GTAP.

4.a) Production Nest
In the GTAP production structure, the standard production technology tree is a nested production function where a CES-primary factor composite of labor payment shares, we add a new nest to the production structure so that labor is now split into two components, raw labor (L u ) and skilled labor (L s ), so that total effective labor (E) is a constant elasticity of substitution (CES) combination of L s and L u . The underlying assumption is that human capital does not enter as an additional independent factor of production in the conventional way; rather, human capital proxied by educational attainment is embodied in the supply of skilled labor. Competition ensures that the payment to the labors with skill differentials are proportionate to their productivities.
Such a production nesting is shown in Figure 6. Thus, the production function for nett output is written as: where 'H' is homogeneous of degree one (i.e., constant returns to scale) in the factor inputs and B is a technological coefficient. In intensity form, (4.1) is written as: where y = Y/E, k = K/E, and t = T/E.
As we assume that 'L s ' and 'L u ' are combined in a Constant Elasticity of Substitution (CES) production nest to yield 'E', we write Note that σ =1/(1+ρ) is the elasticity of substitution between L s and L u .
(4.3) may be written: δ E j ' s are the distribution parameters which can be normalised to sum to unity (provided Γ is chosen appropriately). The shares of each factor computed from the quantity side are expressed as: where j is either of the categories of labor i.e., L s or L u . Equations (4.4) and (4.5) Therefore, When X j = L s , we can write that where S Ls is SKL_AC in our notation for calculation of skilled labor payment share as described in Sections 2 and 3 above.

4.b) Estimation Procedure
Equation ( where Λ = (ln δ E j − ρ ln Γ) is a constant representing the intercept of the fitted regression line. By applying the OLS estimation procedure, we get the leastsquares estimate ρ of ρ. This is used to calculate the estimated value of the elasticity of substitution between skilled labor (L s ) and raw labor (L u ) i.e., σ. The estimated standard error and approximate t-value of σ are also derived 6 . This method has been followed for both the data sets viz., BL and DNS. The estimation results are discussed in the following section.

Estimation Results
As mentioned in Section 3, the educational attainment data for the selected GTAP composite regions are obtained by calculating the simple averages of the education data for the constituent countries. Here, for estimating σ, we present one set of results i.e., excluding those composite regions. However, values of σ do not differ substantially if we include these composite regions from our sample of observations. 7 The list of the sample regions included in the regressions are presented in the Tables A6 and A7 in the appendix. 6 From the asymptotic distribution theory for large sample sizes (N→∞), we can say that for any differentiable scalar function ζ of a random variable Ψ, the asymptotic mean and variance-covariance matrix of ζ respectively are ζ (E -(Ψ) ) and j′∑j, where E indicates asymptotic expectation, ∑ is the asymptotic variance-covariance matrix of Ψ, and j is matrix of derivatives of the elements of ζ with respect to those of ζ evaluated at Ψ = E -(Ψ). In practice, we estimate E -(Ψ) by Ψ, an unbiased estimate of E (Ψ), in order to evaluate the entities above. In the present application, ζ and Ψ are scalars. In fact, ζ ≡ σ, our estimated inter-skill substitution elasticity, while Ψ ≡ ρ (our estimate of ρ).

Summary
We have been concerned here to identify educational data that can be used as a proxy for human capital endowment. The above analysis reveals that the available alternative educational attainment data sets all conform with the share of aggregate labor payments accruing to the skilled labor categories incorporated in the Version 4 of GTAP database. This comes as no surprise, since the GTAP labor split is based on one of these educational data sources.
However, there is room for disagreement on some of the details. Amongst the alternative data sources, DNS (1995) data scores over BL (1996) on some desirable grounds.
The derivation by Liu et al. (1998) of the shares of skilled and unskilled labor in the work force of the 45 GTAP regions from data on educational attainment follows an ad hoc regression approach. In this paper the GTAP data on such shares have been taken as given, although it might have been preferable if the shares had been derived within a production-theoretic framework. Given the shares, we offer an ex post rationalization of them within such a framework, thereby deriving estimates of the elasticity of substitution between skilled and unskilled labor. This relies on the inter-regional covariation in the GTAP shares and in measures of educational attainment. The resulting point estimates are in the range 0.67 (±0.05) to 0.83 (±0.03), depending on the educational data used.
These point estimates differ significantly from zero and from unity at a high level of significance.             Source: Same as mentioned in Table A5.