The number of countries for which data is available is a key metric to understand the geographical coverage of the indicator.
To display only the indicators with poor country coverage (under 50 countries), select the checkbox under the ‘Number of Countries’ column.
Indicators with a score below 2 are highlighted. For a detailed explanation of the methodology, refer to the Quantitative Scoring section.
An indicator should have recent data available and the indicator should ideally be updated as frequently as possible.
To display only the indicators with no data in the last 10 years, select the checkbox under the ‘Latest Year’ column.
Indicators with a score below 2 are highlighted. For a detailed explanation of the methodology, refer to the Quantitative Scoring section.
To quantify the quality of a WDI database indicator, we have created some metrics that help us understand the temporal coverage, geographical coverage, completeness, and usage of the indicator. We have created the following metrics:
Geographical coverage:
- Number of economies (n_country): This metric measures the total number of economies for which data is available for the indicator.
- Share of low- and middle-income economies (p_lmic): This metric measures the percent of low- and middle-income economies for which data is available. We use the total number of LMICs as of today as the denominator.
Temporal coverage:
- Absolute latest year (yearlatest): This metric measures the most recent year of data available for an indicator.
- Median latest year (yearleatest_median): This metric takes the most recent year of data available for each country for the indicator and then calculate the median.
- Span of years (span_years): This metric measures the total number of years for which data is available for this indicator. We take the first year data and latest year for which any data is available and calculating the span between these years.
- Non-missing data: This metric measures the share of non-missing data within its availability. The span is restricted to the indicator span and country coverage previously calculated, and not the span and coverage of the WDI.
Usage:
- Unique visitors: This metric measures the number of unique visitors in one year which is calculated using the API for the Adobe Analytics platform.
Metadata:
- Key Availability This indicator checks that key metadata fields are available for the indicator, specifically fields on development relevance, methodology, license, definition, and source. This is done by checking for the availability of at least one word in each of these fields. The metric is between 0-5, with 1 point awarded for each field that has content available.
- Length This metric is a word count for an indicator across all metadata fields. This includes development relevance, licenses, methodology, definition, source, comments, notes, and limitation.
These 9 metrics has been used to calculate scores for each indicator by two different methods. The following describes the two types of scoring methods:
Distance to frontier scoring
This approach takes the 'best case' scenario and 'worst case' scenario for each indicator, and calculates the distance of the actual value between the best and worst case. This is done by creating a percentage for each indicator.
For example, for 'Latest year', the best case value is 2020 and the worst case value is 2008, then if the indicator has a value of 2016, then the score for this indicator will be 2016-2008/2020-2008*100 = 66.67%.
Such a percentage is calculated for every indicator except for Unique visitors. For Unique visitors, a percentile is generated as there is no 'best case' scenario that is independent of the distribution.
After getting a score out of 100 for each metric, it is summed up and divided by the number of metrics i.e. seven, to get a total score out of 100.
Threshold scoring
In this approach, a score is produced for each indicator based on whether it passes certain thresholds in each metric. Three thresholds are defined for each metric, which are named the "Loose", "Median", and "Stringent" scenarios. The "Loose" scenario is the easiest for an indicator to pass. For instance for the "Number of economies" metric, it requires that the indicator have data for at least 35 economies. The "Median" scenario is more difficult to pass. On the "Number of economies" metric, for instance, it requires that the indicator to have data for at least 50 economies. The "Stringent" scenario is the most difficult for an indicator to pass, requiring, as an example, at least 65 countries with data for the "Number of economies" metric. The thresholds were defined by the WDI criteria team based on looking for natural cuts in the data for each metric and based on internal team discussion on reasonable standards.
Table. Thresholds for Classifications in Tiers to Flag Poorly Performing Indicators as of 2024.
Metric | Lowest Tier | 4th Tier | 3rd Tier | 2nd Tier | Top Tier |
---|---|---|---|---|---|
Number of economies | 30 | 50 | 80 | 100 | 180 |
Share of low- and middle-income economies | 10 | 30 | 40 | 65 | 90 |
Span of years | 3 | 6 | 10 | 15 | 50 |
Absolute latest year | 2012 | 2013 | 2015 | 2016 | 2019 |
Median latest year | 2010 | 2012 | 2012 | 2015 | 2019 |
Non-missing data | 8 | 10 | 12 | 15 | 60 |
Unique visitors | 50 | 65 | 120 | 200 | 2000 |
Metadata Availability | 1 | 2 | 3 | 4 | 5 |
Metadata Word Length | 50 | 100 | 150 | 200 | 500 |
To produce an overall score for each indicator based on this approach, the following methodology was followed. For each metric, a score of 0-4 was produced based on the following scoring:
- 1 point: metric value falls below the "4th Tier" scenario threshold
- 2 points: metric value falls between the "4th Tier" and "3rd Tier" threshold
- 3 points: metric value falls between the "3rd Tier" and "2nd Tier" threshold
- 4 points: metric value above the "2nd Tier" threshold.
The result of this scoring is that each indicator has a 1-4 score along all 9 of the scoring metrics (Number of economies, Share of low- and middle-income economies, Absolute latest year, Median latest year, Span of years, Non-missing data, Unique visitors, metadata availability, metadata length).
An overall score is then computed by taking the weighted average the scores across the 9 metrics with higher scores meaning the indicator performs better on average across the nine metrics. The scores are produced using the nested structure described above with five categories: Geographic Coverage, Temporal Coverage, Completeness, Usage. The weights are 1/5 to indicators grouped in the Geographic Coverage category, 1/5 to Temporal Coverage, 1/5 to Completeness, 1/5 to Usage, and 1/5 to metadata quality. Within each category, the category scores are produced taking the unweighted average of metrics in that category.
Distance to Threshold Scoring Approach
A third approach to scoring combines some aspects of the distance to frontier scoring method and the threshold scoring method. As with the threshold scoring method, a set of loose, median, and stringent scenarios are used to score the indicators. This is the approach displayed in the figure on the main page.
A flaw with the threshold scoring approach is that the discrete scoring on the 1-4 scale resulted in many tied scores for indicators. The Distance to Threshold approach rectifies this flaw by extending the discrete 1-4 scale to a continuous scale between 0 and 4. It does so by incorporating some of the elements of the distance to frontier method, where upper limits and lower limits are set based on each scenario (loose, median, stringent) and an indicator is scored by its distance between the upper and lower limit.
Additionally, two new categories are added: "Low" and "High", which help smooth the distribution of scores below the "Loose" and above the "Stringent" scenarios. Indicators scoring below the "Low" are automatically given the lower possible score '0', which indicators scoring above the "High" are automatically given the highest possible score '4'. The "Low" and "High" categories help account for outliers in the distribution of the metrics that may impact the scoring. To give a specific example, on unique visitors, suppose there is an indicator that receives over a million unique visitors in a year. In fact, the indicator "GDP growth (annual %)" does receive this total in a year. If a "High" category was not set, then this outlier would cause a large amount of bunching of scores for all indicators above the "Stringent" group, as the vast majority of indicators are very far from this indicator with 1 million visits. Therefore, a High category is introduced, where any indicator scoring above this limit will automatically receive 4 points, and indicators under this High will not see as much bunching due to the outlier observation.
For each metric, a score of 0-4 was produced based on the following scoring. First some notation. Let \(\alpha_l\) be the lower limit and \(\alpha_u\) be the upper limit. Also let \(\gamma\) be the value of a metric (for instance the number of countries). Then the scores are the following:
- Metric value falls below the "Low" scenario threshold
- Lower limit : Low possible value for any indicator
- Upper limit : value of the "Low" scenario
- Score: 0
- Metric value falls below the "Loose" scenario threshold and above "Low"
- Lower limit : value of the "Low" scenario
- Upper limit : value of the "Loose" scenario
- Score: \((\gamma-\alpha_l)/(\alpha_u - \alpha_l)\)
- Metric value falls between the "Loose" and "Median" threshold
- Lower limit : value of the "Loose" scenario
- Upper limit : value of the "Median" scenario
- Score: 1+\((\gamma-\alpha_l)/(\alpha_u - \alpha_l)\)
- Metric value falls between the "Median" and "Stringent" threshold
- Lower limit : value of the "Median" scenario
- Upper limit : value of the "Stringent" scenario
- Score: 2+\((\gamma-\alpha_l)/(\alpha_u - \alpha_l)\)
- Metric value above the "Stringent" but below "High" threshold.
- Lower limit : value of the "Stringent" scenario
- Upper limit : value of the "High" scenario
- Score: 3 +\((\gamma-\alpha_l)/(\alpha_u - \alpha_l)\)
- Metric value falls above the "High" scenario threshold
- Lower limit : value of the "High" scenario
- Upper limit : Maximum possible value for any indicator
- Score: 4
The score for indicator on a specific metric (say number of countries) is the distance from the lower limit divided by the total distance between the upper and lower limit plus a constant. The constant distinguishes between different thresholds. Note that the value for is always between 0 and 1 and that indicators with metrics closer to the upper limit are closer to 1 and thus receive higher scores. This scoring system results in continuous scale between 0 and 4.
To give a specific example based on the number of countries criteria, an indicator with data for 21 countries would receive a score of \((21-20)/(35-20)=0.067\). An indicator with data for 45 countries would receive a score of \(1+(45-35)/(50-35)=1.667\). An indicator with data for 200 countries would receive a score of \(3+(190-65)/(200-65)=3.93\)
The result of this scoring is that each indicator has a 0-4 score along all 9 of the scoring metrics (Number of economies, Share of low- and middle-income economies, Absolute latest year, Median latest year, Span of years, Non-missing data, Unique visitors, metadata availability, metadata length).
An overall score is then computed by taking the weighted average the scores across the 9 metrics with higher scores meaning the indicator performs better on average across the nine metrics. The scores are produced using the nested structure described above with five categories: Geographic Coverage, Temporal Coverage, Completeness, Usage. The weights are 1/5 to indicators grouped in the Geographic Coverage category, 1/5 to Temporal Coverage, 1/5 to Completeness, 1/5 to Usage, and 1/5 to metadata quality. Within each category, the category scores are produced taking the unweighted average of metrics in that category.