Brett Romero

Data Inspired Insights

Tag: open data

Forget SQL or NoSQL – 5 scenarios where you may not need a database at all

A while back, I attended a hackathon in Belgrade as a mentor. This hackathon was the first ‘open data’ hackathon in Serbia and focused on making applications using data that had recently been released by various ministries, government agencies, and independent bodies in Serbia. As we walked around talking to the various teams, one of the things I noticed at the time, was that almost all teams were using databases to manage their data . In most cases, the database being used was something very lightweight like SQLite3, but in some cases more serious databases (MySQL, PostgreSQL, MongoDB) were also being used.

What I have come to realize is that in many cases this was probably completely unnecessary, particularly given the tight timeframe the teams were working towards – a functional prototype within 48 hours. However, even if you have more time to build an application, there are several good reasons that you may not need to worry about using a formal database. These are outlined below.

1. The data is small

Firstly, let’s clarify what I mean when I say ‘small data’. For me, small data is any dataset under 10,000 records (assuming a reasonable number of data points for each record). For many non-data people, 10,000 records may seem quite big, but when using programming languages such as Python or JavaScript, this amount of data is usually very quick and easy to work with. In fact, as Josh Zeigler found, even loading 100,000 records or 15MB of data into a page was possible, completing in as little as 463ms (Safari FTW).

Leaving aside the numbers for a second, the key point here is that in many cases, the data being displayed in an application has far fewer than 10,000 records. If your data is less than 10,000 records, you should probably ask yourself, do you need a database? It is often far simpler, and requires significantly less overhead to simply have your data in a JSON file and load it into the page directly. Alternatively, CSV and Excel files can also be converted to JSON and dumped to a file very quickly and easily using a Python/Pandas script.

ecis visualization

The ECIS Development Tracker uses data from six Worldwide Governance Indicators and two other series over 20 years and 18 countries – a total of almost 3,000 data points and a perfect example of small data.

2. The data is static

Another reason you may not need a database is if you have a reasonable expectation that the data you are using is not going to change. This is often the case where the data is going to be used for read only purposes – for example visualizations, dashboards and other apps where you are presenting information to users. In these cases, again it may make sense to avoid a database, and rely on a flat file instead.

The important point here is that if the data is not changing or being altered, then static files are probably all that is needed. Even if the data is larger, you can use a script to handle any data processing and load the (assumedly) aggregated or filtered results into the page. If your needs are more dynamic (i.e. you want to show different data to different users and do not want to load everything), you may need a backend (something you would need for a database anyway) that extracts the required data from the flat file, but again, a database may be overkill.

kosovo mosaic

The Kosovo Mosaic visualizer – based on data from a survey conducted once every three years – is an example of a case where the data is not expected to change any time soon.

3. The data is simple

One of the big advantages of databases is their ability to store and provide access to complex data. For example, think about representing data from a chain of retail stores on the sale of various products by different sales people. In this case, because there are three related concepts (products, sales people and stores), representing this data without using a database becomes very difficult without a large amount of repetition[1]. In this case, even if the data is small and static, it may simply be better to use a relational database to store the data.

However, in cases where the data can be represented in a table, or multiple unrelated tables, subject to points 1 and 2 above, it may make sense to avoid the overhead of a database.

database schema

If you need a schema diagram like this to describe your data, you can probably skip the rest of this article.

4. The data is available from a good API

I have recently been working on a project to develop an application that is making extensive use of the Google API. While still under development, the app is already quite complex, making heavy use of data to generate charts and tables on almost every page. However, despite this complexity, so far, I have not had to use a database.

One of the primary reasons I have not needed to implement a database is that the Google API is flexible enough for me to effectively use that as a database. Every time I need data to generate a chart or table, the app makes a call to the API (using Python), passes the results to the front end where, because the data is small (the Google API returns a maximum of 10,000 rows in a query), most of the data manipulation is handled using JavaScript on the client side. For the cases where more heavy data manipulation is required, I make use of Python libraries like Pandas to handle the data processing before sending the data to the front end. What this boils down to is a data intensive application that, as yet, still does not need a database.

Of course, this isn’t to say I will not need a database in the future. If I plan to store user settings and preferences, track usage of the application, or collect other meta data, I will need to implement a database to store that information. However, if you are developing an application that will make use of a flexible and reliable API, you may not need to implement your own database.

google apis

Google has APIs available for almost all of its products – most of them with a lot of flexibility and quick response times.

5. The app is being built for a short-term need

While it might seem unusual to build a web app with the expectation that it will not be used six months later, this is a surprisingly common use case. In fact, this is often the expectation for visualizations and other informative pages, or pages built for a specific event.

In these particular use cases, keeping down overhead should be a big consideration, in addition to potential hosting options. Developing these short-term applications without a backend and database means free and easy hosting solutions like that provided by GitHub can be used. Adding a backend or database immediately means a more complex hosting setup is required.

Wrapping up, this is a not an argument against databases…

… it is simply an argument to use the best and simplest tools for a given job. As someone who has worked with a number of different databases throughout their career, I am actually a big user of databases and find most of them intuitive and easy to use. There is also a large number of advantages that only a database can provide, from ensuring data consistency, to facilitating large numbers of users simultaneously making updates, to managing large and complex datasets, there are a number of very good reasons to use a database (SQL or NoSQL, whichever flavor you happen to prefer).

But, as we have covered above, there may be some cases where you do not need these features and can avoid adding an unnecessary complication to your app.

 

Next week we’ll take a look at a simple app that uses an Excel spreadsheet to generate the data required for the application.

 

[1] With repetition comes an increased risk of data quality issues

Traffic Accidents Involving Cyclists in the ACT

I’ve had a few days off lately and I decided to try something a bit different. Instead of writing an(other) lengthy article, I thought I would go back to my roots and actually look at some data. To that end I recently discovered a website for open data in Australia, data.gov.au. This website has literally thousands of interesting datasets released from all levels of government, covering everything from the tax bills of Australia’s largest companies to the locations of trees in Ballarat.

One of the first datasets that caught my eye was one published by the Australian Capital Territory (ACT) Government on traffic accidents involving cyclists. For those that don’t know, Canberra (the main city in the ACT) is a very bike friendly city and is home to a large number of recreational and more serious cyclists, so seeing where the accidents were/are occurring was something I thought would be interesting.

Using a few new things I have not used before (primarily Mapbox and leaflet.js), I put (slapped?) together an interactive map that uses the data provided and also gives you a few different ways of viewing it. The full version of the map can be accessed by clicking the picture below:

cyclist-map

 

See a bug? Found it particularly useful? Hate it? Leave a comment below!

Women in the Workplace – Where is Everyone?

Cross posted from OpenDataKosovo.org:

Continuing our series on Gender Inequality and Corruption in Kosovo, in Part IV we are going to build on Part III and use our understanding of the participation rate to compare the participation rate in Kosovo across a range of countries, as well as look at the reasons for non-participation (“inactivity”). If you don’t understand what a participation rate is (SPOILER: it is not the same as the unemployment rate), or just want to make sure you get the full picture, please go back and read Part III.

Click on the chart below to interact with the data!

sunburst_pic

Sunburst chart created by Festina Ismali

Comparing Participation Rates

Comparing participation rates across countries provides insight into broad demographic trends and the specific employment situation in a country relative to other countries. For most high income nations, the participation rate tends to be around 60%. That is, 6 out of every 10 people of working age are actively engaged in the employment market (whether they currently have a job or not). While that may sound low, this accounts for parents who stay home to raise children, students, retirees and discouraged workers[1].

Once we leave high income countries, there is a much larger range of participation rates. Many very poor low income nations in Asia and Africa have extremely high participation rates of well over 80%. This is driven by pure necessity as, in many cases, there is simply no option for one partner to stay home, retire, or even for young people to continue studying.

Conversely, we also see many countries with very low participation rates of just over 40%. In some cases, these countries are involved in ongoing conflicts or are post-conflict countries (Syria, Iraq and Afghanistan all had participation rates below 50% in 2013). But in other cases, the cause is harder to identify.

Unfortunately, Kosovo is one of these harder to understand cases. In 2013, Kosovo had the second lowest participation rate of any country in the World Bank database, at 40.5%. In 2014 that number picked up slightly to 41.6%, but that was still low enough to keep Kosovo in the bottom 10, based on 2013 figures. Notably, Kosovo’s low participation rate has actually decreased substantially over the past decade (see Chart 1). In 2002, the participation rate stood at 52.8%. If that participation rate applied today, there would be an extra 134,600 people in the labour force – an increase of 26.9%.

Chart 1 – Participation Rate in Kosovo 2002 to 2014

Looking at Chart 1, another data point that immediately stands out is the low participation rate for women. In fact, with a participation rate for women of 21.1% in 2013, Kosovo has one of the lowest participation rates for women in the world. In terms of the rankings, Kosovo places between Saudi Arabia (20.2%) and Lebanon (23.3%). Looking around the region, Kosovo is also a significantly outlier (see Chart 2).

Chart 2 – Female Participation Rate for Selected Countries 2002 to 2013

Methodology Matters

Previously, in Part III, we mentioned that there were some more detailed criteria for determining whether a person is considered ‘employed’ in Kosovo. Specifically, there is one particular criteria that may partially explain Kosovo’s notably lower participation compared to its neighbors (and everyone else).

In the 2014 Kosovo Labour Force Survey, a specific methodological difference with Albania is highlighted. In Kosovo, people who work on a family run farm are not considered employed if the produce of the farm is not considered an “important source of consumption” (let’s call these people ‘family farm workers’). In contrast, these same people in Albania are classified as employed. From the 2014 Kosovo Labour Force Survey Results paper (emphasis mine):

“It is important to note that when respondents answer code 5B[2], that they do some agricultural activity but it is not an important contribution, this is not counted as employed. In 2014 69% of this group were categorized as inactive and 31% as unemployed. An important contribution is a subjective term and could depend on overall household income.”

The key takeaway here is that there is a significant population of family farm workers that are currently being classified as inactive, when in fact they are working. This at least partially explains the low participation rate in Kosovo.

Unfortunately, the paper does not provide enough information to be able to determine how many people are  family farm workers. As such, we are unable to quantify exactly how much impact adding family farm workers back into the labour force would have on the headline participation rate.

Even if we could though, this would not be fully correct either (welcome to the surprisingly complex world of labour market statistics). Many family farm workers probably do not consider themselves employed – working 1 hour a week[3] on a family farm is a pretty low bar after all. The fact that 31% of them qualified as unemployed, meaning they actively sought other work, reveals that this is not homogenous group of full time farm workers being incorrectly classified.

Worrying Trends

Methodological anomalies aside, there is also a concerning trend in the data – the participation rate for women in Kosovo has been declining for much of the past decade[4]. Despite the improving economy and significant international development assistance, the participation rate for women fell from over 34.5% in 2002 to 21.4% in 2014. There is some good news – the fall appears to have bottomed out, with 2013 and 2014 both recording higher participation rates for women than the low point in 2012 (17.8%!).

This slight uptick in recent years could be the impact of numerous initiatives to get women into the workforce in Kosovo. These range from the prioritization of grants for projects that provide jobs for women, to supporting women in registering property in their own names to help provide collateral for loans. There has also been a push by Kosovo’s first and current female President to boost participation among women. Several more years of data will be required to determine whether this is the beginning of a more substantial trend or simply noise in the data.

In the meantime, let’s get a better understanding of the current labour market by looking at a break down (see Table 1), provided in the 2014 Kosovo Labour Force Survey, of the inactive population sorted by reason for not participating.

Table 1 – Inactive Persons by Category

(A) Men (B) Women (C) = (B) minus (A)
1,000s 1,000s  (C1) 1,000s (C2) % of total
Looking after children or incapacitated adults 0.1 14.3 14.2 5.8%
Own illness or disability 13.3 8.6 -4.7 -1.9%
Other personal or family responsibilities 13.5 233.4 219.9 90.2%
In education or training 104.7 97.3 -7.4 -3.0%
Retired 6.9 5 -1.9 -0.8%
Believes that no work is available 49.5 78.9 29.4 12.1%
Waiting to go back to work (laid-off people) 0.8 0.5 -0.3 -0.1%
Other reasons 20.7 16.2 -4.5 -1.8%
No reason given 1.9 3.4 1.5 0.6%
Total  229.2 473.0 243.8 100.0%

Looking at the breakdown, there is one category in particular in which there was a large discrepancy between the sexes – ‘Other personal or family responsibilities’. In this category, 233,400 were women, amounting to 38.8% of the total population of working age women. By contrast, only 13,500 were men, amounting to 2.2% of the total population of working age men. The table also shows the calculated difference between the number of inactive women and men (see column C1). Looking at these calculated differences, we see that for the total calculated difference across all categories (243,800 – see ‘Total’ row in column C1), 219,900, or over 90%, arose from this category. This breakdown is also shown in Chart 3 below.

Chart 3 – Inactive People by Category of Inactivity – 2014

Going back to the family farm workers discussed earlier, we expect that those classified as inactive would be included in the ‘Other personal or family responsibilities’ category. However, if a significant number of women in this category were family farm workers and this was a full time role, we would also expect to see large numbers of men in the same category. The fact that we do not suggests that many men who are family farm workers also have other more formal jobs and lends support to the decision to exclude family farm workers from the employed population.

The other category where we see a meaningful gap between the sexes is the ‘Believes that no work is available’ category. As mentioned earlier, these are the people that are considered discouraged workers (i.e. those that would take a job, but are no longer actively looking). Why would significantly more women be discouraged than men? Typically, discouraged workers are the end product of long and unsuccessful searches for employment. At times of high unemployment, it will often be the case that the number of discouraged workers will also increase. Seeing that women are more likely to be discouraged than men suggests they are having a more difficult time finding employment.

To confirm this hypothesis, we need to look at unemployment rates. This will be the focus of the next piece in this series – Part V.

 

[1] People who would like a job but who haven’t actively sought work in the past 4 weeks

[2] Code 5b text: “Worked (at least one hour) on a farm owned or rented by you or a member of your household (even unpaid) whether in cultivating crops or in other farm maintenance tasks, or you have cared for livestock belonging to you or a member of your household (if the whole production is only for own consumption and this production does not constitute an important contribution to the total consumption of the household.

[3] Employed are considered all the persons who have worked even for one hour with a respective salary or profit during the reference week.

[4] There is no mention of when the current methodology was implemented, but it is possible that the large drop in participation rate between 2009 and 2012 was due to a change.

Corruption in Kosovo: A Comparative Analysis

Cross posted from OpenDataKosovo.org:

Previously in Part I of this series, we looked at corruption in Kosovo from the perspective of Kosovo civil servants, as documented in a United Nations Development Programme (UNDP) report entitled Gender Equality Related Corruption Risks and Vulnerabilities in Civil Service in Kosovo[1].

In Part II we are now going to look at global corruption perception statistics compiled by Transparency International to consider how Kosovo compares internationally.

An International Comparison of Corruption

Transparency International is an organization that works to reduce corruption[2] through increasing the transparency of Governments around the world. Arguably Transparency International’s most well known contribution is the Corruption Perceptions Index (CPI), an index measuring “the perceived levels of public sector corruption worldwide”. In 2014[3] the CPI was calculated by aggregating 12 indices and data sources collected from 11 different independent institutions specializing in governance and business climate analysis over the past 24 months. The 2014 CPI covered 175 countries, including Kosovo.

In addition to the CPI, Transparency International does its own survey and data collection in the form of the Global Corruption Barometer (GCB survey). The GCB survey focuses on the public’s opinion of corruption within their own country, and in 2013 (the latest edition of the GCB available at the time of writing) collected the opinions of over 114,000 people across 107 countries – including Kosovo.

So what did these two reports show?

Results

In the CPI, Kosovo performs poorly, placing 110th out of 175 countries with a score of 33 out of 100 (unchanged from 2013). To give some perspective, Kosovo finished equal 110th with 4 other countries – Albania, Ecuador, Ethiopia, and Malawi. This placed it behind Argentina (107th), Mexico (103rd), China (100th), India (85th) and Greece (69th), countries that are often associated with high levels of corruption. Finally, this was the lowest ranking for any country in the Balkans region (tied with Albania).

Chart 1 – GCB Survey Q6 – Perceptions of Corruption by Institution for 6 Countries

WAC_2_1

The GCB survey, however, shows that the people in Kosovo have a different perception of corruption in several areas to that reported in the CPI. Based on the responses to question 6[4] (see Chart 1) and question 7[5] (see Chart 2) of the GCB survey, people in Kosovo are somewhat more optimistic about the levels of corruption in their country than the low rating on the CPI might indicate. Kosovo scores well in several areas:

  • Only 16% of people reported having paid a bribe in the last 12 months. This placed Kosovo 35th out of the 95 countries that provided a response to question 7.
  • 46% of Kosovars generally believe their public institutions to be corrupt or extremely corrupt. This sounds high but actually puts Kosovo ahead of the US (47%) and only slightly behind Germany (40%). The results for certain institutions were even better:
    • The Military is believed to be corrupt or extremely corrupt by only 8% of those interviewed – only four countries had a lower percentage than Kosovo on this part of question 6.
    • NGOs and Religious bodies were also seen as uncorrupt by large majorities.
    • 44% of people believed public officials and civil servants were corrupt, placing Kosovo ahead of Germany, France and the US, among others.

Chart 2 – GCB Survey Q7 – Reports of Bribes Paid by Institution for 6 Countries

WAC_2_2

But not all the results were positive. Questions 1[6], 4[7] and 5[8] in the GCB survey in particular highlight a more pessimistic outlook:

  • In response to question 1, 66% of Kosovars stated that they believed corruption had increased over the past 2 years, while only 8% believed it had decreased.
  • In response to question 4, 74% of Kosovars stated they believed their Government is run by large entities largely or entirely for their own benefit.
  • In response to question 5, only 11% of Kosovars surveyed believed the actions of their Government in the fight against corruption are effective.

What does all this mean? Why does Kosovo perform so poorly on the CPI, and on some GCB survey questions, but on other questions the perceived level of corruption of people in Kosovo is comparable to some developed nations?

Perceptions vs. Reality

One of the issues when looking at the results of the GCB survey is that the responses to most of these questions are subjective. What constitutes corruption or extreme corruption varies by country and culture based on what people are used to living with. What someone in South Asia or sub-Saharan Africa considers standard practice and harmless may be considered unbelievably corrupt by people in other parts of the world.

These different standards are really highlighted when we compare the percentage of people believing an institution is corrupt with the number of people reporting to have paid a bribe to that institution, using questions 6 and 7 of the GCB survey. There are four institutions that appear as options for both questions, allowing us to make a direct comparison:

  1. Education
  2. Judiciary
  3. Medical and Health, and
  4. Police

In the comparison (see Chart 3), we find numerous examples where the percentage of people that reported paying bribes was higher than the percentage of people who believed the institution was corrupt. The implication of this finding is that significant numbers of people in these countries believe that paying a bribe is not a sign of corruption.

Chart 3 – Comparison of Perceived Corruption with Bribes Paid

WAC_2_3

Kosovo and most developed nations were examples of the opposite case – they generally reported relatively high numbers of people who believed the four comparable institutions were corrupt, and relatively low percentages of people reporting bribes being paid. Bribery, of course, is not the only form of corruption, and this result could simply be an indicator that different forms of corruption are more prevalent in these countries. But it could also be an indicator that people in some countries are particularly cynical about the fidelity of their institutions.

To get a better sense of how concerned people really are about corruption, lets now take a look at some of the responses to other questions in the survey.

Is a Person’s Willingness to Take Action a Better Indicator?

One of the questions asked on the survey that could potentially reveal some further information was question 10 – “Are you willing to get involved in the fight against corruption?” Respondents were then provided with a range of activities, both active and passive, and were requested to indicate whether they would be willing to participate.

At a high level, the responses to this question appear to show an inverse correlation between the value of the CPI for a country and how willing people in that country were to do something active to fight corruption. In other words, the higher the percentage of people willing to do something active to fight corruption, the lower the CPI index for that country (i.e. a higher level of corruption).

Using a statistical model (such as regression), we can check whether this relationship is real and how strong it is. However to do this, we need to consider countries with regimes that punish dissent and crack down on protests and/or organizations that might try to combat corruption. In these countries, you would expect to have a low percentage of people willing to take action against corruption despite corruption being high.

To account for this, we need to have some sort of indicator of how worried people are about speaking out in their country. The best piece of information that we have from the GCB survey that can serve this purpose was the question asking if the respondent would be willing to report corruption.

Using these two pieces of information, we can try to test the following hypotheses:

  1. A high percentage of people willing to take action against corruption in a given country is indicative of a high level of corruption.
  2. A low percentage of people willing to take action against corruption in a given country, but a high level willing to report corruption is indicative of a low level of corruption.
  3. A low percentage of people willing to take action against corruption in a given country, and a low level willing to report corruption is indicative of a high level of corruption in a repressive regime.

Based on these hypotheses, we also expect that there would be no (or very few) cases where there is high percentage of people willing to take action against corruption and a low level of people willing to report corruption.

Building a Model

Using our two pieces of information described above, and with the assumption that the CPI is the most accurate indicator of the true level of corruption within a country[9], we can build a model to predict CPI for each country and test our hypotheses. The formula for this model will be as follows:

Where:

Yi = the actual value of CPI for country i

β0 = a constant

Xi1 = the percentage of people willing to do something active to fight corruption[10] in country i

β1 = a constant applied to Xi1

Xi2 = the percentage of people willing report an incidence of corruption in country i

β2 = a constant applied to Xi2

εi = the residual or error

Using ordinary least squares (OLS) and the data for the 101 countries for which the CPI and the two variables (X1 and X2) described above are provided, the results of the model is as follows:

β0 β1 β2
Coefficient 27.9735 -0.8417 0.8798
Standard Error 4.8427 0.0705 0.0738
R2 66.7%

The first thing to note is that the coefficients support the three hypotheses we mentioned above:

  1. A strongly negative coefficient β1 indicates that the larger the percentage of people willing to do something active to fight corruption, the lower the predicted CPI.
  2. A strongly positive coefficient β2 indicates that the larger the percentage of people willing to report corruption, the higher the predicted CPI.

General Insights

Aside from providing support for our hypotheses, the other thing this model reveals is the countries that are not very well explained by this model. Chart 4 shows the CPI predicted by the model as compared to the actual CPI value for 2014.

Chart 4 – Predicted CPI vs. Actual CPI by Country

WAC_2_4

At a high level, we can split the chart into two parts:

  1. Points below and to the right of the line reflect countries where the actual level of the CPI was lower than the predicted level
  2. Points above and to the left of the line reflect countries where the actual level of the CPI was higher than the predicted level.

Starting with the first group – countries that were more corrupt than the model predicted – these cases appear to fall into two categories:

  • Conflict Affected Countries – In these cases, of which Sudan is the most extreme example, there was typically a low percentage of people willing to do something active to fight corruption, and therefore the CPI was predicted to be significantly higher than it is in reality. This is likely to be due the citizenry of these countries facing more immediate problems. This pattern was seen across Sudan, Afghanistan, Iraq, Libya and South Sudan.
  • Other – In these cases, of which Russia was the best example, there was generally a high percentage of people willing to report corruption (86% for Russia) and a relatively low percentage of people willing to do something active to fight corruption (47% in Russia). As a result the model predicted a relatively high CPI. The explanation for this is not as clear as above, but the evidence would seem to suggest that the people in these countries are either not aware of the high level of corruption present in their country, or that they have a significantly different opinion as to what constitutes corruption.

Contrasting with the above cases, we can also see there are countries above and to the left of the line in Chart 4. This represents countries that were less corrupt than the model predicted. In these cases the responses to the two questions were indicative of a country with a higher level of corruption than actually existed. The following were two interesting cases:

  • Finland – the model was thrown off by a surprisingly low percentage of people willing to report corruption. Of the respondents from Finland, only 65% of people surveyed reported they would be willing to report corruption – a surprisingly low percentage for a country with a CPI value of 89. In fact, Finland and Japan were the only countries with a CPI above 60 that reported a percentage below 80% for this question.
  • The United States – neither of the data points used for the US in the model were hugely abnormal for countries in the same CPI range. 80% of people said they would be willing to report corruption (a little lower than you would expect) and 50% said they be willing to do something active to fight corruption (a little higher than you would expect). Both of these potentially show a slightly higher level of mistrust in government than other developed nations, something that does tie in with the politics of large parts of the US.

Unlike the above examples, Kosovo appeared fairly typical for the model. Let’s now take a deeper look into the results of the model for Kosovo.

Insights for Kosovo

For Kosovo, the model was able to fairly accurately predict the CPI using the two variables described. Kosovo has both a high percentage of people willing to do something active to fight corruption (80%) and a high percentage of people willing to report corruption (84%). As a result, the model predicted a high level of corruption in Kosovo, a CPI of 35, which was just below the actual CPI value of 33.

However, aside from proving the accuracy of the model in this case, these high values reveal important information about the people of Kosovo. It reveals Kosovars do believe corruption is an issue, and that they are willing to do something about it.

Summary

Overall, there are positives and negatives for Kosovo that can be taken from the Transparency International data. On the negative side, the CPI highlights that corruption is a significant issue in Kosovo. Even in a region with consistently low CPI scores (the best performer is Slovenia with a score of 58) Kosovo is a significant underperformer. The most disappointing aspect of this underperformance is that Kosovo has had the significant advantage of 15 years of assistance from various international agencies in setting up infrastructure for good governance.

That said, there is a big positive that comes from the GCB survey data, and it is also potentially an important clue as to the best way forward for Kosovo and the international organizations involved in the region. That positive is that the people of Kosovo appear to be aware of the issues of corruption in their country, and more importantly, they are very willing to take an active role to fight it. Compared to Albania, a country with the same CPI as Kosovo, almost twice the percentage of survey respondents stated they were willing to do something active to fight corruption in Kosovo (80% vs. 44%), and significantly more people said they were willing to report corruption (84% vs. 51%).

What this suggests is that, if harnessed effectively, anti-corruption efforts in Kosovo could be very popular, and therefore powerful. But the right strategies have to be implemented and publicized to garner public support.

Somewhat unsurprisingly, we believe a key strategy has to be raising awareness of how data can be used to reduce corruption and bring about change. This can apply equally to data that is currently collected by government agencies but isn’t publically released, or new datasets that the public can assist in collecting. With the right data and right analysis, these datasets can help to improve governance in numerous ways including:

  • exposing systematic corruption
  • identifying gaps in anti‑corruption controls, and
  • better targeting of anti-corruption efforts.

Using this open data approach also helps reduce reliance on the bravery of individual whistleblowers. Although whistleblowers are often vital in helping to identify incidents and even patterns of corruption, the fact is that, even in developed nations, they will always risk retaliation and other subtler forms of retribution (reduced career prospects, being ostracized by their peers and generally being perceived as untrustworthy).

Overall, what the results of the Transparency International data shows us is that, with better coordination and targeting of anti-corruption efforts, there is the potential to actively involve large numbers of Kosovars. If that can be achieved and funneled into meaningful strategies, the future of Kosovo could be very bright indeed.

Have any suggestions for ways data could be used to fight corruption? Disagree completely? Feel free to leave your thoughts in the comments!

 

[1] Gender Equality Related Corruption Risks and Vulnerabilities in Civil Service Kosovo, United Nations Development Programme. November 2014. Gender Corruption final Eng.pdf

[2] Defined by Transparency International ‘… as “the abuse of entrusted power for private gain”. Corruption can be classified as grand, petty and political, depending on the amounts of money lost and the sector where it occurs.’

[3] The methodology for compiling the CPI is reviewed on a yearly basis with data sources added and removed as needed.

[4] “To what extent do you see the following categories in this country affected by corruption?” – responses of “corrupt” or “extremely corrupt” recorded as a positive response.

[5] “In your contact or contacts with the institutions have you or anyone living in your household paid a bribe in any form in the past 12 months?“

[6] “Over the past 2 years, how has the level of corruption in this country changed?”

[7] “To what extent is this country’s government run by a few big entities acting in their own best interests?”

[8] “How effective do you think your government’s actions are in the fight against corruption?”

[9] By their own admission, Transparency International’s CPI is not a perfect measure of corruption. Corruption by its nature is hidden and so there is no objective measure of the true level of corruption. However, the CPI is currently the most respected measure of corruption available and so we make the assumption that it is also the most accurate for the purposes of constructing this model.

[10] Taken as the average of the percentage of people who said they would take part in a peaceful protest and the percentage of people who said they would join an organization that works to reduce corruption as an active member

Women and Corruption Issues in Kosovo

For those that don’t know, over the past couple of months I have been spending time working with a tech startup/NGO here in Pristina called Open Data Kosovo. The main aim of the organization is to encourage and facilitate the release of data and other information by the government of Kosovo (and related bodies) in order to increase transparency and reduce corruption. So far they have been fantastically successful, getting both national and international media attention, which is all the more impressive when you consider they are only now coming to the end of their first year of existence.

One of the main things I have been working on since joining is putting together some analysis of the various datasets they have been publishing online to see what conclusions can be provided to the public that might help create a more informed discussion of the issues. The first piece has now been published on the Open Data Kosovo website and we are excited to see what kind of feedback we get. If you want to take a look, please click the link below:

More women in leadership would probably reduce corruption, but is there a more effective way? 

© 2018 Brett Romero

Theme by Anders NorenUp ↑