Brett Romero

Data Inspired Insights

Page 7 of 11

Excel Tips – Array Functions

Excel’s array functions probably rival pivot tables for the title of most misunderstood and underutilized features available to Excel users. That is a shame as they are powerful tools that can be used to simply and elegantly address some of the key problems that users will regularly encounter. Although there is an almost endless list of ways in which array functions can be used, below I discuss three scenarios that I regularly encountered in which array functions can be useful.

Ensuring Formula Consistency

One of the simplest and most useful ways to utilize array functions is to ensure the consistency of formulae in large spreadsheets. There are numerous infamous examples where simple formula errors have undermined the credibility of analyses done in Excel (including a recent popular economics paper) and although this method (or any method for that matter) can’t prevent all errors, it can help to minimize the places where things can go wrong.

This method can be applied to any situation where you have a column or row in which you need to apply the same formula repeatedly (anywhere you would normally drag the formula across or down). To use an array function in this situation, select the full range of cells you need the formula to apply to and then create the formula in the same way you would normally, except replacing the single cell references with references to the range of cells instead. When the formula is ready, press Control + Shift + Enter to confirm it [1]. A simple example summing two columns is shown below:

sample_2_1

Using this method ensures that the formulae are consistent and also that individual formula in the column or row cannot be modified. You can test how this works yourself – once you have created an array formula (or opened the example file at the bottom of this page) try to delete or modify one of the individual cell formulas. You should be presented with an error like the following:

sample_2_2

This error prevents any modifications that users may inadvertently make, including the deletion of a row or column in the dataset.

Transposing Datasets

Occasionally the need will arise to transpose data (convert data that runs across a row into data that runs down a column, or vice versa). In my experience, these cases typically involve time series data running across a spreadsheet being transposed to run down the spreadsheet to make it easier to view.

The method here is to highlight the cells where you want the transposed results to display and enter the function “=TRANSPOSE( “, then highlight the values to be transposed. Note, that if the size of the ranges selected do not match (e.g. the number of columns to be transposed is greater than the number of rows in the range selected to display the results) the extra values will be dropped, or in the reverse case, the extra cells in the result set will be display as errors. An example is provided below:

sample_2_3

The big advantage of doing this (as opposed to simply copying and pasting the set with the transpose option selected) is that it maintains a live connection. Any updates in the original dataset will be reflected in the transposed dataset.

SUMIF with two conditions[2]

One of the primary reasons I started learning about array functions was this exact problem – how do I do a SUMIF with two conditions? This may seem like an esoteric requirement but, as the datasets you are working with become more complex, the need to summarize the data by multiple criteria becomes increasingly useful. Looking at an example, let’s imagine we have a small dataset of campaign contributions that contains the name, gender, city and the amount contributed for 10 individuals. This data is shown below:

sample_2_4

Now let’s imagine you want to identify the total contributions from all females in Boston. Broadly speaking, there are three ways you can do this:

  1. Manually select all the values and add them up
  2. Create a pivot table [3]
  3. Use an array formula

Often, the actual dataset is too big for method 1 (any dataset with more than 10 entries probably falls into this category), and a pivot table can be overkill when you only need to derive one value. In these cases, method 3 can be the simplest option.

To create our SUMIF with two criteria, we create a formula that actually looks like a SUMIF in some ways. Essentially we create two nested IF statements inside a SUM function. Our final formula is going to be as follows (if you are recreating this, don’t forget to use Control + Shift + Enter):

sample_2_5

The formula works by creating an array of TRUEs and FALSEs based on the criteria specified in the IF statements (i.e. that the value in the gender column = “F” and the value in the City column = “Boston”). If both conditions are met, the formula captures the corresponding value in the specified range (the contributions column in this case); otherwise, the FALSE value (zero) is captured. Once there is an array of values and zeroes, the SUM function will sum up all the values to provide the answer.

Finally, this structure can also be used for a range of functions including:

  • MAX – replace SUM with MAX
  • MIN – replace SUM with MIN and the false value (zero) with a value larger than the maximum value in the dataset
  • COUNT – set the TRUE value to 1 instead of a range
  • AVERAGE – use the COUNT and SUM functions described above to create an average

Still confused? Please feel free to download this example file: 

 

[1] This combination confirms to Excel that the formula is an array formula. If you simply press enter, the formula will not work as expected. Completed array formulas will display in the formula bar surrounded by {}, but you do not manually type these characters in.


[2] Since writing this article I have become aware of two new(ish) Excel functions that have effectively replaced array functions for this purpose. SUMIFS and COUNTIFS are built in functions available from the 2007 version of Excel onwards which allow the user to sum or count records and specify multiple criteria.


[3] Pivot tables are very useful for a range of purposes, but will be covered in a separate blog entry

Labor Statistics Part II – The Non-Participants

Previously in Labor Statistics Part I, we looked at data from the Bureau of Labor Statistics that showed, among other things, a falling participation rate since the turn of the century (see Chart 1). We also saw that even though unemployment has fallen over the last 5 years or so, that fall was at least partially a result of people leaving the labor force entirely rather than finding employment. Here we will take a more detailed look at those people classified as not in the labor force (those deemed to be not participating) to see if we can explain why the participation rate is falling.

Chart 1 – Participation Rate vs. Employed Population as a Percentage of Total Population 1947 to 2014

chart_4

Courtesy of Bureau of Labor Statistics (BLS) data we can actually break the population of people not in the labor force into several different subgroups, starting with two main groups: people who want a job (but don’t fall under the unemployed category); and people who don’t want a job. These populations can then be further broken down into subgroups based on: gender; race; and age. For our purposes we are going to focus mainly on the two main groups, people who want work and people who do not, with the latter further split into three age brackets (16-24, 25-54 and 55+). Chart 2 shows the break down of these groups (12 month moving averages have been used to smooth the series).

Chart 2 – Breakdown of Population Not in the Labor Force 1995 to 2015

chart_2_2

Looking at Chart 2, aside from an increase of over 20 million people over the last 20 years, no obvious trends emerge – none of the subgroups really appears to be shrinking or expanding significantly and all seem to be more or less moving in line with population growth. And, in fact, if we look at the percentages, this is more or less what we see:

Chart 3 – Breakdown of Population Not in the Labor Force – Percent of Total – 1995 to 2015

chart_2_3

Keeping in mind that the left hand axis of these charts has been zoomed in to exaggerate any changes, we can see that over 20 years, the proportion of the total population of people not in the labor force in each age bracket has not changed significantly. Summarizing:

  • Don’t Want a Job – 55+ years: fell a couple of percent from 2001 through to 2011, but in the past four years has basically recovered to where it was pre-2001.
  • Don’t Want a Job – 25-54 years: slowly increased around 1.5% from 1995 to 2005, but has now fallen back below 1995 levels.
  • Don’t Want a Job – 16-24 years: Increased around 5% from 1995 to 2011 but has fallen back a couple of percent since.
  • Do want a job – All Ages: Fell 4% from 1995 to 2001, was flat from 2001 to 2009 but has increased around a percentage point since.

Looking at the data this way, there aren’t many conclusions to draw. There is not enough real movement in the numbers to suggest anything is fundamentally changing in the pool of people considered not in the labor force.

However, one of the issues with analyzing data like this can be that the large pool of existing people can tend to obscure more rapid changes happening when you look at new entrants to, and exits from, that pool. So what does the picture look like if we analyze only the changes in the population from year to year?

Chart 4 – Annual Change to the Population Not in the Labor Force 1995 to 2015

chart_2_4

* 2015 figures are only for January, February and March

Looking at Chart 4, we can see several interesting trends and changes.

The first trend that becomes clear is how much of the growth in the population of people not in the labor force since the global recession in 2008-2009 is due to over 55 years olds. In fact, from 2010 to March 2015, the 55+ years age group contributed just over 75% of the growth in the population of people not in the labor force. And this trend has been accelerating. In 2014, the last year for which we have complete data, the 55+ years age group contributed close to 95% of total growth.

The second trend that stands out concerns 16-24 year olds. In this case there are large increases in the number of 16-24 year olds entering the population not in the labor force immediately after the two major downturns, the bursting of the tech bubble in 2000 and the financial crisis in 2008-2009. This suggests that following a downturn, young people are electing to either delay entering the workforce to stay in school longer, or are leaving the workforce and going back to school.

The third and final observation relates to those who indicated that they want a job, but do not meet the formal criteria to be classified as unemployed. In 2008 and 2009 we see large increases in the population of people who wanted a job but are not in the labor force. This is expected and corresponds with the financial crisis when millions of Americans lost their jobs. However, from 2010 onwards, the rate of this increase slowed, and, in some years, even reversed.

Drawing all the threads together, what are the main conclusions we can make from the above observations? My main takeaways are as follows:

  1. The growth in the population of people not in the labor force (and the decline in the participation rate), particularly over the past 5 years, has been mostly driven by over 55 year olds. This actually agrees with one of the more prevalent theories for why the participation rate is falling – the baby boomer generation is reaching retirement age and retiring en masse. Additionally, this trend of decreasing participation seems set to continue, meaning a decreasing percentage of the population will be required to fund the government for an ageing population.
  2. 16-24 year olds tend to drop out of the labor force in the immediate aftermath of downturns. Given the age range, a logical explanation for this is that, when the job market is poor, young people elect to either stay in school longer or return to school to pursue further studies. The positive news is that the spikes in young people leaving the labor force appear to be short term. Additionally, taking a long-term perspective, more young people pursuing higher education will arguably benefit the wider economy in later years.
  3. Since 2013, the number of people not in the labor force who do want a job appears to be on the decline (or has at least leveled off). This in turn suggests the current recovery is a genuine one, even in the face of a falling participation rate.

Finally, I want to leave you with one last chart. Chart 5 shows the changes in the proportion of men and women in the population of people not in the labor force. The positive news is that, although there is still a gender divide, it appears that the gap is continuing to close. In fact, 2014 was the year in which women represented the smallest proportion of the population not in the labor force, and the first time (probably since the US existed) that the percentage has dropped under 60%.

Chart 5 – Gender Split of Not in the Labor Force Population 1947 to 2014

chart_2_5

Excel Tips – Template to Table

There was an interesting Excel problem I encountered a while back which I thought I would share as the solution ended up being quite well received and even got me a nice Swiss dinner.

Problem

The problem arose out of a reasonably common scenario – a friend of mine had begun working in a new workplace where they had been using a MS Word template to collect the results of survey they had been undertaking. Being more of a data focused person, she managed to convince them to migrate the template to MS Excel as this would allow easier aggregation and analysis of the data. However, in convincing them to migrate, she had to design the Excel template in a manner more or less mirroring the Word template, which meant merged cells, data in various columns and rows on the spreadsheet, and generally no easy way to tabulate the data across multiple surveys.

After the surveys starting coming in from various areas and she had spent an inordinate amount of time manually copying and pasting the results into a table, she asked me if there was a better way to do this.

Solution

After a couple of false starts, I came to a point where I thought the only way to do this was going to be a macro. The issues with using a macro are:

  • if it breaks, it is likely that no one else will be able to fix it
  • it isn’t easy for others to determine what exactly the macro is doing, and
  • moving a macro across multiple spreadsheets and workbooks can get messy.

Asides from those issues, non-technical people tend to get a little squeamish around macros.

I ended up stumbling across this solution almost by pure chance as I happened to be doing something at work that resulted in me using an Excel function that I hadn’t used very often: INDIRECT. What INDIRECT does is takes any string (either typed directly into the formula, referenced from another cell, or some combination of both) and tries to read it as a cell reference.

For example – if you type “A3” into a cell, then reference that cell (“=A1” for example), the result will be that both the original cell and the cell with the reference will now read “A3”. But what if you wanted the cell referencing the original cell to actually use the value in that cell as a reference to look up a third value in cell A3? That is where INDIRECT comes in:

sample_1_1

In this simple example we have used it to reference a cell in the same sheet, but we can also use INDIRECT to build a reference to a different sheet or workbook. And that’s where this is going. All the templates were the same – the same cell reference for each data point every time – but the name of the sheet (or workbook if they haven’t been added together yet) was always different. So what if we use an INDIRECT formula to dynamically look up the name of the sheet, which the user inputs? We can then use that value to complete the cell references for all the cells looking up the information on that sheet. Here is an example of what that would look like:

sample_1_2

Once you have the basic formula working there are 3 steps left:

  1. Wrap the formula in an IFERROR clause ( =IFERROR(formula, “”) ) so that if the ‘Sheet Name’ column is blank, you get blank data fields instead of a ton of #REF errors.
  2. Manually ‘program’ the first row. You will have to ensure all the right cells are being looked up in the right columns.
  3. Drag the formula down for as many rows as you like.

Once this is done, users of the workbook will be able to move a completed template into this workbook, enter the sheet name in the table, and all the results should immediately populate in the table. No Macros, no mess, and a convenient and simple way to convert information from an irregularly formatted template into tabular data.

Still confused? Please feel free to download this example file: 

Labor Statistics Part I – Setting the Scene

The unemployment rate: in western countries this tends to be one of the most discussed and politicized of the official statistics produced by Governments. In the US, the unemployment rate has been under a higher than usual level of scrutiny since the 2008 financial meltdown led to historically high levels of unemployment.

However, as of February 2015 the unemployment rate has fallen to 5.5%, meaning it is back within the normal historical range and is expected to keep falling. So everything is all good right? Maybe.

Chart 1 – US Unemployment Rate Jun 1976 to Feb 2015

chart_1_1

Before we get into that, first a bit more detail on how the unemployment rate is calculated. For such a well-known statistic, it appears to be relatively poorly understood outside of the world of policy wonks. Ask the average person to guess how the unemployment rate is calculated and they are likely to guess something along the lines of the following:

equation_1

Someone with a bit more time to think on it may consider the fact that not all of the population are of a working age and factor that into their guess:

equation_2

But even this more refined calculation would result in an unemployment rate well into the 30%-40% range. The reason for that is, as the Bureau of Labor Statistics (BLS) outlines, to qualify as unemployed, a person has to be part of the Civilian non-institutional population [1] and meet one of the following two criteria (emphasis mine):

  • had no employment during the reference week, were available for work, except for temporary illness, and had made specific efforts to find employment some time during the 4 week-period ending with the reference week, or
  • were waiting to be recalled to a job from which they had been laid off.

As I am sure you can imagine, this definition leaves a lot of people that many would consider “unemployed” in some third pool, neither employed nor unemployed as the BLS defines it. These people are actually in a pool labeled “not in the labor force”.

From Bureau of Labor Statistics data, in 2014, people in the labor force made up 62.9% of the Civilian non-institutional population, leaving just over 92 million people outside the labor force. But this percentage of people in the labor force (also called the “Participation Rate”) has changed significantly over time. Chart 2 shows the Participation Rate since 1947:

Chart 2 – US Participation Rate 1947 to 2014

chart_1_2

There are several interesting things to notice here.

Firstly, from the early 60s until the late 90’s, the percentage of people considered in the labor force surged – from 58.7% in 1963 to 67.1% in the year 2000. To give you an idea of how that translates into numbers of people, that increase meant there were an additional 17.8 million people in the labor force in the year 2000 than there would have been if the participation rate from 1963 had remained unchanged.

Who were all these extra people? Most of this increase represents the movement of women into the labor force over time and the rise of the two income household, both of which can be seen in the increasing participation rate for woman (see Chart 3). Although this indirectly led to a lower participation rate for men, the overall result was an increase in the participation rate in general.

Chart 3 – Participation Rate by Gender 1948 to 2014

chart_1_3

The other major trend we can see in the Participation Rate over time is the downwards trend since 2000. By the end of 2014 the US had returned to a Participation Rate not seen since 1977. The result is that even though the headline unemployment rate has dropped back down to 5.5% as of February 2015, once the lower participation rate is factored, the picture isn’t nearly as rosy, as shown in Chart 4:

Chart 4 – Participation Rate vs. Employed Population as a Percentage of Total Population 1947 to 2014

chart_4

This has not gone unnoticed (see ForbesCNN and Bloomberg for example), but what is the cause and what impact does this have for public policy going forward? This will be something we will explore over the next few weeks in a series of articles – watch this space.

In the meantime, keep an eye on the Datasets section of this website for downloads of the various datasets being used in these articles.

 


[1] Persons 16 years of age and older residing in the 50 states and the District of Columbia, who are not inmates of institutions (e.g., penal and mental facilities, homes for the aged), and who are not on active duty in the Armed Forces.

Eurozone Perceptions

It has long been a perception held by many in the western world that the people of southern Europe (Spain, Portugal, Italy and Greece for the purposes of this article) have a particularly easy-going approach to work, life and financial responsibility. Whether this is a good or bad thing depends on who you ask and even what time of year you ask them as Ed Vulliamy describes.

However, with the onset of the European debt crisis, these perceptions have taken on a new prominence as they are now used to justify the harsh austerity being forced on Southern European nations, with special scorn and head shaking reserved for Greece in particular. At the deepest level, the enforcement of austerity is being spun as a moral tale – the people of Southern Europe are suffering for their laziness and financial irresponsibility. The financial irresponsibility aspect of this is a topic for another article, but here we will look at the evidence supporting the proposition that people in Southern European nations are ‘lazier’ than their northern European neighbors.

The first step in analyzing this proposition is defining how we measure ‘laziness’. In general, laziness refers to a lack of willingness to work or expend energy. Given we have no quantitative way of comparing how much energy people are expending, or their willingness to perform work (what a different world it would be if we could!), a good proxy to determine relative energy expenditure, and therefore laziness, is the number of hours worked. Conveniently, the OECD produces statistics on average hours worked per person per a year for most OECD nations, which includes most of the European nations we are interested in.

An argument can be made about the productivity of the respective workers but productivity has its own larger distortions due to the impact of differing levels of capital investment. A German working in a car manufacturing plant controlling a high tech automated assembly line will be much more productive (in terms of the value of his output) than an Italian waiting tables in a coffee shop – but this tells us nothing about the level of effort (or lack thereof) being expended, and also nothing about the time being spent at work.

So looking at the OECD statistics on hours worked, what do we see for the countries we are talking about?

Table 1

prod_table_1

What we see is actually the opposite of the commonly assumed situation. The famously hardworking Germans are averaging less than 1,400 hours a year of work or under 27 hours a week averaged over 52 weeks. This is actually the 2nd lowest of all countries in the OECD in 2012. Meanwhile, the Greeks, often held up as the epitome of laziness (at least in Europe) actually work some of the longest hours in the OECD – the third longest in fact, behind only the Koreans and the Mexicans. In 2012 the average Greek clocked up 2,034 hours of work, or the hours of almost 1.5 Germans. So how do we explain this? How can the perception be so different to what we are seeing here?

When we look at the data, some trends begin to emerge that explain some of the differences in hours worked. The first and most obvious trend that emerges when we expand our dataset to the full OECD and for all years covered is a negative correlation between hours worked and GDP (PPP) per capita (a rough proxy for wealth – see Chart 1). The trend is clear, both across countries and across time, and intuitively this makes sense – as people get wealthier, they feel less need to work long hours.

Chart 1 – GDP per capita (PPP) Vs. Average hours worked per person per year – OECD Countries, 2000-2012

prod_chart_1

Asides from the negative correlation between GDP (PPP) per capita and average hours worked, there are a few other observations we can make looking at this chart. The first observation is that the minimum hours that people work seems to bottom out at around 1400 hours a year – more or less where Germany and the Netherlands sit currently. Again this conclusion checks out logically. Subject to the social expectations and the demands of a given job, people aim to reach a comfortable balance between work and leisure time. Once this is achieved, they generally let any further increases in income accrue to their wealth rather than further reduce their working hours.

The second observation is that at any given level of GDP (PPP) per capita, there is a still a high level of variability between countries as to how many hours the average person will work. More than anything, this shows there is a range of factors that create variances in hours worked between countries. Labor force restrictions, minimum wage, unemployment benefits, education levels, inequality and the general structure of the economy will all affect the hours worked at a given level of GDP (PPP) per capita.

What else can we determine looking at this information? If we believe wealth to be a major factor in how many hours a person will work, what would it look like if we could remove the impact of wealth? In fact we can remove the wealth effect from this data by building a simple linear model that estimates the average amount of hours a person would work given a certain level of GDP per capita. From there we can then see where countries lie relative to the model prediction, effectively telling us which countries are working more hours than we would expect for their relative level of wealth, and which countries are working less. The 2012 data, with a linear model applied is shown in Chart 2.

Chart 2 – Average hours worked per person per year Vs. GDP per Capita (PPP) – OECD Countries, 2012

prod_chart_2

What this model tells us is that for every extra $1,000 of GDP (PPP) per capita, the average person will work 16.4 hours less per year. When we use this model to predict the number of hours the average person will work per year based on the GDP (PPP) per capita of their country, we come up with an estimated hours worked per person per year for each country, which we can then compare to the actual value for each country. The results of this comparison are shown in Chart 3.

Chart 3 – Average hours per person per year – Actual vs. Forecast, 2012

prod_chart_3

What we see is that even if we remove the differences in wealth from average hours worked per person per year, the average citizen in many northern European countries (particularly Germany and Denmark) are still working less hours than we would expect. The verdict for Southern European nations is more mixed. People in Portugal and Spain are also working fewer hours than we would expect, the Italians are more or less in line with expectations, while the Greeks are again well ahead of what would be expected.

So what is the bottom line here? What conclusions can we take away from this? The answer is surprisingly little. There are a huge range of incentives and disincentives that are unique to each country that we are completely ignoring in this analysis. We also have no way of identifying how effective or productive different people are while they are at work, which as I’m sure anyone who has worked with another human being can testify, can vary pretty dramatically. So, despite the above evidence, no one should be prepared to believe the people of Germany or Denmark are ‘lazier’ than people in Mexico, the US or Korea. What we can say though is what evidence is missing from the above analysis – and what is clearly missing is any evidence that the people of southern European nations are ‘lazier’ than their northern European counterparts.

Reds Shake off the Blues

Saturday, 21 August 2010

Narrabundah FC 4

Brindabella Blues 2

Scoring
1 – 0  Dziubinski ‘9
1 – 1  W.Atrudoingtony ’23
2 – 1  Tadic 31′
3 – 1  Quick 36′
4 – 1  Tadic 53′
4 – 2  W. Atrudoingseri 81′

On a day that would have been glorious if not for the gale force winds that are usually associated with the home ground of the Reds opponents, Narrabundah scraped together 11 players to face Brindabella bogan patrol in a game that bore no significance for either team.

Continue reading

The toe poke heard around the world…

Saturday, 14 August 2010

Woden Valley 0

Narrabundah FC 1

Scoring
0 – 1  Romero 35′

It was a windy and cold Saturday as Narrabundah traveled to Woden for a game that meant little to either team. However, despite the low stakes, it was a hard fought encounter with both teams seemingly having the same idea of trying to play themselves into form for the approaching finals series.

Continue reading

Olympic Do Everyone a Favour a Don’t Bother Showing Up

Saturday, 7 August 2010

Narrabundah FC 3

Canberra Olympic 0

Scoring
Olympic Forfeit

On a glorious day in South Canberra, it was a fired up Narrabundah side that arrived at the ground to book their spot in the top 4 with a win over league cellar dwellers, Canberra Olympic. Unfortunately/fortunately, they didn’t bother coming and so Coach Romero put his chargers through their paces instead with a solid training session. Given there was no game to discuss, please enjoy the following explanation of the global financial crisis instead.

Continue reading

Improved Reds Pipped by Students

Saturday, 31 July 2010

ANU 2

Narrabundah FC 1

Scoring
1 – 0  O.G. 48′
1 – 1  Padrotta 57′
2 – 1  I. Canrunfast 81′

On a day that at times was Antarctic in temperature, the Reds took to the ANU synthetic pitch and put in a dramatically improved showing on recent weeks, pushing 1st place chasing ANU all the way, and were perhaps a little unfortunate not to come away with at least a point.

Continue reading

Lionel Richie has Queanbeyan Dancing on the Ceiling

Saturday, 24 July 2010

Narrabundah FC 1

Queanbeyan City 1

Scoring
1 – 0  Padrotta 31′
1 –1  L. Richie 88′

On a sparkling day in south Canberra, a Narrabundah side low on confidence lined up against a side that has proven to be their bogey team this season, the lowly Queanbeyan City. With a combination of hard running and non-stop whinging both on and off the field, they are team that has always seemed to unnerve a usually calm and collected Narrabundah side and so it was to prove today.

Continue reading
« Older posts Newer posts »

© 2024 Brett Romero

Theme by Anders NorenUp ↑