Data Inspired Insights

Category: Technology (Page 4 of 4)

Excel Tips – Array Functions

Excel’s array functions probably rival pivot tables for the title of most misunderstood and underutilized features available to Excel users. That is a shame as they are powerful tools that can be used to simply and elegantly address some of the key problems that users will regularly encounter. Although there is an almost endless list of ways in which array functions can be used, below I discuss three scenarios that I regularly encountered in which array functions can be useful.

Ensuring Formula Consistency

One of the simplest and most useful ways to utilize array functions is to ensure the consistency of formulae in large spreadsheets. There are numerous infamous examples where simple formula errors have undermined the credibility of analyses done in Excel (including a recent popular economics paper) and although this method (or any method for that matter) can’t prevent all errors, it can help to minimize the places where things can go wrong.

This method can be applied to any situation where you have a column or row in which you need to apply the same formula repeatedly (anywhere you would normally drag the formula across or down). To use an array function in this situation, select the full range of cells you need the formula to apply to and then create the formula in the same way you would normally, except replacing the single cell references with references to the range of cells instead. When the formula is ready, press Control + Shift + Enter to confirm it [1]. A simple example summing two columns is shown below:

sample_2_1

Using this method ensures that the formulae are consistent and also that individual formula in the column or row cannot be modified. You can test how this works yourself – once you have created an array formula (or opened the example file at the bottom of this page) try to delete or modify one of the individual cell formulas. You should be presented with an error like the following:

sample_2_2

This error prevents any modifications that users may inadvertently make, including the deletion of a row or column in the dataset.

Transposing Datasets

Occasionally the need will arise to transpose data (convert data that runs across a row into data that runs down a column, or vice versa). In my experience, these cases typically involve time series data running across a spreadsheet being transposed to run down the spreadsheet to make it easier to view.

The method here is to highlight the cells where you want the transposed results to display and enter the function “=TRANSPOSE( “, then highlight the values to be transposed. Note, that if the size of the ranges selected do not match (e.g. the number of columns to be transposed is greater than the number of rows in the range selected to display the results) the extra values will be dropped, or in the reverse case, the extra cells in the result set will be display as errors. An example is provided below:

sample_2_3

The big advantage of doing this (as opposed to simply copying and pasting the set with the transpose option selected) is that it maintains a live connection. Any updates in the original dataset will be reflected in the transposed dataset.

SUMIF with two conditions[2]

One of the primary reasons I started learning about array functions was this exact problem – how do I do a SUMIF with two conditions? This may seem like an esoteric requirement but, as the datasets you are working with become more complex, the need to summarize the data by multiple criteria becomes increasingly useful. Looking at an example, let’s imagine we have a small dataset of campaign contributions that contains the name, gender, city and the amount contributed for 10 individuals. This data is shown below:

sample_2_4

Now let’s imagine you want to identify the total contributions from all females in Boston. Broadly speaking, there are three ways you can do this:

  1. Manually select all the values and add them up
  2. Create a pivot table [3]
  3. Use an array formula

Often, the actual dataset is too big for method 1 (any dataset with more than 10 entries probably falls into this category), and a pivot table can be overkill when you only need to derive one value. In these cases, method 3 can be the simplest option.

To create our SUMIF with two criteria, we create a formula that actually looks like a SUMIF in some ways. Essentially we create two nested IF statements inside a SUM function. Our final formula is going to be as follows (if you are recreating this, don’t forget to use Control + Shift + Enter):

sample_2_5

The formula works by creating an array of TRUEs and FALSEs based on the criteria specified in the IF statements (i.e. that the value in the gender column = “F” and the value in the City column = “Boston”). If both conditions are met, the formula captures the corresponding value in the specified range (the contributions column in this case); otherwise, the FALSE value (zero) is captured. Once there is an array of values and zeroes, the SUM function will sum up all the values to provide the answer.

Finally, this structure can also be used for a range of functions including:

  • MAX – replace SUM with MAX
  • MIN – replace SUM with MIN and the false value (zero) with a value larger than the maximum value in the dataset
  • COUNT – set the TRUE value to 1 instead of a range
  • AVERAGE – use the COUNT and SUM functions described above to create an average

Still confused? Please feel free to download this example file: 

 

[1] This combination confirms to Excel that the formula is an array formula. If you simply press enter, the formula will not work as expected. Completed array formulas will display in the formula bar surrounded by {}, but you do not manually type these characters in.


[2] Since writing this article I have become aware of two new(ish) Excel functions that have effectively replaced array functions for this purpose. SUMIFS and COUNTIFS are built in functions available from the 2007 version of Excel onwards which allow the user to sum or count records and specify multiple criteria.


[3] Pivot tables are very useful for a range of purposes, but will be covered in a separate blog entry

Excel Tips – Template to Table

There was an interesting Excel problem I encountered a while back which I thought I would share as the solution ended up being quite well received and even got me a nice Swiss dinner.

Problem

The problem arose out of a reasonably common scenario – a friend of mine had begun working in a new workplace where they had been using a MS Word template to collect the results of survey they had been undertaking. Being more of a data focused person, she managed to convince them to migrate the template to MS Excel as this would allow easier aggregation and analysis of the data. However, in convincing them to migrate, she had to design the Excel template in a manner more or less mirroring the Word template, which meant merged cells, data in various columns and rows on the spreadsheet, and generally no easy way to tabulate the data across multiple surveys.

After the surveys starting coming in from various areas and she had spent an inordinate amount of time manually copying and pasting the results into a table, she asked me if there was a better way to do this.

Solution

After a couple of false starts, I came to a point where I thought the only way to do this was going to be a macro. The issues with using a macro are:

  • if it breaks, it is likely that no one else will be able to fix it
  • it isn’t easy for others to determine what exactly the macro is doing, and
  • moving a macro across multiple spreadsheets and workbooks can get messy.

Asides from those issues, non-technical people tend to get a little squeamish around macros.

I ended up stumbling across this solution almost by pure chance as I happened to be doing something at work that resulted in me using an Excel function that I hadn’t used very often: INDIRECT. What INDIRECT does is takes any string (either typed directly into the formula, referenced from another cell, or some combination of both) and tries to read it as a cell reference.

For example – if you type “A3” into a cell, then reference that cell (“=A1” for example), the result will be that both the original cell and the cell with the reference will now read “A3”. But what if you wanted the cell referencing the original cell to actually use the value in that cell as a reference to look up a third value in cell A3? That is where INDIRECT comes in:

sample_1_1

In this simple example we have used it to reference a cell in the same sheet, but we can also use INDIRECT to build a reference to a different sheet or workbook. And that’s where this is going. All the templates were the same – the same cell reference for each data point every time – but the name of the sheet (or workbook if they haven’t been added together yet) was always different. So what if we use an INDIRECT formula to dynamically look up the name of the sheet, which the user inputs? We can then use that value to complete the cell references for all the cells looking up the information on that sheet. Here is an example of what that would look like:

sample_1_2

Once you have the basic formula working there are 3 steps left:

  1. Wrap the formula in an IFERROR clause ( =IFERROR(formula, “”) ) so that if the ‘Sheet Name’ column is blank, you get blank data fields instead of a ton of #REF errors.
  2. Manually ‘program’ the first row. You will have to ensure all the right cells are being looked up in the right columns.
  3. Drag the formula down for as many rows as you like.

Once this is done, users of the workbook will be able to move a completed template into this workbook, enter the sheet name in the table, and all the results should immediately populate in the table. No Macros, no mess, and a convenient and simple way to convert information from an irregularly formatted template into tabular data.

Still confused? Please feel free to download this example file: 

Newer posts »

© 2024 Brett Romero

Theme by Anders NorenUp ↑