Category: Data Science (Page 1 of 3)

7 traits of the best Data Scientists

November 9, 2021 / Brett Romero / 0 Comments

Over the course of 10 years of working with data you tend to pick up a few things. You learn (and re-learn) languages and syntax. You make mistakes, some of them bad ones. You get better at spotting bullshit. You also start to observe some patterns, both in the work and in your colleagues.

Seven is an arbitrary number of course but I believe the following seven traits are things I have consistently observed across the best Data Scientists I have had the pleasure of working with. They are not things you can put on a resume. They probably won’t help you in that coding test or job interview. But they are traits which mean they are able to make more meaningful contributions, have a bigger impact, and get along better with their colleagues.

So, given this article is already long enough, let’s skip to the part we are all here for: 7 traits of the best Data Scientists.

Continue reading

Accessing the Google Ads API with a Service Account

June 25, 2021 / Brett Romero / 2 Comments

I have recently been doing some work for a client building an automated task in python that accesses the Google Ads API using a Service Account and got stumped by a very persistent “NOT_ADS_USER” error.

Despite the fact that we only wanted to use the API for reporting purposes (generating and downloading aggregate campaign statistics), the process Google makes you navigate to access the API is… arduous. It involves creating the Service Account (fair enough), enabling an API and generating a developer token in Google Ads, then make a DOMAIN WIDE DELEGATION to the service account! Anyway, I won’t go into detail on those steps here as the documentation for those steps is decent, but let’s look at a key step which I couldn’t find mentioned in the documentation anywhere.

Continue reading

python pandas

Pandas: Where and Mask

April 6, 2021 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

In this guide we are going to look at three ways to handle a scenario where you want to update the values in a column based on whether some condition is true or not. This condition could be applied based on the same column you want to update, a different column, or a combination of columns.

The three methods (or two methods and a function we will look at are:

where – a method of the pandas.DataFrame class.
mask – a method of the pandas.DataFrame class and inverse of where
numpy.where – a function in the numpy library.

Continue reading

python pandas

Pandas: Append and Concat

March 3, 2021 / Brett Romero / 0 Comments

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

In this guide we will look at a few methods we can use to add pandas DataFrames together vertically, stacking them one on top of the other. This will include two pandas methods concat and append, and a third way where we make use of some simple python methods. This last method can often be much faster than working with DataFrames directly, especially if we want to repeatedly append one row at a time to a DataFrame.

If you are looking at joining tables, or adding two tables together horizontally, try the guide on joining tables.

Continue reading

python pandas

Pandas: Joining tables

February 8, 2021 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

One of the most fundamental concepts in data science and data work in general is joining two tables together based on some shared column or index. In SQL it is a JOIN. In Excel it is INDEX-MATCH or VLOOKUP. In pandas, two methods are available to join tables together: merge and join. We will look at both of those methods in this guide.

Continue reading

python pandas

Pandas: How to Pivot data

February 1, 2021 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

When I was starting out with pandas, I was coming from an Excel and SQL background. Having spent a solid 8 years with Excel as my primary data munging and modeling tool, I was very comfortable using pivot tables, a tool I found extremely powerful and later discovered are strangely controversial. My workflow started to involve pivot tables so regularly that my SQL queries were often written to extract data in a format that would make it simpler to aggregate in a pivot table.

Naturally, when I started learning pandas, one of the first things I wanted to learn was “how can I recreate the functionality of an Excel pivot table in pandas”? In this guide we will look at several ways to do just that.

Continue reading

python pandas

Pandas: Advanced Aggregation

January 20, 2021 / Brett Romero / 0 Comments

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

Building on the basic aggregation guide, in this guide we will look at some more advanced ways we can aggregate data using pandas. We are going to cover three techniques:

Aggregating using different methods at the same time, for example, summing one column and taking the average of another.
Defining and using custom aggregation functions which we can use to calculate aggregates that are not available “out of the box”.
The transform method which can be used to do some very useful things with aggregated values.

Continue reading

python pandas

Pandas: Aggregation

January 8, 2021 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

A fundamental tool for working in pandas and with tabular data more generally is the ability to aggregate data across rows. Thankfully pandas gives us some easy-to-use methods for aggregation, which includes a range of summary statistics such as sums, min and max values, means and medians, variances and standard deviations, or even quantiles. In this guide we will walk through the basics of aggregation in pandas, hopefully giving you the basic building blocks to go on to more complex aggregations.

Continue reading

python pandas

Pandas: SettingWithCopyWarning

December 21, 2020 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

For many users starting out with pandas, a common and frustrating warning that pops up sooner or later is the following:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.

Try using .loc[row_indexer,col_indexer] = value instead

To the uninitiated, it can be hard to know what it means or if it even matters. In this guide, we’ll walk through what the warning means, why you are seeing it, and what you can do to avoid it.

Continue reading

python pandas

Pandas: Advanced booleans

December 16, 2020 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

In other sections in this series, we’ve looked at how we can use booleans (a value that is either True or False) in pandas. Specifically, we’ve looked at how a list or array of booleans can be used to filter a DataFrame. In those examples we generated lists of booleans using simple comparisons like “are the values in the fixed acidity column > 12?” However, simple comparisons like this are only one of many ways we can create booleans. In this guide we are going to look at a range of methods that allow us to do more complex comparisons, while also making our code more concise and easier to understand.

Continue reading

© 2025 Brett Romero

Theme by Anders Noren — Up ↑