Author: Brett Romero (Page 2 of 11)

Pandas: Advanced booleans

December 16, 2020 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

In other sections in this series, we’ve looked at how we can use booleans (a value that is either True or False) in pandas. Specifically, we’ve looked at how a list or array of booleans can be used to filter a DataFrame. In those examples we generated lists of booleans using simple comparisons like “are the values in the fixed acidity column > 12?” However, simple comparisons like this are only one of many ways we can create booleans. In this guide we are going to look at a range of methods that allow us to do more complex comparisons, while also making our code more concise and easier to understand.

Pandas: Filtering and segmenting

December 1, 2020 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

One of the most common ways you will interact with a pandas DataFrame is by selecting different combinations of columns and rows. This can be done using the numerical positions of columns and rows in the DataFrame, column names and row indices, or by filtering the rows by applying some criteria to the data in the DataFrame. All of these options (and combinations of them) are available, so let’s dig in!

Pandas: Basic data interrogation

November 24, 2020 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

Once we have our data in a pandas DataFrame, the basic table structure in pandas, the next step is how do we assess what we have? If you are coming from Excel or R Studio, you are probably used to being able to look at the data any time you want. In python/pandas, we don’t have a spreadsheet to work with, and we don’t even have an equivalent of R Studio (although Jupyter notebooks are a similar concept), but we do have several tools available that can help you get a handle on what your data looks like.

Pandas: Reading in JSON data

November 23, 2020 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

When we are working with data in software development or when the data comes from APIs, it is often not provided in a tabular form. Instead it is provided in some combination of key-value stores and arrays broadly denoted as JavaScript Object Notation (JSON). So how do we read this type of non-tabular data into a tabular format like a pandas DataFrame?

Pandas: Reading in tabular data

November 13, 2020 / Brett Romero

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

To get started with pandas, the first thing you are going to need to understand is how to get data into pandas. For this guide we are going to focus on reading in tabular data (i.e. data stored in a table with rows and columns). If you don’t have some data available but want to try some things out, a great place to get some data to play with is the UCI Machine Learning Repository.

Official Release: Visual Analytics

August 25, 2017 / Brett Romero / 0 Comments

I am proud to announce the release of an application I’ve been working on for the last few months – Visual Analytics. This application is designed to give you a new way to view your Google Analytics data using a range of interactive visualizations, allowing you to get a better understanding of who your users are, how they are getting to your site, and what they are doing when they get there.

Why the ‘boring’ part of Data Science is actually the most interesting

May 30, 2017 / Brett Romero / 1 Comment

For the last 5 years, data science has been one of the world’s hottest professions, but it is also one of the most poorly defined. This can be seen on any career website, where advertisements for ‘Data Scientist’ positions describe everything from what used to be a simple data analyst role, to technical, PhD-only, research positions working on artificial intelligence or autonomous cars.

The Surprising Complexity of Randomness

May 20, 2017 / Brett Romero / 0 Comments

Previously, in a walkthrough on building a simple application without a database, I touched on randomness. Randomness and generating random numbers is a surprisingly deep and important area of computer science, and also one that few outside of computer science know much about. As such, for my own benefit as much as yours, I thought I would take a deeper look at the surprising complexity of randomness.

How to create a flashcard app without a database

April 27, 2017 / Brett Romero / 5 Comments

Last week, I covered how setting up a database may not be necessary when creating an app or visualization, even one that relies on data. This week we are going to walk through an example application that runs off data, but does not need a formal database.

Forget SQL or NoSQL – 5 scenarios where you may not need a database at all

April 18, 2017 / Brett Romero / 0 Comments

A while back, I attended a hackathon in Belgrade as a mentor. This hackathon was the first ‘open data’ hackathon in Serbia and focused on making applications using data that had recently been released by various ministries, government agencies, and independent bodies in Serbia. As we walked around talking to the various teams, one of the things I noticed at the time, was that almost all teams were using databases to manage their data . In most cases, the database being used was something very lightweight like SQLite3, but in some cases more serious databases (MySQL, PostgreSQL, MongoDB) were also being used.

Author: Brett Romero (Page 2 of 11)

Pandas: Advanced booleans

Pandas: Filtering and segmenting

Pandas: Basic data interrogation

Pandas: Reading in JSON data

Pandas: Reading in tabular data

Official Release: Visual Analytics

Why the ‘boring’ part of Data Science is actually the most interesting

The Surprising Complexity of Randomness

How to create a flashcard app without a database

Forget SQL or NoSQL – 5 scenarios where you may not need a database at all

Archives

Categories

Archives

Categories

Tags