Brett Romero

Data Inspired Insights

Page 2 of 7

Pandas: Basic data interrogation

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

Once we have our data in a pandas DataFrame, the basic table structure in pandas, the next step is how do we assess what we have? If you are coming from Excel or R Studio, you are probably used to being able to look at the data any time you want. In python/pandas, we don’t have a spreadsheet to work with, and we don’t even have an equivalent of R Studio (although Jupyter notebooks are a similar concept), but we do have several tools available that can help you get a handle on what your data looks like.

Continue reading

Pandas: Reading in JSON data

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

When we are working with data in software development or when the data comes from APIs, it is often not provided in a tabular form. Instead it is provided in some combination of key-value stores and arrays broadly denoted as JavaScript Object Notation (JSON). So how do we read this type of non-tabular data into a tabular format like a pandas DataFrame?

Continue reading

Pandas: Reading in tabular data

This article is part of a series of practical guides for using the Python data processing library pandas. To see view all the available parts, click here.

To get started with pandas, the first thing you are going to need to understand is how to get data into pandas. For this guide we are going to focus on reading in tabular data (i.e. data stored in a table with rows and columns). If you don’t have some data available but want to try some things out, a great place to get some data to play with is the UCI Machine Learning Repository.

Continue reading

Why the ‘boring’ part of Data Science is actually the most interesting

For the last 5 years, data science has been one of the world’s hottest professions, but it is also one of the most poorly defined. This can be seen on any career website, where advertisements for ‘Data Scientist’ positions describe everything from what used to be a simple data analyst role, to technical, PhD-only, research positions working on artificial intelligence or autonomous cars.

Continue reading

The Surprising Complexity of Randomness

Previously, in a walkthrough on building a simple application without a database, I touched on randomness. Randomness and generating random numbers is a surprisingly deep and important area of computer science, and also one that few outside of computer science know much about. As such, for my own benefit as much as yours, I thought I would take a deeper look at the surprising complexity of randomness.

Continue reading

Forget SQL or NoSQL – 5 scenarios where you may not need a database at all

A while back, I attended a hackathon in Belgrade as a mentor. This hackathon was the first ‘open data’ hackathon in Serbia and focused on making applications using data that had recently been released by various ministries, government agencies, and independent bodies in Serbia. As we walked around talking to the various teams, one of the things I noticed at the time, was that almost all teams were using databases to manage their data . In most cases, the database being used was something very lightweight like SQLite3, but in some cases more serious databases (MySQL, PostgreSQL, MongoDB) were also being used.

Continue reading

Uber Vs Taxi – A Follow-Up

Hi everyone – welcome to 2017! I hope you all had a good Christmas and New Year’s Eve and are geared up for a big 2017.

Kicking off the year, this week, I happened to stumble on a series of articles written by Hubert Horan, who has spent the last 40 years working in the transportation industry, particularly the management and regulation of airlines. In a four-part series (two pieces were later added to respond to reader comments and look at newer evidence) published at nakedcapitalism.com, he takes a critical look at the Uber business model and dispels a bunch of myths.

Continue reading

JSONify It – CSV to JSON Converter

Go to JSONify It

For those who have some experience in creating visualizations, particularly online visualizations using JavaScript and libraries such as D3.js, one thing that you will often come across is the need to convert your data. Typically this need will arise because the data you receive or collect will be in a human-friendly format such as an Excel spreadsheet, and in order for you to use it for the visualization you will need that data in JSON format. Annoyingly, this will often be just a one time conversion, meaning writing a stand alone script to do the conversion often seems like overkill.

Continue reading
« Older posts Newer posts »

© 2021 Brett Romero

Theme by Anders NorenUp ↑