A while back, I attended a hackathon in Belgrade as a mentor. This hackathon was the first ‘open data’ hackathon in Serbia and focused on making applications using data that had recently been released by various ministries, government agencies, and independent bodies in Serbia. As we walked around talking to the various teams, one of the things I noticed at the time, was that almost all teams were using databases to manage their data . In most cases, the database being used was something very lightweight like SQLite3, but in some cases more serious databases (MySQL, PostgreSQL, MongoDB) were also being used.
What I have come to realize is that in many cases this was probably completely unnecessary, particularly given the tight timeframe the teams were working towards – a functional prototype within 48 hours. However, even if you have more time to build an application, there are several good reasons that you may not need to worry about using a formal database. These are outlined below.
1. The data is small
Firstly, let’s clarify what I mean when I say ‘small data’. For me, small data is any dataset under 10,000 records (assuming a reasonable number of data points for each record). For many non-data people, 10,000 records may seem quite big, but when using programming languages such as Python or JavaScript, this amount of data is usually very quick and easy to work with. In fact, as Josh Zeigler found, even loading 100,000 records or 15MB of data into a page was possible, completing in as little as 463ms (Safari FTW).
Leaving aside the numbers for a second, the key point here is that in many cases, the data being displayed in an application has far fewer than 10,000 records. If your data is less than 10,000 records, you should probably ask yourself, do you need a database? It is often far simpler, and requires significantly less overhead to simply have your data in a JSON file and load it into the page directly. Alternatively, CSV and Excel files can also be converted to JSON and dumped to a file very quickly and easily using a Python/Pandas script.
2. The data is static
Another reason you may not need a database is if you have a reasonable expectation that the data you are using is not going to change. This is often the case where the data is going to be used for read only purposes – for example visualizations, dashboards and other apps where you are presenting information to users. In these cases, again it may make sense to avoid a database, and rely on a flat file instead.
The important point here is that if the data is not changing or being altered, then static files are probably all that is needed. Even if the data is larger, you can use a script to handle any data processing and load the (assumedly) aggregated or filtered results into the page. If your needs are more dynamic (i.e. you want to show different data to different users and do not want to load everything), you may need a backend (something you would need for a database anyway) that extracts the required data from the flat file, but again, a database may be overkill.
3. The data is simple
One of the big advantages of databases is their ability to store and provide access to complex data. For example, think about representing data from a chain of retail stores on the sale of various products by different sales people. In this case, because there are three related concepts (products, sales people and stores), representing this data without using a database becomes very difficult without a large amount of repetition[1]. In this case, even if the data is small and static, it may simply be better to use a relational database to store the data.
However, in cases where the data can be represented in a table, or multiple unrelated tables, subject to points 1 and 2 above, it may make sense to avoid the overhead of a database.
4. The data is available from a good API
I have recently been working on a project to develop an application that is making extensive use of the Google API. While still under development, the app is already quite complex, making heavy use of data to generate charts and tables on almost every page. However, despite this complexity, so far, I have not had to use a database.
One of the primary reasons I have not needed to implement a database is that the Google API is flexible enough for me to effectively use that as a database. Every time I need data to generate a chart or table, the app makes a call to the API (using Python), passes the results to the front end where, because the data is small (the Google API returns a maximum of 10,000 rows in a query), most of the data manipulation is handled using JavaScript on the client side. For the cases where more heavy data manipulation is required, I make use of Python libraries like Pandas to handle the data processing before sending the data to the front end. What this boils down to is a data intensive application that, as yet, still does not need a database.
Of course, this isn’t to say I will not need a database in the future. If I plan to store user settings and preferences, track usage of the application, or collect other meta data, I will need to implement a database to store that information. However, if you are developing an application that will make use of a flexible and reliable API, you may not need to implement your own database.
5. The app is being built for a short-term need
While it might seem unusual to build a web app with the expectation that it will not be used six months later, this is a surprisingly common use case. In fact, this is often the expectation for visualizations and other informative pages, or pages built for a specific event.
In these particular use cases, keeping down overhead should be a big consideration, in addition to potential hosting options. Developing these short-term applications without a backend and database means free and easy hosting solutions like that provided by GitHub can be used. Adding a backend or database immediately means a more complex hosting setup is required.
Wrapping up
This is a not an argument against databases, it is simply an argument to use the best and simplest tools for a given job. As someone who has worked with a number of different databases throughout their career, I am actually a big user of databases and find most of them intuitive and easy to use. There is also a large number of advantages that only a database can provide, from ensuring data consistency, to facilitating large numbers of users simultaneously making updates, to managing large and complex datasets, there are a number of very good reasons to use a database (SQL or NoSQL, whichever flavor you happen to prefer).
But, as we have covered above, there may be some cases where you do not need these features and can avoid adding an unnecessary complication to your app.
Next week we’ll take a look at a simple app that uses an Excel spreadsheet to generate the data required for the application.
[1] With repetition comes an increased risk of data quality issues
Leave a Reply