Data Inspired Insights

Tag: data (Page 3 of 4)

Data Science: A Kaggle Walkthrough – Creating a Model

This article is Part VI in a series looking at data science and machine learning by walking through a Kaggle competition. If you have not done so already, you are strongly encouraged to go back and read the earlier parts – (Part I, Part II, Part III, Part IV and Part V).

Continuing on the walkthrough, in this part we build the model that will predict the first booking destination country for each user based on the dataset created in the earlier parts.

Continue reading

Data Science: A Kaggle Walkthrough – Adding New Data

This article is Part V in a series looking at data science and machine learning by walking through a Kaggle competition. If you have not done so already, you are strongly encouraged to go back and read the earlier parts – (Part I, Part II, Part III and Part IV).

Continuing on the walkthrough, in this part we take the data from sessions.csv that we left aside initially and add it to the transformed and expanded data from Part IV.  This part will cover, in brief, all the steps in Parts II – IV.

Continue reading

Data Science: A Kaggle Walkthrough – Data Transformation and Feature Extraction

This article on data transformation and feature extraction is Part IV in a series looking at data science and machine learning by walking through a Kaggle competition. If you have not done so already, you are strongly encouraged to go back and read Part I, Part II and Part III.

Continuing on the walkthrough, in this part we focus on getting the data we cleaned in Part III ready for use in the classification algorithm. These steps are often referred to as data transformation and feature extraction.

Continue reading

Data Science: A Kaggle Walkthrough – Introduction

This article on understanding the data is Part I in a series looking at data science and machine learning by walking through a Kaggle competition. The other parts in this series can be found here.

In a futile attempt to shed some light on the field of Data Science, I have put together a multi-part series looking at what data science involves and some of the techniques most commonly used. This series is not intended to make everyone experts on data science, rather it is intended to simply try and remove some of the fear and mystery surrounding the field. In order to be as practical as possible, this series will be structured as a walk through of the process of entering a Kaggle competition and the steps taken to arrive at the final submission.

Continue reading

The argument for taxing capital gains at the full rate

Politicians, both in Australia and the US, when asked how they will find the money to fund various policy proposals, often resort to the magic pudding of funding sources that is “closing the loop holes in the tax code”. After all, who can argue with stopping tax dodgers rorting the system? But as Megan McArdle recently pointed out, raising any significant revenue from closing loop holes requires denying deductions for things that a lot of middle and lower class people also benefit from. This includes, among other things, deductions for mortgage interest, employee sponsored health insurance, lower (or no) tax on money set aside for pensions and no tax on capital gains when the family house is sold.[1]

Broadly, I agree with McArdle’s point. The public, in general, are far too easily convinced by simplistic arguments about changes to taxation – as if after decades of tax policy changes there are still simple ways to increase revenues without anyone suffering. Any changes made at this point are going to cause winners and losers, and often, the people intended to be the losers (usually the rich) are less affected than some other group that also happened to be taking advantage of a particular deduction.

That said, there is one point, addressed breifly in McArdle’s article, that I thought deserved greater attention – the concessional taxation of capital gains. In the list provided in the article, it was the second most expensive tax deduction in the US at $85 billion a year[2]. You see, for a while now I have been somewhat of a closet skeptic of the need for lower tax rates on capital income (i.e. capital gains and dividends). The reason for my skepticism is two fold:

  1. Everyone seems to be in agreement that concessional rates for capital income are absolutely necessary, but no one seems to really understand why.
  2. Capital income makes up a much larger percentage of income for the wealthy than for the lower or middle class. When you hear that story about billionaire Warren Buffet paying a lower rate of tax than his secretary, it is because of the low rate of tax on capital income.

So, now that I am finally voicing my skepticism, this article is going to look at what arguments are made for lower tax rates on capital income (focusing on capital gains for individuals) and whether those arguments hold water.

Why are capital gains taxed at a lower rate?

Once you start digging, you quickly find there is a range of arguments (of variable quality) being made for why capital gains should be taxed at a lower rate. These arguments can largely be grouped into the following broad categories:

  1. Inflation
  2. Lock-In
  3. Double Taxation
  4. Capital is Mobile
  5. The Consumption – Savings tradeoff

Inflation

Taxing capital gains implies taxing the asset holder for any increases in the price of that asset. In an economy where inflation exists (i.e. every economy) this means you are taxing increases in the price of the asset due to inflation, as well as any increase in the value of the asset itself. Essentially, even if you had an asset which had only increased in value at the exact same rate as inflation (i.e. the asset was tradable for the same amount of goods as when you bought it), you would still have to pay capital gains tax.

The inflation argument although legitimate, is relatively easy to legislate around by allowing asset holders to adjust up the cost base of their assets by the inflation rate each year.

Lock In

‘Lock-in’ is the idea that investors, to avoid paying capital gains tax, will stop selling their assets. An investor holding onto assets to avoid tax implies they are being incentivized, through the tax system, to invest suboptimally – something economists really dislike. However, as far as ‘lock-in’ would occur, it cannot be considered anything other than an irrational reaction. Holding onto assets does not avoid tax, it only delays it, and given inflation is factored into the asset price (as discussed above), there is not even the benefit of time reducing the tax burden. The bottom line is this – to pay more capital gains tax, there must be larger capital gains. That is, even if the capital gains tax rate was 99%, an investor would still be better off making larger capital gains than smaller ones.

The other point to remember when it comes to ‘lock-in’ is that in both the US and Australia, the lower rate of capital gains tax only applies to assets held for more than a year. That means if ‘lock-in’ exists, it is already a major problem. Because asset holders can access a lower rate of tax by holding an asset for a year, they are already strongly incentivized to hold onto their underperforming assets longer than is optimal to access the concessional tax rate. In fact, increasing the long-term capital gains tax rate to the same level as the short-term rate should actually reduce lock-in by removing this incentive.

Double Taxation

The double taxation argument is a genuine concern for economists. The double tax situation arises because companies already pay tax on their profits. Taxing those profits in the hands of investors again, either as capital gains (on that company’s stock) or dividends, implies some high marginal tax rates on investment. This is one of the main reasons capital income is taxed at low rates in most countries.

Ideally, to avoid this situation, the tax code would be simplified by removing company tax altogether, as McArdle herself has argued in the past. However, we should probably both accept that, at best, the removal of corporate tax is a long way away. Nevertheless, this idea can form the basis for policies that achieve similar goals without the political issue of trying to sell the removal of corporate tax.

For dividends, for example, double taxation can be avoided by providing companies with a deduction for the value of dividends paid out to investors. Investors would then pay their full marginal tax rate on the dividends, more than replacing the lost company tax revenues.

Preventing double taxation of capital gains is a little more complicated, but the answer may lie in setting up a quarantined investment pool that companies can move profits into. Profits moved into this pool would not be subject to tax and, once in the pool, the money could only be used for certain legitimate investment activities. This would effectively remove taxation on profits going toward genuine reinvestment, as opposed to fattening bonus checks.

The overall point here is not that I have the perfect policy to avoid double taxation of company profits, but that there are other worthwhile avenues worth exploring that are not simply giving huge tax breaks to wealthy investors.

Capital is Mobile

This is one of the two arguments McArdle briefly mentions in her article. The ‘capital is mobile argument’ is the argument that if we tax wealthy investors too much, they will do a John Galt, take their money and go to another country that won’t be so “mean” to them.

When it comes to moving money offshore, obviously, not everyone is in a position to make the move. Pension funds and some investment vehicles cannot simply move country. Companies and some other investment vehicles do not receive a capital gains tax discount currently, meaning raising tax rates for capital gains for individuals would not impact them at all. Finally, even for investors that would be affected and do have the means, a hike in the capital gains rate does not automatically move all their investments below the required rate of return.

This argument also overlooks the vast array of complications in moving money offshore and the risks involved with that action. Moving assets offshore exposes investors to new risks such as exchange rate risk[3] and sovereign risk[4]. It also significantly complicates the administrative, compliance and legal burden the investor has to manage.

However, even if we concede that yes, some money would move offshore as a result of higher taxes on capital gains, let’s look at the long term picture. What is the logical end point for a world where each country employs a policy of attracting wealthy investors by lowering taxes on capital? A world where no country taxes capital!

Of course, there are alternatives. Countries (and developed countries in particular should take the lead on this) can stop chasing the money through tax policy and focus on other ways of competing for investment capital. Education, productivity, infrastructure, network effects, low administrative and compliance costs are all important factors in the assessment of how attractive a location is for investors. California, for example, is not the home of Silicon Valley because it has low taxes on capital. Pulling the ‘lower taxes to attract investment’ lever is essentially the lazy option.

Consumption vs. Savings

The second point raised by McArdle is the argument that if you reduce the returns from investing (by raising tax rates), people will substitute away from saving and investing (future consumption) and instead spend the money now (immediate consumption).

The way to think of this is not of someone cashing in all their assets and going on a spending spree because the capital gains tax rate increased. That is extremely unlikely to happen and would actually make no sense. The change will come on the margin – because the returns on investment have decreased slightly (for certain asset types), there will be slightly less incentive to save and invest. As a result, over time, less money ends up being invested and is instead consumed.

But let’s consider who would be affected. If we think about the vast majority of people, their only exposure to capital gains is through their pension fund and the property they live in, neither of which would be affected by increasing the individual capital gains tax rate. Day traders, high frequency traders and anyone holding stocks for less than a year on average would also be unaffected. Most investors in start-ups do so through investment vehicles that are, again, not subject to individual capital gains tax[5]. That leaves two main groups of investors impacted by an increase in the capital gains tax rate for individuals:

  1. Property investors
  2. High net worth individual investors

Given property investing is not what most people are thinking about when concerns about capital gains tax rates reducing investment are raised, let’s focus on high wealth investors.

The key issue when considering how these investors would be affected by an increase in the capital gains tax rate is identifying what drives them to invest in the first place. Many of them literally have more money than they could ever spend, which means their investment decisions cannot be driven by a desire for future consumption. Many of their kids will never want for anything either, so even ensuring the financial security of their kids is not an issue. The only real motivation that can be left is simply status, power and prestige. Or as the tech industry has helpfully rebadged it – ‘making the world a better place.’

If that is the motivation though, does a rise in the capital gains tax rate change that motivation?

To my mind, the answer to that question is ‘No’. These people are already consuming everything they want, or in economic parlance, their desire for goods and services has been satiated. They will gain no additional pleasure (‘utility’) from diverting savings to consumption, so there is no incentive to do so even when the gains from investing are reduced.

Of course, there are exceptions, and it is quite possible (even likely) that there are high net worth individuals who live somewhat frugally and as a result of this policy change would really start splashing out. The question is how significant is this amount of lost investment, and does the loss of that investment capital outweigh the cost to society more widely of a deduction that flows almost entirely to the wealthy.

The Research

Putting this piece together, I have studiously attempted to avoid confirmation bias.[6] Despite the fact that I would benefit personally from lower tax rates on capital gains (well, at least I would if my portfolio would increase in value for a change), I definitely want to believe that aligning capital gains tax rates with the tax rates on normal income would raise significant amounts of tax, mostly from wealthy individuals, with few negative consequences.

In my attempts to avoid confirmation bias, I have deliberately searched for articles and research papers that provide empirical evidence that lower capital gains tax rates were found to lead to higher rates of savings, investment and/or economic growth. I have not been able to find any. There were some papers that claimed to show that decreasing capital gains tax rates actually increased tax revenue, but reading the Australian section of this paper (about which I have some knowledge), it quickly became clear this conclusion had been reached using a combination of cherry picking dates[7] and leaving out important details.[8]

I did also find some papers that, through theoretical models, concluded higher taxes on capital income would cause a range of negative impacts. But the problem with papers that rely on theoretical models is that for every paper based on a theoretical model that concludes “… a capital income tax… reduces the number of entrepreneurs…” there is another paper based on a theoretical model that concludes “… higher capital income taxes lead to faster growth…

Leaving research aside, there were a number of articles supporting the lowering or removing of capital income taxes. The problem is they all recite the same old arguments (“it will cause lock-in!”) and tend to come from a very specific type of institution. Without going too much into what type of institution, let me just list where almost all the material I located was coming from (directly or indirectly):

Even when I found an article from a less partisan source (Forbes), it turned out to be written by a senior fellow at the Cato Institute, and was rebutted by another article in the same publication.

Of course we should not ignore what people say because they work for a certain type of institution – just because they have an agenda does not mean they are wrong. In fact, it stands to reason that organizations interested in reducing taxation and limiting government would research this particular topic. The problem is that if there are genuine arguments being made, they are being lost amongst the misleading and the nonsensical.

Take this argument for lower taxes on capital as an example. First there is a chart taken from this textbook:

Capital per Worker vs. Income per Worker

The article then uses this as evidence to suggest more capital equals more income for workers. As straightforward as this seems, what this conclusion misleadingly skips over is:

  • income per worker is not equivalent to income for workers, and
  • almost all the countries towards the top right hand corner of this chart (i.e. the rich ones) got to their highly capital intensive states despite having high taxes on capital.

A Change in Attitude?

The timing of this article seems to have conveniently coincided with the announcement by Hilary Clinton of a new policy proposal – a ‘Fair Share Surcharge’. In short, the surcharge would be a 4% tax on all income above $5 million, regardless of the source. Matt Yglesias has done a good job of outlining the details in this article if you are interested.

The interesting aspect of this policy is, given the lower rate of tax typically applied to dividends and capital gains, it is a larger percentage increase in taxes on capital income than wage income. Of course, unless something major changes, this policy is very unlikely to make it past Congress and so may simply be academic, but at least it shows one side of politics may be starting to question the idea that taxes on capital should always be lower.

The Data

Finally, I want to finish up with a few charts. The charts below show how various economic indicators changed as various changes were made to the rate of capital gains tax, historically and across countries. Please note, these charts should not be taken as conclusive evidence one way or the other. The curse of economics is the inability to know (except in rare circumstances) what would have happened if a tax rate had not been raised, or if an interest rate rise had been postponed. The same applies with changes to the capital gains tax rate. Without knowing what would have happened if the capital gains tax rate had not been changed, we cannot draw firm conclusions as to what the result of that change was.

However, what we can see is that the indicators shown below do not seem to be significantly affected by changes in the capital gains tax rate, one way or the other – the effects appear to be drowned out by larger changes in the economy. That could be considered a conclusion in itself.

Chart1 – Maximum Long Term CGT Rate vs. Personal Savings rate, US 1959 to 2014

Chart 2 – Maximum Long Term CGT Rate vs. Annual GDP Growth, US 1961 to 2014

Chart 3 – Maximum Long Term CGT Rate vs. Gross Savings, Multiple Countries, 2011-2015 Average

Gross savings are calculated as gross national income less total consumption, plus net transfers. This amount is then divided by GDP (the overall size of the economy to normalize the value across countries.

Chart 4 – Maximum Long Term CGT Rate vs. Gross Fixed Capital Formation, Multiple Countries, 2011-2015 Average

Gross fixed capital formation is money invested in assets such as land, machinery, buildings or infrastructure. For the full definition, please see here. This amount is then divided by GDP (the overall size of the economy to normalize the value across countries.

Chart 5 – Maximum Long Term CGT Rate vs. Gini Index, 2011-2015 Average

The Gini index is a measure of income inequality within a country. A Gini index of 100 represents a country in which one person receives all of the income (i.e. total inequality). An index of 0 represents total equality.

 

[1] Interestingly, two of these four deductions (mortgage interest and employee sponsored health insurance) will be completely foreign to Australians.

[2] A similar policy (50% tax discount for capital gains) in Australia costs around AUD$6-7 billion per year.

[3] The risk that the exchange rate changes and has an adverse impact on the value of your investments.

[4] The risk that the government of the country you are investing in will change the rules in such a way to hurt your investments.

[5] Capital Gains Tax Policy Toward Entrepreneurship, James M. Poterba, National Tax Journal, Vol. 42, No. 3, Revenue Enhancement and Other Word Games: When is it a Tax? (September, 1989), pp. 375-389

[6] Confirmation basis is the tendency of people, consciously or subconsciously, to disregard or discount evidence that disagrees with their preconceived notions while perceiving evidence that confirms those notions as more authoritative.

[7] “After Australian CGT rates for individuals were cut by 50% in 1999 revenue from individuals grew strongly and the CGT share of tax revenue nearly doubled over the subsequent nine years.” Note the carefully selected time period includes the huge run up in asset prices from 2000 to 2007 and avoids the 2008 financial crisis, which caused huge declines in CGT revenues.

[8] “Individuals enjoyed a larger discount under the 1999 reforms than superannuation funds (50% versus 33%), yet yielded a larger increase in CGT payable.” This neglects to mention that even after the discounts were applied, the rate for of capital gains tax for almost all individuals was still higher than for superannuation funds.

Web Analytics – Looking Under the Hood

On occasion I get the sense from bloggers that talking about your traffic statistics is a bit like talking about salary – not something to be done amongst polite company. However, unlike discussing pay, which can generate bad feelings, jealousy, poor morale and a range of other negative side effects, discussing website stats should provide a great learning opportunity for everyone taking part. With that said, in the name of transparency, let me offer a peak under the hood here at BrettRomero.com.

Overall Traffic

For those that have not looked at web traffic statistics, first a quick introduction. When it comes to web traffic, there are two primary measures of volume – sessions and page views. A session is a continuous period of time that one user spends on a website. One session can result in multiple page views – or just the one if the user leaves after reading one article as is often the case. Chart 1 below shows the traffic to BrettRomero.com, as measured in sessions per day.

Chart 1 – All Traffic – Daily


There are a couple of large peaks worth explaining in this chart. The first peak, on 3 November 2015, was the day I discovered just how much traffic Reddit.com can generate. Posting to the TrueReddit subreddit, I posted what, to that point, had been by far my most popular article – 4 Reasons Working Long Hours is Crazy. The article quickly gained over 100 upvotes and, over the course of the day, generated well over 500 sessions. To put that in perspective, the traffic generated from that one post on Reddit in one day is greater than all traffic from LinkedIn and Twitter combined… for the entire time the blog has been online.

The second big peak on 29 December 2015 was also a Reddit generated spike (in fact, all four spikes post 3 November were from Reddit). In this instance it was the posting of the Traffic Accidents Involving Cyclists visualization to two subreddits – the DataIsBeautiful subreddit and the Canberra subreddit.

Aside from these large peaks though, the data as represented in Chart 1 is a bit difficult to decipher – there is too much noise on a day-to-day basis to really see what is going on. Chart 2 shows the same data at a weekly level.

Chart 2 – All Traffic – Weekly


Looking at the weekly data the broader trend seems to show two different periods for the website. The first period, from March to around August has more consistent traffic, around 200 sessions a week, but with smaller spikes. The second period, from August onwards shows less consistent traffic, around 50 sessions a week, but with much larger spikes. But how accurate is this data? Let’s break some of the statistics down.

Breakdown by Channel

When looking at web traffic using Google Analytics, there are a couple of breakdowns worth looking at. The first is the breakdown by ‘channel’ – or how users got to your website for a given session. The four channels are:

  1. Direct – the user typed your website URL directly into the address bar
  2. Referral – the user navigated to your site from another (non-social media) website by clicking on a link
  3. Social – the user accessed your website from a social media website (Facebook, Twitter, Reddit, LinkedIn and so on)
  4. Organic Search – a user searched for something in a search engine (primarily Google) and clicked on a search result to access your site.

The breakdown of sessions by channel for BrettRomero.com is shown in Table 1 below:

Table 1 – Breakdown by Channel

Channel Grouping

Sessions

Direct

2,923

Referral

2,776

Social

2,190

Organic Search

567

Total

8,456

Referral Traffic

Looking at referral traffic specifically, Google Analytics allows you to view which specific sites you are getting referral traffic from. This is shown in Table 2.

Table 2 – Top Referrers

Rank Source

Sessions

1 floating-share-buttons.com

706

2 traffic2cash.xyz

177

3 adf.ly

160

4 free-share-buttons.com

152

5 snip.to

74

6 get-free-social-traffic.com

66

7 www.event-tracking.com

66

8 claim60963697.copyrightclaims.org

63

9 free-social-buttons.com

57

10 sexyali.com

50

Total All Referral Traffic

2,776

Looking at the top 10 referrers to BrettRomero.com, the first thing you may notice is that these site addresses look a bit… fake. You would be right. What you are seeing above is a prime example of what is known as ‘referrer spam’. In order to generate traffic to their sites, some unscrupulous people use a hack that tricks Google Analytics into recording visitors to your site coming from a URL they want you to visit. In short, they are counting on you looking at this data, getting curious and trying to work out where all this traffic is coming from. Over time these fake hits can build up to significant levels.

There are ways to customize your analytics to exclude traffic from certain domains, and initially I was doing this. However, I quickly realized that this spam comes from an almost unlimited number of domains and trying to block them all is basically a waste of time.

Looking at the full list of sites that have ‘referred’ traffic to my site, I can actually only find a handful of genuine referrals. These are shown in Table 3.

Table 3 – Genuine Referrers

Rank Source

Sessions

17 uberdriverdiaries.com

35

18 vladimiriii.github.io

33

72 australiancraftbeer.org.au

3

76 alexa.com

2

95 opendatakosovo.org

1

Total Genuine Referral Traffic

74

Total Referrer Spam

2,702

What does the total traffic look like if I exclude all the referrer spam? Chart 3 below shows the updated results.

Chart 3 – All Traffic Excluding Referrals


As can be seen, a lot of the traffic in the period March through August was actually coming from referrer spam. Although May still looks to have been a strong month, April, June and July now appear to be hovering around that baseline 50 sessions a month.

Search Traffic

Search traffic is generally the key channel for website owners in the long term. Unlike traffic from social media or from referrals, it is traffic that is generated on an ongoing basis without additional effort (posting, promotion and so on) on the part of the website. As you would expect though, to get to the first page of search results for any combination of key words that is searched regularly is very difficult. In fact it is so difficult, an entire industry has developed around trying to achieve this – Search Engine Optimization or SEO.

For BrettRomero.com, search traffic has been difficult to come by for the most part. Below is a chart showing all search traffic since the website started:

Chart 4 – Search Traffic – All


Keeping in mind the y-axis in this chart is on a smaller scale than the previous charts, there doesn’t seem to be much pattern to this data. August again seemed to be a strong month, as well as the weeks in late May and early June. Recent months have been flatter, but more consistent.

Going one step further, Table 4 shows the keywords that were searched by users to access BrettRomero.com.

Table 4 – Top Search Terms

Rank Keyword

Sessions

1 (not provided)

272

2 beat with a shovel the weak google spots addons.mozilla.org/en-us/firefox/addon/ilovevitaly/

47

3 erot.co

45

4 непереводимая.рф

40

5 “why you probably don’t need a financial advisor”

33

6 howtostopreferralspam.eu

32

7 sexyali.com

16

8 vitaly rules google ☆*:.。.゚゚・*ヽ(^ᴗ^)丿*・゚゚.。.:*☆ ¯\_(ツ)_/¯(•ิ_•ิ)(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^. .^=)oo

14

9 http://w3javascript.com

13

10 ghost spam is free from the politics, we dancing like a paralytics

11

Again, we see something unexpected – most of the keywords are actually URLs or nonsensical phrases (or both). As you might suspect, this is another form of spam. Other website promoters are utilizing another hack – this one tricks Google Analytics into recording a search session, with the keyword being a message or URL the promoter wants to display. Looking at the full list, the only genuine search traffic appears be the records for which keywords are not provided[1]. Chart 5 shows search traffic with the spam excluded.

Chart 5 – Search Traffic – Spam Removed


With the spam removed, we see something a little bit more positive. After essentially nothing from March through July, we see a spike in activity in August and September, before falling back to a new baseline of around 5-10 sessions per week. Although this is obviously still miniscule, it does suggest that the website is starting to show up regularly in people’s searches.

Referring back to the total sessions over time, Chart 6 shows how removing the spam search impacts our overall number of sessions chart.

Chart 6 – All Traffic Excluding Referrals and Spam Search

Social Traffic and the Reddit Effect

As was shown in Table 1, one of the two main sources of (real) traffic for the website is social media.

Social media provides a real bonus for people who are starting from zero. Most people now have large social networks they can utilize, allowing them to get their content in front of a lot of people from a very early stage. That said, there is a line and spamming your friends with content continuously is more likely to get you muted than generate additional traffic.

Publicizing content on social media can also be a frustrating experience. Competing against a never-ending flood of viral memes and mindless, auto-generated content designed specifically to generate clicks, can often feel like a lost cause. However, even though it seems like posts simply get lost amongst the tsunami of rubbish, social media is still generally a good indicator of how ‘catchy’ a given article is. Better content will almost always generate more likes/retweets/shares.

In terms of the effectiveness of each social media platform, Reddit and Facebook have proven to be the most effective for generating traffic by some margin. Table 5 shows sessions by social media source.

Table 5 – Sessions by Social Media Source

Rank Social Network

Sessions

1 Reddit

999

2 Facebook

868

3 Twitter

224

4 LinkedIn

69

5 Blogger

26

6 Google+

3

7 Pocket

1

When looking at the above data, also keep in mind, I only started posting to Reddit at the start of November, effectively giving Facebook a 7 month head start. This means Reddit is by far the most effective tool I have found to date to get traffic to the website. However, there is a catch to posting on Reddit – the audience can be brutal.

Generally on Facebook, Twitter and LinkedIn, people who do not agree with your article will just ignore it. On Reddit, if people do not agree with you – or worse still, if they do not like your writing – they will comment and tell you. They will not be delicate. They will down vote your post (meaning they are actively trying to discourage other people from viewing it). Finally, just to be vindictive, they will down vote any comments you make as well. If you are planning to post on Reddit, make sure you read the rules of the subreddit (many explicitly ban people from promoting their own content) and try to contribute in ways that are not just self‑promotional.

Pages Visited

Finally, let’s look at one final breakdown for BrettRomero.com. Table 5 shows the top 10 pages viewed on BrettRomero.com.

Table 6 – 10 Most Viewed Pages

Rank Page

Pageviews

1 /

4,345

2 /wordpress/

1,450

3 /wordpress/4-reasons-working-long-hours-is-crazy/

1,038

4 /cyclist-accidents-act/

773

5 /wordpress/climbing-mount-delusion-the-path-from-beginner-to-expert/

306

6 /wordpress/the-dark-side-of-meritocracy/

205

7 /wordpress/why-australians-love-fosters-and-other-beer-related-stories/

194

8 /blog.html

192

9 /?from=http://www.traffic2cash.xyz/

177

10 /wordpress/visualizations/

165

As mentioned earlier, 4 Reasons Working Long Hours is Crazy has been by some margin my popular article. Although Reddit gave this article a boost traffic wise, it was also by some margin the best performing article I have posted to Reddit with over 100 upvotes. The next best performing, the Traffic Accidents Involving Cyclists visualization, only managed 20 upvotes.

Overall

As I mentioned at the outset, web traffic statistics tend to be a subject that is not openly discussed all that often. As a result, I have little idea how good or bad these statistics are. Given I have made minimal effort to promote my blog, generate back links (incoming links from other websites) or get my name out there by guest blogging, I suspect that these numbers are pretty unimpressive in the wider scheme of things. Certainly I am not thinking about putting up a pay wall any time soon anyway.

As unimpressive as the numbers may be though, I hope they have provided an interesting glimpse into the world of web analytics and, for those other bloggers out there, some sort of useful comparison.

 

Spotted something interesting that I missed? Please leave a comment!

 

[1] For further information on why the keywords are often not provided, this article has a good explanation.

5 Things I Learned in 2015

2015 has been an interesting year in many respects. A new country[1], a new language, a new job, and plenty of new experiences – both at work and in life in general. To get into the year-end spirit, I thought I would list out 5 key things I learned this year.

1. I Love Pandas

Yes, those pandas as well, who doesn’t? But I knew that well before 2015. The pandas I learned to love this year is a data analysis library for the programming language Python. “Whoa, slow down egg head” I hear you say. For those that are not regular coders, what that means is that pandas provides a large range of ways for people writing Python code to interact with data that makes life very easy.

Reading from and writing to Excel, CSV files and JSON (see lesson number 2) is super easy and fast. Manipulating large datasets in table like structures (dataframes) – check. Slicing, dicing, aggregating – check, check and check. In fact, as a result of pandas, I have almost entirely stopped using R[2]. All the (mostly basic) data manipulation for which I used to use R, I now use Python. Of course R still has an important role to play, particularly when it comes to complex statistical analysis, but that does not tend to come up all that regularly.

2. JSON is Everywhere

JSON, JavaScript Object Notation for the uninitiated, is a data interchange format that has become the default way of transferring data online. Anytime you are seeing data displayed on a webpage, including all the visualizations on this website, JSON is the format the underlying data is in.

JSON has two big advantages that have led to its current state of dominance. The first is that, as the name suggests, it is native to JavaScript – the key programming language, alongside HTML, that is interpreted by the browser you are reading this on. The second is that JSON is an extremely flexible way of representing data.

However, as someone who comes from a statistics and data background, as opposed to a technology background, JSON can take a while to get used to. The way data is represented in JSON is very different to the traditional tables of data that most people are used to seeing. Gone are the columns and rows, replaced with key-value pairs and lots of curly brackets – “{“ and “}”. If you are interested in seeing what it looks like, there are numerous CSV to JSON convertors online. This one even has a sample dataset to play with.

If you do bother to take a look at some JSON, you will note that it is also much more verbose than your standard tabular format. A table containing 10 columns by 30 rows – something that could easily fit into one screen on a spreadsheet – runs to 300+ lines of JSON, depending on how it is structured. That does not make it easy to get an overview of the data for a human reader, but that overlooks what JSON is designed for – to be read by computers. The fact that a human can read it at all is seen as one of JSON’s strengths.

For those interested in working with data (or any web based technology), knowing how to read and manipulate JSON is becoming as important as knowing how to use a spreadsheet.

3. Free Tools are Great

There are some people working for software vendors who will read this and be happy I have a very small audience. Having worked in the public sector, for a large corporate and now for a small NGO, one thing I have been pleasantly surprised by in 2015 is the number and quality of free tools available online.

For general office administration there are office communicator applications (Slack), task management tools (Trello) and Google’s free replacements for Excel, Word and PowerPoint. For version control and code management there is GitHub. For data analysis, the aforementioned Python and R are both free and open source. For data storage, there is a huge range of free database technologies available, in both SQL (PostgreSQL, MySQL, SQLite3) and NoSQL (MongoDB, Redis, Cassandra) variations.

To be fair to my previous larger employers and my software-selling friends, most of these tools/applications do have significant catches. Many operate on a ‘freemium’ model. This means that for individuals and small organizations with relatively few users, the service is free (or next to free), but costs quickly rise when you need larger numbers of users and/or want access to additional features, typically the types of features larger organizations need. Many of the above also provide no tech support or guarantees, meaning that executives have no one to blame if the software blows up. If you are responsible for maintaining the personal data of millions of clients, that may not be a risk you are willing to take.

For small business owners and entrepreneurs however, these tools are great news. They bring down barriers to entry for small businesses and make their survival more dependent on the quality of the product rather than how much money they have. That is surely only a good thing.

4. Blogging is a Full Time Job

Speaking of starting a business, a common dream these days is semi-retiring somewhere warm and writing a blog. My realization this year from running a blog (if only part time) is just how difficult it is to get any traction. Aside from being able to write reasonably well, there are two main hurdles that anyone planning to become a full time blogger needs to overcome – note that I have not come close to accomplishing either of these:

  1. You have to generate large amounts of good quality content – at least 2-3 longer form pieces a week if you want to maintain a consistent audience. That may seem easy, but after you have quickly bashed out the 5-10 article ideas you have been mulling over, the grind begins. You will often be writing things that are not super interesting to you. You will often not be happy with what you have written. You will quickly realize that your favorite time is the time immediately after you have finished an article and your least favorite is when you need to start a new piece.
  2. You will spend more time marketing your blog than writing. Yep, if you want a big audience (big enough to generate cash to live on) you will need to spend an inordinate amount of time:
    • cold emailing other blogs and websites, asking them to link to your blog (‘generating back links’ in blogspeak)
    • ensuring everything on your blog is geared towards your blog showing up in peoples’ Google search results (Search Engine Optimization or SEO)
    • promoting yourself on Facebook
    • building a following on Twitter
    • contributing to discussions on Reddit and LinkedIn to show people you are someone worth listening to, and
    • writing guest blogs for other sites.

None of this is easy. Begging strangers for links, incorporating ‘focus words’ into your page titles and headings, posting links on Facebook to something you spend days writing, only to find you get one like (thanks Mum!). Meanwhile, some auto-generated, barely readable click-bait trash from ‘viralnova’ or ‘quandly’ (yes, I am deliberately not linking to those sites) is clocking up likes in the 5 figures. It can be downright depressing.

Of course, there are an almost infinite number of people out there offering their services to help with these things (I should know, they regularly comment on my articles telling me how one weird trick can improve my ‘on page SEO’). The problem is, the only real help they can give you is adding more things to the list above. On the other hand, if you are thinking about paid promotion (buying like’s or a similar strategy) I’d recommend watching this video:

Still want to be a blogger? You’re welcome.

5. Do not be Afraid to Try New Things

One of the things that struck me in 2015 is how attached people get to doing things a certain way. To a large degree this makes sense, the more often you use/do something, the better you get at it. I am very good at writing SQL and using Excel – I have spent most of the last 10 years using those two things. As a result, I will often try to use those tools to solve problems because I feel most comfortable using them.

Where this becomes a problem is when you start trying to shoehorn problems into tools not just because you are comfortable with the tool, but to avoid using something you are less comfortable with. As you have seen above, two of the best things I learned this year were two concepts that were completely foreign to a SQL/Excel guy like me. But that is part of what made learning them so rewarding. I gained a completely new perspective on how data can be structured and manipulated and, even though I am far from an expert in those new skills, I now know they are available and which sorts of problems they are useful for.

So, do not be afraid to try new things, even if the usefulness of that experience is not immediately apparent. You never know when that skill might come in handy.

 

Happy New Year to everyone, I hope you have a great 2016!

 

[1] Or ‘Autonomous Province’ depending on your political views

[2] R is another programming language designed specifically for statistical analysis, data manipulation and data mining.

Traffic Accidents Involving Cyclists in the ACT

I’ve had a few days off lately and I decided to try something a bit different. Instead of writing an(other) lengthy article, I thought I would go back to my roots and actually look at some data. To that end I recently discovered a website for open data in Australia, data.gov.au. This website has literally thousands of interesting datasets released from all levels of government, covering everything from the tax bills of Australia’s largest companies to the locations of trees in Ballarat.

One of the first datasets that caught my eye was one published by the Australian Capital Territory (ACT) Government on traffic accidents involving cyclists. For those that don’t know, Canberra (the main city in the ACT) is a very bike friendly city and is home to a large number of recreational and more serious cyclists, so seeing where the accidents were/are occurring was something I thought would be interesting.

Using a few new things I have not used before (primarily Mapbox and leaflet.js), I put (slapped?) together an interactive map that uses the data provided and also gives you a few different ways of viewing it. The full version of the map can be accessed by clicking the picture below:

cyclist-map

 

See a bug? Found it particularly useful? Hate it? Leave a comment below!

« Older posts Newer posts »

© 2024 Brett Romero

Theme by Anders NorenUp ↑