The very first time I saw the data in .CSV, I just wanted to put it on the map ASAP, because I already pictured how great it would be too see all of the crashes in the city, where they actually happened. I quickly sorted data by city districts and threw it into carto.com interface. The result was astonishing even for me — firstly, how visual crashes become when you make an interactive timeline animation of them, and secondly, how other people recognize it better that way too: in the span of 24 hours my map was mentioned/covered in 5 online newspapers, two of them local and three federal. It looked like for the first time people actually saw how bad it was out there, behind the statistical numbers of crashes.
In the process, I've noticed that data I've worked with had major flaws, e.g. one third of points (each representing one crash) were out of city boundaries, perhaps by means of bad geocoding, manual geocoding or just lack of motivation of police officers and bad quality control on this data. To set the scale, one third is around 7 thousand points/rows in the table. Fixing this data by hand was inefficient and downright crazy, so I've decided to do some programming instead. The problem was, I had in my bag of knowledge only a slight memory of html I've learned in school and the basics of Python learnt about a year ago. By basics I actually mean basics. But it looked like Python was a good fit for the task (and will help me in the future), so I've bought a course on Udemy and improved my skills to the point when I could manage this task.
Soon it was obvious that Python was a good choice — not only did I tackled the analysis of the data on hand, but I also parsed the data for 3 whole years (2015-2017) in one go, and used it afterwards for the analysis. To put this in context, the website with original data lets you download only 14 days in one go, for some silly reason. Well, what a shame.
With Python it was quite easy to analyze the data by streets, seasons, time of day and fix any bugs faster. When the service I intend to make goes live, the bacck end will be implemented on Python.
Below you can see 4 pictures, generally illustrating the stages of my workflow to date.