Why Kaggle is Your Best Place to Get New Hard Skills Ever
Hi! Glad to have you back. The last time I wrote an article where I mostly grumbled about Hacker Rank (by the way, if you missed it, I recommend you to read the link), I had a great idea for a post. Why not tell you what I think about Kaggle and share why I think Kaggle is one of the best sandbox out there right now.
I will try to be as objective as possible, which is very difficult because I am a big fan of what the Kaggle developers are doing. I will also publish this article on my profile, so if by any chance one of the developers reads this article - you are awesome, keep up the good work. And I'll start.
This small section is for those who do not know what Kaggle is. So, Kaggle is a sandbox for data specialists: for Data Scientists and Data Analysts, where, as in any sandbox, you can find something to your liking - you can compile and publish datasets, you can visualize data using other people's datasets, you can make predictions on other people's or your own datasets, while doing it publicly. That is, everyone can comment, criticize, correct, admire your beautiful graphics and so on.
To be honest, I would not say that Kaggle is an educational platform. All the same, the educational platform implies that the user learns something, receives certificates, and so on. Of course, there are courses on Kaggle, but to be honest, they are very short and superficial. Education here is mainly at the expense of the community, at the expense of other users. With the community, I just want to start.
So, the first advantage of Kaggle is the community. I have never met such kind and, most importantly, sympathetic people anywhere. I have been using Kaggle since 2019 and no one has ever scoffed or laughed at my data analysis, although there was a reason, believe me. In 2019, I built such graphs that it would be more efficient to draw a graph with a pencil on paper and attach a photo to a notebook.
Basically, on Kaggle, I was engaged in publishing datasets. It has always been more interesting for me to write parsers and collect data than to analyze them. So, each of my datasets was warmly received, despite the fact that the data quality of some of them clearly left much to be desired (at that time I was just training and learning to write parsers). Even my last post on Kaggle (here it is by the way), despite being dry text due to the inability to add images to the post (Kaggle, please, add such an opportunity), received warmly, there was a heated discussion in the comments, which I always love.
The most important thing is the community, which is always ready to help. In the comments to any dataset, to any code, to any discussion, they will prompt you, give a link to a resource where you can read more and in more detail. In a word, there will always be someone who will tell you.
Speaking of the community, HackTheBox immediately comes to mind, a similar sandbox for penetration testers and information security specialists. Forgive me, of course, but there is a toxic community. I used to hang out in this sandbox while I was at university, hoping to learn something, but all the other users did was make dirty jokes and often mention my mother. It's not nice.
I partially wrote about this in this article, but in order not to repeat myself, I will try to give new arguments and examples. I am a big fan of the general concept of Kaggle. Yes, there are just a huge number of sandboxes now - train what you want, when you want, but ... Kaggle is designed so that users do not hunt for answers, and this is a big difference.
Kaggle Trending Datasets
Answers to other datasets are simply meaningless. What is the meaning of these answers, because people come here to practice their skills or get new ones. Here, as elsewhere, there is a pursuit of an internal award, medals and a high place in the leaderboard, but these places, unlike Hacker Rank, can only be achieved with your own work and your code. It is impossible to write off the answers here, and there is not much sense in this.
It's not worth mentioning the competition. Competitions on Kaggle are very difficult for a simple user who selects the color of the horizontal bar for half an hour (this is about me). These are serious events with serious rules that are organized, including by companies looking for an employee. Answers to competitions can only be found in training datasets, and then from developers, as part of onboarding on Kaggle.
The whole concept can be put into one simple sentence. The best of the usual sandboxes (medals, titles, leader board) was taken and any external influence (answers or reviews) devalued.
The Learn section has always bugged me. Before, I didn’t understand why there are such modest courses on Kaggle - they are very short, superficial, and the material itself is somehow crumpled and inferior, just like black coffee - it seems nothing unusual, but not at all without milk and sugar. But then I realized. I realized that the Learn section was essentially meant to be something like onboarding, that is, several courses that will take no more than an hour to join the sandbox, understand how it works, understand how to work with datasets, understand what they are competitions and how to participate in them, figure out which programming languages can be used in notebooks, and so on.
There are no right and wrong answers. There are certificates, but in fact they are not important and do not affect anything. There are competitions on Kaggle (an analogue of certificates for Hacker Rank), where participants compete in different areas of Data, from forecasting to visualization. And there are no right answers in these competitions. You may know the results of others, you may know the code of others, but you cannot simply win by cheating from more experienced participants. There is real competition there, where personal skill is important. The answers to these competitions are impossible to google.
If the point was to show the user how to interact with the platform and present the main features, then the developers have done their job. The course format in this case is much more effective than boring video lectures. However, if the idea was different and the courses were really aimed at teaching someone something, then this is clearly not enough.
I think that the Learn section is the weak link, because of which Kaggle is ideal for training, ideal for learning something on your own, but clearly inferior to other platforms that have more comprehensive and complete courses.
Each sandbox has a reward system. This is important in order to keep the competition between the participants and thus bring the sandbox to life. There is a reward system on Hacker Rank (the terrible reward system I talked about here), on Hack The Box and a lot of other sites, but Kaggle is different here in that, in fact, what rank you have is influenced by others. Other users upvote you, other users praise you or point out your shortcomings, and this is... unique. I've come across a lot of sandboxes, but this is the first time with such a reward system (correct me if I'm wrong).
Real democracy. It is clear that in a sandbox where there are no right and wrong answers, this option is the most ideal, for which else to give rewards. What I also like about this system is the fact that no matter what dataset you work with, it has exactly the same chance of becoming popular and receiving many awards as everyone else.
For example, I collected data on the rise in the price of wheat over the past few decades, and also wrote a parser to collect a dataset with all the locations of Pizza Hut restaurants. When I first published this data, it was obvious to me which one would become more popular, but no, I was wrong. Here all datasets are equal, and you will never guess which one will get a gold medal, and which one will not be seen at all.
I tried to prove to you why Kaggle is the best place to learn new hard skills and practice the ones you already have. I tried to be as objective as possible, despite all my sympathy for this sandbox. Furthermore, I like the community that will always help, I like the concept itself and the reward system, which is different from other sandboxes. Yes, I consider the learning section weak and not fully developed, but I didn’t say that this is a learning platform. Here the user learns from each other, that's what's important!