Events

Kaggle data science meetup at Futurice

What’s Kaggle? It’s a platform for organizing data science competitions. It provides students and data science enthusiasts with an opportunity to develop their skills and win some money in competitions.

An active community has grown around Kaggle meetups and people involved share their knowledge and scripts openly. Kagglers and data scientists in Helsinki were looking for a place to organize an event and we were more than happy to provide a space for them. Thus, the data science community in Helsinki got together at Futurice’s office on February 24.

We had three speakers with three interesting topics: Antti Rauhala (Futurice) introduced the Futurice approach to do data science, Michael Wallin gave a talk about Kaggle competitions and Timo Aho (Yle) presented Yle’s data architecture.

The Futuway of doing data science

In the first presentation, Antti Rauhala talked about data driven services built by Futurice. We believe that data-driven services will only increase in importance in the future and it goes far beyond recommendation services. The applications and service of the future will have to learn to adapt to the user’s needs on a variety of levels - not the other way around, as has been the practice. Antti presented a few case studies where customer satisfaction/engagement increased substantially by using data driven solutions such as content recommendation.

Kaggle data science competitions and beyond

The second speaker of the event was Michael Wallin, an MSc student in Industrial Engineering and Management at Aalto University. He’s had a fair amount of success in Kaggle competitions. The key messages in his presentation were:

  • Getting started is easy. Just take an example script, (modify it) and run it
  • Feature engineering is the most important part of a successful data processing pipeline
  • Shortcomings in Kaggle competitions:
    • Competitions are divorced from real-life applications
      • Data is cleaned and given in a nice format. In real-life data is typically stored in many locations and formats, and it consists of missing values.
    • Competitors can try optimize model hyperparameters on test data (leaderboard)
      • And hope for not overfitting on test data

Slides for his presentation can be found here: https://docs.google.com/presentation/d/1uo_zBynNWFpgKQdjHMG-giHO8A-IhIQmD0O3_KESSNY/edit#slide=id.g10002cc486_0_70

Real life data science

Timo Aho is a data scientist PhD at Finnish Broadcasting Corporation Yle and an expert on the real-life use of data science and big data as well as the insights they provide. The key messages in his presentation were:

  • An introduction to Yle’s advanced data architecture behind content recommendation, dashboards and strategic planning
  • They record tens of millions of events every day and use AWS stack to process and store the data using AWS Kinesis, Lambda, RDS, S3, Elastic Beanstalk etc

Timo’s talk also really showed how much needs to be done before a dataset can be used for a Kaggle competition and used to develop content recommendation or other predictive models.

Link to Timo’s presentation: https://docs.google.com/presentation/d/11gvOvalJf7QoL5u6VzNXprHJ_p1vrZnURSm8OQhP_s0/edit?usp=sharing

Summary

Based on the feedback we received, the event was a great success. We had good talks and some great discussion. We hope that it encouraged participants to team up with each other and participate in future Kaggle data science competitions. Since it went so well, we'll definitely organize more data science events in the future, so stay tuned. While waiting for the next event, you can check upcoming Data Science Helsinki meet-ups at http://www.meetup.com/Helsinki-Data-Analytics-Science-Meetup/

Did we already mention that we are hiring? If you are interested joining the Futurice data science team, please fill in the application: http://futurice.com/open-positions/software-engineer-slash-data-scientist