Are you interested in learning more about data science but don’t know where to start? This course will provide you with the key foundational knowledge any data scientist needs. It will prepare you for your career start in data science or further advanced learning in the field.
In this course, you will learn about the various activities that data scientists do. Do you want to predict if a bank transaction was fraudulent or not? Do you want to forecast your company sales in the upcoming months, so that enough inventory is prepared? Or do you want to assess what chemical compound has the biggest impact on production of a protein that is used for your company's newly developed drug?
We will show you how to do all of that! We will walk you through multiple types of real-world data science projects. You will learn about selecting suitable methods for solving each of these tasks, how these methods work, and on which use cases they can be applied.
What will you learn?
- Data Science fundamentals – data science and business thinking, an overview of machine learning problems and related solution methods, principles of data modelling – how data should be structured, train/test data split, model selection, and validation.
- Testing hypotheses, for example, if a drug is effective in stopping disease, or if company investment in sustainability has a positive impact on its profits: linear regression, panel data regression – fixed and random effects.
- Solving classification tasks such as classifying bank transactions as fraudulent or not: Logistic regression, Decision Trees, Support Vector Machines methods.
- Solving regression tasks such as predicting train delays: Regression Trees, Support Vector Regressions, Random Forest methods.
- Forecasting time series data such as pollution levels or country's population growth: stationarity concept, ARIMA, and Exponential Smoothing methods.
- Finding unseen patterns in your data, for example creating segments of your customers: Principal Component Analysis, k-means clustering, Hierarchical clustering.
Who is this course for?
- For people interested in data science and for those who want to get into the data science field.
- For people who still remember high school math a bit – terms like correlation or weighted average should sound familiar.
- Prerequisite: Úvod do programování 1: Python
How do you finish the course?
- You will attend at least 7/9 lectures.
- You will submit at least one of two homework assignments. They will be focused on a simple application of presented methods in Python.
- Aneta Havlínová – is currently working in Workday as a python developer for the People Analytics application – a tool that provides clients with an automated analytics of their employees: ratio of men vs. women in the management, hiring and attrition trends, racial diversity trends, and more. Recently, she did an internship in the Council of the European Union, where she trained her team in statistical modeling and Python programming. In previous years, she worked as a data scientist in MSD, where she used her knowledge to help lab scientists with biological processes modeling, or provided oncology marketing teams with insights based on financial data. She has a master's degree from the Institute of Economic Studies at Charles University in Prague.
- Justina Ivanauskaite – is an experienced data scientist with a background in statistics. Her expertise lies primarily in econometric modeling, statistics, and simulation. Currently, she is data science lead of Animal Health Advanced Analytics team, which supports research, new product development, manufacturing, and commercial aspects of animal health in MSD. Justina is interested in creating data science solutions with an emphasis on reusability and reproducibility, which delivers value to the client. Justina is interested in creating data science solutions that bring value to the client, with emphasis on reusability and reproducibility.
- Pavel Fišer – is a data scientist within Animal health space with prior experience from various projects in R&D, Manufacturing, and commercial space. He enjoys working on projects to apply new approaches and create new methods to help businesses function more efficiently.
- Andrea Štefancová – is a graduate from University of Economics in Prague with masters' in Econometrics and Operations Research. She works as a data scientist in MSD Animal Health with prior experience from clinical research, quality assurance, and global operations. Andrea mainly enjoys solving optimization problems. Within her current project, she works on finding optimal safety stocks of animal medicine products in MSD distribution centers. She also has a lot of experience creating web-based applications in R software, allowing business users easy access to statistical models developed by data scientists.
- Thomas Browne – is a senior data scientist at Kiwi, where he focuses on mathematical theory to address travel search-related problems with machine learning. In the past he graduated from Paris Cité University, France, with a PhD in probability and statistics for numerical simulators. He also has experience with applying machine learning in the fields of energy - reliability in nuclear plants - and pharmaceutical industry - identification of key features in cancer drug development. On a much lighter note, he is a huge fan of indie/punk music and loves cooking.
- Michal Hakala – is a data scientist in MSD and PhD candidate at CERGEI-EI (joint workplace of Charles University and Czech Academy of Sciences) in econometrics. His research is about modeling high frequency financial time series. Michal has a professional experience with modeling time series, developing various statistical models and asset pricing in time continuous framework. He also has more than 4 years of experience of teaching PhD courses on statistics, econometrics, and financial econometrics.
- Josef Švec – is currently working as a data scientist in Workday. As a part of his work, he contributes to the development of augmented analytics engine, that detects interesting stories and anomalies in human capital data. Moreover, he analyses data about customer adoption and interactions with the product. He already participated as a coach in two Czechitas courses Python 1 and Introduction to Data Analysis. He has a master's degree in Applied Economics from CERGE economic institute that is a joint workplace of Charles University and the Economics Institute of the Czech Academy of Sciences.
- Martin Koryťák – is a data scientist and Python developer at Workday. He is one of the key contributors to its proprietary engine which provides enthralling insights into HR data in a narrative form. Prior to joining Workday, Martin was an IBM Great Minds intern at IBM Research in Zurich working on accelerated inference of tree-based models capable of handling large-scale data sets. He holds an M.Sc. degree in data science with specialization in artificial intelligence from Czech Technical University in Prague. His interests span algorithms, neural networks and interpretability of machine learning algorithms. He is also a member of the local AI community and an enthusiastic teacher of Python programming language.
- Anna Štrobová – is a graduate from University of St Andrews with bachelor’s degree from Neuroscience and masters’ in Data Science. She is currently working as a data scientist for MSD Research Labs, specialising on optimisation of clinical trial enrolment. Previously, she worked in the chatbot development field in the financial sector. Anna also has experience in combining data science with life science research, working in various labs across Europe.