The Machine Learning Hub

ML @ KAUST

Beginner

As an aspiring data scientist or machine learning researcher/practioner you will need to develop proficiency in the following core data science tools.

We cover all of the above (and more!) as part of the Introduction to Data Science Workshop Series offered by the KAUST Visualization Lab (KVL) in both the Fall and Spring semesters.

The lesson materials for the Introduction to Data Science Workshop Series draw heavily from the Software Carpentry lessons on Python, R, SQL, Git, and Bash as well as a number of domain-specific data-science lessons developed by Data Carpentry that are built on this core tool stack.

Intermediate

Once you have a basic understanding of the core data science tool stack. The next step is to start gaining experience working with the main Python machine learning libraries such as Pandas, Scikit-learn, TensorFlow (Keras), PyTorch, and PySpark.

While an aspiring data scientist needs a grasp of all the core data science tools, you do not need to master all of these advanced machine learning libraries. Instead you should focus your time on learning only those libraries that are most relevant for the problems that you are working on in your research.

Books

There are a number of excellent books all of which should be available from the University Library and which have source code available for download via GitHub.

Courses

Python-based on-line data science training courses to dig deeper into a particular topic.

Advanced

Once you have experience with Pandas, Scikit-learn and one of PyTorch or TensorFlow (Keras), then you can start exploring some of the cutting edge machine learning libraries such as Nvidia RAPIDS, Dask, XG-Boost, and Numba.

Again, the point is not to master all of these but rather to learn to use the libraries that help you solve your research problems most efficiently.

Courses

The following courses are deep dives into advanced topics.