Files
awesome-awesomeness/html/learndatascience.md2.html
2025-07-18 23:13:11 +02:00

191 lines
8.5 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h1 id="data-science-tutorials-resources-for-beginners-awesome">Data
Science Tutorials &amp; Resources for Beginners <a
href="https://github.com/sindresorhus/awesome"><img
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
alt="Awesome" /></a></h1>
<p><em>If you want to know more about Data Science but dont know where
to start this list is for you!</em> :chart_with_upwards_trend:</p>
<p>No previous knowledge is required but Python and statistics basics
will definitely come in handy. These resources have been used
successfully for many beginners at my local Data Science student group
<a href="http://ml-ka.de/">ML-KA</a>.</p>
<h2 id="what-is-data-science">What is Data Science?</h2>
<ul>
<li><a href="https://www.quora.com/What-is-data-science">What is Data
Science? on Quora</a></li>
<li><a
href="https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-and-Big-Data-1?share=1">Explanation
of important vocabulary</a> - Differentiation of Big Data, Machine
Learning, Data Science.</li>
<li><a href="https://amzn.to/2voPJUi">Data Science for Business
(Book)</a> - An introduction to Data Science and its use as a business
asset.</li>
<li><a href="https://www.scaler.com/blog/data-science-process/">Data
Science Process: A Beginners Comprehensive Guide</a> - Technical Skills
for the Data Science: This emphasizes the practical skills needed
throughout the data science process.</li>
</ul>
<h2 id="common-algorithms-and-procedures">Common Algorithms and
Procedures</h2>
<ul>
<li><a
href="https://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning">Supervised
vs unsupervised learning</a> - The two most common types of Machine
Learning algorithms.</li>
<li><a
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.05-Naive-Bayes.ipynb">9
important Data Science algorithms and their implementation</a></li>
<li><a
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.03-Hyperparameters-and-Model-Validation.ipynb">Cross
validation</a> - Evaluate the performance of your algorithm/model.</li>
<li><a
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.04-Feature-Engineering.ipynb">Feature
engineering</a> - Modifying the data to better model predictions.</li>
<li><a
href="http://www.cs.umd.edu/%7Esamir/498/10Algorithms-08.pdf">Scientific
introduction to 10 important Data Science algorithms</a></li>
<li><a
href="https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/">Model
ensemble: Explanation</a> - Combine multiple models into one for better
performance.</li>
</ul>
<h2 id="data-science-using-python">Data Science using Python</h2>
<p>This list covers only Python, as many are already familiar with this
language. <a href="https://github.com/ujjwalkarn/DataScienceR">Data
Science tutorials using R</a>.</p>
<h3 id="general">General</h3>
<ul>
<li><a href="https://amzn.to/2GSjjrK">OReilly Data Science from Scratch
(Book)</a> - Data processing, implementation, and visualization with
example code.</li>
<li><a
href="https://www.coursera.org/specializations/data-science-python">Coursera
Applied Data Science</a> - Online Course using Python that covers most
of the relevant toolkits.</li>
</ul>
<h3 id="learning-python">Learning Python</h3>
<ul>
<li><a
href="https://www.youtube.com/watch?v=oVp1vrfL_w4&amp;list=PLQVvvaa0QuDe8XSftW-RAxdo6OmaeL85M">YouTube
tutorial series by sentdex</a></li>
<li><a href="http://www.learnpython.org/">Interactive Python tutorial
website</a></li>
</ul>
<h3 id="numpy">numpy</h3>
<p><a href="http://www.numpy.org/">numpy</a> is a Python library which
provides large multidimensional arrays and fast mathematical operations
on them.</p>
<ul>
<li><a
href="https://www.datacamp.com/community/tutorials/python-numpy-tutorial#gs.h3DvLnk">Numpy
tutorial on DataCamp</a></li>
</ul>
<h3 id="pandas">pandas</h3>
<p><a href="http://pandas.pydata.org/index.html">pandas</a> provides
efficient data structures and analysis tools for Python. It is build on
top of numpy.</p>
<ul>
<li><a
href="http://www.synesthesiam.com/posts/an-introduction-to-pandas.html">Introduction
to pandas</a></li>
<li><a
href="https://www.datacamp.com/courses/pandas-foundations">DataCamp
pandas foundations</a> - Paid course, but 30 free days upon account
creation (enough to complete course).</li>
<li><a
href="https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf">Pandas
cheatsheet</a> - Quick overview over the most important functions.</li>
</ul>
<h3 id="scikit-learn">scikit-learn</h3>
<p><a href="http://scikit-learn.org/stable/">scikit-learn</a> is the
most common library for Machine Learning and Data Science in Python.</p>
<ul>
<li><a
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.02-Introducing-Scikit-Learn.ipynb">Introduction
and first model application</a></li>
<li><a
href="http://scikit-learn.org/stable/tutorial/machine_learning_map/">Rough
guide for choosing estimators</a></li>
<li><a
href="http://scikit-learn.org/stable/user_guide.html">Scikit-learn
complete user guide</a></li>
<li><a
href="http://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/">Model
ensemble: Implementation in Python</a></li>
</ul>
<h3 id="jupyter-notebook">Jupyter Notebook</h3>
<p><a href="https://jupyter.org/">Jupyter Notebook</a> is a web
application for easy data visualisation and code presentation.</p>
<ul>
<li><a href="https://jupyter.org/install.html">Downloading and running
first Jupyter notebook</a></li>
<li><a
href="https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-instacart">Example
notebook for data exploration</a></li>
<li><a
href="https://elitedatascience.com/python-seaborn-tutorial">Seaborn data
visualization tutorial</a> - Plot library that works great with
Jupyter.</li>
</ul>
<h3 id="various-other-helpful-tools-and-resources">Various other helpful
tools and resources</h3>
<ul>
<li><a
href="https://github.com/drivendata/cookiecutter-data-science">Template
folder structure for organizing Data Science projects</a></li>
<li><a href="https://www.continuum.io/downloads">Anaconda Python
distribution</a> - Contains most of the important Python packages for
Data Science.</li>
<li><a href="https://spacy.io/">Spacy</a> - Open source toolkit for
working with text-based data.</li>
<li><a href="https://github.com/Microsoft/LightGBM">LightGBM gradient
boosting framework</a> - Successfully used in many Kaggle
challenges.</li>
<li><a href="https://aws.amazon.com/">Amazon AWS</a> - Rent cloud
servers for more timeconsuming calculations (r4.xlarge server is a good
place to start).</li>
</ul>
<h2 id="data-science-challenges-for-beginners">Data Science Challenges
for Beginners</h2>
<p>Sorted by increasing complexity.</p>
<ul>
<li><a
href="https://www.dataquest.io/blog/kaggle-getting-started/">Walkthrough:
House prices challenge</a> - Walkthrough through a simple challenge on
house prices.</li>
<li><a
href="https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/">Blood
Donation Challenge</a> - Predict if a donor will donate again.</li>
<li><a href="https://www.kaggle.com/c/titanic">Titanic Challenge</a> -
Predict survival on the Titanic.</li>
<li><a
href="https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/">Water
Pump Challenge</a> - Predict the operating condition of water pumps in
Africa.</li>
</ul>
<h2 id="more-advanced-resources-and-lists">More advanced resources and
lists</h2>
<ul>
<li><a
href="https://github.com/bulutyazilim/awesome-datascience">Awesome Data
Science</a></li>
<li><a href="https://github.com/ujjwalkarn/DataSciencePython">Data
Science Python</a></li>
<li><a
href="https://github.com/ujjwalkarn/Machine-Learning-Tutorials">Machine
Learning Tutorials</a></li>
</ul>
<h2 id="contribute">Contribute</h2>
<p>Contributions welcome! Read the <a
href="contributing.md">contribution guidelines</a> first.</p>
<h2 id="license">License</h2>
<p><a href="http://creativecommons.org/publicdomain/zero/1.0"><img
src="http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg"
alt="CC0" /></a></p>
<p>To the extent possible under law, Simon Böhm has waived all copyright
and related or neighboring rights to this work. Disclaimer: Some of the
links are affiliate links.</p>
<p><a
href="https://github.com/siboehm/awesome-learn-datascience">learndatascience.md
Github</a></p>