update
This commit is contained in:
190
html/learndatascience.md2.html
Normal file
190
html/learndatascience.md2.html
Normal file
@@ -0,0 +1,190 @@
|
||||
<h1 id="data-science-tutorials-resources-for-beginners-awesome">Data
|
||||
Science Tutorials & Resources for Beginners <a
|
||||
href="https://github.com/sindresorhus/awesome"><img
|
||||
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
|
||||
alt="Awesome" /></a></h1>
|
||||
<p><em>If you want to know more about Data Science but don’t know where
|
||||
to start this list is for you!</em> :chart_with_upwards_trend:</p>
|
||||
<p>No previous knowledge is required but Python and statistics basics
|
||||
will definitely come in handy. These resources have been used
|
||||
successfully for many beginners at my local Data Science student group
|
||||
<a href="http://ml-ka.de/">ML-KA</a>.</p>
|
||||
<h2 id="what-is-data-science">What is Data Science?</h2>
|
||||
<ul>
|
||||
<li><a href="https://www.quora.com/What-is-data-science">‘What is Data
|
||||
Science?’ on Quora</a></li>
|
||||
<li><a
|
||||
href="https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-and-Big-Data-1?share=1">Explanation
|
||||
of important vocabulary</a> - Differentiation of Big Data, Machine
|
||||
Learning, Data Science.</li>
|
||||
<li><a href="https://amzn.to/2voPJUi">Data Science for Business
|
||||
(Book)</a> - An introduction to Data Science and its use as a business
|
||||
asset.</li>
|
||||
<li><a href="https://www.scaler.com/blog/data-science-process/">Data
|
||||
Science Process: A Beginner’s Comprehensive Guide</a> - Technical Skills
|
||||
for the Data Science: This emphasizes the practical skills needed
|
||||
throughout the data science process.</li>
|
||||
</ul>
|
||||
<h2 id="common-algorithms-and-procedures">Common Algorithms and
|
||||
Procedures</h2>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning">Supervised
|
||||
vs unsupervised learning</a> - The two most common types of Machine
|
||||
Learning algorithms.</li>
|
||||
<li><a
|
||||
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.05-Naive-Bayes.ipynb">9
|
||||
important Data Science algorithms and their implementation</a></li>
|
||||
<li><a
|
||||
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.03-Hyperparameters-and-Model-Validation.ipynb">Cross
|
||||
validation</a> - Evaluate the performance of your algorithm/model.</li>
|
||||
<li><a
|
||||
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.04-Feature-Engineering.ipynb">Feature
|
||||
engineering</a> - Modifying the data to better model predictions.</li>
|
||||
<li><a
|
||||
href="http://www.cs.umd.edu/%7Esamir/498/10Algorithms-08.pdf">Scientific
|
||||
introduction to 10 important Data Science algorithms</a></li>
|
||||
<li><a
|
||||
href="https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/">Model
|
||||
ensemble: Explanation</a> - Combine multiple models into one for better
|
||||
performance.</li>
|
||||
</ul>
|
||||
<h2 id="data-science-using-python">Data Science using Python</h2>
|
||||
<p>This list covers only Python, as many are already familiar with this
|
||||
language. <a href="https://github.com/ujjwalkarn/DataScienceR">Data
|
||||
Science tutorials using R</a>.</p>
|
||||
<h3 id="general">General</h3>
|
||||
<ul>
|
||||
<li><a href="https://amzn.to/2GSjjrK">O’Reilly Data Science from Scratch
|
||||
(Book)</a> - Data processing, implementation, and visualization with
|
||||
example code.</li>
|
||||
<li><a
|
||||
href="https://www.coursera.org/specializations/data-science-python">Coursera
|
||||
Applied Data Science</a> - Online Course using Python that covers most
|
||||
of the relevant toolkits.</li>
|
||||
</ul>
|
||||
<h3 id="learning-python">Learning Python</h3>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://www.youtube.com/watch?v=oVp1vrfL_w4&list=PLQVvvaa0QuDe8XSftW-RAxdo6OmaeL85M">YouTube
|
||||
tutorial series by sentdex</a></li>
|
||||
<li><a href="http://www.learnpython.org/">Interactive Python tutorial
|
||||
website</a></li>
|
||||
</ul>
|
||||
<h3 id="numpy">numpy</h3>
|
||||
<p><a href="http://www.numpy.org/">numpy</a> is a Python library which
|
||||
provides large multidimensional arrays and fast mathematical operations
|
||||
on them.</p>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://www.datacamp.com/community/tutorials/python-numpy-tutorial#gs.h3DvLnk">Numpy
|
||||
tutorial on DataCamp</a></li>
|
||||
</ul>
|
||||
<h3 id="pandas">pandas</h3>
|
||||
<p><a href="http://pandas.pydata.org/index.html">pandas</a> provides
|
||||
efficient data structures and analysis tools for Python. It is build on
|
||||
top of numpy.</p>
|
||||
<ul>
|
||||
<li><a
|
||||
href="http://www.synesthesiam.com/posts/an-introduction-to-pandas.html">Introduction
|
||||
to pandas</a></li>
|
||||
<li><a
|
||||
href="https://www.datacamp.com/courses/pandas-foundations">DataCamp
|
||||
pandas foundations</a> - Paid course, but 30 free days upon account
|
||||
creation (enough to complete course).</li>
|
||||
<li><a
|
||||
href="https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf">Pandas
|
||||
cheatsheet</a> - Quick overview over the most important functions.</li>
|
||||
</ul>
|
||||
<h3 id="scikit-learn">scikit-learn</h3>
|
||||
<p><a href="http://scikit-learn.org/stable/">scikit-learn</a> is the
|
||||
most common library for Machine Learning and Data Science in Python.</p>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.02-Introducing-Scikit-Learn.ipynb">Introduction
|
||||
and first model application</a></li>
|
||||
<li><a
|
||||
href="http://scikit-learn.org/stable/tutorial/machine_learning_map/">Rough
|
||||
guide for choosing estimators</a></li>
|
||||
<li><a
|
||||
href="http://scikit-learn.org/stable/user_guide.html">Scikit-learn
|
||||
complete user guide</a></li>
|
||||
<li><a
|
||||
href="http://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/">Model
|
||||
ensemble: Implementation in Python</a></li>
|
||||
</ul>
|
||||
<h3 id="jupyter-notebook">Jupyter Notebook</h3>
|
||||
<p><a href="https://jupyter.org/">Jupyter Notebook</a> is a web
|
||||
application for easy data visualisation and code presentation.</p>
|
||||
<ul>
|
||||
<li><a href="https://jupyter.org/install.html">Downloading and running
|
||||
first Jupyter notebook</a></li>
|
||||
<li><a
|
||||
href="https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-instacart">Example
|
||||
notebook for data exploration</a></li>
|
||||
<li><a
|
||||
href="https://elitedatascience.com/python-seaborn-tutorial">Seaborn data
|
||||
visualization tutorial</a> - Plot library that works great with
|
||||
Jupyter.</li>
|
||||
</ul>
|
||||
<h3 id="various-other-helpful-tools-and-resources">Various other helpful
|
||||
tools and resources</h3>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://github.com/drivendata/cookiecutter-data-science">Template
|
||||
folder structure for organizing Data Science projects</a></li>
|
||||
<li><a href="https://www.continuum.io/downloads">Anaconda Python
|
||||
distribution</a> - Contains most of the important Python packages for
|
||||
Data Science.</li>
|
||||
<li><a href="https://spacy.io/">Spacy</a> - Open source toolkit for
|
||||
working with text-based data.</li>
|
||||
<li><a href="https://github.com/Microsoft/LightGBM">LightGBM gradient
|
||||
boosting framework</a> - Successfully used in many Kaggle
|
||||
challenges.</li>
|
||||
<li><a href="https://aws.amazon.com/">Amazon AWS</a> - Rent cloud
|
||||
servers for more timeconsuming calculations (r4.xlarge server is a good
|
||||
place to start).</li>
|
||||
</ul>
|
||||
<h2 id="data-science-challenges-for-beginners">Data Science Challenges
|
||||
for Beginners</h2>
|
||||
<p>Sorted by increasing complexity.</p>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://www.dataquest.io/blog/kaggle-getting-started/">Walkthrough:
|
||||
House prices challenge</a> - Walkthrough through a simple challenge on
|
||||
house prices.</li>
|
||||
<li><a
|
||||
href="https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/">Blood
|
||||
Donation Challenge</a> - Predict if a donor will donate again.</li>
|
||||
<li><a href="https://www.kaggle.com/c/titanic">Titanic Challenge</a> -
|
||||
Predict survival on the Titanic.</li>
|
||||
<li><a
|
||||
href="https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/">Water
|
||||
Pump Challenge</a> - Predict the operating condition of water pumps in
|
||||
Africa.</li>
|
||||
</ul>
|
||||
<h2 id="more-advanced-resources-and-lists">More advanced resources and
|
||||
lists</h2>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://github.com/bulutyazilim/awesome-datascience">Awesome Data
|
||||
Science</a></li>
|
||||
<li><a href="https://github.com/ujjwalkarn/DataSciencePython">Data
|
||||
Science Python</a></li>
|
||||
<li><a
|
||||
href="https://github.com/ujjwalkarn/Machine-Learning-Tutorials">Machine
|
||||
Learning Tutorials</a></li>
|
||||
</ul>
|
||||
<h2 id="contribute">Contribute</h2>
|
||||
<p>Contributions welcome! Read the <a
|
||||
href="contributing.md">contribution guidelines</a> first.</p>
|
||||
<h2 id="license">License</h2>
|
||||
<p><a href="http://creativecommons.org/publicdomain/zero/1.0"><img
|
||||
src="http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg"
|
||||
alt="CC0" /></a></p>
|
||||
<p>To the extent possible under law, Simon Böhm has waived all copyright
|
||||
and related or neighboring rights to this work. Disclaimer: Some of the
|
||||
links are affiliate links.</p>
|
||||
<p><a
|
||||
href="https://github.com/siboehm/awesome-learn-datascience">learndatascience.md
|
||||
Github</a></p>
|
||||
Reference in New Issue
Block a user