101 lines
11 KiB
Plaintext
101 lines
11 KiB
Plaintext
[38;5;12m [39m[38;2;255;187;0m[1m[4mData Science Tutorials & Resources for Beginners [0m[38;5;14m[1m[4m![0m[38;2;255;187;0m[1m[4mAwesome[0m[38;5;14m[1m[4m (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)[0m[38;2;255;187;0m[1m[4m (https://github.com/sindresorhus/awesome)[0m
|
||
|
||
[48;2;30;30;40m[38;5;13m[3mIf you want to know more about Data Science but don't know where to start this list is for you![0m[38;5;12m :chart_with_upwards_trend:[39m
|
||
|
||
[38;5;12mNo[39m[38;5;12m [39m[38;5;12mprevious[39m[38;5;12m [39m[38;5;12mknowledge[39m[38;5;12m [39m[38;5;12mrequired[39m[38;5;12m [39m[38;5;12mbut[39m[38;5;12m [39m[38;5;12mPython[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mstatistics[39m[38;5;12m [39m[38;5;12mbasics[39m[38;5;12m [39m[38;5;12mwill[39m[38;5;12m [39m[38;5;12mdefinitely[39m[38;5;12m [39m[38;5;12mcome[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12mhandy.[39m[38;5;12m [39m[38;5;12mThese[39m[38;5;12m [39m[38;5;12mressources[39m[38;5;12m [39m[38;5;12mhave[39m[38;5;12m [39m[38;5;12mbeen[39m[38;5;12m [39m[38;5;12mused[39m[38;5;12m [39m[38;5;12msuccessfully[39m[38;5;12m [39m[38;5;12mfor[39m[38;5;12m [39m[38;5;12mmany[39m[38;5;12m [39m[38;5;12mbeginners[39m[38;5;12m [39m[38;5;12mat[39m[38;5;12m [39m[38;5;12mmy[39m[38;5;12m [39m[38;5;12mlocal[39m[38;5;12m [39m[38;5;12mData[39m[38;5;12m [39m[38;5;12mScience[39m[38;5;12m [39m[38;5;12mstudent[39m[38;5;12m [39m[38;5;12mgroup[39m[38;5;12m [39m[38;5;14m[1mML-KA[0m[38;5;12m [39m
|
||
[38;5;12m(http://ml-ka.de/).[39m
|
||
|
||
[38;2;255;187;0m[4mWhat is Data Science?[0m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1m'What is Data Science?' on Quora[0m[38;5;12m (https://www.quora.com/What-is-data-science)[39m
|
||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mExplanation[0m[38;5;14m[1m [0m[38;5;14m[1mof[0m[38;5;14m[1m [0m[38;5;14m[1mimportant[0m[38;5;14m[1m [0m[38;5;14m[1mvocabulary[0m[38;5;12m [39m[38;5;12m(https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-and-Big-Data-1?share=1)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mDifferentiation[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mBig[39m
|
||
[38;5;12mData,[39m[38;5;12m [39m[38;5;12mMachine[39m[38;5;12m [39m[38;5;12mLearning,[39m[38;5;12m [39m[38;5;12mData[39m[38;5;12m [39m[38;5;12mScience.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mData Science for Business (Book)[0m[38;5;12m (https://amzn.to/2voPJUi) - An introduction to Data Science and its use as a business asset.[39m
|
||
|
||
[38;2;255;187;0m[4mCommon Algorithms and Procedures[0m
|
||
|
||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mSupervised[0m[38;5;14m[1m [0m[38;5;14m[1mvs[0m[38;5;14m[1m [0m[38;5;14m[1munsupervised[0m[38;5;14m[1m [0m[38;5;14m[1mlearning[0m[38;5;12m [39m[38;5;12m(https://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mThe[39m[38;5;12m [39m[38;5;12mtwo[39m[38;5;12m [39m[38;5;12mmost[39m[38;5;12m [39m[38;5;12mcommon[39m[38;5;12m [39m[38;5;12mtypes[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mMachine[39m[38;5;12m [39m[38;5;12mLearning[39m[38;5;12m [39m
|
||
[38;5;12malgorithms.[39m[38;5;12m [39m
|
||
[38;5;12m- [39m[38;5;14m[1m9 important Data Science algorithms and their implementation[0m[38;5;12m (https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.05-Naive-Bayes.ipynb) [39m
|
||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mCross[0m[38;5;14m[1m [0m[38;5;14m[1mvalidation[0m[38;5;12m [39m[38;5;12m(https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.03-Hyperparameters-and-Model-Validation.ipynb)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mEvaluate[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mperformance[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12malgorithm[39m[38;5;12m [39m[38;5;12m/[39m[38;5;12m [39m
|
||
[38;5;12mmodel.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mFeature engineering[0m[38;5;12m (https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.04-Feature-Engineering.ipynb) - Modifying the data to better model predictions.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mScientific introduction to 10 important Data Science algorithms[0m[38;5;12m (http://www.cs.umd.edu/%7Esamir/498/10Algorithms-08.pdf)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mModel ensemble: Explanation[0m[38;5;12m (https://www.analyticsvidhya.com/blog/2017/02/introduction-to-ensembling-along-with-implementation-in-r/) - Combine multiple models into one for better performance.[39m
|
||
|
||
[38;2;255;187;0m[4mData Science using Python[0m
|
||
[38;5;12mThis list covers only Python, as many are already familiar with this language. [39m[38;5;14m[1mData Science tutorials using R[0m[38;5;12m (https://github.com/ujjwalkarn/DataScienceR).[39m
|
||
|
||
[38;2;255;187;0m[4mGeneral[0m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mO'Reilly Data Science from Scratch (Book)[0m[38;5;12m (https://amzn.to/2GSjjrK) - Data processing, implementation, and visualization with example code.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mCoursera Applied Data Science[0m[38;5;12m (https://www.coursera.org/specializations/data-science-python) - Online Course using Python that covers most of the relevant toolkits. [39m
|
||
|
||
[38;2;255;187;0m[4mLearning Python[0m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mYouTube tutorial series by sentdex[0m[38;5;12m (https://www.youtube.com/watch?v=oVp1vrfL_w4&list=PLQVvvaa0QuDe8XSftW-RAxdo6OmaeL85M)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mInteractive Python tutorial website[0m[38;5;12m (http://www.learnpython.org/)[39m
|
||
|
||
[38;2;255;187;0m[4mnumpy[0m
|
||
[38;5;14m[1mnumpy[0m[38;5;12m (http://www.numpy.org/) is a Python library which provides large multidimensional arrays and fast mathematical operations on them.[39m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mNumpy tutorial on DataCamp[0m[38;5;12m (https://www.datacamp.com/community/tutorials/python-numpy-tutorial#gs.h3DvLnk)[39m
|
||
|
||
[38;2;255;187;0m[4mpandas[0m
|
||
[38;5;14m[1mpandas[0m[38;5;12m (http://pandas.pydata.org/index.html) provides efficient data structures and analysis tools for Python. It is build on top of numpy.[39m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mIntroduction to pandas[0m[38;5;12m (http://www.synesthesiam.com/posts/an-introduction-to-pandas.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mDataCamp pandas foundations[0m[38;5;12m (https://www.datacamp.com/courses/pandas-foundations) - Paid course, but 30 free days upon account creation (enough to complete course).[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPandas cheatsheet[0m[38;5;12m (https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf) - Quick overview over the most important functions.[39m
|
||
|
||
[38;2;255;187;0m[4mscikit-learn[0m
|
||
[38;5;14m[1mscikit-learn[0m[38;5;12m (http://scikit-learn.org/stable/) is the most common library for Machine Learning and Data Science in Python.[39m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mIntroduction and first model application[0m[38;5;12m (https://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.02-Introducing-Scikit-Learn.ipynb)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mRough guide for choosing estimators[0m[38;5;12m (http://scikit-learn.org/stable/tutorial/machine_learning_map/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mScikit-learn complete user guide[0m[38;5;12m (http://scikit-learn.org/stable/user_guide.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mModel ensemble: Implementation in Python[0m[38;5;12m (http://machinelearningmastery.com/ensemble-machine-learning-algorithms-python-scikit-learn/)[39m
|
||
|
||
[38;2;255;187;0m[4mJupyter Notebook[0m
|
||
[38;5;14m[1mJupyter Notebook[0m[38;5;12m (https://jupyter.org/) is a web application for easy data visualisation and code presentation.[39m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mDownloading and running first Jupyter notebook[0m[38;5;12m (https://jupyter.org/install.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mExample notebook for data exploration[0m[38;5;12m (https://www.kaggle.com/sudalairajkumar/simple-exploration-notebook-instacart)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSeaborn data visualization tutorial[0m[38;5;12m (https://elitedatascience.com/python-seaborn-tutorial) - Plot library that works great with Jupyter.[39m
|
||
|
||
|
||
[38;2;255;187;0m[4mVarious other helpful tools and resources[0m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mTemplate folder structure for organizing Data Science projects[0m[38;5;12m (https://github.com/drivendata/cookiecutter-data-science)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mAnaconda Python distribution[0m[38;5;12m (https://www.continuum.io/downloads) - Contains most of the important Python packages for Data Science.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSpacy[0m[38;5;12m (https://spacy.io/) - Open source toolkit for working with text-based data.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mLightGBM gradient boosting framework[0m[38;5;12m (https://github.com/Microsoft/LightGBM) - Successfully used in many Kaggle challenges.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mAmazon AWS[0m[38;5;12m (https://aws.amazon.com/) - Rent cloud servers for more timeconsuming calculations (r4.xlarge server is a good place to start).[39m
|
||
|
||
|
||
[38;2;255;187;0m[4mData Science Challenges for Beginners[0m
|
||
[38;5;12mSorted by increasing complexity.[39m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mWalkthrough: House prices challenge[0m[38;5;12m (https://www.dataquest.io/blog/kaggle-getting-started/) - Walkthrough through a simple challenge on house prices.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mBlood Donation Challenge[0m[38;5;12m (https://www.drivendata.org/competitions/2/warm-up-predict-blood-donations/) - Predict if a donor will donate again.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mTitanic Challenge[0m[38;5;12m (https://www.kaggle.com/c/titanic) - Predict survival on the Titanic.[39m
|
||
[38;5;12m- [39m[38;5;14m[1mWater Pump Challenge[0m[38;5;12m (https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/) - Predict the operating condition of water pumps in Africa.[39m
|
||
|
||
[38;2;255;187;0m[4mMore advanced resources and lists[0m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mAwesome Data Science[0m[38;5;12m (https://github.com/bulutyazilim/awesome-datascience)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mData Science Python[0m[38;5;12m (https://github.com/ujjwalkarn/DataSciencePython)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mMachine Learning Tutorials[0m[38;5;12m (https://github.com/ujjwalkarn/Machine-Learning-Tutorials)[39m
|
||
|
||
[38;2;255;187;0m[4mContribute[0m
|
||
|
||
[38;5;12mContributions welcome! Read the [39m[38;5;14m[1mcontribution guidelines[0m[38;5;12m (contributing.md) first.[39m
|
||
|
||
|
||
[38;2;255;187;0m[4mLicense[0m
|
||
|
||
[38;5;14m[1m![0m[38;5;12mCC0[39m[38;5;14m[1m (http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)[0m[38;5;12m (http://creativecommons.org/publicdomain/zero/1.0)[39m
|
||
|
||
[38;5;12mTo the extent possible under law, Simon Böhm has waived all copyright and[39m
|
||
[38;5;12mrelated or neighboring rights to this work. Disclaimer: Some of the links are affiliate links.[39m
|