From be32a8d4aec3cf12808af5434fea29a9f76a85ee Mon Sep 17 00:00:00 2001 From: Jonas Zeunert Date: Tue, 23 Apr 2024 18:40:15 +0200 Subject: [PATCH] Update index --- html/datascience.html | 2893 ++++++++++++++++++++++++++++++++++++- html/datascience.md2.html | 2893 +------------------------------------ html/index.html | 14 +- lists/awesome-index | 2 +- readmes/datascience.md | 1122 +++++++++++++- readmes/datascience.md2 | 1122 +------------- readmes/index.md | 14 +- terminal/datascience | 1191 ++++++++++++++- terminal/datascience2 | 1191 +-------------- terminal/index | 10 +- 10 files changed, 5229 insertions(+), 5223 deletions(-) diff --git a/html/datascience.html b/html/datascience.html index c1c9d2a..ba6d85c 100644 --- a/html/datascience.html +++ b/html/datascience.html @@ -1,7 +1,2886 @@ -

awesome-data-science

-

A curated list of amazingly awesome open source data science -resources.

-

Data Visualization

-

A JavaScript visualization library for HTML and SVG - -http://d3js.org

-

Real-time visualization library - https://github.com/fastly/epoch

+
+ +
+

AWESOME DATA SCIENCE

+

+

An open-source Data Science repository to learn and apply +towards solving real world problems.

+

This is a shortcut path to start studying Data +Science. Just follow the steps to answer the questions, “What +is Data Science and what should I study to learn Data Science?”

+

Sponsors

+ + + + + + + + + + + + + +
SponsorPitch
Be the first to sponsor! github@academic.io
+


+

Table of Contents

+ +

What is Data Science?

+

^ back to top ^

+

Data Science is one of the hottest topics on the Computer and +Internet farmland nowadays. People have gathered data from applications +and systems until today and now is the time to analyze them. The next +steps are producing suggestions from the data and creating predictions +about the future. Here +you can find the biggest question for Data Science and +hundreds of answers from experts.

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LinkPreview
What is +Data Science @ O’reillyData scientists combine entrepreneurship with patience, the +willingness to build data products incrementally, the ability to +explore, and the ability to iterate over a solution. They are inherently +interdisciplinary. They can tackle all aspects of a problem, from +initial data collection and data conditioning to drawing conclusions. +They can think outside the box to come up with new ways to view the +problem, or to work with very broadly defined problems: “here’s a lot of +data, what can you make from it?”
What is +Data Science @ QuoraData Science is a combination of a number of aspects of Data such as +Technology, Algorithm development, and data interference to study the +data, analyse it, and find innovative solutions to difficult problems. +Basically Data Science is all about Analysing data and driving for +business growth by finding creative ways.
The +sexiest job of 21st centuryData scientists today are akin to Wall Street “quants” of the +1980s and 1990s. In those days people with backgrounds in physics and +math streamed to investment banks and hedge funds, where they could +devise entirely new algorithms and data strategies. Then a variety of +universities developed master’s programs in financial engineering, which +churned out a second generation of talent that was more accessible to +mainstream firms. The pattern was repeated later in the 1990s with +search engineers, whose rarefied skills soon came to be taught in +computer science programs.
WikipediaData science is an interdisciplinary field that uses scientific +methods, processes, algorithms and systems to extract knowledge and +insights from many structural and unstructured data. Data science is +related to data mining, machine learning and big data.
How +to Become a Data ScientistData scientists are big data wranglers, gathering and analyzing +large sets of structured and unstructured data. A data scientist’s role +combines computer science, statistics, and mathematics. They analyze, +process, and model data then interpret the results to create actionable +plans for companies and other organizations.
a +very short history of #datascienceThe story of how data scientists became sexy is mostly the story +of the coupling of the mature discipline of statistics with a very young +one–computer science. The term “Data Science” has emerged only recently +to specifically designate a new profession that is expected to make +sense of the vast stores of big data. But making sense of data has a +long history and has been discussed by scientists, statisticians, +librarians, computer scientists and others for years. The following +timeline traces the evolution of the term “Data Science” and its use, +attempts to define it, and related terms.
Software +Development Resources for Data ScientistsData scientists concentrate on making sense of data through +exploratory analysis, statistics, and models. Software developers apply +a separate set of knowledge with different tools. Although their focus +may seem unrelated, data science teams can benefit from adopting +software development best practices. Version control, automated testing, +and other dev skills help create reproducible, production-ready code and +tools.
+

Where do I Start?

+

^ back to top ^

+

While not strictly necessary, having a programming language is a +crucial skill to be effective as a data scientist. Currently, the most +popular language is Python, closely followed by R. +Python is a general-purpose scripting language that sees applications in +a wide variety of fields. R is a domain-specific language for +statistics, which contains a lot of common statistics tools out of the +box.

+

Python is by far the most popular +language in science, due in no small part to the ease at which it can be +used and the vibrant ecosystem of user-generated packages. To install +packages, there are two main methods: Pip (invoked as +pip install), the package manager that comes bundled with +Python, and Anaconda (invoked as +conda install), a powerful package manager that can install +packages for Python, R, and can download executables like Git.

+

Unlike R, Python was not built from the ground up with data science +in mind, but there are plenty of third party libraries to make up for +this. A much more exhaustive list of packages can be found later in this +document, but these four packages are a good set of choices to start +your data science journey with: Scikit-Learn is a +general-purpose data science package which implements the most popular +algorithms - it also includes rich documentation, tutorials, and +examples of the models it implements. Even if you prefer to write your +own implementations, Scikit-Learn is a valuable reference to the +nuts-and-bolts behind many of the common algorithms you’ll find. With Pandas, one can collect and +analyze their data into a convenient table format. Numpy provides very fast tooling for +mathematical operations, with a focus on vectors and matrices. Seaborn, itself based on the Matplotlib package, is a quick way to +generate beautiful visualizations of your data, with many good defaults +available out of the box, as well as a gallery showing how to produce +many common visualizations of your data.

+

When embarking on your journey to becoming a data scientist, the +choice of language isn’t particularly important, and both Python and R +have their pros and cons. Pick a language you like, and check out one of +the Free courses we’ve listed below!

+

Real World

+

^ back to top ^

+

Data science is a powerful tool that is utilized in various fields to +solve real-world problems by extracting insights and patterns from +complex data.

+

Disaster

+

^ back to top ^

+ +

Training Resources

+

^ back to top ^

+

How do you learn data science? By doing data science, of course! +Okay, okay - that might not be particularly helpful when you’re first +starting out. In this section, we’ve listed some learning resources, in +rough order from least to greatest commitment - Tutorials, Massively Open Online +Courses (MOOCs), Intensive +Programs, and Colleges.

+

Tutorials

+

^ back to top ^

+ +

Free Courses

+

^ back to top ^

+ +

MOOC’s

+

^ back to top ^

+ +

Intensive Programs

+

^ back to top ^

+ +

Colleges

+

^ back to top ^

+ +

The Data Science Toolbox

+

^ back to top ^

+

This section is a collection of packages, tools, algorithms, and +other useful items in the data science world.

+

Algorithms

+

^ back to top ^

+

These are some Machine Learning and Data Mining algorithms and models +help you to understand your data and derive meaning from it.

+

Three kinds of Machine +Learning Systems

+ +

Supervised Learning

+ +

Unsupervised Learning

+ +

Semi-Supervised Learning

+ +

Reinforcement Learning

+ +

Data Mining Algorithms

+ +

Deep Learning architectures

+ +

General Machine Learning +Packages

+

^ back to top ^

+ +

Deep Learning Packages

+

PyTorch Ecosystem

+ +

TensorFlow Ecosystem

+ +

Keras Ecosystem

+ +

Visualization Tools

+

^ back to top ^

+ +

Miscellaneous Tools

+

^ back to top ^

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LinkDescription
The Data Science Lifecycle +ProcessThe Data Science Lifecycle Process is a process for taking data +science teams from Idea to Value repeatedly and sustainably. The process +is documented in this repo
Data Science +Lifecycle Template RepoTemplate repository for data science lifecycle project
RexMexA general purpose recommender metrics library for fair +evaluation.
ChemicalXA PyTorch based deep learning library for drug pair scoring.
PyTorch +Geometric TemporalRepresentation learning on dynamic graphs.
Little +Ball of FurA graph sampling library for NetworkX with a Scikit-Learn like +API.
Karate +ClubAn unsupervised machine learning extension library for NetworkX with +a Scikit-Learn like API.
ML +WorkspaceAll-in-one web-based IDE for machine learning and data science. The +workspace is deployed as a Docker container and is preloaded with a +variety of popular data science libraries (e.g., Tensorflow, PyTorch) +and dev tools (e.g., Jupyter, VS Code)
Neptune.aiCommunity-friendly platform supporting data scientists in creating +and sharing machine learning models. Neptune facilitates teamwork, +infrastructure management, models comparison and reproducibility.
steppyLightweight, Python library for fast and reproducible machine +learning experimentation. Introduces very simple interface that enables +clean machine learning pipeline design.
steppy-toolkitCurated collection of the neural networks, transformers and models +that make your machine learning work faster and more effective.
Datalab from +Googleeasily explore, visualize, analyze, and transform data using +familiar languages, such as Python and SQL, interactively.
Hortonworks +Sandboxis a personal, portable Hadoop environment that comes with a dozen +interactive Hadoop tutorials.
Ris a free software environment for statistical computing and +graphics.
Tidyverseis an opinionated collection of R packages designed for data +science. All packages share an underlying design philosophy, grammar, +and data structures.
RStudioIDE – powerful user interface for R. It’s free and open source, and +works on Windows, Mac, and Linux.
Python - Pandas - +AnacondaCompletely free enterprise-ready Python distribution for large-scale +data processing, predictive analytics, and scientific computing
Pandas GUIPandas GUI
Scikit-LearnMachine Learning in Python
NumPyNumPy is fundamental for scientific computing with Python. It +supports large, multi-dimensional arrays and matrices and includes an +assortment of high-level mathematical functions to operate on these +arrays.
VaexVaex is a Python library that allows you to visualize large datasets +and calculate statistics at high speeds.
SciPySciPy works with NumPy arrays and provides efficient routines for +numerical integration and optimization.
Data +Science ToolboxCoursera Course
Data Science +ToolboxBlog
Wolfram +Data Science PlatformTake numerical, textual, image, GIS or other data and give it the +Wolfram treatment, carrying out a full spectrum of data science analysis +and visualization and automatically generate rich interactive +reports—all powered by the revolutionary knowledge-based Wolfram +Language.
DatadogSolutions, code, and devops for high-scale data science.
VarianceBuild powerful data visualizations for the web without writing +JavaScript
Kite +Development KitThe Kite Software Development Kit (Apache License, Version 2.0), or +Kite for short, is a set of libraries, tools, examples, and +documentation focused on making it easier to build systems on top of the +Hadoop ecosystem.
Domino Data LabsRun, scale, share, and deploy your models — without any +infrastructure or setup.
Apache FlinkA platform for efficient, distributed, general-purpose data +processing.
Apache HamaApache Hama is an Apache Top-Level open source project, allowing you +to do advanced analytics beyond MapReduce.
WekaWeka is a collection of machine learning algorithms for data mining +tasks.
OctaveGNU Octave is a high-level interpreted language, primarily intended +for numerical computations.(Free Matlab)
Apache SparkLightning-fast cluster computing
Hydrosphere +Mista service for exposing Apache Spark analytics jobs and machine +learning models as realtime, batch or reactive web services.
Data MechanicsA data science and engineering platform making Apache Spark more +developer-friendly and cost-effective.
CaffeDeep Learning Framework
TorchA SCIENTIFIC COMPUTING FRAMEWORK FOR LUAJIT
Nervana’s python +based Deep Learning FrameworkIntel® Nervana™ reference deep learning framework committed to best +performance on all hardware.
SkaleHigh performance distributed data processing in NodeJS
AerosolveA machine learning package built for humans.
Intel frameworkIntel® Deep Learning Framework
DatawrapperAn open source data visualization platform helping everyone to +create simple, correct and embeddable charts. Also at github.com
Tensor FlowTensorFlow is an Open Source Software Library for Machine +Intelligence
Natural Language ToolkitAn introductory yet powerful toolkit for natural language processing +and classification
Annotation +LabFree End-to-End No-Code platform for text annotation and DL model +training/tuning. Out-of-the-box support for Named Entity Recognition, +Classification, Relation extraction and Assertion Status Spark NLP +models. Unlimited support for users, teams, projects, documents.
nlp-toolkit for +node.jsThis module covers some basic nlp principles and implementations. +The main focus is performance. When we deal with sample or training data +in nlp, we quickly run out of memory. Therefore every implementation in +this module is written as stream to only hold that data in memory that +is currently processed at any step.
Juliahigh-level, high-performance dynamic programming language for +technical computing
IJuliaa Julia-language backend combined with the Jupyter interactive +environment
Apache ZeppelinWeb-based notebook that enables data-driven, interactive data +analytics and collaborative documents with SQL, Scala and more
FeaturetoolsAn open source framework for automated feature engineering written +in python
OptimusCleansing, pre-processing, feature engineering, exploratory data +analysis and easy ML with PySpark backend.
AlbumentationsА fast and framework agnostic image augmentation library that +implements a diverse set of augmentation techniques. Supports +classification, segmentation, and detection out of the box. Was used to +win a number of Deep Learning competitions at Kaggle, Topcoder and those +that were a part of the CVPR workshops.
DVCAn open-source data science version control system. It helps track, +organize and make data science projects reproducible. In its very basic +scenario it helps version control and share large data and model +files.
Lambdois a workflow engine that significantly simplifies data analysis by +combining in one analysis pipeline (i) feature engineering and machine +learning (ii) model training and prediction (iii) table population and +column evaluation.
FeastA feature store for the management, discovery, and access of machine +learning features. Feast provides a consistent view of feature data for +both model training and model serving.
PolyaxonA platform for reproducible and scalable machine learning and deep +learning.
LightTagText Annotation Tool for teams
UBIAIEasy-to-use text annotation tool for teams with most comprehensive +auto-annotation features. Supports NER, relations and document +classification as well as OCR annotation for invoice labeling
TrainsAuto-Magical Experiment Manager, Version Control & DevOps for +AI
HopsworksOpen-source data-intensive machine learning platform with a feature +store. Ingest and manage features for both online (MySQL Cluster) and +offline (Apache Hive) access, train and serve models at scale.
MindsDBMindsDB is an Explainable AutoML framework for developers. With +MindsDB you can build, train and use state of the art ML models in as +simple as one line of code.
LightwoodA Pytorch based framework that breaks down machine learning problems +into smaller blocks that can be glued together seamlessly with an +objective to build predictive models with one line of code.
AWS Data +WranglerAn open-source Python package that extends the power of Pandas +library to AWS connecting DataFrames and AWS data related services +(Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, etc).
Amazon +RekognitionAWS Rekognition is a service that lets developers working with +Amazon Web Services add image analysis to their applications. Catalog +assets, automate workflows, and extract meaning from your media and +applications.
Amazon TextractAutomatically extract printed text, handwriting, and data from any +document.
Amazon Lookout +for VisionSpot product defects using computer vision to automate quality +inspection. Identify missing product components, vehicle and structure +damage, and irregularities for comprehensive quality control.
Amazon CodeGuruAutomate code reviews and optimize application performance with +ML-powered recommendations.
CMLAn open source toolkit for using continuous integration in data +science projects. Automatically train and test models in production-like +environments with GitHub Actions & GitLab CI, and autogenerate +visual reports on pull/merge requests.
DaskAn open source Python library to painlessly transition your +analytics code to distributed computing systems (Big Data)
StatsmodelsA Python-based inferential statistics, hypothesis testing and +regression framework
GensimAn open-source library for topic modeling of natural language +text
spaCyA performant natural language processing toolkit
Grid +StudioGrid studio is a web-based spreadsheet application with full +integration of the Python programming language.
Python Data +Science HandbookPython Data Science Handbook: full text in Jupyter Notebooks
ShapleyA data-driven framework to quantify the value of classifiers in a +machine learning ensemble.
DAGsHubA platform built on open source tools for data, model and pipeline +management.
DeepnoteA new kind of data science notebook. Jupyter-compatible, with +real-time collaboration and running in the cloud.
ValohaiAn MLOps platform that handles machine orchestration, automatic +reproducibility and deployment.
PyMC3A Python Library for Probabalistic Programming (Bayesian Inference +and Machine Learning)
PyStanPython interface to Stan (Bayesian inference and modeling)
hmmlearnUnsupervised learning and inference of Hidden Markov Models
Chaos +GeniusML powered analytics engine for outlier/anomaly detection and root +cause analysis
NimbleboxA full-stack MLOps platform designed to help data scientists and +machine learning practitioners around the world discover, create, and +launch multi-cloud apps from their web browser.
TowheeA Python library that helps you encode your unstructured data into +embeddings.
LineaPyEver been frustrated with cleaning up long, messy Jupyter notebooks? +With LineaPy, an open source Python library, it takes as little as two +lines of code to transform messy development code into production +pipelines.
envd🏕️ machine learning development environment for data science and +AI/ML engineering teams
Explore +Data Science LibrariesA search engine 🔎 tool to discover & find a curated list of +popular & new libraries, top authors, trending project kits, +discussions, tutorials & learning resources
MLEM🐶 Version and deploy your ML models following GitOps +principles
MLflowMLOps framework for managing ML models across their full +lifecycle
cleanlabPython library for data-centric AI and automatically detecting +various issues in ML datasets
AutoGluonAutoML to easily produce accurate predictions for image, text, +tabular, time-series, and multi-modal data
Arize AIArize AI community tier observability tool for monitoring machine +learning models in production and root-causing issues such as data +quality and performance drift.
Aureo.ioAureo.io is a low-code platform that focuses on building artificial +intelligence. It provides users with the capability to create pipelines, +automations and integrate them with artificial intelligence models – all +with their basic data.
ERD LabFree cloud based entity relationship diagram (ERD) tool made for +developers.
Arize-PhoenixMLOps in a notebook - uncover insights, surface problems, monitor, +and fine tune your models.
CometAn MLOps platform with experiment tracking, model production +management, a model registry, and full data lineage to support your ML +workflow from training straight through to production.
CometLLMLog, track, visualize, and search your LLM prompts and chains in one +easy-to-use, 100% open-source tool.
SynthicalAI-powered collaborative environment for research. Find relevant +papers, create collections to manage bibliography, and summarize content +— all in one place
teeplotWorkflow tool to automatically organize data visualization +output
+

Literature and Media

+

^ back to top ^

+

This section includes some additional reading material, channels to +watch, and talks to listen to.

+

Books

+

^ back to top ^

+ +

Book Deals (Affiliated) 🛍

+ +

Journals, Publications and +Magazines

+

^ back to top ^

+ +

Newsletters

+

^ back to top ^

+ +

Bloggers

+

^ back to top ^

+ +

Presentations

+

^ back to top ^

+ +

Podcasts

+

^ back to top ^

+ +

YouTube Videos & Channels

+

^ back to top ^

+ +

Socialize

+

^ back to top ^

+

Below are some Social Media links. Connect with other data +scientists!

+ +

Facebook Accounts

+

^ back to top ^

+ +

Twitter Accounts

+

^ back to top ^

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TwitterDescription
Big Data +CombineRapid-fire, live tryouts for data scientists seeking to monetize +their models as trading strategies
Big Data ManiaData Viz Wiz, Data Journalist, Growth Hacker, Author of Data Science +for Dummies (2015)
Big Data +ScienceBig Data, Data Science, Predictive Modeling, Business Analytics, +Hadoop, Decision and Operations Research.
Charlie GreenbackerDirector of Data Science at @ExploreAltamira
Chris SaidData scientist at Twitter
Clare CorthellDev, Design, Data Science @mattermark #hackerei
DADI +Charles-Abner#datascientist @Ekimetrics. , #machinelearning #dataviz +#DynamicCharts #Hadoop #R #Python #NLP #Bitcoin #dataenthousiast
Data Science +CentralData Science Central is the industry’s single resource for Big Data +practitioners.
Data Science LondonData Science. Big Data. Data Hacks. Data Junkies. Data Startups. +Open Data
Data Science +ReneeDocumenting my path from SQL Data Analyst pursuing an Engineering +Master’s Degree to Data Scientist
Data Science +ReportMission is to help guide & advance careers in Data Science & +Analytics
Data Science +TipsTips and Tricks for Data Scientists around the world! #datascience +#bigdata
Data VizzardDataViz, Security, Military
DataScienceX
deeplearning4j
DJ PatilWhite House Data Chief, VP @ RelateIQ.
Domino Data Lab
Drew ConwayData nerd, hacker, student of conflict.
Emilio Ferrara#Networks, #MachineLearning and #DataScience. I work on #Social +Media. Postdoc at @IndianaUniv
Erin BartoloRunning with #BigData–enjoying a love/hate relationship with its +hype. @iSchoolSU +#DataScience Program Mgr.
Greg RedaWorking @ GrubHub about data and pandas
Gregory PiatetskyKDnuggets President, Analytics/Big Data/Data Mining/Data Science +expert, KDD & SIGKDD co-founder, was Chief Scientist at 2 startups, +part-time philosopher.
Hadley WickhamChief Scientist at RStudio, and an Adjunct Professor of Statistics +at the University of Auckland, Stanford University, and Rice +University.
Hakan KardasData Scientist
Hilary MasonData Scientist in Residence at @accel.
Jeff HammerbacherReTweeting about data science
John Myles +WhiteScientist at Facebook and Julia developer. Author of Machine +Learning for Hackers and Bandit Algorithms for Website Optimization. +Tweets reflect my views only.
Juan Miguel +LavistaPrincipal Data Scientist @ Microsoft Data Science Team
Julia EvansHacker - Pandas - Data Analyze
Kenneth CukierThe Economist’s Data Editor and co-author of Big Data +(http://www.big-data-book.com/).
Kevin DavenportOrganizer of +https://www.meetup.com/San-Diego-Data-Science-R-Users-Group/
Kevin MarkhamData science instructor, and founder of Data School
Kim ReesInteractive data visualization and tools. Data flaneur.
Kirk BorneDataScientist, PhD Astrophysicist, Top #BigData Influencer.
Linda RegberData storyteller, visualizations.
Luis ReiPhD Student. Programming, Mobile, Web. Artificial Intelligence, +Intelligent Robotics Machine Learning, Data Mining, Natural Language +Processing, Data Science.
Mark StevensonData Analytics Recruitment Specialist at Salt (@SaltJobs) Analytics - +Insight - Big Data - Data science
Matt HarrisonOpinions of full-stack Python guy, author, instructor, currently +playing Data Scientist. Occasional fathering, husbanding, organic +gardening.
Matthew RussellMining the Social Web.
Mert NuhoğluData Scientist at BizQualify, Developer
Monica RogatiData @ Jawbone. Turned data into stories & products at LinkedIn. +Text mining, applied machine learning, recommender systems. Ex-gamer, +ex-machine coder; namer.
Noah IliinskyVisualization & interaction designer. Practical cyclist. Author +of vis books: https://www.oreilly.com/pub/au/4419
Paul MillerCloud Computing/ Big Data/ Open Data Analyst & Consultant. +Writer, Speaker & Moderator. Gigaom Research Analyst.
Peter SkomorochCreating intelligent systems to automate tasks & improve +decisions. Entrepreneur, ex-Principal Data Scientist @LinkedIn. Machine +Learning, ProductRei, Networks
Prash ChanSolution Architect @ IBM, Master Data Management, Data Quality & +Data Governance Blogger. Data Science, Hadoop, Big Data & +Cloud.
Quora Data +ScienceQuora’s data science topic
R-BloggersTweet blog posts from the R blogosphere, data science conferences, +and (!) open jobs for data scientists.
Rand Hindi
Randy OlsonComputer scientist researching artificial intelligence. Data +tinkerer. Community leader for @DataIsBeautiful. #OpenScience +advocate.
Recep ErolData Science geek @ UALR
Ryan OrbanData scientist, genetic origamist, hardware aficionado
Sean J. TaylorSocial Scientist. Hacker. Facebook Data Science Team. Keywords: +Experiments, Causal Inference, Statistics, Machine Learning, +Economics.
Silvia K. Spiva#DataScience at Cisco
Harsh B. GuptaData Scientist at BBVA Compass
Spencer NelsonData nerd
Talha OzEnjoys ABM, SNA, DM, ML, NLP, HI, Python, Java. Top percentile +Kaggler/data scientist
Tasos SkarlatidisComplex Event Processing, Big Data, Artificial Intelligence and +Machine Learning. Passionate about programming and open-source.
Terry TimkoInfoGov; Bigdata; Data as a Service; Data Science; Open, Social +& Business Data Convergence
Tony BaerIT analyst with Ovum covering Big Data & data management with +some systems engineering thrown in.
Tony OjedaData Scientist , Author , Entrepreneur. Co-founder @DataCommunityDC. +Founder @DistrictDataLab. #DataScience +#BigData #DataDC
Vamshi AmbatiData Science @ PayPal. #NLP, #machinelearning; PhD, Carnegie Mellon +alumni (Blog: https://allthingsds.wordpress.com )
Wes McKinneyPandas (Python Data Analysis library).
WileyEdSenior Manager - @Seagate Big Data Analytics @McKinsey Alum #BigData + +#Analytics Evangelist #Hadoop, #Cloud, #Digital, & #R +Enthusiast
WNYC Data News TeamThe data news crew at @WNYC. Practicing data-driven journalism, +making it visual, and showing our work.
Alexey GrigorevData science author
İlker ArslanData science author. Shares mostly about Julia programming
INEVITABLEAI & Data Science Start-up Company based in England, UK
+

Telegram Channels

+

^ back to top ^

+ +

Slack Communities

+

top

+ +

GitHub Groups

+ +

Data Science Competitions

+

Some data mining competition platforms

+ +

Fun

+ +

Infographics

+

^ back to top ^

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PreviewDescription
Key +differences of a data scientist vs. data engineer
A visual guide to Becoming a Data Scientist in 8 Steps by DataCamp (img)
Mindmap on required skills (img)
Swami Chandrasekaran made a Curriculum +via Metro map.
by @kzawadz via twitter
By Data Science +Central
Data Science Wars: R vs Python
How to select statistical or machine learning techniques
Choosing the Right Estimator
The Data Science Industry: Who Does What
Data Science Venn Euler Diagram
Different Data Science Skills and Roles from this +article by Springboard
Data Fallacies To AvoidA simple and friendly way of teaching your non-data +scientist/non-statistician colleagues how to avoid +mistakes with data. From Geckoboard’s Data Literacy +Lessons.
+

Datasets

+

^ back to top ^

+ +

Comics

+

^ back to top ^

+ +

Other Awesome Lists

+ +

Hobby

+ + + + diff --git a/html/datascience.md2.html b/html/datascience.md2.html index ba6d85c..c1c9d2a 100644 --- a/html/datascience.md2.html +++ b/html/datascience.md2.html @@ -1,2886 +1,7 @@ -
- -
-

AWESOME DATA SCIENCE

-

-

An open-source Data Science repository to learn and apply -towards solving real world problems.

-

This is a shortcut path to start studying Data -Science. Just follow the steps to answer the questions, “What -is Data Science and what should I study to learn Data Science?”

-

Sponsors

- - - - - - - - - - - - - -
SponsorPitch
Be the first to sponsor! github@academic.io
-


-

Table of Contents

- -

What is Data Science?

-

^ back to top ^

-

Data Science is one of the hottest topics on the Computer and -Internet farmland nowadays. People have gathered data from applications -and systems until today and now is the time to analyze them. The next -steps are producing suggestions from the data and creating predictions -about the future. Here -you can find the biggest question for Data Science and -hundreds of answers from experts.

- ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
LinkPreview
What is -Data Science @ O’reillyData scientists combine entrepreneurship with patience, the -willingness to build data products incrementally, the ability to -explore, and the ability to iterate over a solution. They are inherently -interdisciplinary. They can tackle all aspects of a problem, from -initial data collection and data conditioning to drawing conclusions. -They can think outside the box to come up with new ways to view the -problem, or to work with very broadly defined problems: “here’s a lot of -data, what can you make from it?”
What is -Data Science @ QuoraData Science is a combination of a number of aspects of Data such as -Technology, Algorithm development, and data interference to study the -data, analyse it, and find innovative solutions to difficult problems. -Basically Data Science is all about Analysing data and driving for -business growth by finding creative ways.
The -sexiest job of 21st centuryData scientists today are akin to Wall Street “quants” of the -1980s and 1990s. In those days people with backgrounds in physics and -math streamed to investment banks and hedge funds, where they could -devise entirely new algorithms and data strategies. Then a variety of -universities developed master’s programs in financial engineering, which -churned out a second generation of talent that was more accessible to -mainstream firms. The pattern was repeated later in the 1990s with -search engineers, whose rarefied skills soon came to be taught in -computer science programs.
WikipediaData science is an interdisciplinary field that uses scientific -methods, processes, algorithms and systems to extract knowledge and -insights from many structural and unstructured data. Data science is -related to data mining, machine learning and big data.
How -to Become a Data ScientistData scientists are big data wranglers, gathering and analyzing -large sets of structured and unstructured data. A data scientist’s role -combines computer science, statistics, and mathematics. They analyze, -process, and model data then interpret the results to create actionable -plans for companies and other organizations.
a -very short history of #datascienceThe story of how data scientists became sexy is mostly the story -of the coupling of the mature discipline of statistics with a very young -one–computer science. The term “Data Science” has emerged only recently -to specifically designate a new profession that is expected to make -sense of the vast stores of big data. But making sense of data has a -long history and has been discussed by scientists, statisticians, -librarians, computer scientists and others for years. The following -timeline traces the evolution of the term “Data Science” and its use, -attempts to define it, and related terms.
Software -Development Resources for Data ScientistsData scientists concentrate on making sense of data through -exploratory analysis, statistics, and models. Software developers apply -a separate set of knowledge with different tools. Although their focus -may seem unrelated, data science teams can benefit from adopting -software development best practices. Version control, automated testing, -and other dev skills help create reproducible, production-ready code and -tools.
-

Where do I Start?

-

^ back to top ^

-

While not strictly necessary, having a programming language is a -crucial skill to be effective as a data scientist. Currently, the most -popular language is Python, closely followed by R. -Python is a general-purpose scripting language that sees applications in -a wide variety of fields. R is a domain-specific language for -statistics, which contains a lot of common statistics tools out of the -box.

-

Python is by far the most popular -language in science, due in no small part to the ease at which it can be -used and the vibrant ecosystem of user-generated packages. To install -packages, there are two main methods: Pip (invoked as -pip install), the package manager that comes bundled with -Python, and Anaconda (invoked as -conda install), a powerful package manager that can install -packages for Python, R, and can download executables like Git.

-

Unlike R, Python was not built from the ground up with data science -in mind, but there are plenty of third party libraries to make up for -this. A much more exhaustive list of packages can be found later in this -document, but these four packages are a good set of choices to start -your data science journey with: Scikit-Learn is a -general-purpose data science package which implements the most popular -algorithms - it also includes rich documentation, tutorials, and -examples of the models it implements. Even if you prefer to write your -own implementations, Scikit-Learn is a valuable reference to the -nuts-and-bolts behind many of the common algorithms you’ll find. With Pandas, one can collect and -analyze their data into a convenient table format. Numpy provides very fast tooling for -mathematical operations, with a focus on vectors and matrices. Seaborn, itself based on the Matplotlib package, is a quick way to -generate beautiful visualizations of your data, with many good defaults -available out of the box, as well as a gallery showing how to produce -many common visualizations of your data.

-

When embarking on your journey to becoming a data scientist, the -choice of language isn’t particularly important, and both Python and R -have their pros and cons. Pick a language you like, and check out one of -the Free courses we’ve listed below!

-

Real World

-

^ back to top ^

-

Data science is a powerful tool that is utilized in various fields to -solve real-world problems by extracting insights and patterns from -complex data.

-

Disaster

-

^ back to top ^

- -

Training Resources

-

^ back to top ^

-

How do you learn data science? By doing data science, of course! -Okay, okay - that might not be particularly helpful when you’re first -starting out. In this section, we’ve listed some learning resources, in -rough order from least to greatest commitment - Tutorials, Massively Open Online -Courses (MOOCs), Intensive -Programs, and Colleges.

-

Tutorials

-

^ back to top ^

- -

Free Courses

-

^ back to top ^

- -

MOOC’s

-

^ back to top ^

- -

Intensive Programs

-

^ back to top ^

- -

Colleges

-

^ back to top ^

- -

The Data Science Toolbox

-

^ back to top ^

-

This section is a collection of packages, tools, algorithms, and -other useful items in the data science world.

-

Algorithms

-

^ back to top ^

-

These are some Machine Learning and Data Mining algorithms and models -help you to understand your data and derive meaning from it.

-

Three kinds of Machine -Learning Systems

- -

Supervised Learning

- -

Unsupervised Learning

- -

Semi-Supervised Learning

- -

Reinforcement Learning

- -

Data Mining Algorithms

- -

Deep Learning architectures

- -

General Machine Learning -Packages

-

^ back to top ^

- -

Deep Learning Packages

-

PyTorch Ecosystem

- -

TensorFlow Ecosystem

- -

Keras Ecosystem

- -

Visualization Tools

-

^ back to top ^

- -

Miscellaneous Tools

-

^ back to top ^

- ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
LinkDescription
The Data Science Lifecycle -ProcessThe Data Science Lifecycle Process is a process for taking data -science teams from Idea to Value repeatedly and sustainably. The process -is documented in this repo
Data Science -Lifecycle Template RepoTemplate repository for data science lifecycle project
RexMexA general purpose recommender metrics library for fair -evaluation.
ChemicalXA PyTorch based deep learning library for drug pair scoring.
PyTorch -Geometric TemporalRepresentation learning on dynamic graphs.
Little -Ball of FurA graph sampling library for NetworkX with a Scikit-Learn like -API.
Karate -ClubAn unsupervised machine learning extension library for NetworkX with -a Scikit-Learn like API.
ML -WorkspaceAll-in-one web-based IDE for machine learning and data science. The -workspace is deployed as a Docker container and is preloaded with a -variety of popular data science libraries (e.g., Tensorflow, PyTorch) -and dev tools (e.g., Jupyter, VS Code)
Neptune.aiCommunity-friendly platform supporting data scientists in creating -and sharing machine learning models. Neptune facilitates teamwork, -infrastructure management, models comparison and reproducibility.
steppyLightweight, Python library for fast and reproducible machine -learning experimentation. Introduces very simple interface that enables -clean machine learning pipeline design.
steppy-toolkitCurated collection of the neural networks, transformers and models -that make your machine learning work faster and more effective.
Datalab from -Googleeasily explore, visualize, analyze, and transform data using -familiar languages, such as Python and SQL, interactively.
Hortonworks -Sandboxis a personal, portable Hadoop environment that comes with a dozen -interactive Hadoop tutorials.
Ris a free software environment for statistical computing and -graphics.
Tidyverseis an opinionated collection of R packages designed for data -science. All packages share an underlying design philosophy, grammar, -and data structures.
RStudioIDE – powerful user interface for R. It’s free and open source, and -works on Windows, Mac, and Linux.
Python - Pandas - -AnacondaCompletely free enterprise-ready Python distribution for large-scale -data processing, predictive analytics, and scientific computing
Pandas GUIPandas GUI
Scikit-LearnMachine Learning in Python
NumPyNumPy is fundamental for scientific computing with Python. It -supports large, multi-dimensional arrays and matrices and includes an -assortment of high-level mathematical functions to operate on these -arrays.
VaexVaex is a Python library that allows you to visualize large datasets -and calculate statistics at high speeds.
SciPySciPy works with NumPy arrays and provides efficient routines for -numerical integration and optimization.
Data -Science ToolboxCoursera Course
Data Science -ToolboxBlog
Wolfram -Data Science PlatformTake numerical, textual, image, GIS or other data and give it the -Wolfram treatment, carrying out a full spectrum of data science analysis -and visualization and automatically generate rich interactive -reports—all powered by the revolutionary knowledge-based Wolfram -Language.
DatadogSolutions, code, and devops for high-scale data science.
VarianceBuild powerful data visualizations for the web without writing -JavaScript
Kite -Development KitThe Kite Software Development Kit (Apache License, Version 2.0), or -Kite for short, is a set of libraries, tools, examples, and -documentation focused on making it easier to build systems on top of the -Hadoop ecosystem.
Domino Data LabsRun, scale, share, and deploy your models — without any -infrastructure or setup.
Apache FlinkA platform for efficient, distributed, general-purpose data -processing.
Apache HamaApache Hama is an Apache Top-Level open source project, allowing you -to do advanced analytics beyond MapReduce.
WekaWeka is a collection of machine learning algorithms for data mining -tasks.
OctaveGNU Octave is a high-level interpreted language, primarily intended -for numerical computations.(Free Matlab)
Apache SparkLightning-fast cluster computing
Hydrosphere -Mista service for exposing Apache Spark analytics jobs and machine -learning models as realtime, batch or reactive web services.
Data MechanicsA data science and engineering platform making Apache Spark more -developer-friendly and cost-effective.
CaffeDeep Learning Framework
TorchA SCIENTIFIC COMPUTING FRAMEWORK FOR LUAJIT
Nervana’s python -based Deep Learning FrameworkIntel® Nervana™ reference deep learning framework committed to best -performance on all hardware.
SkaleHigh performance distributed data processing in NodeJS
AerosolveA machine learning package built for humans.
Intel frameworkIntel® Deep Learning Framework
DatawrapperAn open source data visualization platform helping everyone to -create simple, correct and embeddable charts. Also at github.com
Tensor FlowTensorFlow is an Open Source Software Library for Machine -Intelligence
Natural Language ToolkitAn introductory yet powerful toolkit for natural language processing -and classification
Annotation -LabFree End-to-End No-Code platform for text annotation and DL model -training/tuning. Out-of-the-box support for Named Entity Recognition, -Classification, Relation extraction and Assertion Status Spark NLP -models. Unlimited support for users, teams, projects, documents.
nlp-toolkit for -node.jsThis module covers some basic nlp principles and implementations. -The main focus is performance. When we deal with sample or training data -in nlp, we quickly run out of memory. Therefore every implementation in -this module is written as stream to only hold that data in memory that -is currently processed at any step.
Juliahigh-level, high-performance dynamic programming language for -technical computing
IJuliaa Julia-language backend combined with the Jupyter interactive -environment
Apache ZeppelinWeb-based notebook that enables data-driven, interactive data -analytics and collaborative documents with SQL, Scala and more
FeaturetoolsAn open source framework for automated feature engineering written -in python
OptimusCleansing, pre-processing, feature engineering, exploratory data -analysis and easy ML with PySpark backend.
AlbumentationsА fast and framework agnostic image augmentation library that -implements a diverse set of augmentation techniques. Supports -classification, segmentation, and detection out of the box. Was used to -win a number of Deep Learning competitions at Kaggle, Topcoder and those -that were a part of the CVPR workshops.
DVCAn open-source data science version control system. It helps track, -organize and make data science projects reproducible. In its very basic -scenario it helps version control and share large data and model -files.
Lambdois a workflow engine that significantly simplifies data analysis by -combining in one analysis pipeline (i) feature engineering and machine -learning (ii) model training and prediction (iii) table population and -column evaluation.
FeastA feature store for the management, discovery, and access of machine -learning features. Feast provides a consistent view of feature data for -both model training and model serving.
PolyaxonA platform for reproducible and scalable machine learning and deep -learning.
LightTagText Annotation Tool for teams
UBIAIEasy-to-use text annotation tool for teams with most comprehensive -auto-annotation features. Supports NER, relations and document -classification as well as OCR annotation for invoice labeling
TrainsAuto-Magical Experiment Manager, Version Control & DevOps for -AI
HopsworksOpen-source data-intensive machine learning platform with a feature -store. Ingest and manage features for both online (MySQL Cluster) and -offline (Apache Hive) access, train and serve models at scale.
MindsDBMindsDB is an Explainable AutoML framework for developers. With -MindsDB you can build, train and use state of the art ML models in as -simple as one line of code.
LightwoodA Pytorch based framework that breaks down machine learning problems -into smaller blocks that can be glued together seamlessly with an -objective to build predictive models with one line of code.
AWS Data -WranglerAn open-source Python package that extends the power of Pandas -library to AWS connecting DataFrames and AWS data related services -(Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, etc).
Amazon -RekognitionAWS Rekognition is a service that lets developers working with -Amazon Web Services add image analysis to their applications. Catalog -assets, automate workflows, and extract meaning from your media and -applications.
Amazon TextractAutomatically extract printed text, handwriting, and data from any -document.
Amazon Lookout -for VisionSpot product defects using computer vision to automate quality -inspection. Identify missing product components, vehicle and structure -damage, and irregularities for comprehensive quality control.
Amazon CodeGuruAutomate code reviews and optimize application performance with -ML-powered recommendations.
CMLAn open source toolkit for using continuous integration in data -science projects. Automatically train and test models in production-like -environments with GitHub Actions & GitLab CI, and autogenerate -visual reports on pull/merge requests.
DaskAn open source Python library to painlessly transition your -analytics code to distributed computing systems (Big Data)
StatsmodelsA Python-based inferential statistics, hypothesis testing and -regression framework
GensimAn open-source library for topic modeling of natural language -text
spaCyA performant natural language processing toolkit
Grid -StudioGrid studio is a web-based spreadsheet application with full -integration of the Python programming language.
Python Data -Science HandbookPython Data Science Handbook: full text in Jupyter Notebooks
ShapleyA data-driven framework to quantify the value of classifiers in a -machine learning ensemble.
DAGsHubA platform built on open source tools for data, model and pipeline -management.
DeepnoteA new kind of data science notebook. Jupyter-compatible, with -real-time collaboration and running in the cloud.
ValohaiAn MLOps platform that handles machine orchestration, automatic -reproducibility and deployment.
PyMC3A Python Library for Probabalistic Programming (Bayesian Inference -and Machine Learning)
PyStanPython interface to Stan (Bayesian inference and modeling)
hmmlearnUnsupervised learning and inference of Hidden Markov Models
Chaos -GeniusML powered analytics engine for outlier/anomaly detection and root -cause analysis
NimbleboxA full-stack MLOps platform designed to help data scientists and -machine learning practitioners around the world discover, create, and -launch multi-cloud apps from their web browser.
TowheeA Python library that helps you encode your unstructured data into -embeddings.
LineaPyEver been frustrated with cleaning up long, messy Jupyter notebooks? -With LineaPy, an open source Python library, it takes as little as two -lines of code to transform messy development code into production -pipelines.
envd🏕️ machine learning development environment for data science and -AI/ML engineering teams
Explore -Data Science LibrariesA search engine 🔎 tool to discover & find a curated list of -popular & new libraries, top authors, trending project kits, -discussions, tutorials & learning resources
MLEM🐶 Version and deploy your ML models following GitOps -principles
MLflowMLOps framework for managing ML models across their full -lifecycle
cleanlabPython library for data-centric AI and automatically detecting -various issues in ML datasets
AutoGluonAutoML to easily produce accurate predictions for image, text, -tabular, time-series, and multi-modal data
Arize AIArize AI community tier observability tool for monitoring machine -learning models in production and root-causing issues such as data -quality and performance drift.
Aureo.ioAureo.io is a low-code platform that focuses on building artificial -intelligence. It provides users with the capability to create pipelines, -automations and integrate them with artificial intelligence models – all -with their basic data.
ERD LabFree cloud based entity relationship diagram (ERD) tool made for -developers.
Arize-PhoenixMLOps in a notebook - uncover insights, surface problems, monitor, -and fine tune your models.
CometAn MLOps platform with experiment tracking, model production -management, a model registry, and full data lineage to support your ML -workflow from training straight through to production.
CometLLMLog, track, visualize, and search your LLM prompts and chains in one -easy-to-use, 100% open-source tool.
SynthicalAI-powered collaborative environment for research. Find relevant -papers, create collections to manage bibliography, and summarize content -— all in one place
teeplotWorkflow tool to automatically organize data visualization -output
-

Literature and Media

-

^ back to top ^

-

This section includes some additional reading material, channels to -watch, and talks to listen to.

-

Books

-

^ back to top ^

- -

Book Deals (Affiliated) 🛍

- -

Journals, Publications and -Magazines

-

^ back to top ^

- -

Newsletters

-

^ back to top ^

- -

Bloggers

-

^ back to top ^

- -

Presentations

-

^ back to top ^

- -

Podcasts

-

^ back to top ^

- -

YouTube Videos & Channels

-

^ back to top ^

- -

Socialize

-

^ back to top ^

-

Below are some Social Media links. Connect with other data -scientists!

- -

Facebook Accounts

-

^ back to top ^

- -

Twitter Accounts

-

^ back to top ^

- ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TwitterDescription
Big Data -CombineRapid-fire, live tryouts for data scientists seeking to monetize -their models as trading strategies
Big Data ManiaData Viz Wiz, Data Journalist, Growth Hacker, Author of Data Science -for Dummies (2015)
Big Data -ScienceBig Data, Data Science, Predictive Modeling, Business Analytics, -Hadoop, Decision and Operations Research.
Charlie GreenbackerDirector of Data Science at @ExploreAltamira
Chris SaidData scientist at Twitter
Clare CorthellDev, Design, Data Science @mattermark #hackerei
DADI -Charles-Abner#datascientist @Ekimetrics. , #machinelearning #dataviz -#DynamicCharts #Hadoop #R #Python #NLP #Bitcoin #dataenthousiast
Data Science -CentralData Science Central is the industry’s single resource for Big Data -practitioners.
Data Science LondonData Science. Big Data. Data Hacks. Data Junkies. Data Startups. -Open Data
Data Science -ReneeDocumenting my path from SQL Data Analyst pursuing an Engineering -Master’s Degree to Data Scientist
Data Science -ReportMission is to help guide & advance careers in Data Science & -Analytics
Data Science -TipsTips and Tricks for Data Scientists around the world! #datascience -#bigdata
Data VizzardDataViz, Security, Military
DataScienceX
deeplearning4j
DJ PatilWhite House Data Chief, VP @ RelateIQ.
Domino Data Lab
Drew ConwayData nerd, hacker, student of conflict.
Emilio Ferrara#Networks, #MachineLearning and #DataScience. I work on #Social -Media. Postdoc at @IndianaUniv
Erin BartoloRunning with #BigData–enjoying a love/hate relationship with its -hype. @iSchoolSU -#DataScience Program Mgr.
Greg RedaWorking @ GrubHub about data and pandas
Gregory PiatetskyKDnuggets President, Analytics/Big Data/Data Mining/Data Science -expert, KDD & SIGKDD co-founder, was Chief Scientist at 2 startups, -part-time philosopher.
Hadley WickhamChief Scientist at RStudio, and an Adjunct Professor of Statistics -at the University of Auckland, Stanford University, and Rice -University.
Hakan KardasData Scientist
Hilary MasonData Scientist in Residence at @accel.
Jeff HammerbacherReTweeting about data science
John Myles -WhiteScientist at Facebook and Julia developer. Author of Machine -Learning for Hackers and Bandit Algorithms for Website Optimization. -Tweets reflect my views only.
Juan Miguel -LavistaPrincipal Data Scientist @ Microsoft Data Science Team
Julia EvansHacker - Pandas - Data Analyze
Kenneth CukierThe Economist’s Data Editor and co-author of Big Data -(http://www.big-data-book.com/).
Kevin DavenportOrganizer of -https://www.meetup.com/San-Diego-Data-Science-R-Users-Group/
Kevin MarkhamData science instructor, and founder of Data School
Kim ReesInteractive data visualization and tools. Data flaneur.
Kirk BorneDataScientist, PhD Astrophysicist, Top #BigData Influencer.
Linda RegberData storyteller, visualizations.
Luis ReiPhD Student. Programming, Mobile, Web. Artificial Intelligence, -Intelligent Robotics Machine Learning, Data Mining, Natural Language -Processing, Data Science.
Mark StevensonData Analytics Recruitment Specialist at Salt (@SaltJobs) Analytics - -Insight - Big Data - Data science
Matt HarrisonOpinions of full-stack Python guy, author, instructor, currently -playing Data Scientist. Occasional fathering, husbanding, organic -gardening.
Matthew RussellMining the Social Web.
Mert NuhoğluData Scientist at BizQualify, Developer
Monica RogatiData @ Jawbone. Turned data into stories & products at LinkedIn. -Text mining, applied machine learning, recommender systems. Ex-gamer, -ex-machine coder; namer.
Noah IliinskyVisualization & interaction designer. Practical cyclist. Author -of vis books: https://www.oreilly.com/pub/au/4419
Paul MillerCloud Computing/ Big Data/ Open Data Analyst & Consultant. -Writer, Speaker & Moderator. Gigaom Research Analyst.
Peter SkomorochCreating intelligent systems to automate tasks & improve -decisions. Entrepreneur, ex-Principal Data Scientist @LinkedIn. Machine -Learning, ProductRei, Networks
Prash ChanSolution Architect @ IBM, Master Data Management, Data Quality & -Data Governance Blogger. Data Science, Hadoop, Big Data & -Cloud.
Quora Data -ScienceQuora’s data science topic
R-BloggersTweet blog posts from the R blogosphere, data science conferences, -and (!) open jobs for data scientists.
Rand Hindi
Randy OlsonComputer scientist researching artificial intelligence. Data -tinkerer. Community leader for @DataIsBeautiful. #OpenScience -advocate.
Recep ErolData Science geek @ UALR
Ryan OrbanData scientist, genetic origamist, hardware aficionado
Sean J. TaylorSocial Scientist. Hacker. Facebook Data Science Team. Keywords: -Experiments, Causal Inference, Statistics, Machine Learning, -Economics.
Silvia K. Spiva#DataScience at Cisco
Harsh B. GuptaData Scientist at BBVA Compass
Spencer NelsonData nerd
Talha OzEnjoys ABM, SNA, DM, ML, NLP, HI, Python, Java. Top percentile -Kaggler/data scientist
Tasos SkarlatidisComplex Event Processing, Big Data, Artificial Intelligence and -Machine Learning. Passionate about programming and open-source.
Terry TimkoInfoGov; Bigdata; Data as a Service; Data Science; Open, Social -& Business Data Convergence
Tony BaerIT analyst with Ovum covering Big Data & data management with -some systems engineering thrown in.
Tony OjedaData Scientist , Author , Entrepreneur. Co-founder @DataCommunityDC. -Founder @DistrictDataLab. #DataScience -#BigData #DataDC
Vamshi AmbatiData Science @ PayPal. #NLP, #machinelearning; PhD, Carnegie Mellon -alumni (Blog: https://allthingsds.wordpress.com )
Wes McKinneyPandas (Python Data Analysis library).
WileyEdSenior Manager - @Seagate Big Data Analytics @McKinsey Alum #BigData + -#Analytics Evangelist #Hadoop, #Cloud, #Digital, & #R -Enthusiast
WNYC Data News TeamThe data news crew at @WNYC. Practicing data-driven journalism, -making it visual, and showing our work.
Alexey GrigorevData science author
İlker ArslanData science author. Shares mostly about Julia programming
INEVITABLEAI & Data Science Start-up Company based in England, UK
-

Telegram Channels

-

^ back to top ^

- -

Slack Communities

-

top

- -

GitHub Groups

- -

Data Science Competitions

-

Some data mining competition platforms

- -

Fun

- -

Infographics

-

^ back to top ^

- ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
PreviewDescription
Key -differences of a data scientist vs. data engineer
A visual guide to Becoming a Data Scientist in 8 Steps by DataCamp (img)
Mindmap on required skills (img)
Swami Chandrasekaran made a Curriculum -via Metro map.
by @kzawadz via twitter
By Data Science -Central
Data Science Wars: R vs Python
How to select statistical or machine learning techniques
Choosing the Right Estimator
The Data Science Industry: Who Does What
Data Science Venn Euler Diagram
Different Data Science Skills and Roles from this -article by Springboard
Data Fallacies To AvoidA simple and friendly way of teaching your non-data -scientist/non-statistician colleagues how to avoid -mistakes with data. From Geckoboard’s Data Literacy -Lessons.
-

Datasets

-

^ back to top ^

- -

Comics

-

^ back to top ^

- -

Other Awesome Lists

- -

Hobby

- - - - +

awesome-data-science

+

A curated list of amazingly awesome open source data science +resources.

+

Data Visualization

+

A JavaScript visualization library for HTML and SVG - +http://d3js.org

+

Real-time visualization library - https://github.com/fastly/epoch

diff --git a/html/index.html b/html/index.html index ab3f8d7..486d76d 100644 --- a/html/index.html +++ b/html/index.html @@ -1,6 +1,6 @@
-