update lists

This commit is contained in:
2025-07-18 22:22:32 +02:00
parent 55bed3b4a1
commit 5916c5c074
3078 changed files with 331679 additions and 357255 deletions

View File

@@ -1,4 +1,4 @@
 Awesome Empirical Software Engineering !Awesome (https://awesome.re/badge.svg) (https://awesome.re)
 Awesome Empirical Software Engineering !Awesome (https://awesome.re/badge.svg) (https://awesome.re)
A curated repository of data sets and tools that can be used for conducting evidence-based, data-driven research on software systems.
This research approach is often termed experimental, or empirical software engineering (https://en.wikipedia.org/wiki/Experimental_software_engineering).
Many of the data sets can also be useful in research using search-based software engineering (https://en.wikipedia.org/wiki/Search-based_software_engineering) methods.
@@ -21,9 +21,11 @@
Repositories
- SIR (http://sir.unl.edu/portal/index.php) - Software-artifact infrastructure repository; Java, C, C++, and C# software together with test suites and fault data.
- PROMISE (http://promise.site.uottawa.ca/SERepository/datasets-page.html) - About 20 datasets related to software engineering research.
- ESEUR (https://github.com/Derek-Jones/ESEUR-code-data) All data used in the openly available book Evidence-based Software Engineering (http://www.knosof.co.uk/ESEUR/index.html)
- Directory of MSR Datasets (https://authecesofteng.github.io/directory-msr-datasets/)
- FLOSSmole (https://flossmole.org/collection_details) - Collaborative collection and analysis of free/libre/open source project data.
- PROMISE (http://promise.site.uottawa.ca/SERepository/datasets-page.html) - About 20 datasets related to software engineering research.
- SIR (http://sir.unl.edu/portal/index.php) - Software-artifact infrastructure repository; Java, C, C++, and C# software together with test suites and fault data.
- Zenodo (http://zenodo.org/) - Software data collections in CERN's open-access repository.
 - Software Engineering Artifacts Can Really Assist Future Tasks (http://zenodo.org/communities/seacraft)
 - Empirical Software Engineering (https://zenodo.org/communities/empirical-software-engineering/)
@@ -35,10 +37,10 @@
- AndroZoo (https://androzoo.uni.lu/) - Collection of Android Applications.
- Bug Prediction Dataset (http://bug.inf.usi.ch/index.php) - Collection of models and metrics from Eclipse JDT Core, PDE UI, Equinox Framework, Lucene, Mylyn, and their histories.
- Code Reviews (http://kin-y.github.io/miningReviewRepo/) - Code reviews of OpenStack, LibreOffice, AOSP, Qt, Eclipse.
- CoREBench (http://www.comp.nus.edu.sg/%7Erelease/corebench/) - Collection of 70 realistically Complex Regression Errors that were systematically extracted from the repositories and bug reports of four open-source software projects: 
Make, Grep, Findutils, and Coreutils.
- Cryptocurrency GitHub Activity and Market Cap Dataset (https://rvantonder.github.io/CryptOSS/) - Activity such as commits, stars, prices, and market cap of over 200 cryptocurrency projects on GitHub over time. Raw, historic data is 
also available (https://zenodo.org/record/2595588#.XRuzuBNKhSM).
- CoREBench (http://www.comp.nus.edu.sg/%7Erelease/corebench/) - Collection of 70 realistically Complex Regression Errors that were systematically extracted from the repositories and bug reports of four open-source software projects: Make, Grep, 
Findutils, and Coreutils.
- Cryptocurrency GitHub Activity and Market Cap Dataset (https://rvantonder.github.io/CryptOSS/) - Activity such as commits, stars, prices, and market cap of over 200 cryptocurrency projects on GitHub over time. Raw, historic data is also 
available (https://zenodo.org/record/2595588#.XRuzuBNKhSM).
- Defects4J (https://github.com/rjust/defects4j) - Collection of 395 reproducible bugs collected with the goal of advancing software testing research.
- Eclipse AERI stacktraces (http://download.eclipse.org/scava/datasets/aeri_stacktraces/aeri_stacktraces.html) - Collection of stacktraces of Exceptions encountered by users of the Eclipse IDE, as retrieved by the AERI reporting system.
- Enron Spreadsheets and Emails (https://figshare.com/articles/Enron_Spreadsheets_and_Emails/1221767) - All the spreadsheets and emails used in the paper 'Enron's Spreadsheets and Related Emails: A Dataset and Analysis'.
@@ -52,7 +54,7 @@
- Maven metrics (https://github.com/bkarak/data_msr2015) - Collection of software complexity & sizing metrics for the Maven Repository (https://maven.apache.org).
- Maven Dependency Graph (https://zenodo.org/record/1489120) - Snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database.
- mzdata (https://github.com/jxshin/mzdata) - Multi-extract and multi-level dataset of Mozilla issue tracking history.
- npm-miner (https://github.com/AuthEceSoftEng/msr-2018-npm-miner) - The dataset contains the analysis results of 5 open source software quality tools eslint, escomplex, nsp, jsinspect and sonarjs for 2000 popular (in terms of stars and
- npm-miner (https://github.com/AuthEceSoftEng/msr-2018-npm-miner) - The dataset contains the analysis results of 5 open source software quality tools eslint, escomplex, nsp, jsinspect and sonarjs for 2000 popular (in terms of stars and 
downloads) npm packages.
- OCL Expressions on GitHub (https://github.com/tue-mdse/ocl-dataset) - Data set of 9188 OCL expressions originating from 504 EMF meta-models in 245 systematically selected GitHub repositories.
- RepoReapers Data Set (https://reporeapers.github.io) - Data set containing a collection of _engineered software projects_ from GHTorrent.
@@ -62,8 +64,8 @@
- Stack Exchange (https://archive.org/details/stackexchange) - Anonymized dump of all user-contributed content on the Stack Exchange network.
- TravisTorrent (http://travistorrent.testroots.org) - Provides free and easy-to-use Traivs CI build analyses.
- Ultimate Debian Database (UDD) (https://wiki.debian.org/UltimateDebianDatabase) - Data about various aspects of Debian (e.g. packages, bugs, mainteners) in the same SQL database.
- Unified Bug Dataset (http://www.inf.u-szeged.hu/~ferenc/papers/UnifiedBugDataSet/) - Static source code based datasets which includes the Bugcatchers Bug Dataset, the Bug Prediction Dataset (http://bug.inf.usi.ch/index.php), the 
Eclipse Bug Dataset (https://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/), the GitHub Bug Dataset (http://www.inf.u-szeged.hu/~ferenc/papers/GitHubBugDataSet/), some datasets from the PROMISE 
- Unified Bug Dataset (http://www.inf.u-szeged.hu/~ferenc/papers/UnifiedBugDataSet/) - Static source code based datasets which includes the Bugcatchers Bug Dataset, the Bug Prediction Dataset (http://bug.inf.usi.ch/index.php), the Eclipse Bug 
Dataset (https://www.st.cs.uni-saarland.de/softevo/bug-data/eclipse/), the GitHub Bug Dataset (http://www.inf.u-szeged.hu/~ferenc/papers/GitHubBugDataSet/), some datasets from the PROMISE 
(http://promise.site.uottawa.ca/SERepository/datasets-page.html) repository.
- Unix history (https://github.com/dspinellis/unix-history-repo) - Git repository with 46 years of Unix history evolution.
@@ -109,3 +111,5 @@
!CC0 (http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg) (https://creativecommons.org/publicdomain/zero/1.0/)
To the extent possible under law, Diomidis Spinellis (http://www.spinellis.gr) has waived all copyright and related or neighboring rights to this work.
msr Github: https://github.com/dspinellis/awesome-msr