Here’s my lab’s research philosophy:
As biomedicine becomes more complex and reliant on heterogeneous, large datasets, the necessity for computational algorithms, statistical approaches, and software tools increases. The task of producing these computational methods, tools, and analytical approaches and applying these to data-sets is as important as doing experimental hypothesis-driven science that generates the data. We believe in transparency and reproducibility of scientific research and will strive to do good, careful, experimentally validatable science. We will also adhere to the relevant institutional and funders policies.
Reproducible Research:
My lab has an extensive reproducible research strategy and we use a verity of data science technologies and practices to implement this. I believe that training the next generation of scientists so that these processes becomes standard practice is important. We use data processing pipelines and workflows to increase reproducibility and consistency. This also makes it easier to track data provenance.
However careful someone is, resultant software, methods or analyses can be prone to errors or mistake. Therefore it is critical to catch any errors and mistakes before they end up in research publications and contaminate the scientific record. Having the right checks and balances to do this is important.
Good Experimental Design:
Good experimental design is especially important in experiments generating high-throughput datasets. Lack of good design is one of the primary causes of problematic science. We strive to work with our experimental collaborators in improving the design of our joint experiments.
Data:
We will deposit the datasets used in our published research into acceptable data repositories, and make them completely open and publicly available when a research project is published. This is becoming standard practice for most biomedical journals.
Code:
The lab has expertise in C++, R, Python, Shell, and JavaScript (and maybe Julia or Rust in the future) programming and scripting languages. We generate computational, mathematical or statistical methods and their software implantation for our use, to solve scientific problems we encounter in the biomedical domain. Sometimes these will be useful to others or essential for the biomedical question under investigation, and in these cases, we make an extra effort to convert these methods into software packages, usable scripts, programs or web applications and share them with the wider community. Resources and repositories such as PyPi, CRAN, and Bioconductor are great for this.
Software:
Good software engineering is important. When working in a multidisciplinary team, there is a lot to learn. Mathematicians, statisticians, biologists and bioinformaticians are usually not great software engineers. Learning good sowftware engineering practices is essential to improve the capabilities of these sorts of people who ultimately end up creating biomedical computational methods and software.
When a software package is published we will release the code and/or software using either open repositories or on our group software webpage. This is great for open collaboration. Useful open software evolves and gains new functionality … sometimes. We also get involved in collaborative software development efforts, most of these are open source from the start.
Publications:
We will publish in peer-reviewed, open access, reputable (as opposed to predatory) journals.