+ - 0:00:00
Notes for current slide
Notes for next slide

Open Code and Software


Dr Heather Turner
RSE Fellow, University of Warwick, UK

24 March 2022

1 / 24

Open Access

New UKRI policy for research articles from 1 April 2022.

Two routes to open access:

  • Gold: final version on publishers website.
  • Green: accepted version in a repository.

Both routes require:

  • CC BY license.
  • Statement about the access of the underlying data.

The open access logo: an unlocked padlock
The UKRI logo
2 / 24

Open Practices

Recognised as integral to healthy research culture

Working openly means that our work is

  • Accessible, for free
  • Open to scrutiny (verifiable)
  • Reproducible
3 / 24

Open Teaching

A screenshot of the homepage for STA 313 - Advanced Data Visualisation, with menu items Syllabus, Teaching Team, Schedule, Useful Links, Project 1
Example of CC BY-SA course materials, vizdata.org

Range of licenses e.g. CC BY, CC BY-SA, CC BY-NC, CC BY NC-ND, No license.

4 / 24

Code and Software

Script

Analysis/simulation script
  • Customised
  • Reproducible workflow
    • read data
    • analyse
    • summarise
    • report
Software

Package, tool, dashboard
  • Optimized
  • Reusable
  • Sharable
5 / 24

command line tool

Why Share Code and Software?

Beyond general benefits of open practices

  • Increased impact and reputation
  • Faster translation into practice
  • Additional, citable, outputs

Benefits of coding in the open

  • Encourages good practices
  • Facilitates collaboration
  • Can give access to software engineering tools
6 / 24

Proprietary vs. Open Source

Proprietary

Logos for Stata, SAS and Matlab
  • Code can not be reviewed
  • No control over when bugs fixed
  • Costly: not accessible to all
Open Source

Logos for Python, R, Julia and gcc
  • Code open to scrutiny
  • Can contribute bug fix/fork code
  • Free to use
7 / 24

Sharing code on your website

Pros

  • Simple
  • Low maintenance

Cons

  • Not persistent
  • Not discoverable
  • Not citable
  • Versioning is painful, update history not transparent
8 / 24

Sharing code as supplementary material

Pros

  • Simple
  • Low maintenance
  • Persistent

Cons

  • Need to publish paper!
  • Not separately discoverable/citable from paper
  • No option to update
9 / 24

Sharing code in an open research archive

Pros

  • Persistent DOI
  • Version support (including DOI for "all versions")
  • Range of licensing options
  • Link to ORCID and UKRI grant
  • Extensible to "research compendium"

Cons

  • Version snapshots ("releases")
10 / 24

concept DOI Zenod: built by researchers for researchers https://www.agu.org/-/media/Files/Publications/Generalist-Data-Repository-Grid.pdf https://the-turing-way.netlify.app/communication/citable.html https://www.nature.com/articles/s41597-022-01143-6 https://www.researchequals.com/faq Other benefits: free, support for peer review during embargo, immediate publication (no peer review)

Sharing code in an online repository

Pros

  • Version control (commit history + releases)
  • README.md for quick documentation
  • Facilitates contribution (bug reports, patches)
  • Can use CITATION.cff file for clear citation [GitHub]
  • Links to Zenodo for publishing releases

Cons

  • Learning curve to take full advantage
11 / 24

Citation File Format (CFF)

Further benefits of GitHub

Research compendia

Potential to use Binder to so that people can run your code in the browser

  • For short analysis (<10 min) on small data (< 10MB)
  • Zero to Binder tutorial [Julia, Python, R]

Software packages

  • Users can install Julia/Python/R packages from GitHub
  • Can deploy package websites using GitHub pages
  • Can benefit from GitHub Actions, e.g. automatically running tests
12 / 24

R-universe

Preparing to Share

  1. Choose a license
    • No license = no permission to use, modify or share!
    • Open source license for scholarly work
    • LGPL or BSD license for script in proprietary language.
    • Industrial partnership: check first
  2. Make it public
    • Do not version control secrets (passwords, private keys...)!

Logo for the Open Source Initiative

13 / 24

Good (Enough) Software Practices

  • Version control
  • Documentation
    • Basic Comments, README
    • Advanced Codebook, function documentation
  • Tests
    • Basic Validation examples with expected output
    • Advanced Unit tests that check expected vs actual
  • Defining the computational environment
    • Basic Document dependencies, versions
    • Advanced Package management systems, containers
14 / 24

writing piece of code that working on for more than a day

Release the Software

  • GitHub
    • Create a versioned release on GitHub
    • Link to Zenodo to record version with DOI
  • Package repository (CRAN, PyPI, Julia General Registry)
    • Quality standards: interoperability
    • Discoverable
    • Easier to install
15 / 24

Code is ready to be used (not a beta version) Basic standards: documented code, running examples, etc Works with current version of R and other packages Commitment of maintainer

Julia General registry: minimal standards, package must be loadable Similar for PyPI?

(but see R-universe)

Prepare for release

  • When (incremental) features are usable
  • Good time to do additional checks
    • Check spelling
    • URL/article citations up-to-date
    • Testing on different platforms
    • Check repository policies, run their checks
    • Start/update NEWS or ChangeLog
16 / 24

Code review

  • External (at time of major release)
    • Software only Bioconductor, rOpenSci
    • With accompanying article R Journal, JSS, JOSS
  • Internal (throughout)
    • Code exchange/peer review within Statistics? c.f. Oxford Code Review Network
    • RSE: embedded RSEs (R), grant funded general RSE support via SCRTP (?)
17 / 24

Promoting your package

  • Some promotion automatic on official repositories
    • E.g. CRANberries, R Views (Top New Packages), custom search engines
  • Social media
    • Website, blog
    • Twitter (#RStats, #Python, #JuliaLang, #OpenSource)
  • Suggest for Task View if relevant
18 / 24

CRANberries: RSS feed/Twitter

Talks

  • Meetups: Warwick RUG, Coventry R-Ladies
  • Conferences

    • General computing: useR!, PyCon, JuliaCon, COMPSTAT
    • Specific: R/Finance, BioC, Psychoco
  • Don't forget to share your slides! (Conference/personal website, LinkedIn, RPubs, Slideshare)

19 / 24

Conferences provide greater exposure, particular to people working in relevant field(s).

  • General domain: JSM, ESA, ...

Paper

  • Traditional journals:
    • Open Source Software: The R Journal, Journal of Statistical Software Computational and Graphical Statistics, SoftwareX
    • Science: Bioinformatics, PLOS ONE, Method in Ecology and Evolution
  • Alternative journals:
    • F1000research Bioconductor/R package gateway: publish, then open review
    • Journal Open Source Software: open code review, short descriptive paper
20 / 24

A paper not only promotes your package but benefits from peer review

  • Paper can also overlap with vignette

Interacting with users

  • Bug reports/help requests
    • Can show where documentation/tests need improving
    • Help you find out who's using your package and what for
    • Can give ideas for new features
    • Can lead to collaborations
  • GitHub issues: better than email!
    • Users can see if bug already reported and what action has been taken
21 / 24

Interacting with developers

  • Add a code of conduct, e.g. Contributor Covenant
  • Add a CONTRIBUTING.md to your GitHub repository
    • Do you have a style guide?
    • Reminders to run check/tests/add NEWS item to pull requests
  • Use tags to highlight issues: the following are promoted by GitHub
    • help wanted
    • good first issue
  • Take advantage of events e.g. Hacktoberfest, Closember
22 / 24

Resources

23 / 24

HetSys PhD students/guests

Summary

  • Working openly encourages good practices
  • Your code/software is an asset!
  • We can make steps to improve our own practice
  • Consider including time for general/specific RSE support on grants
24 / 24

Open Access

New UKRI policy for research articles from 1 April 2022.

Two routes to open access:

  • Gold: final version on publishers website.
  • Green: accepted version in a repository.

Both routes require:

  • CC BY license.
  • Statement about the access of the underlying data.

The open access logo: an unlocked padlock
The UKRI logo
2 / 24
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow