+ - 0:00:00
Notes for current slide
Notes for next slide

Creating packages based on the previous workshop for this user group.

Release process based on the useR!2021 talk

Maintaining packages based on the archived packages files.

Releasing packages



For R User Group Ghana on their 3rd anniversary

LluΓ­s Revilla Sancho
IDIBAPS, CIBEREHD

2022/01/16

(updated: 2022-01-15 )

1 / 34

Creating packages based on the previous workshop for this user group.

Release process based on the useR!2021 talk

Maintaining packages based on the archived packages files.

Creating packages based on the previous workshop for this user group.

Release process based on the useR!2021 talk

Maintaining packages based on the archived packages files.

Creating good packages

Packages provide a mechanism for loading optional code, data and documentation as needed.

3 / 34

Creating good packages

Packages provide a mechanism for loading optional code, data and documentation as needed.


Code

3 / 34

Most common usage

Creating good packages

Packages provide a mechanism for loading optional code, data and documentation as needed.


Code


Data

3 / 34

Most common usage

Less used due to restrictions on size on CRAN

Creating good packages

Packages provide a mechanism for loading optional code, data and documentation as needed.


Code


Data


Documentation

3 / 34

Most common usage

Less used due to restrictions on size on CRAN

In my opinion the most important aspect.

Resources

Many resources!

4 / 34

Resources

Many resources!

You can also find help online (always check the etiquette)

Asking your local community: R User Group Ghana
Asking on Twitter #rstats
Asking on Stack Overflow
Asking on RStudio forum

4 / 34

Resources used to create this workshop

MaΓ«lle post: how to write good packages is also recommended.

Structure

Tree view of the dtplyr package repository on 2021/08/06.

  • A .github folder: Files specific to GitHub (this isn't necessary/Advanced content)
  • A R folder with *.R files: your code.
  • A man folder with *.rd files: your documentation.
  • A tests folder: Check the code of the package.
  • A vignette folder: Long documentation; not just examples.
  • A .Rbuildignore: A file describing what to omit when building the package.
  • A DESCRIPTION file: Summary and description of the package.
  • A LICENSE file: The conditions under the package is released.
  • A NAMESPACE file: What this package shares and needs.
  • A NEWS file: What has changed since last release.
  • A README: How to install and why this packages is needed and some basic examples.
5 / 34

DESCRIPTION

Package: my_package
Title: Short Descriptiono in Title Case
Version: 0.0.9000
Authors@R: c(person(given = "Name",
role = c("aut", "cre", "cph"),
email = "my@email.com"),
...)
Description: A long description of the package
License: MIT + file LICENSE
Depends:
R (>= 4.1.2)
Imports:
methods
Suggests:
covr,
knitr
VignetteBuilder:
knitr
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
6 / 34

Name of the package, description of the package, maintainer, relationships with other packages... Language does not need to be on English, it can be in other languages

DESCRIPTION

Package: my_package
Title: Short Descriptiono in Title Case
Version: 0.0.9000
Authors@R: c(person(given = "Name",
role = c("aut", "cre", "cph"),
email = "my@email.com"),
...)
Description: A long description of the package
License: MIT + file LICENSE
Depends:
R (>= 4.1.2)
Imports:
methods
Suggests:
covr,
knitr
VignetteBuilder:
knitr
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1

Package: Name (ASCII characters and . and -).
Title: Short description. Version: At least two numbers.
Authors@R: Information with an author (auth) and maintainer (cre).
Description: Longer description.
License: Information about the copyright.
Depends: Packages needed without which the package doesn't work.
Imports: Packages used.
Suggests: Packages used on examples or vignettes.
VignetteBuilder: How to build the vignette.
Other optional fields:

  • URL: Link to the source code, and webpages.
  • BugReports: Link to where to report issues.
  • Encoding: ASCII or UTF-8.
  • ...
6 / 34

Name of the package, description of the package, maintainer, relationships with other packages... Language does not need to be on English, it can be in other languages

R files

  1. Only valid R code: functions, objects, environments that don't depend on previous code executed.

  2. If a function uses a function of an other package use package::function

    1. Add that package to the appropriate site on Description

    2. add a #' @importFrom package function (See next slide)

meow <- function() {
message("meow")
}
7 / 34

NAMESPACE

Import code of other packages and export code of your package:

import(dplyr)
importFrom(methods, is)
export(meow)
8 / 34

This can be done manually no need to do it via other tools (but recommended)

NAMESPACE

Import code of other packages and export code of your package:

import(dplyr)
importFrom(methods, is)
export(meow)

If using roxygen2 you can use:

#' @import dplyr
#' @importFrom methods is
is_meow <- function(x){
methods(x, "meow")
}
#' @export # To make it available to others
meow <- function() {
message("meow")
}

Which will be written to the NAMESPACE file when updating the documentation

8 / 34

This can be done manually no need to do it via other tools (but recommended)

Important file!

Function documentation

Using roxygen2:

#' Check if it is a meow
#'
#' Check if object is of class meow.
#' @param x A character string.
#'
#' @return A logical value with either TRUE or FALSE
#' if the object is a meow.
#' @export
#' @examples
#' is_meow("hi")
#' @importFrom methods is
is_meow <- function(x){
methods(x, "meow")
}

Convert this specials comments into *.Rd files with:

document()
9 / 34

In Rstudio you can insert the skeleton with Ctrl+Alt+Shift+R . In Rstudio you can convert the special comments into documentation with Ctrl+Shift+D .

Not covered here:


  • Vignettes and articles
10 / 34

Articles are like vignettes but not checked by R CMD check

Not covered here:


  • Vignettes and articles

  • Tests

10 / 34

Articles are like vignettes but not checked by R CMD check

With testthat, tinytest or other ways

Not covered here:


  • Vignettes and articles

  • Tests

  • Website for the documentation of the package

10 / 34

Articles are like vignettes but not checked by R CMD check

With testthat, tinytest or other ways

Sites with pkgdown or other tools.

Not covered here:


  • Vignettes and articles

  • Tests

  • Website for the documentation of the package

  • Compiled code

10 / 34

Articles are like vignettes but not checked by R CMD check

With testthat, tinytest or other ways

Sites with pkgdown or other tools.

C++, C, Fortran or linking to other languages, Java, perl; or other programs (cURl, xmldev)

Not covered here:


  • Vignettes and articles

  • Tests

  • Website for the documentation of the package

  • Compiled code

  • Other

10 / 34

Articles are like vignettes but not checked by R CMD check

With testthat, tinytest or other ways

Sites with pkgdown or other tools.

C++, C, Fortran or linking to other languages, Java, perl; or other programs (cURl, xmldev)

How to choose a name of the package.
How to write a good README and NEWS file. Set up continuous integration tests. How to pick up a license.

Releasing the package



Sharing with others



A my_package.tar.gz file

11 / 34

Releasing the package



Sharing with others



A my_package.tar.gz file


Built with R CMD build my_package

11 / 34

What does it mean? Building R CMD build and making it accessible to other people, example: install.packages("http://biodev.cea.fr/sgcca/gliomaData_0.4.tar.gz", repos = NULL)

What are the expectations?

Choosing the right archive/repository



  • Repository

Git repository, svn, or none

12 / 34

Repositories keep versions of code

Choosing the right archive/repository



  • Repository

Git repository, svn, or none

  • Archive

CRAN, Bioconductor, Zenodo

12 / 34

Repositories keep versions of code

Archive do not delete content (normally) Keep versions "released". Only CRAN, Bioconductor and additional repositories like r-universe, drat, ... work via install.packages.

Choosing the right archive/repository



  • Repository

Git repository, svn, or none

  • Archive

CRAN, Bioconductor, Zenodo

  • All

GitHub + R-universe + (rOpenSci) + CRAN/Bioconductor + Zenodo

12 / 34

Repositories keep versions of code

Archive do not delete content (normally) Keep versions "released". Only CRAN, Bioconductor and additional repositories like r-universe, drat, ... work via install.packages.

Able to combine them, not mutually-exclusive

Submitting the package

Goals of a submission

  • Sharing something of quality that can be useful to others.
  • Make it easier for others to build upon your package.
  • Other: work, grant, prestige ...
13 / 34

Submissions are though specially if coming from places with poor training Lack of confidence/experience with reviews.

Submitting the package

Goals of a submission

  • Sharing something of quality that can be useful to others.
  • Make it easier for others to build upon your package.
  • Other: work, grant, prestige ...
Archives reviewing packages Objectives of the reviews?
CRAN Non-trivial publication quality packages.
Bioconductor Promote high-quality, well documented and interoperable.
rOpenSci Drive the adoption of best practices with useful, transparent and constructive feedback.
13 / 34

Submissions are though specially if coming from places with poor training Lack of confidence/experience with reviews.

Differences in objectives but all looking for quality CRAN: Point errors, comments Bioconductor: In detail comment of style, classes, dependencies, structure… rOpenSci: guideline for reviewers (about style, tests, functions, description, documentation, …)

CRAN ~18000 packages, Bioconductor ~2000, rOpenSci ~300 To work with this slides use xaringan::infinite_moon_reader()

Submitting the package

Goals of a submission

  • Sharing something of quality that can be useful to others.
  • Make it easier for others to build upon your package.
  • Other: work, grant, prestige ...
Archives reviewing packages Objectives of the reviews?
CRAN Non-trivial publication quality packages.
Bioconductor Promote high-quality, well documented and interoperable.
rOpenSci Drive the adoption of best practices with useful, transparent and constructive feedback.

Pick the right for you and your users

13 / 34

Submissions are though specially if coming from places with poor training Lack of confidence/experience with reviews.

Differences in objectives but all looking for quality CRAN: Point errors, comments Bioconductor: In detail comment of style, classes, dependencies, structure… rOpenSci: guideline for reviewers (about style, tests, functions, description, documentation, …)

CRAN ~18000 packages, Bioconductor ~2000, rOpenSci ~300 To work with this slides use xaringan::infinite_moon_reader()

Project differences

CRAN Bioconductor rOpenSci
Guides R-exts Website Book
Submit tar.gz file fill an issue fill an issue
Review email & ftp Github Github
Setup None ssh key, subscribe mailing CI tests
Checks check --as-cran check; BiocCheck check --as-cran
OS Windows, Unix, iOS Windows, Unix, iOS Windows, Unix, iOS
Versions oldrel, release, patched, devel release, devel oldrel, release, devel
Cycle Always open 2 annual releases Always open
Editors 0 0 ~10
Reviewers ~5 Volunteers ~10+Volunteers Volunteers

Different setup, different review.

14 / 34

The different projects/archives have different setups. Read the table All of them first you need to pass the automatic checks in place before a human looks into it. Will use data from the three projects but mostly refer to CRAN.

Submissions

Three bar plots with new submissions, each bar is a month: on the left CRAN with 16 months collected, on the middle Bioconductor with 5 years of data, on the right rOpenSci with 6 years of data. CRAN has about 300 monthly submissions, Bioconductor 30, rOpenSci 10. Some variance can be observed, specially on Bioconductor and rOpenSci.

CRAN data thanks to the incoming dashboard.

15 / 34

One order of magnitude of difference between each other CRAN > Bioconductor > rOpenSci Many variability on month Also very few data collected from CRAN so far (Also there are some hiccups on CRAN collection, near the end of May the CRON job stopped working for a week. )

Organization

Line plot with number of packages on CRAN's folders newbies and pretest from September 2020 to January 2022 accounted hourly. Pretest is mainly below 10 packages and newbies around 25. There are same increase on newbies packages around October and after CRAN holidays of December-January (which is marked on red). There are two spikes on packages on pretest folder, one after the holidays and another one at the beginning of April.

Packages are moved by reviewers between folders.

16 / 34

Many folders but these two are the most important. There isn't an explanation from CRAN about how do they work. Pretest is resubmission (newer versions of packages) and also for newbies

Submissions patterns

Two plots with a loess estimation of the number of packages on the CRAN's folders newbies and pretest. On the left by day of month: Newbies has some dip at the beginning of the month and around day 20-29 but is around 70 packages a day, while pretests is constant around 50 packages each day. On the right plot the same data by day of week: many packages at the beginning of the week and fewer on the weekend. Pretest packages fall from 50 to around 30, while newbies drops from 80 to 70.

Check the dashboard before submitting?

17 / 34

Submit when you are ready, better on the queue than outside.

Review time

Histogram of time that a submission is on CRAN's queue. One big histogram from 0 to over 2000 hours, where most there are below 500h and decay in logarithmic pattern. Above it a zoom on the first week, split by 24h till 168h (1 week). Most submissions are less than 24h on the queue.

Reviews are short, brief and to the point.

18 / 34

Median time on submissions ~9 hours, mean time ~31.4254531 hours. 1, 2, 9, 31.4254531, 33, 2365

Review speed

A plot with the loess estimation of hours for submission on CRAN. One line if the package is new another if it is an update. Updated packages are 5 hours on the queue while new packages start from 200 hours to 80 before CRAN holidays (end of December and beginning of January), increase again after holidays to around 120 to slowly decay till they reach 40 hours.

Expect 3-7 days till your new package is on CRAN.

19 / 34

Different time, can be shorter or longer. Most longer need resubmission. Resubmit with different version (makes it easier to track how many are).

## # A tibble: 2 Γ— 2
## new time
## <chr> <dbl>
## 1 New 58
## 2 Update 5

CRAN: 60h Bioconductor: most of them in 1 month rOpenSci: in 2 months (seeking 2 reviewers and posting them).

Users role

Two plots showing the number of actions done by users and on how many submissions they have done that. On the left for Bioconductor and on the right for rOpenSci. The points size is according to how many users did so, there are two colors and shapes, one for regular users and one for editors (rOpenSci) or reviewers (Bioconductor). Most active people are core people from the project, but there are some regular users involved on many issues and doing many actions too.

Some users are very involved.

20 / 34

Bioconductor reviewers do a lot rOpenSci editors too Both organizations have a group of users involved on the package review system. Even if Bioconductor doesn't explicitly ask for reviewers from the community. Bioconductor are considering now how to improve the review system. Omitted bots bioc-issue-bot and ropensci-review-bot (new March 2021).

Comments

Four plots, in 2 rows and 2 columns, the first column for Bioconductor and the second data from rOpenSci. First row shows comments from reviewers in relation to author's comments (almost linear relation). On the second row other users vs author's comments. Only linear relationship on rOpenSci as this include the reviewers.

A dialog between authors and reviewers & editors.

21 / 34

Non reviewers users on bioconductor still chime in to help.

Bot role

Tile plot with rows showing different message from bioc-issue-bot and columns being each issue for Bioconductor. The tile is colored by the number of times each bot posted the message. The plot shows how the bot changed with time and which are the most common feedback provided (in order of more feedback given): Build results, valid push, received, accepted, reviewer assigned. And common errors: missing repository, repost, fix version, closing issue, lacking ssh key, multiple repositories detected...

Bot helps on the process and changes with the process

22 / 34

Bot provides feedback of many issues and actions performed. It can be changed/adapted to change in requirements or errors. rOpenSci is going to have a bot too ropensci-review-bot.

Labels

Two tile plots showing labels related to the review process on the vertical axis and issues on the horizontal axis. On the left Bioconductor and on the right rOpenSci. Bioconductor show many accepted packages few declined and more inactive issues. rOpenSci plot shows more labels which allow to better know the state of the review.

Labels are used to indicate progress on the submission.

23 / 34

On bioconductor most problems with the submissions are not the package itself but not replying or chosing another venue. rOpenSci provides more detailed questioning for scope of a package.

Success submissions

A bar plot with packages submissions to CRAN on the x axis and on the vertical axis the number of packages. The bars are colored by if they are accepted or not. It is also split by new packages and updated packages. More new packages are not accepted on the first try than updates, but on resubmissions they are accepted.

High approval rates!!

24 / 34

Bioconductor & rOpenSci 50%, some submissions are abandoned or do not fit the project. Different problems faced by new packages and older ones. More in depth review requires 1 month for each reviewer.

## # A tibble: 50 Γ— 7
## submission_n new Accepted n perc suspended perc_suspended
## <fct> <chr> <lgl> <int> <dbl> <dbl> <dbl>
## 1 1 New FALSE 652 17.8 652 31.0
## 2 1 New TRUE 3015 82.2 0 0
## 3 1 Update FALSE 691 13.4 691 32.8
## 4 1 Update TRUE 4456 86.6 0 0
## 5 2 New FALSE 148 23.4 148 7.03
## 6 2 New TRUE 485 76.6 0 0
## 7 2 Update FALSE 270 8.27 270 12.8
## 8 2 Update TRUE 2993 91.7 0 0
## 9 3 New FALSE 41 20.4 41 1.95
## 10 3 New TRUE 160 79.6 0 0
## # … with 40 more rows
## # A tibble: 2 Γ— 3
## Approved n perc
## <chr> <int> <dbl>
## 1 No 1233 0.505
## 2 Yes 1208 0.495
## # A tibble: 1 Γ— 4
## `1. awaiting moderation` `2. review in progress` `3a. accepted` `3b. declined`
## <dbl> <dbl> <dbl> <dbl>
## 1 0.0000231 0.299 35.7 19.1
name Median days Total days
1/editor-checks 2.8 2.8
2/seeking-reviewer(s) 2.4 5.2
3/reviewer(s)-assigned 7.1 12.3
4/review(s)-in-awaiting-changes 27.0 39.3
5/awaiting-reviewer(s)-response 17.1 56.3
6/approved 13.3 69.6

Success submissions II

The plot on the right shows the acceptance rate of CRAN for the range of dates from 2020/09 to 2022/01. Two lines with one for new submissions which shows a consistent rate around 80% and package updates is around 90%. When the time series get too close to the last day data was collected the long reviews haven't finished so the rates fall.

High acceptance rates

25 / 34

Prepare sumbission

Prepare

Manual to create R packages, R Packages
Follow policies (CRAN) and guidelines (Bioconductor, rOpenSci).


Check

CRAN pre-submission checks: macOS, Windows

Use Rhub, Github Actions

26 / 34

Follow the detailed guidelines from Bioconductor and rOpenSci. Fix any problem that you haven't detected previously (double check the CRAN repository policy). Resubmit

Submit

Via web or devtools::release().

27 / 34

Submit

Via web or devtools::release().

Nervous

27 / 34

Submit

Via web or devtools::release().

Nervous

Wait..

27 / 34

Submit

Via web or devtools::release().

Nervous

Wait..

Resubmit

Fix and explain on re-submission.

27 / 34

Submit

Via web or devtools::release().

Nervous

Wait..

Resubmit

Fix and explain on re-submission.

πŸŽ‰ Celebrate πŸŽ‰

Announce to the community:
πŸ“₯ R-packages mailing list
🐦 Social media
πŸ‘ͺ Family?
Parties interested: users, R-user-groups, ...

27 / 34

Maintaining packages


  • Keep up to date

Changes on R code, changes on dependencies, changes on CRAN checks.

28 / 34

Not in your hands to control, reactive work

Maintaining packages


  • Keep up to date

Changes on R code, changes on dependencies, changes on CRAN checks.


  • Change code of the package

Evaluate how to adapt your package

28 / 34

Not in your hands to control, reactive work

Better quality => Less work

Maintaining packages


  • Keep up to date

Changes on R code, changes on dependencies, changes on CRAN checks.


  • Change code of the package

Evaluate how to adapt your package


  • New releases

Provide new releases

28 / 34

Not in your hands to control, reactive work

Better quality => Less work

Check also your dependencies and be mindful to users

Changing the package

New features

Whenever you want

On Bioconductor only after each release

29 / 34

Be mindful that you now "need" to support this.

Changing the package

New features

Whenever you want

On Bioconductor only after each release

Deprecating features/breaking changes

  • Notify users: NEWS, warnings , deprecate via .Deprecated
29 / 34

Be mindful that you now "need" to support this.

Changing the package

New features

Whenever you want

On Bioconductor only after each release

Deprecating features/breaking changes

  • Notify users: NEWS, warnings , deprecate via .Deprecated
29 / 34

Be mindful that you now "need" to support this.

Changing the package

New features

Whenever you want

On Bioconductor only after each release

Deprecating features/breaking changes

  • Notify users: NEWS, warnings , deprecate via .Deprecated
  • If you have dependencies: notify with at least 1 month before submitting
29 / 34

Be mindful that you now "need" to support this.

Deprecate and defunct are important steps to notify users.
Be mindful also to other developers that depend on your package: give them time, hear their concerns...

Changing the package

New features

Whenever you want

On Bioconductor only after each release

Deprecating features/breaking changes

  • Notify users: NEWS, warnings , deprecate via .Deprecated
  • If you have dependencies: notify with at least 1 month before submitting

Make life easy to users and developers

29 / 34

Be mindful that you now "need" to support this.

Deprecate and defunct are important steps to notify users.
Be mindful also to other developers that depend on your package: give them time, hear their concerns...

Keep checks clean

Plot with flavors on the y axis and percentages of packages with each status (OK, NOTE, WARNING, ERROR, FAILURE). Most flavors have the packages as OK or with notes. Only less than 10% are warnings or errors.

Most packages are in good shape.

30 / 34

CRAN mainteinance

Movement of packages once on CRAN. Barplot with the number of movements on the y axis and the date of said action on the x axis. Many have been archived recently but also many have returned to CRAN.

Packages might be archived from repositories

31 / 34

Common reasons of archiving:

Check email!

32 / 34

Archived?

undefined

Understand why happened.

Fix and resubmit
(Your package will need to pass the new package checks again.)

CRAN Packages Proportion
no 3091 60%
yes 2063 40%

Many archived packages return to CRAN.

33 / 34

Summary

34 / 34

Summary


Thanks to the R core team past and current members.

Thanks to the CRAN, Bioconductor, rOpenSci teams.

All the contributors to packages used to make this presentation.

34 / 34

Summary


Thanks to the R core team past and current members.

Thanks to the CRAN, Bioconductor, rOpenSci teams.

All the contributors to packages used to make this presentation.



To you!

34 / 34

Thank also to the package authors (mainly tidyverse, ggplot2 and rhub, and gh). MaΓ«lle Salmon and Stephanie Locke for the CRAN dashboard. And the organization. rOpenSci review: Video

## NULL

Creating packages based on the previous workshop for this user group.

Release process based on the useR!2021 talk

Maintaining packages based on the archived packages files.

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k Go to previous slide
↓, β†’, Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow