+ - 0:00:00
Notes for current slide
Notes for next slide

Writing Reproducible Research Papers with R Markdown


Resul Umit

24 May 2022

Skip intro — To the contents slide.                                                        I can teach this workshop at your institution — Email me.

1 / 246

Who am I?

Resul Umit

  • post-doctoral researcher in political science at the University of Oslo

  • teaching and studying representation, elections, and parliaments

2 / 246

Who am I?

Resul Umit

  • post-doctoral researcher in political science at the University of Oslo

  • teaching and studying representation, elections, and parliaments


2 / 246

Who am I?

Resul Umit

  • post-doctoral researcher in political science at the University of Oslo

  • teaching and studying representation, elections, and parliaments



2 / 246

How did I use to write?

First, with Stata + Word, I was ...

  • frustrated with Word

    • formatting tables, figures, citations, and equations
    • managing references
  • tired of switching between programmes/screens

    • and, worried about making mistakes in between
  • paying for programme licences

3 / 246

How did I use to write?

Then, with Stata + R + LaTeX, I was ...

  • frustrated with Word

    • formatting tables, figures, citations, and equations
    • managing references
  • tired of switching between programmes/screens

    • and, worried about making mistakes in between
  • paying for the Stata licence

  • converting PDF documents to Word manually

    • coordinating work with co-authors who don't use LaTeX/PDF
    • submitting to journals which don't accept LaTeX/PDF
4 / 246

How do I write now?

Now, with R Markdown, I am ... happy!

  • frustrated with Word

    • formatting tables, figures, citations, and equations
    • managing references
  • tired of switching between programmes/screens

    • and, worried about making mistakes in between
  • paying for the Stata licence

  • converting PDF documents to Word, manually

    • coordinating work with co-authors who don't use LaTeX/PDF
    • submitting to journals which don't accept LaTeX/PDF
5 / 246

R Markdown

  • Efficient

    • write text, cite sources, tidy data, analyse, table, and plot it in one programme/screen
    • re-do one, more, or all of these with ease
      • decrease the possibility of making mistakes in the process
6 / 246

R Markdown

  • Efficient

    • write text, cite sources, tidy data, analyse, table, and plot it in one programme/screen
    • re-do one, more, or all of these with ease
      • decrease the possibility of making mistakes in the process
  • Flexible

    • output to various formats
      • e.g., HTML, LaTeX, PDF, Word
6 / 246

R Markdown

  • Efficient

    • write text, cite sources, tidy data, analyse, table, and plot it in one programme/screen
    • re-do one, more, or all of these with ease
      • decrease the possibility of making mistakes in the process
  • Flexible

    • output to various formats
      • e.g., HTML, LaTeX, PDF, Word
  • Open access/source

    • use for free
    • create documents accessible to anyone with a computer and internet connection
    • benefit from the work of a great community of users/developers
6 / 246

Reproducibilty — Before Publication

  • Having written a complete draft

    • with data including re-coded variables, tables, figures, and text with references to specific results (e.g., numbers from summary and/or regression statistics)
7 / 246

Reproducibilty — Before Publication

  • Having written a complete draft

    • with data including re-coded variables, tables, figures, and text with references to specific results (e.g., numbers from summary and/or regression statistics)
  • If you and/or your co-authors decide

    • to reverse a re-coded variable to its previous/original measure
    • and/or, to exclude a subgroup of observations from analysis
7 / 246

Reproducibilty — Before Publication

  • Having written a complete draft

    • with data including re-coded variables, tables, figures, and text with references to specific results (e.g., numbers from summary and/or regression statistics)
  • If you and/or your co-authors decide

    • to reverse a re-coded variable to its previous/original measure
    • and/or, to exclude a subgroup of observations from analysis
  • How resource intensive would this revision be?

    • how long would this revision take?
    • how many programmes would be needed for this revision, and how much would they cost?
    • there is an inverse relationship between this resource intensity and reproducibilty
7 / 246

Reproducibilty — After Publication

  • After your paper is published, if others, including your future self, would like to test how robust the results are

    • to reversing a re-coded variable to its previous/original measure
    • and/or, to excluding a subgroup of observations from analysis
8 / 246

Reproducibilty — After Publication

  • After your paper is published, if others, including your future self, would like to test how robust the results are

    • to reversing a re-coded variable to its previous/original measure
    • and/or, to excluding a subgroup of observations from analysis
  • How resource intensive would this test be?

    • how accessible is the data, documentation (how was the variable re-coded in the first place?), and the code?
    • how long would the test take?
    • how many programmes would be needed for this revision, and how much would they cost?
    • there is an inverse relationship between this resource intensity and reproducibilty
8 / 246

The Workshop — Overview

  • Two days, on how to write reproducible research papers with R Markdown

    • 200+ slides, 40+ exercises, and time for converting a real project
9 / 246

The Workshop — Overview

  • Two days, on how to write reproducible research papers with R Markdown

    • 200+ slides, 40+ exercises, and time for converting a real project
  • Based on converting a mock manuscript written in Word to R Markdown

    • plus, improving its reproducibility and version-controlling it
    • with a PDF output in mind
9 / 246

The Workshop — Overview

  • Two days, on how to write reproducible research papers with R Markdown

    • 200+ slides, 40+ exercises, and time for converting a real project
  • Based on converting a mock manuscript written in Word to R Markdown

    • plus, improving its reproducibility and version-controlling it
    • with a PDF output in mind
  • Designed for researchers with basic knowledge of R programming language

    • does not cover programming with R
      • e.g., writing functions

    • ability to regress, plot, and table in R will be very helpful
      • but not absolutely necessary — these skills can be developed after learning R Markdown as well
9 / 246

The Workshop — Contents

Part 1. Getting the Tools Ready

  • e.g., downloading course material

Part 2. Introducing R Markdown

  • e.g., creating a new document

Part 3. Setting Metadata

  • e.g., defining output format

Part 4. Writing Text

  • e.g., adding emphasis to text

Part 5. Managing References

  • e.g., citing sources
10 / 246

The Workshop — Contents

Part 1. Getting the Tools Ready

  • e.g., downloading course material

Part 2. Introducing R Markdown

  • e.g., creating a new document

Part 3. Setting Metadata

  • e.g., defining output format

Part 4. Writing Text

  • e.g., adding emphasis to text

Part 5. Managing References

  • e.g., citing sources

Part 6. Adding Code, Figures, and Tables

  • e.g., plotting data

Part 7. Addressing Functionality Gaps

  • e.g., adjusting line spacing

Part 8. Using Version Control

  • e.g., integrating Git and GitHub

Part 9. Collaborating with Others

  • e.g., working simultaneously with co-authors

Part 10. Working on a Real Project

  • e.g., converting a work-in-progress of yours
10 / 246

The Workshop — Organisation

  • Sit in groups of two

    • participants learn as much from their partner as from instructors
    • switch partners after every second part
  • Type, rather than copy and paste, the code that you will find on these slides

    • typing is a part of the learning process
  • When you have a question

    • ask your partner
    • google together
    • ask me
11 / 246

The Workshop — Organisation — Slides

Slides with this background colour indicate that your action is required, for

  • setting the workshop up

    • e.g., downloading course material
  • completing the exercises

    • e.g., managing references in R Markdown
    • there are 40+ exercises
    • these slides have countdown timers
03:00
12 / 246

The Workshop — Organisation — Slides

  • Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background

    • long codes and texts will have their own line(s)
```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) +
geom_point() +
facet_wrap(. ~ branch) +
scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield"))
```
13 / 246

The Workshop — Organisation — Slides

  • Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background

    • long codes and texts will have their own line(s)
  • Results that come out in output files appear as such — in the same font, on green background

    • except very obvious results, such as figures and tables
14 / 246

The Workshop — Organisation — Slides

  • Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background

    • long codes and texts will have their own line(s)
  • Results that come out in output files appear as such — in the same font, on green background

    • except very obvious results, such as figures and tables
  • Specific sections are highlighted yellow as such for emphasis

    • these could be for anything — codes and texts in input, results in output, and/or texts on slides
14 / 246

The Workshop — Organisation — Slides

  • Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background

    • long codes and texts will have their own line(s)
  • Results that come out in output files appear as such — in the same font, on green background

    • except very obvious results, such as figures and tables
  • Specific sections are highlighted yellow as such for emphasis

    • these could be for anything — codes and texts in input, results in output, and/or texts on slides
  • The slides are designed for self-study as much as for the workshop

    • accessible, in substance and form, to go through on your own
14 / 246

The Workshop — Aims

  • To make you aware what is possible with R Markdown

    • we will cover a large breath of issues, not all of it is for long-term memory
      • one reason why the slides are designed for self study as well

    • awareness of what is possible, Google, and perseverance are all we need
15 / 246

The Workshop — Aims

  • To make you aware what is possible with R Markdown

    • we will cover a large breath of issues, not all of it is for long-term memory
      • one reason why the slides are designed for self study as well

    • awareness of what is possible, Google, and perseverance are all we need
  • To encourage you to convert into R Markdown

    • practice with a mock manuscript (Parts 3–9)
    • start converting a real one (Part 10)
15 / 246

Part 1. Getting the Tools Ready

16 / 246

Course Materials — Download from the Internet

Code -> Download ZIP


  • Unzip and rename the folder

    • unzip to a location that is not synced

      • e.g., perhaps Documents, but not Dropbox
    • rename the folder as YOURNAME-rmd

      • e.g., resul-rmd
      • this will come handy when we collaborate Part 9
17 / 246

Course Materials — Overview

Notice that the folder has the following structure

YOURNAME-rmd
|
|- manuscript
| |
| |- reproduce_this.pdf
| |- journals.Rmd
| |- references.bib
| |- apa_7th.csl
|
|- data
| |
| |- journals.csv
|
|- image
| |
| |- google_scholar.png
18 / 246

Course Materials — Contents

  • manuscript\reproduce_this.pdf

    • the document, formatted in Word but saved as PDF, that we will re-create with R Markdown
    • randomly generated sentences, with figures and tables from randomly a generated dataset*
    • key sections in need of attention are highlighted yellow

* The text, Lorem ipsum, is generated with the stringi package (Gagolewski, Tartanus, Unicode, Inc., and others, 2021) while the dataset is created with the fabricatr package (Blair, Cooper, Coppock, Humphreys, Rudkin, and Fultz, 2022).

19 / 246

Course Materials — Contents

  • manuscript\reproduce_this.pdf

    • the document, formatted in Word but saved as PDF, that we will re-create with R Markdown
    • randomly generated sentences, with figures and tables from randomly generated dataset
    • key sections in-need of attention are highlighted
  • manuscript\journals.Rmd

    • the R Markdown document that we will work on
    • includes unformatted text from reproduce_this.pdf to save time
    • major components, such as paragraphs and tables, are numbered and marked in comments to facilitate navigation
20 / 246

Course Materials — Contents

  • manuscript\reproduce_this.pdf

    • the document, formatted in Word but saved as PDF, that we will re-create with R Markdown
    • randomly generated sentences, with figures and tables from randomly generated dataset
    • key sections in-need of attention are highlighted
  • manuscript\journals.Rmd

    • the R Markdown document that we will work on
    • includes unformatted text from reproduce_this.pdf to save time
    • major components, such as paragraphs and tables, are numbered and marked in comments to facilitate navigation
  • manuscript\references.bib

    • a BibTeX document with three fabricated references
20 / 246

Course Materials — Contents

  • manuscript\reproduce_this.pdf

    • the document, formatted in Word but saved as PDF, that we will re-create with R Markdown
    • randomly generated sentences, with figures and tables from randomly generated dataset
    • key sections in-need of attention are highlighted
  • manuscript\journals.Rmd

    • the R Markdown document that we will work on
    • includes unformatted text from reproduce_this.pdf to save time
    • major components, such as paragraphs and tables, are numbered and marked in comments to facilitate navigation
  • manuscript\references.bib

    • a BibTeX document with three fabricated references
  • manuscript\apa_7th.csl

    • a Citation Style Language document, with APA (7th Edition) referencing style (Wiernik, 2020)
20 / 246

Course Materials — Contents

data\journals.csv

  • a dataset created with the fabricatr package (Blair, Cooper, Coppock, et al., 2022), imagined to explore the Google Scholar rankings of fictitious journals

  • includes the following variables

    • name: journals (1090 random titles)
    • origin: geographic origins (five continents)
    • branch: major discipline of journals (four branches)
    • since: time of first publication (years)
    • h5_index: H5 Index (integers)
    • h5_median: H5 Median (integers)
    • english: English (1) vs. other-language (0) journals
    • subfield: subfield (1) vs. generalist (0) journals
    • issues: number of issues published per year (integers)
21 / 246

Course Materials — Contents

22 / 246

Git — Download from the Internet and Install

23 / 246

GitHub — Open an Account

Sign up for GitHub at https://github.com

  • registering an account is free

  • usernames are public

    • either choose an anonymous username (e.g., asdf029348)
    • or choose one carefully — it becomes a part of users' online presence
  • usernames can be changed later

24 / 246

R and RStudio — Download from the Internet and Install

25 / 246

RStudio Project — Create from within RStudio

  • RStudio allows for dividing your work with R into separate projects, each with own history etc.

    • this page has more information on why projects are recommended


  • Create a new RStudio project for the existing* workshop directory ...\YOURNAME-rmd from the RStudio menu:

File -> New Project -> Existing Directory -> Browse -> ...\YOURNAME-rmd -> Open

* Recall that we have downloaded this earlier from GitHub. Back to the relevant slide.

26 / 246

RStudio — R Markdown Options

RStudio offers various functions that facilitate working with .Rmd documents, which can be controlled at two locations:

  • global settings that apply to all markdown projects, located at:

Tools -> Global Options -> R Markdown

27 / 246

RStudio — R Markdown Options

RStudio offers various functions that facilitate working with .Rmd documents, which can be controlled at two* locations:

  • global settings that apply to all markdown projects, located at:

Tools -> Global Options -> R Markdown

  • project settings that apply to a given markdown project, located at:

Tools -> Project Options -> R Markdown

* Some settings become available on the document toolbar as well, only when an .Rmd document is open. We will cover the document toolbar later on in the workshop. All settings can stay as they are — for now.

28 / 246

R Packages — Install from within RStudio

install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))
tinytex::install_tinytex()
29 / 246

R Packages — Install from within RStudio

install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))
tinytex::install_tinytex()
30 / 246

R Packages — Install from within RStudio

install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))
tinytex::install_tinytex()
  • rmarkdown (Allaire, Xie, McPherson, et al., 2022), for automating the process of converting R Markdown documents into other formats

  • tinytex (Xie, 2022c), for PDF outputs

    • requires an additional step to install
    • alternative: a TeX/LaTeX system installed on your computer
31 / 246

R Packages — Install from within RStudio

install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))
tinytex::install_tinytex()
32 / 246

R Packages — Install from within RStudio

install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))
tinytex::install_tinytex()
33 / 246

R Packages — Install from within RStudio

install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))
tinytex::install_tinytex()
34 / 246

R Markdown Cheat Sheet — Download from the Internet

Downloading process can be initiated from within RStudio

  • follow from the RStudio menu

Help -> Cheatsheets -> R Markdown Cheat Sheet

35 / 246

Other Resources*



* During the workshop, R Markdown Cheat Sheet is likely to be more helpful than these resources, which I recommend to be consulted after the workshop.

36 / 246

Part 2. Introducing R Markdown

37 / 246

R Markdown Document — Create from within RStudio

  • Create a new R Markdown document from the RStudio menu:*

File -> New File -> R Markdown -> OK

  • Save your new document:**

File -> Save

  • Observe that

    • the document has been saved to your working directory, and
    • it has the .Rmd extension

* This is for demonstration purposes only. Otherwise, we will work with journals.Rmd, which you have already downloaded, to save time.

** Alternatively, use the Save button or the keyboard shortcut (e.g., Ctrl + S on Windows). For shortcuts, follow Tools -> Keyboard Shortcuts Help or Tools -> Modify Keyboard Shortcuts....

38 / 246

R Markdown Document — Components

Observe also that the document has three components

  • YAML

39 / 246

R Markdown Document — Components

Observe also that the document has three components

  • YAML
  • text

40 / 246

R Markdown Document — Components

Observe also that the document has three components

  • YAML
  • text
  • code chunks

41 / 246

R Markdown Document — Document Toolbar

Observe also that the document toolbar offers extended tools for .Rmd documents


These include, most impotantly,

  • the button to compile .Rmd documents
42 / 246

R Markdown Document — Compile

  • Click the Knit button to compile your .Rmd document, and observe that

    • the output document has the same name as your .Rmd document
  • You may want to delete these newly created files, as we will work with journals.Rmd instead to save time.

43 / 246

R Markdown Document — Compilation Process

  • When you Knit, the following happens:

    .Rmd --knitr--> .md --pandoc--> output

    • knitr* executes the code if there is any, converts the resulting document from .Rmd (R Markdown) into .md (Markdown)

    • pandoc** transforms the .md document into your preferred output format(s)

      • e.g., HTML, LaTeX, PDF, Word
  • This process is automated by the rmarkdown package

* If you had not already have the knitr package, it would have been installed together with the rmarkdown package.

** RStudio comes with a copy of pandoc (http://pandoc.org), which is not an R package, so that you do not have to install it separately.

44 / 246

R Markdown Document — Notes

  • Behind the scenes, each .Rmd file is compiled in its own session, and therefore

    • the code needs to stand alone, for reproducibility reasons
    • e.g., if you load a package in the Console, it will not be available to a given .Rmd file — even in the same R session
45 / 246

R Markdown Document — Notes

  • Behind the scenes, each .Rmd file is compiled in its own session, and therefore

    • the code needs to stand alone, for reproducibility reasons
    • e.g., if you load a package in the Console, it will not be available to a given .Rmd file — even in the same R session
  • R Markdown can produce more than documents,* including

* Here we will focus on research papers only. In a separate workshop, I teach how to create professional websites with R Blogdown.

45 / 246

Part 3. Setting Metadata

46 / 246

YAML — Overview

.Rmd documents start* with YAML

  • includes the metadata variables
    • e.g., title, output format

  • written between a pair of three hyphens -
---
title:
output:
---

* Technically, we can place YAML anywhere in a .Rmd document. However, it is a good practice to start with YAML so that the metadata is easly accessbile.

47 / 246

YAML — Variables

  • title and output are the basic variables of YAML

    • variable names are typed in lower case, followed by a colon :
    • the list of available variables, as well as options and sub-options for these variables, depends on the output format
  • Typical YAML variables for an research paper are as follows:

---
title:
author:
date:
bibliography:
csl:
output:
---
48 / 246

YAML — Variables

Variables can take strings

---
title: "Journals: Random Words With Random Data"
output:
---
49 / 246

YAML — Variables

Variables can take strings, options

---
title: "Journals: Random Words With Random Data"
output: pdf_document
---
50 / 246

YAML — Variables

Variables can take strings, options, sub-options

---
title: "Journals: Random Words With Random Data"
output:
pdf_document:
keep_tex: true
---
51 / 246

YAML — Variables

Variables can take strings, options, sub-options, and code

---
title: "Journals: Random Words With Random Data"
output:
pdf_document:
keep_tex: true
date: "`r format(Sys.Date(), '%d %B %Y')`"
---
52 / 246

YAML — Variables — Output Formats

Documents as output formats include

  • HTML
---
title: "Journals: Random Words With Random Data"
output: html_document
---

53 / 246

YAML — Variables — Output Formats

Documents as output formats include

  • HTML
  • LaTeX
---
title: "Journals: Random Words With Random Data"
output: latex_document
---

54 / 246

YAML — Variables — Output Formats

Documents as output formats include

  • HTML
  • LaTeX
  • PDF
---
title: "Journals: Random Words With Random Data"
output: pdf_document
---

55 / 246

YAML — Variables — Output Formats

Documents as output formats include

  • HTML
  • LaTeX
  • PDF
  • Word
---
title: "Journals: Random Words With Random Data"
output: word_document
---

56 / 246

YAML — Variables — Output Formats

  • Documents as output formats

    • html_document
    • latex_document
    • pdf_document*
    • word_document
    • github_document
    • md_document
    • odt_document
    • rtf_document
  • Presentations as output formats

    • beamer_presentation
    • iosslides_presentation
    • powerpoint_presentation
    • slidy_presentation

* For reasons of simplicity, this workshop focuses on LaTex and/or PDF outputs. Different output formats have slightly different customisations. See Pandoc User's Guide and/or R Markdown Cheat Sheet.

57 / 246

YAML — Strings

Strings with special characters, such as colon, require quotation marks — single ' or double "

---
title: "Journals: Random Words With Random Data"
output: pdf_document
---

58 / 246

YAML — Strings

Quotation marks are optional for strings without special characters

---
title: "Journals: Random Words With Random Data"
subtitle: A Mock Paper for an R Markdown Workshop
author: Jane Doe
date: 4 March 2020
output: pdf_document
---

59 / 246

YAML — Strings — Footnotes

The syntax ^[footnotes_go_here] adds footnotes to strings

---
title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"
subtitle: A Mock Paper for an R Markdown Workshop
author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"
date: 4 March 2020
output: pdf_document
---

60 / 246

YAML — Strings — External Files

The bibliography and csl variables take strings as well

---
title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"
subtitle: A Mock Paper for an R Markdown Workshop
author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"
date: 4 March 2020
bibliography: references.bib
csl: apa_7th.csl
output: pdf_document
---
61 / 246

YAML — Strings — External Files

The strings for external files indicate (a) where the files are located and (b) how they are named

---
...
bibliography: references/ref_library.bib
csl: "../../styles/chicago_manual_17.csl"
...
---
62 / 246

YAML — Strings — External Files

The strings for external files indicate (a) where the files are located and (b) how they are named

---
...
bibliography: references/ref_library.bib
csl: "../../styles/chicago_manual_17.csl"
...
---


Notice that

  • the locations above are specified as relative to the working directory

    • the former (references) is a sub-directory, or folder, one level down while the latter (styles) is two levels up
  • for reproducibility reasons, hard-coded stings should be avoided

    • e.g., "C:/Users/resulumit/Dropbox/styles/chicago_manual_17.csl"
63 / 246

YAML — Strings — External Files

The strings indicate (a) where the files are located and (b) how they are named

---
...
bibliography: references/ref_library.bib
csl: "../../styles/chicago_manual_17.csl"
...
---
64 / 246

YAML — Options and Sub-Options

Options can have sub-options

---
title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"
subtitle: A Mock Paper for an R Markdown Workshop
author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"
date: 4 March 2020
bibliography: references.bib
csl: apa_7th.csl
output:
pdf_document:
keep_tex: true
---

65 / 246

YAML — Options and Sub-Options

Options can have sub-options

---
title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"
subtitle: A Mock Paper for an R Markdown Workshop
author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"
date: 4 March 2020
bibliography: references.bib
csl: apa_7th.csl
output:
pdf_document:
keep_tex: true
---

Notice that

  • this specific setting, highlighted, will create multiple outputs

    • a LaTeX and a PDF document
  • all but the last option (i.e., true) takes a colon

  • options and sub-options (except the last option, again) are stepwise indented

    • exactly with four spaces
    • the alignment between the colons for pdf_document and keep_tex is coincidental
66 / 246

YAML — R Code

Variables can take code as well

---
title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"
subtitle: A Mock Paper for an R Markdown Workshop
author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"
date: "`r format(Sys.Date(), '%d %B %Y')`"
bibliography: references.bib
csl: apa_7th.csl
output: pdf_document
---

67 / 246

YAML — R Code

Variables can take code as well

---
title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"
subtitle: A Mock Paper for an R Markdown Workshop
author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"
date: "`r format(Sys.Date(), '%d %B %Y')`"
bibliography: references.bib
csl: apa_7th.csl
output: pdf_document
---

Notice that

  • such codes can be particularly useful for variables

    • that need frequent updates
    • and that can be automatically updated
      • e.g., date
  • there are quotation marks around the code

  • we will cover codes in .Rmd documents later on in the workshop

68 / 246

YAML — R Code

Code and text can be combined in a string

---
title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"
subtitle: A Mock Paper for an R Markdown Workshop
author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"
date: "First version: 4 March 2020. This version: `r format(Sys.Date(), '%d %B %Y')`."
bibliography: references.bib
csl: apa_7th.csl
output: pdf_document
---

69 / 246

YAML — Some Further Settings for PDF Outputs

  • fontsize

    • the default is 10pt
    • the other options are 11pt and 12pt
  • linkcolor, urlcolor, citecolor

    • the default is the colour of the text
    • the other options are white, red, green, blue, cyan, magenta, yellow
  • link-citations

    • the default is no
    • the other option is yes — a click on an citation will take the screen to the relevant entry in the list of references
70 / 246

Exercises — 1–4

1) Open journals.Rmd and fill in the YAML variables for the mock paper

  • take cues from reproduce_this.pdf and/or the slides


2) Add and set one of the variables mentioned as further settings for PDF outputs above

  • i.e., fontsize, linkcolor, urlcolor, citecolor, link-citations


3) Add and set a completely new variable not covered so far


4) Knit your journals.Rmd

  • observe the outcome
10:00
71 / 246

Part 4. Writing Text

72 / 246

Syntax — Overview

  • There are not one, but several different versions of Markdown

    • e.g., Pandoc, MultiMarkdown, CommonMark
    • each might implement the same things (e.g., citations) slightly differently, and each might offer unique functionalities


73 / 246

Syntax — Lines

Multiple spaces on a given line are reduced to one

This is a sentence followed by four spaces. This is another sentence on the same line.

This is a sentence followed by four spaces. This is another sentence on the same line.


Line endings with fewer than two spaces are ignored

This is a sentence followed by one space.
This is another sentence on a new line.

This is a sentence followed by one space. This is another sentence on a new line.

74 / 246

Syntax — Hard Breaks

Two or more spaces at the end of lines introduce hard breaks, forcing a new line

This is a sentence followed by two spaces.
This is another sentence on a new line.

This is a sentence followed by two spaces.
This is another sentence on a new line.

75 / 246

Syntax — Line Blocks

Spaces on lines that start with a vertical line | are kept

| a one-space indent
| a five-space indent
| a ten-space indent

 a one-space indent
     a five-space indent
          a ten-space indent

76 / 246

Syntax — Block Quotes

Lines starting with the greater-than sign > introduce block quotes*

> In God, we trust. All others must bring data.
>
> --- Anonymous

        In God, we trust. All others must bring data.
        
         — Anonymous

* Notice that three hyphens grouped together introduce an em-dash. Dashes are covered later on in the workshop.

77 / 246

Syntax — Paragraphs

One or more* blank lines introduce a new paragraph

This is the first sentence of a paragraph as it is preceded by a blank line. This is the second
sentence of that paragraph, which is followed by a blank line.
This is the first sentence of a *new paragraph* as it is preceded by a blank line. This is the
second sentence of that paragraph, which is followed by a blank line.

This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.

This is the first sentence of a new paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.

* Multiple blank lines between paragraphs reduce to one.

78 / 246

Syntax — Comments

Text with the syntax <!--comments --> is omitted from output

<!-- This paragraph needs re-writing -->
This is the first sentence of a paragraph as it is preceded by a blank line. This is the second
sentence of that paragraph, which is followed by a blank line.
This is the first sentence of a new paragraph <!-- I've removed italics --> as it is preceded
by a blank line. This is the second sentence of that paragraph, which is followed by a blank
line.

This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.

This is the first sentence of a new paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.

79 / 246

Exercises — 5–6

5) Hard Breaks

  • see reproduce_this.pdf, page 1
  • apply in journals.Rmd, paragraph 1


6) Line Blocks / Block Quotes

  • see reproduce_this.pdf: page 1
  • apply in journals.Rmd: block quote, between paragraphs 1 and 2

  • see reproduce_this.pdf: page 5
  • apply in journals.Rmd: hypothesis 1, between paragraphs 14 and 15; hypothesis 2, between paragraphs 16 and 17
05:00
80 / 246

Syntax — Headers

The number sign # introduces headers; lower levels are created with additional signs — up to total five levels

# Introduction becomes

Introduction

## 1. Introduction becomes

1. Introduction

### 3.1 Introduction becomes

3.1 Introduction

#### Introduction becomes

Introduction

##### Introduction becomes

Introduction
81 / 246

Syntax — Emphases

A pair of single asterisk * or underscores _ introduces italics

*italics* becomes italics

_italics_ becomes italics as well


A pair of double asterisk or underscores introduces bold

**bold** becomes bold

__bold__ becomes bold as well


These two rules can be combined

**_bolditalics_** becomes bolditalics

_**bolditalics**_ becomes bolditalics as well

82 / 246

Syntax — Strikethrough

A pair of double tildes ~ introduces strikethrough

~~strikethrough~~ becomes strikethrough


Strikethrough can be combined with italics or bold

**~~strikebold~~** or __~~strikebold~~__, they both become strikebold

~~**strikebold**~~ or ~~__strikebold__~~, they both become strikebold as well


*~~strikeitalitcs~~* or _~~strikeitalitcs~~_, they both become strikeitalitcs

~~*strikeitalitcs*~~ or ~~_strikeitalitcs_~~, they both become strikeitalitcs as well

83 / 246

Exercises — 7–8

7) Headers

  • see reproduce_this.pdf: pages 1 to 11
    • 10 headers, Abstract to References

  • apply in journals.Rmd


8) Emphases

  • see reproduce_this.pdf: pages 1 and 2
    • bold and italics

  • apply in journals.Rmd: paragraph 2
03:00
84 / 246

You can link text to URLs

[visit my website](https://resulumit.com/) becomes visit my website

[https://resulumit.com](https://resulumit.com/) becomes https://resulumit.com

<https://resulumit.com> becomes https://resulumit.com as well

86 / 246

You can link text to URLs

[visit my website](https://resulumit.com/) becomes visit my website

[https://resulumit.com](https://resulumit.com/) becomes https://resulumit.com

<https://resulumit.com> becomes https://resulumit.com as well


You can also link text to an email address

[email me](mailto:resuluy@uio.no)* becomes email me

<resuluy@uio.no> becomes resuluy@uio.no

* Notice the prefix mailto: in the syntax.

86 / 246

Exercises — 9–10

9) Links — Internal

  • see reproduce_this.pdf: page 2
    • the link to the Literature Review section

  • apply in journals.Rmd: paragraph 4


10) Links — External

  • see reproduce_this.pdf: page 1
    • email and website links in one of the footnotes

  • apply in journals.Rmd: title page items
03:00
87 / 246

Syntax — Equations

Inline equations go between a pair of single dollar signs $ — with no space between the signs and the equation itself

$E = mc^{2}$ becomes E = mc2


88 / 246

Syntax — Equations

Inline equations go between a pair of single dollar signs $ — with no space between the signs and the equation itself

$E = mc^{2}$ becomes E = mc2


Block equations go in between a pair of double dollar signs — with or without spaces, it works

$$ E = mc^{2}$$ becomes

E = mc2


$$E = mc_{2}$$ becomes

E = mc2
88 / 246

Syntax — Footnotes — Inline Notes

For inline footnotes, use the ^[footnote] syntax

Jane Doe^[Corresponding author.] becomes Jane Doe1


1 Corresponding author.

89 / 246

Syntax — Footnotes — Inline Notes

For inline footnotes, use the ^[footnote] syntax

Jane Doe^[Corresponding author.] becomes Jane Doe1


1 Corresponding author.

Notice that

  • the caret sign ^ comes before the left square bracket [
  • this syntax works in YAML as well as in text
    • footnotes in YAML get symbols, in text they get numbers
89 / 246

Syntax — Footnotes — Notes with Identifiers

An alternative is to use the [^identifier] syntax, with identifiers defined elsewhere in the same document

Dr Doe holds a PhD in rock science.[^defence_date]
[^defence_date]: She defended her thesis in 2017.

Dr Doe holds a PhD in rock science.1


1 She defended her thesis in 2017.

90 / 246

Syntax — Footnotes — Notes with Identifiers

An alternative is to use the [^identifier] syntax, with identifiers defined elsewhere in the same document

Dr Doe holds a PhD in rock science.[^defence_date]
[^defence_date]: She defended her thesis in 2017.

Dr Doe holds a PhD in rock science.1


1 She defended her thesis in 2017.

Notice that

  • the caret sign comes after the left square bracket
  • this syntax works in text, but not in YAML
90 / 246

Exercises — 11–12

11) Equations

  • see reproduce_this.pdf: page 7
  • apply in journals.Rmd: paragraph 22; block equation, between paragraphs 22 and 23


12) Footnotes

  • see reproduce_this.pdf: page 2
  • apply in journals.Rmd: paragraph 3
03:00
91 / 246

Syntax — Lists

Lines starting with asterisk * as well as plus + or minus signs introduce lists

- books
- articles
- reports
  • books
  • articles
  • reports
92 / 246

Syntax — Lists — Nesting

Lists can be nested within each other, with indentation

+ books
+ articles
- published
- under review
+ revised and resubmitted
- work in progress
  • books
  • articles
    • published
    • under review
      • revised and resubmitted
    • work in progress
93 / 246

Syntax — Lists — Numbering

List items can be numbered

1. books
2. articles
- published
- under review
+ revised and resubmitted
- work in progress
  1. books
  2. articles
    • published
    • under review
      • revised and resubmitted
    • work in progress
94 / 246

Syntax — Dashes

Two hyphens grouped together introduce an en-dash

‐‐ becomes


Three hyphens grouped together introduce an em-dash

‐‐‐ becomes

95 / 246

Syntax — Subscripts and Superscripts

A pair of tildes introduces subscript

CO~2~ becomes CO2


A pair of carets introduces superscript

R^2^ becomes R2

96 / 246

Syntax — Subscripts and Superscripts

A pair of tildes introduces subscript

CO~2~ becomes CO2


A pair of carets introduces subscript

R^2^ becomes R2

Notice that

  • the syntax here (Markdown-based) is different than the one for equations (LaTeX-based)
    • e.g., R^2^ versus mc^{2}
97 / 246

Exercises — 13–15

13) Lists

  • see reproduce_this.pdf: page 3
  • apply in journals.Rmd: list, between paragraphs 10 and 11


14) Dashes

  • see reproduce_this.pdf: page 2
  • apply in journals.Rmd: paragraph 6


15) Subscripts and Superscripts

  • see reproduce_this.pdf: page 2
  • apply in journals.Rmd: paragraph 5
03:00
98 / 246

Part 5. Managing References

99 / 246

References — Bibliography Database

  • References are defined in .bib files

    • they follow the BibTeX format


  • pandoc looks for a .bib file, and for the definitions therein, to process citations

    • .bib files are specified with the bibliography variable in YAML


  • pandoc can process a citation only if there is a linked entry in the .bib file

    • but not all entries have to be cited

100 / 246

References — Bibliography Database — Entries

  • A BibTeX entry consists of three elements

    • a type
      • e.g., @article

    • a citation-key
      • e.g., bennett2015

    • a number of tags
      • e.g., title, author


  • Different tags are available for different reference types

    • some tags are required, others are optional

101 / 246

References — Bibliography Database — Entries

  • One could create entries by hand

    • requires knowing the BibTeX format, entry types, tags, and related information about references to be cited
    • neither efficient nor necessary


  • A good alternative is to use Google Scholar, which provides BibTeX entries

    • follow cite -> BibTex and copy
    • paste into .bib, edit if necessary, and save


  • Some publishers and journals provide BibTeX entries on their website as well
102 / 246

References — Style


  • pandoc looks for a .csl file, and for the styles therein, to style citations and references

    • .csl files are specified with the csl variable in YAML
    • if unspecified, it uses a Chicago author-date format


  • .csl files affect the style only in outputs

    • no matter which the style is used, the citation syntax in .Rmd documents remains the same

103 / 246

References — In-text Citation Syntax — Author-Date Styles*

All citations keys take the 'at' sign @ while square brackets and/or minus signs introduce variation

[@bennett2015] becomes (Bennett, 2015)

@bennett2015 becomes Bennett (2015)

[-@bennett2015] becomes (2015)

-@bennett2015 becomes 2015

[@bennett2015 35] becomes (Bennett, 2015, p. 35)

[@bennett2015 33-35] becomes (Bennett, 2015, pp. 33–35)

[@bennett2015, ch. 1] becomes (Bennett, 2015, ch. 1)

[@bennett2015; @gilbert2019] becomes (Bennett, 2015; Gilbert, 2019)

[see @bennett2015, for details] becomes (see Bennett, 2015, for details)

@bennett2015 [33-35] becomes Bennett (2015, pp. 33–35)

* Specifically, the outputs on this slide are formatted according to the APA 7th edition.

104 / 246

References — In-text Citation Syntax — Numerical Styles

All citations keys take the 'at' sign @

A clever sentence.[@bennett2015] becomes A clever sentence.[1] in certain numerical sytles

A clever sentence.[@bennett2015; @gilbert2019] becomes A clever sentence.[1,2]


105 / 246

References — In-text Citation Syntax — Numerical Styles

All citations keys take the 'at' sign @

A clever sentence.[@bennett2015] becomes A clever sentence.[1] in certain numerical sytles

A clever sentence.[@bennett2015; @gilbert2019] becomes A clever sentence.[1,2]


Individual styles may or may not use additional information, such as page numbers

A clever sentence.[@bennett2015 35] might become A clever sentence.[1] as well


105 / 246

References — In-text Citation Syntax — Numerical Styles

All citations keys take the 'at' sign @

A clever sentence.[@bennett2015] becomes A clever sentence.[1] in certain numerical sytles

A clever sentence.[@bennett2015; @gilbert2019] becomes A clever sentence.[1,2]


Individual styles may or may not use additional information, such as page numbers

A clever sentence.[@bennett2015 35] might become A clever sentence.[1] as well


Individual styles may or may not be sensitive to variation, such as square brackets

A clever sentence. @bennett2015 might become A clever sentence.[1] as well

105 / 246

Citations — Reference List

The list of references appears after the last line of the output document, with no section header

  • so that you can choose the header yourself, by ending .Rmd documents with a header of your choice
This is the last sentence of an APA style manuscript.
## References

This is the last sentence of an APA style manuscript.

References

Bennett, S. (2015). Peanut butter and jelly. Journal of Bone, 1(12), 3–35.

Gilbert, T. (2019). Turning wine into water. In M. Albert (Ed.), The book of ground (pp. 124–142). Antman.

106 / 246

Exercises — 16–19

16) Add an entry to references.bib for the following book

  • R Markdown: The Definitive Guide by Xie and co-authors


17) Reproduce the citations and reference list in the mock paper

  • see reproduce_this.pdf: pages 3 and 11
  • apply in journals.Rmd: paragraph 7 to 9


18) Change the reference style


19) Link the citations to the reference list

07:30
108 / 246

Part 6. Adding Code, Figures, and Tables

109 / 246

Code, in and outside Chunks

110 / 246

Code — Overview

Most codes go inside code chunks

  • e.g., code that imports and cleans data, and/or produces tables and/or figures
```{r}
df <- read.csv("rmd_workshop_files/images_data/journals.csv") %>%
mutate(age = 2020 - since,
english = factor(english),
subfield = factor(subfield))
```


Codes can also go in line with text

  • e.g., code that results in a single statistic
The average H5 Index for the journals in the dataset is `r mean(df$h5_index)`.
111 / 246

Code Chunks — Overview

  • Code chunks are delimited spaces between a pair of three backticks `

    • placed on their own lines in .Rmd documents, separate from text
    • their output, if there is any, appear in the output document
      • at about the same place as the chunk
      • might float around text to avoid breaking across pages
```
```
112 / 246

Code Chunks — Overview

  • Code chunks are delimited spaces between a pair of three backticks `

    • placed on their own lines in .Rmd documents, separate from text
    • their output, if there is any, appear in the output document
      • at about the same place as the chunk
      • might float around text to avoid breaking across pages


  • On the same line with the first delimiter, and in curly brackets {, code chunks take

    • a languge engine
```{r}
```
113 / 246

Code Chunks — Overview

  • Code chunks are delimited spaces between a pair of three backticks `

    • placed on their own lines in .Rmd documents, separate from text
    • their output, if there is any, appear in the output document
      • at about the same place as the chunk
      • might float around text to avoid breaking across pages


  • On the same line with the first delimiter, and in curly brackets {, code chunks take

    • a language engine
    • a label
```{r, setup}
```
114 / 246

Code Chunks — Overview

  • Code chunks are delimited spaces between a pair of three backticks `

    • placed on their own lines in .Rmd documents, separate from text
    • their output, if there is any, appear in the output document
      • at about the same place as the chunk
      • might float around text to avoid breaking across pages


  • On the same line with the first delimiter, and in curly brackets {, code chunks take

    • a language engine
    • a label
    • one or more options
```{r, setup, echo=FALSE}
```
115 / 246

Code Chunks — Lenguage Engines

The first item in code chunks indicates the engine to run the code

```{r}
```


Note that

  • indicating an engine for each chunk is a must

    • otherwise, any code* in these chunks cannot be executed
  • r is the specified engine, indicating that the code in the chunk above should be run by R

    • it could have been python, which we will not cover in this workshop

* The above chunk has no code — it is for demonstration only.

116 / 246

Code Chunks — Labels

It is recommended, but optional, to label the code chunks

```{r, data_import}
df <- read_csv("data/journals.csv")
```


Note that

  • labels are written after the language engine, separated by a comma

    • in the example above, the chunk is labelled as data_import
  • chunks without labels are otherwise automatically numbered

    • specifying informative labels can be helpful for, e.g., navigating through error messages
  • duplicate labels lead to errors during compilation

117 / 246

Code Chunks — Options

Code chunks can take further options

```{r, setup, include=FALSE}
```


Note that

  • in the example above, the include option is set to FALSE

    • with this option and value, nothing from this chunk will be included in the output document
  • The complete list of options is available at https://yihui.org/knitr/options

  • leaving spaces around the equal sign =, between option tags and values, should be avoided

    • such spaces might lead to errors
118 / 246

Code Chunks — Options — Alternative Syntax

Options can be specified inside code chunks as well, after a number sign and a vertical line #|

  • therefore the following chunks have the same function
```{r, echo=FALSE, eval=TRUE}
```
```{r}
#| echo = FALSE, eval = TRUE
```
```{r}
#| echo = FALSE
#| eval = TRUE
```
119 / 246

Code Chunks — Options — Defaults

Options have default values

  • e.g., for echo, the default is TRUE
    • echo: should the source code printed in the output?
    • TRUE: yes it should

  • therefore the following two chunks have the same function
```{r}
```
```{r, echo=TRUE}
```
120 / 246

Code Chunks — Options — Defaults

This chunk prints two things in the output document — (a) the code and (b) the head of the data frame

```{r}
head(df)
```

head(df)


## name origin branch h5_index h5_median english subfield
## 1 Journal of Bears Americas Physical 73 97 1 1
## 2 Journal of Moon Asia Social 72 106 1 0
## 3 Journal of Lumber Americas Physical 72 100 1 1
## 4 Journal of Houses Europe Social 72 102 1 0
## 5 Journal of Water Europe Social 70 100 1 0
## 6 Journal of Jeans Americas Physical 69 101 1 1
## issues age
## 1 7 61
## 2 6 64
## 3 8 30
## 4 8 38
## 5 5 33
## 6 5 64
121 / 246

Code Chunks — Options — Examples

Setting echo=FALSE prevents the code from being displayed in the output document

```{r ... echo=FALSE}
head(df)
```

This chunk therefore prints one thing in the output document — the head of the data frame

## name origin branch h5_index h5_median english subfield
## 1 Journal of Bears Americas Physical 73 97 1 1
## 2 Journal of Moon Asia Social 72 106 1 0
## 3 Journal of Lumber Americas Physical 72 100 1 1
## 4 Journal of Houses Europe Social 72 102 1 0
## 5 Journal of Water Europe Social 70 100 1 0
## 6 Journal of Jeans Americas Physical 69 101 1 1
## issues age
## 1 7 61
## 2 6 64
## 3 8 30
## 4 8 38
## 5 5 33
## 6 5 64
122 / 246

Code Chunks — Options — Examples

Prevent the result(s) of the source code from being displayed in the output document

```{r ... results="hide"}
head(df)
```

This chunk therefore prints one thing in the output document — the source code

head(df)

Setting results="asis" passes the results as they are produced by the code — pandoc does not transform these. In creating tables for PDF output with the stargazer package, this option is a must.

123 / 246

Code Chunks — Options — Examples

Cache results for future compilations

```{r ... cache=TRUE}
```
124 / 246

Code Chunks — Options — Examples

Cache results for future compilations

```{r ... cache=TRUE}
```


Note that caching

  • is useful especially for chunks that take a long time to execute

    • it can speed up the compilation process
  • avoids executing the chunks at every compilation

    • unless the chunk is newly created or edited since the last cached compilation
  • creates a new folder in your working directory

    • an alternative location can be specified with the cache.path option
124 / 246

Code Chunks — Options — Examples

Prevent R from running the code in the chunk altogether

```{r ... eval=FALSE}
```
125 / 246

Code Chunks — Options — Examples

Prevent R from running the code in the chunk altogether

```{r ... eval=FALSE}
```


Prevent messages and/or warnings from being displayed in the output

```{r ... error=FALSE, message=FALSE, warning=FALSE}
```
125 / 246

Code Chunks — Options — Examples

Define the actual dimensions of figures, in inches

```{r ... fig.height=6, fig.width=9}
```
126 / 246

Code Chunks — Options — Examples

Define the actual dimensions of figures, in inches

```{r ... fig.height=6, fig.width=9}
```


Define the size of figures as they appear in the output document, with out.width and/or out.height

```{r ... out.width="50%"}
```
126 / 246

Code Chunks — Options — Examples

Define the actual dimensions of figures, in inches

```{r ... fig.height=6, fig.width=9}
```


Define the size of figures as they appear in the output document, with out.width and/or out.height

```{r ... out.width="50%"}
```


Define the alignment of figures — left, right, or center

```{r ... fig.align="center"}
```
126 / 246

Code Chunks — Options — Examples

Define captions for figures

```{r ... fig.caption="A Scatter Plot"}
```
127 / 246

Code Chunks — Options — Examples

Define captions for figures

```{r ... fig.caption="A Scatter Plot"}
```


Set the resolution for figures

```{r ... dpi=300}
```
127 / 246

Code Chunks — Options — Examples

Define captions for figures

```{r ... fig.caption="A Scatter Plot"}
```


Set the resolution for figures

```{r ... dpi=300}
```


Set extra options, such as angle, that output format would accept for figures

```{r ... out.extra="angle=45"}
```
127 / 246

Code Chunks — The Setup Chunk

It is recommended to use the first code chunk for general setup, where you can

  • define your own defaults for chunk options, with knitr::opts_chunk$set()
    • avoids repeating chunk options

  • load the necessary packages

  • import raw data
```{r, setup, include=FALSE}
# chunk option defaults
knitr::opts_chunk$set(echo=FALSE, message=FALSE)
# packages
library(dplyr)
library(ggplot2)
library(stargazer)
# data
df_raw <- read.csv("journals.csv")
```
128 / 246

Code Chunks — The Data Chunk

I recommend using the second chunk for the main operations* on raw data

  • e.g., for data cleaning and other transformations
  • some minor transformations could be left to lower chunks
    • e.g., capitalizing variable names for figures
```{r, data, ...}
df <- df_raw %>%
mutate(subfield = as.factor(subfield),
english = as.factor(english),
age = 2020 - since) %>%
select(-since)
```

* I will be using the pipe operator %>% and other functions from the dplyr package for such operations in the following slides.

129 / 246

Inline Code — Overview

Code can also be incorporated in text, with the `r ` syntax

  • unlike chunks, these do not take options

  • the output document will display the result of the code
    • in the exact place of the source code

  • the result of the code will have the same formatting with the text
130 / 246

Inline Code — Examples

If we multiply _pi_ by 5, we get `r pi * 5`.

If we multiply pi by 5, we get 15.7079633.


The average H5 Index for the journals in the dataset is `r mean(df$h5_index)`, which would
round to `r round(mean(df$h5_index), digits = 1)`.

The average H5 Index for the journals in the dataset is 26.3611366, which would round to 26.4.


__Only `r nrow(subset(df, english == 0))` journals__ in the dataset are published in a language
other than English.

Only 113 journals in the dataset are published in a language other than English.

131 / 246

Exercises — 20–22

20) Setup Chunk

  • introduce a setup chunk with one or more defaults chunk options, with knitr::opts_chunk$set()
  • load the packages that we will need — dplyr, ggplot2, and stargazer
  • import raw data


21) Data Chunk

  • introduce a data chunk to transform subfield and english into factors
  • create a new variable age, based on since
  • drop since from the data frame


22) Inline code

  • see reproduce_this.pdf: page 6
    • i.e., 1091 observations

  • apply in journals.Rmd: paragraph 21
    • hint: use the nrow function
07:30
132 / 246

Figures

133 / 246

Figures — Images — Markdown Syntax

The syntax ![Figure Caption](figure.extension) embeds images, and/or figures produced elsewhere,* into .Rmd documents

  • similar to the link syntax, only this time it is preceded by an exclamation mark !
  • goes outside code chunks, on a new line
  • simple, but not very customisable

* Ideally, reproducible papers should produce their own images with data and code. However, there might be situations where this is not possible.

134 / 246

Figures — Images — Markdown Syntax

![A screenshot of the Google Scholar homepage](../image/google_scholar.png)

Figure 1: A screenshot of the Google Scholar homepage.
135 / 246

Figures — Images — Markdown Syntax

Figures are numbered automatically

![A screenshot of the Google Scholar homepage](../image/google_scholar.png)

Figure 1: A screenshot of the Google Scholar homepage.
136 / 246

Figures — Images — Markdown Syntax

The syntax can accept width or height attributes as follows

![A screenshot of the Google Scholar homepage](../image/google_scholar.png){ width=40% }

Figure 1: A screenshot of the Google Scholar homepage.
137 / 246

Figures — Images — knitr

The knitr package offers a capable alternative with the include_graphics() function

  • this goes inside code chunks

    • use the function with the double-colon operator ::
      • e.g., knitr::include_graphics("figure.extension")

  • this is more customisable, through the use of code chunks

    • size is defined with the out.width or out.hight options
      • rather than fig.height and/or fig.width
138 / 246

Figures — Images — knitr

The knitr package offers a capable alternative with the include_graphics() function

```{r, screenshot, echo=FALSE, fig.cap="A screenshot of the Google Scholar homepage."}
knitr::include_graphics("../image/google_scholar.png")
```

Figure 1: A screenshot of the Google Scholar homepage.
139 / 246

Figures — Images — knitr

Size is defined with the chunk options out.width or out.hight

```{r ... out.width="40%"}
knitr::include_graphics("../image/google_scholar.png")
```

Figure 1: A screenshot of the Google Scholar homepage.
140 / 246

Figures — Images — knitr

Most other chunk options are common with figures plotted within R Markdown, such as fig.align

```{r ... fig.align="center"}
knitr::include_graphics("../image/google_scholar.png")
```

Figure 1: A screenshot of the Google Scholar homepage.
141 / 246

Exercise

23) Images

  • see reproduce_this.pdf: figure 1 on page 10
  • apply in journals.Rmd: figure 1, between paragraphs 19 and 20
03:00
142 / 246

Figures — ggplot2 — Overview

  • A powerful package for visualising data

  • Used widely, not only by academics, but also by large corporations such as the New York Times

  • A huge amount is written on this package. See, for example,

  • Among its alternatives are the base and plotly packages

143 / 246

Figures — ggplot2 — Basics

1) The ggplot function and the data argument

  • specify a data frame in the main ggplot function
ggplot(data = df)
144 / 246

Figures — ggplot2 — Basics

1) The ggplot function and the data argument

  • specify a data frame in the main ggplot function
ggplot(data = df)

2) The mapping aesthetics, or aes; most importantly, the variable(s) that we want to plot

  • specify as an additional argument in the same ggplot function
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield))
144 / 246

Figures — ggplot2 — Basics

1) The ggplot function and the data argument

  • specify a data frame in the main ggplot function
ggplot(data = df)

2) The mapping aesthetics, or aes; most importantly, the variable(s) that we want to plot

  • specify as an additional argument in the same ggplot function
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield))

3) The geometric objects, or geom; the visual representations

  • specify, after a plus sign +, as an additional function
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) +
geom_point()
144 / 246

Figures — ggplot2

Put the code in a chunk, and give it a caption

```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) +
geom_point()
```

Figure 1. A scatterplot of journal metrics.

145 / 246

Figures — ggplot2

Add facets for subgroups, e.g., branch

```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) +
geom_point() +
facet_wrap(. ~ branch)
```

Figure 1. A scatterplot of journal metrics.

146 / 246

Figures — ggplot2

Scale the colour to improve the legend

```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) +
geom_point() +
facet_wrap(. ~ branch) +
scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield"))
```

Figure 1. A scatterplot of journal metrics.

147 / 246

Figures — ggplot2

Change the theme

```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) +
geom_point() +
facet_wrap(. ~ branch) +
scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) +
theme_bw()
```

Figure 1. A scatterplot of journal metrics.

148 / 246

Figures — ggplot2

Improve the axis labels, e.g., with capital first letters

```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) +
geom_point() +
facet_wrap(. ~ branch) +
scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) +
theme_bw() +
labs(x = "H5 Median", y = "H5 Index")
```

Figure 1. A scatterplot of journal metrics.

149 / 246

Figures — ggplot2 — Notes

geom_point is one of many geoms avilable

150 / 246

Exercises — 24–25

24) Barplot

  • see reproduce_this.pdf: figure 2 on page 7
  • apply in journals.Rmd: figure 2, between paragraphs 21 and 22


25) Scatterplot

  • see reproduce_this.pdf: figure 3 on page 9
  • apply in journals.Rmd: figure 3, between paragraphs 27 and 28
10:00
151 / 246

Tables

152 / 246

Tables — Markdown Syntax

The following syntax, outside code chunks, introduces tables that pandoc can recognise


First Column Second Column
------------ -------------
First cell First cell
Second cell Second cell
Third cell Third cell


First Column Second Column
First cell First cell
Second cell Second cell
Third cell Third cell
153 / 246

Tables — Markdown Syntax

The position of headers, relative to their line underneath, defines column alignments


Left-Aligned Centered
---------------- ----------------
First cell First cell
Second cell Second cell
Third cell Third cell


Left-Aligned            Centered    
First cell First cell
Second cell Second cell
Third cell Third cell
154 / 246

Tables — Markdown Syntax

A line starting with a colon, placed before or after tables, introduces captions


Centered Right-Aligned
---------------- ----------------
First cell First cell
Second cell Second cell
Third cell Third cell
: A hand-made table with R Markdown
Table 1: A hand-made table with R Markdown

    Centered            Right-Aligned
First cell First cell
Second cell Second cell
Third cell Third cell
155 / 246

Tables — Markdown Syntax

The caption line itself needs to be surrounded by empty lines


Centered Right-Aligned
---------------- ----------------
First cell First cell
Second cell Second cell
Third cell Third cell
: A hand-made table with R Markdown
Table 1: A hand-made table with R Markdown

    Centered            Right-Aligned
First cell First cell
Second cell Second cell
Third cell Third cell
156 / 246

Tables — Markdown Syntax

Tables are numbered automatically


: A hand-made table with R Markdown
Centered Right-Aligned
---------------- ----------------
First cell First cell
Second cell Second cell
Third cell Third cell
Table 1: A hand-made table with R Markdown

    Centered            Right-Aligned
First cell First cell
Second cell Second cell
Third cell Third cell
157 / 246

Tables — Markdown Syntax

Grid tables, with the following syntax, can handle complex cells with multiple lines and/or lists

+--------------------+--------------------+
| First Column | Second Column |
+====================+====================+
| - First item | First cell |
| - Second item | |
| - Third item | |
+--------------------+--------------------+
|Second cell | Second cell with a |
| | long text |
+--------------------+--------------------+
| Third cell | Third cell |
| | |
+--------------------+--------------------+
: A grid table with multi-line cells


Table 1: A grid table with multi-line cells

First Column                Second Column               
- First item
- Second item
- Third item
First cell
Second cell Second cell with a long
text
Third cell Third cell
158 / 246

Tables — Markdown Syntax

Grid tables can be aligned as well, with colons at the boundaries of the header separator*

+--------------------+--------------------+
| Left-Aligned | Centered |
+:===================+:==================:+
| - First item | First cell |
| - Second item | |
| - Third item | |
+--------------------+--------------------+
|Second cell | Second cell with a |
| | long text |
+--------------------+--------------------+
| Third cell | Third cell |
| | |
+--------------------+--------------------+
: A grid table with multi-line cells


Table 1: A grid table with multi-line cells

Left-Aligned        Centered   
- First item
- Second item
- Third item
First cell
Second cell Second cell with a
long text
Third cell Third cell

* Use := for left-aligned, :=: for centered, =: for right-aligned columns.

159 / 246

Exercise — 26

26) Markdown Tables

  • see reproduce_this.pdf: table 1 on page 4
  • apply in journals.Rmd: table 1, between paragraphs 11 and 12
05:00
160 / 246

Tables — stargazer — Overview

  • A capable package for creating at least three kinds of tables

    • raw data, in columns and rows
    • descriptive/summary statistics
    • regression models
  • Used widely by academics, even tough it has not been updated since 2018

  • Creates LaTeX code, HTML/CSS code, and ASCII text to be knitted

  • A lot is written on this package. See, for example,

  • Among its alternatives are the knitr, kableExtra, and huxtable packages

161 / 246

Tables — stargazer — Notes

  • The stargazer package requires specific settings

    • in the chunk options
    • and, in the type argument of the stargazer() function


  • These settings depend on the desired output format,* as shown below
Output Chunk Option Type Argument
LaTex / PDF results="asis" latex
HTML results="asis" html
Word comment="" text

* The following slides use the setting for LaTex and PDF outputs.

162 / 246

Tables — stargazer — Notes

  • stargazer tables look slightly different in different output formats

    • on the following slides, they will have the HTML look
    • even if the slides display the setting for LaTex and PDF outputs


  • In fact, it is currently not quite possible to knit stargazer code into tables in Word documents

    • though it can knit ASCII text, looking like a table
    • some popular workarounds:
      • knit to HTML as well as Word, copy the tables from HTML to Word
      • knit to PDF, open the PDF in Word
      • use a different package to create tables, such as huxtable
163 / 246

Tables — stargazer — Basics

  • The stargazer() function

    • this is probably the only fuction you will ever use from this package
      • but it accepts many, many arguments to customise tables
164 / 246

Tables — stargazer — Basics

  • The stargazer() function

    • this is probably the only fuction you will ever use from this package
      • but it accepts many, many arguments to customise tables


  • The data argument of that function, with two main options

    1. a data frame for data or summary statistics tables
      • e.g., df, here coming from df <- read_csv(journals.csv)

    2. one or more regression models for regression tables
      • e.g., lm1, here coming from lm1 <- lm(h5_index ~ issues, data = df)
164 / 246

Tables — stargazer — Data Tables

Table the first four rows of the dataset

```{r, data_table, echo=FALSE, results="asis"}
stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)
```
165 / 246

Tables — stargazer — Data Tables

Table the first four rows of the dataset

```{r, data_table, echo=FALSE, results="asis"}
stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)
```


Notice the options of the chunk and the arguments of the function

  • with echo=FALSE, the code will not be displayed in the output document
165 / 246

Tables — stargazer — Data Tables

Table the first four rows of the dataset

```{r, data_table, echo=FALSE, results="asis"}
stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)
```


Notice the options of the chunk and the arguments of the function

  • with echo=FALSE, the code will not be displayed in the output document

  • with results="asis", knitr will pass through results without reformatting them

    • these results are produced in LaTeX, due to type = "latex"
    • they should remain LaTeX because our outcome document is PDF, converted from LaTeX
165 / 246

Tables — stargazer — Data Tables

Table the first four rows of the dataset

```{r, data_table, echo=FALSE, results="asis"}
stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)
```


Notice the options of the chunk and the arguments of the function

  • with echo=FALSE, the code will not be displayed in the output document

  • with results="asis", knitr will pass through results without reformatting them

    • these results are produced in LaTeX, due to type = "latex"
    • they should remain LaTeX because our outcome document is PDF, converted from LaTeX
  • with summary = FALSE, the table will present the data, not its descriptive statistics

165 / 246

Tables — stargazer — Data Tables

Table the first four rows of the dataset

```{r, data_table, echo=FALSE, results="asis"}
stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)
```

% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: Fri, Apr 10, 2020 - 12:31:21


Table 1:
nameoriginbranchh5_indexh5_medianenglishsubfieldissuesage
1Journal of BearsAmericasPhysical739711761
2Journal of MoonAsiaSocial7210610664
3Journal of LumberAmericasPhysical7210011830
4Journal of HousesEuropeSocial7210210838
166 / 246

Tables — stargazer — Data Tables

Set header = FALSE to remove the note preceding tables

```{r, data_table, echo=FALSE, results="asis"}
stargazer(data = head(df, n = 4), type = "latex", summary = FALSE, header = FALSE)
```
Table 1:
nameoriginbranchh5_indexh5_medianenglishsubfieldissuesage
1Journal of BearsAmericasPhysical739711761
2Journal of MoonAsiaSocial7210610664
3Journal of LumberAmericasPhysical7210011830
4Journal of HousesEuropeSocial7210210838
167 / 246

Tables — stargazer — Data Tables

Define a caption with the title argument

```{r, data_table, echo=FALSE, results="asis"}
stargazer(data = head(df, n = 4), type = "latex", summary = FALSE, header = FALSE,
title = "First four rows of the dataset")
```
Table 1: First four rows of the dataset
nameoriginbranchh5_indexh5_medianenglishsubfieldissuesage
1Journal of BearsAmericasPhysical739711761
2Journal of MoonAsiaSocial7210610664
3Journal of LumberAmericasPhysical7210011830
4Journal of HousesEuropeSocial7210210838
168 / 246

Tables — stargazer — Summary Statistics Tables

Create a table of summary statistics instead, for the complete dataset

```{r, summary_table, echo=FALSE, results="asis"}
stargazer(data = df, type = "latex", summary = TRUE, header = FALSE,
title = "Descriptive statistics")
```
Table 1: Descriptive statistics
StatisticNMeanSt. Dev.MinMax
h5_index1,09126.36113.814173
h5_median1,09139.40021.2723109
issues1,0914.6761.786112
age1,09142.90226.3701158
169 / 246

Tables — stargazer — Summary Statistics Tables

Keep only a selection of statistics

```{r, summary_table, echo=FALSE, results="asis"}
stargazer(data = df, type = "latex", summary = TRUE, header = FALSE,
title = "Descriptive statistics", summary.stat = c("n", "mean", "sd", "min", "max"))
```
Table 1: Descriptive statistics
StatisticNMeanSt. Dev.MinMax
h5_index1,09126.36113.814173
h5_median1,09139.40021.2723109
issues1,0914.6761.786112
age1,09142.90226.3701158
170 / 246

Tables — stargazer — Summary Statistics Tables

Omit a selection of statistics for the same effect

```{r, summary_table, echo=FALSE, results="asis"}
stargazer(data = df, type = "latex", summary = TRUE, header = FALSE,
title = "Descriptive statistics", omit.summary.stat = c("p25", "p75"))
```
Table 1: Descriptive statistics
StatisticNMeanSt. Dev.MinMax
h5_index1,09126.36113.814173
h5_median1,09139.40021.2723109
issues1,0914.6761.786112
age1,09142.90226.3701158
171 / 246

Tables — stargazer — Summary Statistics Tables

Flip the table

```{r, summary_table, echo=FALSE, results="asis"}
stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, flip = TRUE,
title = "Descriptive statistics", omit.summary.stat = c("p25", "p75"))
```
Table 1: Descriptive statistics
Statistich5_indexh5_medianissuesage
N1,0911,0911,0911,091
Mean26.36139.4004.67642.902
St. Dev.13.81421.2721.78626.370
Min1311
Max7310912158
172 / 246

Exercise — 27

27) Summary Statistics Tables

  • see reproduce_this.pdf: table 2 on page 8
  • apply in journals.Rmd: table 2, between paragraphs 23 and 24
05:00
173 / 246

Tables — stargazer — Regression Tables

Create a table of regression models instead

```{r, regression_table, echo=FALSE, results="asis"}
stargazer(data = lm(h5_index ~ issues, data = df),
type = "latex", header = FALSE,
title = "Regression Results")
```
Table 1: Regression Results
Dependent variable:
h5_index
issues1.913***
(0.227)
Constant17.415***
(1.137)
Observations1,091
R20.061
Adjusted R20.060
Residual Std. Error13.391 (df = 1089)
F Statistic70.959*** (df = 1; 1089)
Note:*p<0.1; **p<0.05; ***p<0.01
174 / 246

Tables — stargazer — Regression Tables

Models can also be estimated outside the function first

```{r, regression_table, echo=FALSE, results="asis"}
lm1 <- lm(h5_index ~ issues, data = df)
stargazer(data = lm1, type = "latex", header = FALSE,
title = "Regression Results")
```
Table 1: Regression Results
Dependent variable:
h5_index
issues1.913***
(0.227)
Constant17.415***
(1.137)
Observations1,091
R20.061
Adjusted R20.060
Residual Std. Error13.391 (df = 1089)
F Statistic70.959*** (df = 1; 1089)
Note:*p<0.1; **p<0.05; ***p<0.01
175 / 246

Tables — stargazer — Regression Tables

Keep only a selection of statistics

```{r, regression_table, echo=FALSE, results="asis"}
stargazer(data = lm1, type = "latex", header = FALSE,
title = "Regression Results",
keep.stat = c("n", "rsq"))
```
Table 1: Regression Results
Dependent variable:
h5_index
issues1.913***
(0.227)
Constant17.415***
(1.137)
Observations1,091
R20.061
Note:*p<0.1; **p<0.05; ***p<0.01
176 / 246

Tables — stargazer — Regression Tables

Display multiple models in the same table

```{r, regression_table, echo=FALSE, results="asis"}
stargazer(data = list(lm1, lm2), type = "latex",
header = FALSE, title = "Regression Results",
keep.stat = c("n", "rsq"))
```
Table 1: Regression Results
Dependent variable:
h5_index
(1)(2)
issues1.913***1.424***
(0.227)(0.212)
english117.262***
(1.244)
Constant17.415***4.226***
(1.137)(1.415)
Observations1,0911,091
R20.0610.202
Note:*p<0.1; **p<0.05; ***p<0.01
177 / 246

Tables — stargazer — Regression Tables

Change variable labels

```{r, regression_table, echo=FALSE, results="asis"}
stargazer(data = list(lm1, lm2), type = "latex",
header = FALSE, title = "Regression Results",
keep.stat = c("n", "rsq"),
dep.var.labels = "H5 Index",
covariate.labels = c("Issues", "English"))
```
Table 1: Regression Results
Dependent variable:
H5 Index
(1)(2)
Issues1.913***1.424***
(0.227)(0.212)
English17.262***
(1.244)
Constant17.415***4.226***
(1.137)(1.415)
Observations1,0911,091
R20.0610.202
Note:*p<0.1; **p<0.05; ***p<0.01
178 / 246

Tables — stargazer — Regression Tables

Change significance levels

```{r, regression_table, echo=FALSE, results="asis"}
stargazer(data = list(lm1, lm2), type = "latex",
header = FALSE, title = "Regression Results",
keep.stat = c("n", "rsq"),
dep.var.labels = "H5 Index",
covariate.labels = c("Issues", "English"),
star.cutoffs = c(0.05, 0.01, 0.001))
```
Table 1: Regression Results
Dependent variable:
H5 Index
(1)(2)
Issues1.913***1.424***
(0.227)(0.212)
English17.262***
(1.244)
Constant17.415***4.226**
(1.137)(1.415)
Observations1,0911,091
R20.0610.202
Note:*p<0.05; **p<0.01; ***p<0.001
179 / 246

Exercise — 28

28) Regression Tables

  • see reproduce_this.pdf: table 3 on page 10
  • apply in journals.Rmd: table 3, between paragraphs 30 and 31
07:30
180 / 246

Part 7. Addressing Functionality Gaps

181 / 246

Functionality Gaps

  • Not everything is possible to achieve with R Markdown syntax, code chunks, and/or code

    • e.g., centering text, increasing the space between the lines of text
182 / 246

Functionality Gaps

  • Not everything is possible to achieve with R Markdown syntax, code chunks, and/or code

    • e.g., centering text, increasing the space between the lines of text


  • Workarounds available through inclusion of other languages and/or syntaxes in .Rmd documents

    • e.g., incorporating HTML or LaTeX code into R Markdown
    • workarounds might be output specific
      • e.g., LaTeX-based workarounds may work only for LaTeX and PDF outputs
182 / 246

Functionality Gaps

  • Not everything is possible to achieve with R Markdown syntax, code chunks, and/or code

    • e.g., centering text, increasing the space between the lines of text


  • Workarounds available through inclusion of other languages and/or syntaxes in .Rmd documents

    • e.g., incorporating HTML or LaTeX code into R Markdown
    • workarounds might be output specific
      • e.g., LaTeX-based workarounds may work only for LaTeX and PDF outputs


  • There are no exclusive list of gaps or workarounds

    • these are specific to the output you want to achieve, problems you encounter
    • after writing a few manuscripts with R Markdown, you will have addressed most typical gaps in your workflow
182 / 246

Functionality Gaps — Examples

Problem:

How can we cross-reference figures, tables, and equations in R Markdown?

Solution:

Insert a LaTeX label into the targets (figures, tables, and equations), and then use the \autoref{figure_caption} syntax in text

183 / 246

Functionality Gaps — Examples — Cross-references

For figures, insert a LaTeX label into the fig.caption option, and use the \autoref{latex_label} syntax in text

\autoref{scatter_plot} visualises the relationship between the two journal metrics.
```{r ... fig.caption = "A Scatter Plot \\label{scatter_plot}"}
ggplot(data = df) +
geom_point(...
```


Figure 1 visualises the relationship between the two journal metrics.

184 / 246

Functionality Gaps — Examples — Cross-references

For Markdown tables, insert a LaTeX label after the table caption, and use the \autoref{latex_label} syntax in text

See \autoref{handmade_table} for further details.
: A hand-made table with R Markdown \label{handmade_table}
+--------------------+--------------------+
| Left-Aligned | Centered |
...


See Table 1 for further details.

185 / 246

Functionality Gaps — Examples — Cross-references — Note

Note that there is a difference in the label syntax for figures and R Markdown tables

  • we use a double backslash \ \ to label figures

    • e.i., \\label{scatter_plot} because the label goes into a string

    • the first is an escape operator for the second, LaTeX backslash

186 / 246

Functionality Gaps — Examples — Cross-references — Note

Note that there is a difference in the label syntax for figures and R Markdown tables

  • we use a double backslash \ \ to label figures

    • e.i., \\label{scatter_plot} because the label goes into a string

    • the first is an escape operator for the second, LaTeX backslash

  • we use single backslash \ to label R Markdown tables

    • e.i., \label{handmade_table} because the label is not in any string

    • there is no need for the escape operator

186 / 246

Exercises — 29–30

29) Referring to Figures

  • see reproduce_this.pdf: pages 6 and 9
  • apply in journals.Rmd: paragraphs 19, 21, and 27


30) Referring to Markdown Tables

  • see reproduce_this.pdf: page 4
  • apply in journals.Rmd: paragraph 11
05:00
187 / 246

Functionality Gaps — Examples — Cross-references

For stargazer tables, define a label with the label argument, and use the \autoref{latex_label} syntax in text

```{r, regression_table, echo=FALSE, results="asis"}
stargazer(data = list(lm1, lm2), type = "latex",
...
label = "regression_results")
```
\autoref{regression_results} provides results from two OLS models.

Table 1 provides results from two OLS models.

Table 1: Regression Results
Dependent variable:
H5 Index
(1)(2)
Issues1.913***1.424***
(0.227)(0.212)
English17.262***
(1.244)
Constant17.415***4.226**
(1.137)(1.415)
Observations1,0911,091
R20.0610.202
Note:*p<0.05; **p<0.01; ***p<0.001
188 / 246

Functionality Gaps — Examples — Cross-references — Note

Note that we can cross-reference specific results in tables as well

  • there is no gap here — this possible with inline code
In Model 1, the coefficient for _Issues_ is
`r round(coef(summary(lm1))["issues", "Estimate"], digits = 2)`.

In Model 1, the coefficient for Issues is 1.91.

189 / 246

Functionality Gaps — Examples — Cross-references

For equations, insert a LaTeX label in an equation environment, and use the \autoref{latex_label} syntax in text

\begin{equation}
\label{special_relativity}
E = mc_{2}
\end{equation}
According to \autoref{special_relativity}, space and time are linked.


According to Equation 1, space and time are linked.

190 / 246

Exercises — 31–33

31) Referring to Tables

  • see reproduce_this.pdf: pages 7 and 9
  • apply in journals.Rmd: paragraph 23 and 29


32) Referring to Results in Regression Tables

  • see reproduce_this.pdf: page 9
  • apply in journals.Rmd: paragraph 29
    • hint: to extract the standard error from the model, use the column Std. Error


33) Referring to Equations

  • see reproduce_this.pdf: page 7
  • apply in journals.Rmd: paragraph 22
    • hint: transform the existing equation from R Markdown to LaTeX syntax, to be able to insert the label
07:30
191 / 246

Functionality Gaps — Examples

Problem:

R Markdown adds the list of references to the end of documents. This might be undesirable for some manuscripts, for example those with an appendix. Similarly, some journals require tables and figures to be added after references.

192 / 246

Functionality Gaps — Examples

Problem:

R Markdown adds the list of references to the end of documents. This might be undesirable for some manuscripts, for example those with an appendix. Similarly, some journals require tables and figures to be added after references.

Solution:

Define where exactly the list of references should appear with the HMTL code <div id="refs">

# References
<div id = "refs"></div>
# Appendix
192 / 246

Functionality Gaps — Examples

Problem:

R Markdown produces outputs with single-line-spaced text while we might prefer or be required (e.g., by journal submission rules) to double-space our manuscripts.

Solution:

Use the doublespacing command from the LaTeX package setspace (Carlisle, Fairbairns, Harris, and Tobin, 2011)

  • because the command comes from a package, we need to add it to YAML with header-includes
  • including commands in YAML ensures they are applied through the output*
---
...
header-includes:
- \usepackage{setspace}\doublespacing
---

* This can be reversed anywhere in text, with the singlespacing command.

193 / 246

Exercise — 34

34) Line Spacing

  • introduce 1.5 spacing to the manuscript
    • hint: the command is called onehalfspacing

  • except for the Abstract, which should be single spaced
02:00
194 / 246

Functionality Gaps — Examples

Problem:

Pages, tables, figures etc. are numbered continuously across an output. We might prefer or be required (e.g., by journal submission rules) to change this behaviour, for example for appendices.

Solution:

Use the setcounter in combination with the renewcommand command, outside code chunks

\setcounter{page}{1}
\renewcommand*{\thepage}{A\arabic{page}}
\setcounter{table}{0}
\renewcommand*{\thetable}{A\arabic{table}}
\setcounter{figure}{0}
\renewcommand*{\thefigure}{A\arabic{figure}}
195 / 246

Part 8. Using Version Control

196 / 246

Version Control

  • Research papers have many versions before publication

    • typically written over a long period of time, in numerous sittings
    • at the end of every sitting, essentially a different version of the same manuscript is created*

* They also often written by multiple authors and/or on different computers, increasing the number of versions created. Here I assume projects are single-authored on a single computer, leaving the topic of collaboration (including, with oneself) to the next section — Part 9.

197 / 246

Version Control

  • Research papers have many versions before publication

    • typically written over a long period of time, in numerous sittings
    • at the end of every sitting, essentially a different version of the same manuscript is created


  • With many versions created over time, there emerge at least two challenges

    • keeping track of changes and versions
    • reverting to a previous version when necessary
198 / 246

Version Control

  • Research papers have many versions before publication

    • typically written over a long period of time, in numerous sittings
    • at the end of every sitting, essentially a different version of the same manuscript is created


  • With many versions created over time, there emerge at least two challenges

    • keeping track of changes and versions
    • reverting to a previous version when necessary


  • We all version control, in different ways, such as

    • edit, rename, and save the files
    • use applications or websites such as Dropbox, Google Docs, Overleaf
    • use distributed version control systems such as Git and GitHub
198 / 246

Version Control — Manual Attempts

Typically, hand-made attemps to version control lead to cluttered folders

manuscript
|
|- journals_FINAL_19May.Rmd
|- journals_FINAL.Rmd
|- journals_26APRIL_newliterature.Rmd
...
|- journals.Rproj
|- references.bib
|- apa_7th.csl
199 / 246

Version Control — Git and GitHub — Definitions

  • Git
    • a software that keeps track of versions of a set of files
    • it is local to you; the records are kept on your computer
200 / 246

Version Control — Git and GitHub — Definitions

  • Git
    • a software that keeps track of versions of a set of files
    • it is local to you; the records are kept on your computer


  • GitHub
    • a hosting service, or a website, that can keep the records
    • it is remote to you, like the Dropbox website
    • but unlike Dropbox, GitHub is specifically structured to keep records with Git
200 / 246

Version Control — Git and GitHub — Definitions

  • Git
    • a software that keeps track of versions of a set of files
    • it is local to you; the records are kept on your computer


  • GitHub
    • a hosting service, or a website, that can keep the records
    • it is remote to you, like the Dropbox website
    • but unlike Dropbox, GitHub is specifically structured to keep records with Git


  • Repository, or repo
    • a set of files whose records are kept together, by Git and/or on GitHub
    • it is like a folder, which can keep files and other folders containing files
200 / 246

Version Control — Git and GitHub — Definitions

  • To commit
    • to take a snapshot of, or to version, a repository
    • it is like saving a new version of all files and sub-folders in your project folder with a new name
    • it is local, the records are kept on your computer unless you push
201 / 246

Version Control — Git and GitHub — Definitions

  • To commit
    • to take a snapshot of, or to version, a repository
    • it is like saving a new version of all files and sub-folders in your project folder with a new name
    • it is local, the records are kept on your computer unless you push


  • To push
    • to move a copy of the records from Git to GitHub, from your computer to online server
    • it is like uploading (the new versions of) your files and sub-folders to a website
    • it also involves merging, if this not the first push

* For projects that are single-authored on a single computer, merging is typically automatic. It becomes an issue for collaborated projects, which we will cover in the next section — Part 9.

201 / 246

Version Control — Git and GitHub

Version control with Git and GitHub requires

  1. initial setup, done once*

    • unless for a new computer or, if ever, a new GitHub account
    • a bit technical, but worth the hassle
  2. project setup, repeated for every RStudio project

    • shorter, less complicated

* We have started this process already, in Part 1 of the workshop, by downloading and installing Git and signing up for GitHub. Back to the relevant slide.

202 / 246

Version Control — Git — Initial Setup

1) Enable version control with RStudio

  • from the RStudio menu, follow:

Tools -> Global Options -> Git/SNV -> Enable version control interface for RStudio projects


  • RStudio will likely find Git automatically

    • In case it cannot do so on its own, help RStudio find it by clicking Browse...
    • Git is likely to be at

      • c:/Program Files/Git/bin/git.exe on Windows

      • /usr/local/git/bin/git on Mac

203 / 246

Version Control — Git — Initial Setup

2) If you are using Windowns, set Git Bash as your shell

  • from the RStudio menu, follow:

Tools -> Global Options -> Terminal -> New terminals open with: Git Bash

204 / 246

Version Control — Git — Initial Setup

3) Introduce yourself to Git

  • from the RStudio menu, follow:

Tools -> Terminal -> New Terminal

  • enter the following lines in the Terminal, with the email address that you have used to sign up for GitHub
git config --global user.name "YOUR-NAME"
git config --global user.email "YOUR-EMAIL-ADDRESS"
  • enter the following line in the Terminal, to observe whether the previous step was sucessful
git config --global --list
205 / 246

Version Control — Git and Github — Project Setup*

1) Initiate local version control with Git

  • from the RStudio menu, follow:

Tools -> Version Control -> Project Setup... -> Version Control System -> Git


  • after confirming your new repository, and restarting the session, observe that

    • now there is now a Git tab in RStudio

      • newly-added and/or edited files, since the last commmit, are listed here
    • your project now includes a .gitignore file

      • this is where you can list files and/or folders to be excluded from being tracked

* These instructions presume there is an exiting RStudio project to be set up for version control. If not, or to start a new project, follow from this slide first.

206 / 246

Version Control — Git and Github — Project Setup

2) Create a new GitHub repository

  • on GitHub, follow:

Repositories -> New -> Repository name (e.g., "rwd_workshop") -> Public -> Create repository


207 / 246

Version Control — Git and Github — Project Setup

3) Push an existing repository

  • from the RStudio menu, follow:

Tools -> Terminal -> New Terminal

  • enter the following lines in the Terminal, with your username and repository name
git remote add origin https://github.com/USER_NAME/REPOSITORY_NAME.git
git add .
git commit -m "first commit"
git push -u origin master
208 / 246

Version Control — Git and Github — Project Setup

3) Push an existing repository

  • from the RStudio menu, follow:

Tools -> Terminal -> New Terminal

  • enter the following lines in the Terminal, with your username and repository name
git remote add origin https://github.com/USER_NAME/REPOSITORY_NAME.git
git add .
git commit -m "first commit"
git push -u origin master
  • if this is your first time using GitHub with RStudio, you will be prompted to authenticate

    • follow the instructions on your screen and in your email
  • observe that your project files are now online, listed on the GitHub repository

208 / 246

Version Control — Git and Github — Workflow

1) Edit and Save

  • work on one or more files under version control
    • e.g., delete the first sentence of the abstract in journals.Rmd, and save it
    • under the Git tab in RStudio, find the list of files that you edited since the last push
    • these will have M, for modified, as Status
209 / 246

Version Control — Git and Github — Workflow

1) Edit and Save

  • work on one or more files under version control
    • e.g., delete the first sentence of the abstract in journals.Rmd, and save it
    • under the Git tab in RStudio, find the list of files that you edited since the last push
    • these will have M, for modified, as Status

2) Commit and Push

  • tick Staged* for one or more files that you would like to commit
    • enter a Commit message that summarises the edits
    • click Commit to create a record of the new version locally to your computer
    • click Close -> Push to push the version to GitHub

* To stage is to add files to be committed. It allows us to commit files individually or together with other files.

209 / 246

Version Control — Git and Github — Workflow

1) Edit and Save

  • work on one or more files under version control
    • e.g., delete the first sentence of the abstract in journals.Rmd, and save it
    • under the Git tab in RStudio, find the list of files that you edited since the last push
    • these will have M, for modified, as Status


2) Commit and Push

  • tick Staged for one or more files that you would like to commit
    • enter a Commit message that summarises the edits
    • click Commit to create a record of the new version locally to your computer
    • click Close -> Push to push the version to GitHub

  • observe the changes in the Git tab in RStudio and on the GitHub repository
210 / 246

Version Control — Git and Github — .gitignore

  • .gitignore specifies which file(s) and/or folder(s) should be excluded from version control

    • a set of project-specific files are ignored by default
      • see your .gitignore file
211 / 246

Version Control — Git and Github — .gitignore

  • .gitignore specifies which file(s) and/or folder(s) should be excluded from version control

    • a set of project-specific files are ignored by default
      • see your .gitignore file


  • .gitignore lists one item per line

    • each line has a pattern, which determines whether one or more files or folders are to be ignored
211 / 246

Version Control — Git and Github — .gitignore

  • .gitignore specifies which file(s) and/or folder(s) should be excluded from version control

    • a set of project-specific files are ignored by default
      • see your .gitignore file


  • .gitignore lists one item per line

    • each line has a pattern, which determines whether one or more files or folders are to be ignored


211 / 246

Version Control — Git and Github — .gitignore

There are good reasons to ignore some others, including files

  • that contain information that we do not want others to see
    • e.g., personal API keys

  • that we do not have the right to share with others
    • e.g., secondary data with user agreements otherwise

  • that we (re-)create automatically as outputs
    • e.g., journals.pdf, as opposed to journals.Rmd
212 / 246

Version Control — Git and Github — .gitignore

  • Observe that, by default, .gitignore has a list of project-specific files
    • you can delete, or comment out, any or all to start including them in version control
.Rproj.user
.Rhistory
.RData
.Ruserdata
213 / 246

Version Control — Git and Github — .gitignore

  • Observe that, by default, .gitignore has a list of project-specific files

  • In addition, you can ignore, for example,

    • a specific folder, relative to the root directory
.Rproj.user
.Rhistory
.RData
.Ruserdata
/manuscript/
214 / 246

Version Control — Git and Github — .gitignore

  • Observe that, by default, .gitignore has a list of project-specific files

  • In addition, you can ignore, for example,

    • a specific folder, relative to the root directory

    • a specific file in a specific folder, relative to the root directory

.Rproj.user
.Rhistory
.RData
.Ruserdata
/manuscript/
/manuscript/journals.pdf
215 / 246

Version Control — Git and Github — .gitignore

  • Observe that, by default, .gitignore has a list of project-specific files

  • In addition, you can ignore, for example,

    • a specific folder, relative to the root directory

    • a specific file in a specific folder, relative to the root directory

    • a specific file in any folder

.Rproj.user
.Rhistory
.RData
.Ruserdata
/manuscript/
/manuscript/journals.pdf
journals.pdf
216 / 246

Version Control — Git and Github — .gitignore

  • Observe that, by default, .gitignore has a list of project-specific files

  • In addition, you can ignore, for example,

    • a specific folder, relative to the root directory

    • a specific file in a specific folder, relative to the root directory

    • a specific file in any folder

    • all files with a specific extension, anywhere in the project

.Rproj.user
.Rhistory
.RData
.Ruserdata
/manuscript/
/manuscript/journals.pdf
journals.pdf
*.pdf
217 / 246

Version Control — Git and Github — .gitignore — Notes

218 / 246

Version Control — Git and Github — .gitignore — Notes

  • Starting to ignore a file or folder that is already being tracked requires clearing the cache

    • after changing and saving .gitignore, enter the following line in the Terminal
    • with your speficic /path/to/file
git rm --cached /path/to/file
218 / 246

Version Control — Git and Github — .gitignore — Notes

  • Starting to ignore a file or folder that is already being tracked requires clearing the cache

    • after changing and saving .gitignore, enter the following line in the Terminal
    • with your speficic /path/to/file
git rm --cached /path/to/file
  • The following command clears all cache

    • might be useful after changes to .gitignore that involves several files or folders
    • but should be used with care, on an otherwise up-to-date repository
git rm -r --cached .
218 / 246

Exercises — 35–36

35) Reproducibility and Version Control

  • imagine that, after producing all these tables and figures, and writing up your results, you have decided to exclude journals from Oceania from analysis
    • hint: use the filter function in the data chunk
    • create a new version of the manuscript
    • commit and push to GitHub

36) Gitignore

  • stop tracking journals.pdf
    • change .gitignore
    • remove journals.pdf from cache
    • commit and push to GitHub
05:00
219 / 246

Part 9. Collaborating with Others

220 / 246

Collaboration

  • Many research papers are written by multiple authors and/or on multiple computers

    • yourself on a different computer (e.g., laptop at home, desktop at office), poses similar challenges as collaboration
221 / 246

Collaboration

  • Many research papers are written by multiple authors and/or on multiple computers

    • yourself on a different computer (e.g., laptop at home, desktop at office), poses similar challenges as collaboration
  • With multiple authors and/or computers, there emerges at least two additional challenges beyond version control

    • communicating the versions to other authors and/or computers
    • working on the same project with co-authors at the same time
221 / 246

Collaboration

  • Many research papers are written by multiple authors and/or on multiple computers

    • yourself on a different computer (e.g., laptop at home, desktop at office), poses similar challenges as collaboration
  • With multiple authors and/or computers, there emerges at least two additional challenges beyond version control

    • communicating the versions to other authors and/or computers
    • working on the same project with co-authors at the same time
  • We all manage collaboration, in different ways, such as

    • edit, rename, save, e-mail
    • use applications or websites such as Dropbox, Google Docs, Overleaf
    • use distributed version control systems such as Git and GitHub
221 / 246

Collaboration — Git and GitHub — Definitions

  • To pull

    • to move the (presumably) up-to-date records from GitHub to your computer
    • it is like downloading a zipped folder of files
222 / 246

Collaboration — Git and GitHub — Definitions

  • To pull

    • to move the (presumably) up-to-date records from GitHub to your computer
    • it is like downloading a zipped folder of files
  • To merge

    • to integrate different versions into a single version
      • e.g., the old version on your laptop, with (the changes in) the new version from GitHub
    • except the first push or pull, pushing and pulling necessiate merging
222 / 246

Collaboration — Git and GitHub — Definitions

  • To pull

    • to move the (presumably) up-to-date records from GitHub to your computer
    • it is like downloading a zipped folder of files
  • To merge

    • to integrate different versions into a single version
      • e.g., the old version on your laptop, with (the changes in) the new version from GitHub
    • except the first push or pull, pushing and pulling necessiate merging
  • Merge conflict

    • emerges when versions to be merged include edits on the same line of the same file
      • edits on different lines are not a problem as changes are tracked line by line

    • less likely to occur in one-author-multiple-computer setting
      • more likely while collaborating with others

    • requires human intervention, to decide which edit to keep and which one to discharge
222 / 246

Collaboration — Git and GitHub — Definitions

  • Branch

    • a line of development in a repository; a copy of the repository, with all its versions, at a given time
    • by default, repositories have one branch, called master
  • Pull request

    • a proposal to pull and merge
      • e.g., a proposal from one co-author to another, -e.g., tp merge a branch into master
    • it allows a review of changes on GitHub before merge, to deal with potential merge conflicts
223 / 246

Collaboration — Git and GitHub — Project Setup

  • The setup depends on the users' role, on whether they are
    • the owner who creates the GitHub repository, or
    • the collaborator who is then added to that repository
224 / 246

Collaboration — Git and GitHub — Project Setup

  • The setup depends on the users' role, on whether they are
    • the owner who creates the GitHub repository, or
    • the collaborator who is then added to that repository
  • Once the project is setup
    • it continues to be associated with the owner's GitHub profile
    • at the same time, it is listed under the collaborator's profile as well
    • both the owner and the collaborator have the same rights, unless otherwise restricted
224 / 246

Collaboration — Git and GitHub — Project Setup — Owner

1) The setup for the owner is largely the same as in any single-author, single-computer scenario

  • following the instructions on this slide forward
    • to introduce version control to a local project with Git,
    • to create a remote repository for that project on GitHub, and
    • to associate the local project with the remote repository


225 / 246

Collaboration — Git and GitHub — Project Setup — Owner

1) The setup for the owner is largely the same as in any single-author, single-computer scenario

  • following the instructions on this slide forward
    • to introduce version control to a local project with Git,
    • to create a remote repository for that project on GitHub, and
    • to associate the local project with the remote repository


2) As an additional step, the owner needs to invite their collaborator(s) to the project

  • following, from the relevant GitHub repository,

Settings -> Manage access -> Invite a collaborator

225 / 246

Collaboration — Git and GitHub — Project Setup — Collaborator

1) Notice that the remote part of the setup is done by the owner for the collaborator

  • subject to acceptance of the invitation
    • invitations are available directly at https://github.com/notifications, but also sent via email
    • with an option to "Accept invitation"
    • on acceptance, projects appear among the repositories of the collaborator
226 / 246

Collaboration — Git and GitHub — Project Setup — Collaborator

1) Notice that the remote part of the setup is done by the owner for the collaborator

  • subject to acceptance of the invitation
    • invitations are available directly at https://github.com/notifications, but also sent via email
    • with an option to "Accept invitation"
    • on acceptance, projects appear among the repositories of the collaborator

2) The local part of the setup still needs to be done

  • by creating a new RStudio project with version control
  • following, from the Rstudio menu,*

File -> New Project -> Version Control -> Git

226 / 246

Collaboration — Git and GitHub — Project Setup — Collaborator

1) Notice that the remote part of the setup is done by the owner for the collaborator

  • subject to acceptance of the invitation
    • invitations are available directly at https://github.com/notifications, but also sent via email
    • with an option to "Accept invitation"
    • on acceptance, projects appear among the repositories of the collaborator

2) The local part of the setup still needs to be done

  • by creating a new RStudio project with version control
  • following, from the Rstudio menu,*

File -> New Project -> Version Control -> Git

226 / 246

Exercises — 37–39

37) Owner Setup

  • create a new version-controlled RStudio project, with Git and GitHub
  • add the default R Markdown template to your project
    • hint: click File -> New File -> R Markdown -> OK to create the template
    • another hint: name the project and the template in a way that they are easily distinguishable from your partner's project and template

38) Invitation to Collaborate

  • invite the partner in your current group as a collaborator to your new project
    • hint: you will need their username, full name, or email address to do so

39) Collaborator Setup

  • accepting the invitation from your partner, do the necessary arrangements so that you can collaborate on your partner's project
10:00
227 / 246

Colloboration — Git and Github — Workflow

1) Pull

  • on the Git tab in RStudio, click Pull to move the up-to-date records from GitHub to your computer
    • if your collaborator has not pushed anything since your last pull, you will be noticed that Already up-to-date.
    • collaborative projects require pulling as well as pushing because your collaborator(s) might have pushed their commits to GitHub
    • pulling frequently minimises the risk of merge conflicts
228 / 246

Colloboration — Git and Github — Workflow

1) Pull

  • on the Git tab in RStudio, click Pull to move the up-to-date records from GitHub to your computer
    • if your collaborator has not pushed anything since your last pull, you will be noticed that Already up-to-date.
    • collaborative projects require pulling as well as pushing because your collaborator(s) might have pushed their commits to GitHub
    • pulling frequently minimises the risk of merge conflicts

2) Edit and save; commit and push

  • the same procedure as in any single-author, single-computer scenario
  • pushing frequently minimises the risk of merge conflicts
228 / 246

Exercise — 40

40) Non-simultaneous Collaboration

  • take in turns with your partner to work on the same document (of the same project)

    • owner: edit the first header in the document (i.e., "R Markdown"), save, commit, and push

    • owner and collaborator: observe the changes, if any, on your own .Rmd file, and/or on your GitHub repository

      • click on the relevant commit message on GitHub and observe the commit
    • collaborator: pull, revert the header back to original, save, commit, and push

05:00
229 / 246

Exercise — 40 — Notes

Notice that you have not encountered any errors and/or merge conflicts

  • because everyone edited and merged with an up-to-date document

  • this is the default scenario in single-author, multiple computer scenario

230 / 246

Exercise — 41

41) Simultaneous Collaboration — Different Lines

  • work on the same document at the same time
    • owner: edit the first header in the document (i.e., "R Markdown"), save, commit, and push
    • collaborator: edit the second header in the document (i.e., "Including Plots"), save, commit, and push
      • observe the error message that the last pusher will receive, follow the instructions on RStudio to solve the problem
10:00
231 / 246

Exercise — 41 — Notes

Notice that you have encountered an error

  • pulling before pushing solves the problem because the edits are not on the same line

    • hence, this is not a merge conflict
  • the merge takes place automatically, on the local repository of the last pusher

232 / 246

Exercise — 42

42) Simultaneous Collaboration — Same Line

  • work on the same document at the same time

    • owner: edit the first header in the document again, save, commit, and push

    • collaborator: edit the first header in the document as well, save, commit, and push

  • observe the error message that the last pusher will receive

    • follow the instructions on RStudio to solve the problem
    • google if necessary
10:00
233 / 246

Exercise — 42 — Notes

Notice that you have encountered not only an error but also a merge conflict

  • pulling before pushing alone does not solve the problem because the edits are on the same line

    • the conflict cannot be solved automatically — it needs human intervention
  • nevertheless, by pulling first, you can view the conflict directly on the file

    • marked between less than < and greater than > signs, divided by the equal signs
    • solution is to accept the remote version, by deleting your edit and or moving that edit to a different line
  • the merge takes place on the local repository of the last pusher

234 / 246

Colloboration — Git and Github — Workflow — Alternative

  • The workflow above is rather simple, but has some disadvantages, including
    • not easy, albeit still possible, to see the edits of the collaborators
    • not clear who is in charge of the overall progress
    • not possible to discuss edits
    • not possible to compromise on conflicting edits
235 / 246

Colloboration — Git and Github — Workflow — Alternative

  • The workflow above is rather simple, but has some disadvantages, including
    • not easy, albeit still possible, to see the edits of the collaborators
    • not clear who is in charge of the overall progress
    • not possible to discuss edits
    • not possible to compromise on conflicting edits
  • An alternative workflow exits
    • work on different branches of the same project
    • version control to your own branch
    • create pull requests with comments
    • merge the branch into master
235 / 246

Colloboration — Git and Github — Workflow — Alternative

1) Branch

  • click New Branch on the Git tab
    • name it, and leave everything else as default
    • notice that you are now working on a new branch

2) Edit and save; commit and push

  • the same procedure as in any single-author, single-computer scenario
  • notice, on GitHub, that your commit is in the new branch, while master remains unchanged

3) Pull request

  • On GitHub, click

    Pull requests -> New pull request

  • choose what is to be pulled, and write a note to your collaborator who can accept or reject the merge

    • if there are merge conflicts, the collaborator solves them on GitHub before merging
236 / 246

Exercises — 43–44

43) Pull request

  • create a pull request for your collaboration project
    • create a branch for yourself
    • edit any line, save, commit, and push
    • request your branch to be merged

44) Merging

  • view the pull request of your collaborator
  • take the necessary steps to merge it to master
10:00
237 / 246

Colloboration — Git and Github — Workflow — Notes

  • It is possible to edit .Rmd documents directly on GitHub

    • click on any editable file, and Edit this file
    • commit changes, either as a direct commit or a pull request
238 / 246

Colloboration — Git and Github — Workflow — Notes

  • It is possible to edit .Rmd documents directly on GitHub

    • click on any editable file, and Edit this file
    • commit changes, either as a direct commit or a pull request
  • A GitHub account is enough for collaboration with co-authors who do not work with Git, R, or RStudio

    • not possible to knit to see the outcome
    • would suit co-authors whose contributions are plain text
238 / 246

Exercises — 45

45) GitHub edit

  • create two edits on the .Rmd document in your collaboration project
  • commit one of the edits as a direct commit
  • commit the other as a pull request
05:00
239 / 246

Part 10. Working on a Real Project

240 / 246

Real Project

  • Consider converting a real project to R Markdown

    • now, in the remainder of the workshop
  • Choose an existing project, preferably

    • single-authored
    • at an early stage
    • but one that you are, will be, working on
  • Ask me for help

    • with no more slides to go through, I will now focus on helping you start your first project in R Markdown
241 / 246

References

242 / 246

References

Allaire, J., Y. Xie, J. McPherson, et al. (2022). rmarkdown: Dynamic Documents for R. R package version 2.14. <https://CRAN.R-project.org/package=rmarkdown.

Blair, G., J. Cooper, A. Coppock, et al. (2022). fabricatr: Imagine Your Data Before You Collect It. R package version 0.16.0. <https://CRAN.R-project.org/package=fabricatr.

Carlisle, D., R. Fairbairns, E. Harris, et al. (2011). setspace – Set space between lines. LaTeX package, version 6.7a. <https://ctan.org/pkg/setspace.

Dowle, M. and A. Srinivasan (2021). data.table: Extension of data.frame. R package version 1.14.2. <https://CRAN.R-project.org/package=data.table.

Gagolewski, M., B. Tartanus, o. Unicode, et al. (2021). stringi: Character String Processing Facilities. R package version 1.7.6. <https://CRAN.R-project.org/package=stringi.

Hlavac, M. (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.3. <https://CRAN.R-project.org/package=stargazer.

Hugh-Jones, D. (2021). huxtable: Easily Create and Style Tables for LaTeX, HTML and Other Formats. R package version 5.4.0. <https://hughjonesd.github.io/huxtable/.

243 / 246

References

R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. <https://www.R-project.org/.

Sievert, C., C. Parmer, T. Hocking, et al. (2021). plotly: Create Interactive Web Graphics via plotly.js. R package version 4.10.0. <https://CRAN.R-project.org/package=plotly.

Wickham, H., R. François, L. Henry, et al. (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.9. <https://CRAN.R-project.org/package=dplyr.

Wickham, H. and G. Grolemund (2021). R for data science. O'Reilly.

Xie, Y. (2022a). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.26. <https://CRAN.R-project.org/package=bookdown.

Xie, Y. (2022b). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.39. <https://yihui.org/knitr/.

Xie, Y. (2022c). tinytex: Helper Functions to Install and Maintain TeX Live, and Compile LaTeX Documents. R package version 0.39. <https://github.com/rstudio/tinytex.

244 / 246

References

Xie, Y., J. Allaire, and G. Grolemund (2018). R Markdown: The Definitive Guide. ISBN 9781138359338. Boca Raton, Florida: Chapman and Hall/CRC. <https://bookdown.org/yihui/rmarkdown.

Xie, Y., C. Dervieux, and A. Presmanes Hill (2022). blogdown: Create Blogs and Websites with R Markdown. R package version 1.10. <https://CRAN.R-project.org/package=blogdown.

Zhu, H. (2021). kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.3.4. <https://CRAN.R-project.org/package=kableExtra.

245 / 246

The workshop ends here.

Congratulations for making it this far, and

thank you for joining me!

246 / 246

Who am I?

Resul Umit

  • post-doctoral researcher in political science at the University of Oslo

  • teaching and studying representation, elections, and parliaments

2 / 246
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow