Resul Umit
post-doctoral researcher in political science at the University of Oslo
teaching and studying representation, elections, and parliaments
Resul Umit
post-doctoral researcher in political science at the University of Oslo
teaching and studying representation, elections, and parliaments
teaching workshops, also on
Resul Umit
post-doctoral researcher in political science at the University of Oslo
teaching and studying representation, elections, and parliaments
teaching workshops, also on
First, with Stata + Word, I was ...
frustrated with Word
tired of switching between programmes/screens
paying for programme licences
Then, with Stata + R + LaTeX, I was ...
frustrated with Word
tired of switching between programmes/screens
paying for the Stata licence
converting PDF documents to Word manually
Now, with R Markdown, I am ... happy!
frustrated with Word
tired of switching between programmes/screens
paying for the Stata licence
converting PDF documents to Word, manually
Efficient
Efficient
Flexible
Efficient
Flexible
Open access/source
Having written a complete draft
Having written a complete draft
If you and/or your co-authors decide
Having written a complete draft
If you and/or your co-authors decide
How resource intensive would this revision be?
After your paper is published, if others, including your future self, would like to test how robust the results are
After your paper is published, if others, including your future self, would like to test how robust the results are
How resource intensive would this test be?
Two days, on how to write reproducible research papers with R Markdown
Two days, on how to write reproducible research papers with R Markdown
Based on converting a mock manuscript written in Word to R Markdown
Two days, on how to write reproducible research papers with R Markdown
Based on converting a mock manuscript written in Word to R Markdown
Designed for researchers with basic knowledge of R programming language
Part 1. Getting the Tools Ready
Part 2. Introducing R Markdown
Part 1. Getting the Tools Ready
Part 2. Introducing R Markdown
Part 6. Adding Code, Figures, and Tables
Part 7. Addressing Functionality Gaps
Part 9. Collaborating with Others
Part 10. Working on a Real Project
Sit in groups of two
Type, rather than copy and paste, the code that you will find on these slides
When you have a question
Slides with this background colour indicate that your action is required, for
setting the workshop up
completing the exercises
03:00
Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background
```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) + scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) ```
Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background
Results that come out in output files appear as such — in the same font, on green background
Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background
Results that come out in output files appear as such — in the same font, on green background
Specific sections are highlighted yellow as such for emphasis
Codes and texts that go in R Markdown documents appear as such — in a different font, on gray background
Results that come out in output files appear as such — in the same font, on green background
Specific sections are highlighted yellow as such for emphasis
The slides are designed for self-study as much as for the workshop
To make you aware what is possible with R Markdown
Google
, and perseverance are all we needTo make you aware what is possible with R Markdown
Google
, and perseverance are all we needTo encourage you to convert into R Markdown
Download the materials from https://github.com/resulumit/rmd_workshop/tree/materials
Code -> Download ZIP
Unzip and rename the folder
unzip to a location that is not synced
rename the folder as YOURNAME-rmd
resul-rmd
Notice that the folder has the following structure
YOURNAME-rmd | |- manuscript | | | |- reproduce_this.pdf | |- journals.Rmd | |- references.bib | |- apa_7th.csl | |- data | | | |- journals.csv | |- image | | | |- google_scholar.png
manuscript\reproduce_this.pdf
* The text, Lorem ipsum, is generated with the stringi
package (Gagolewski, Tartanus, Unicode, Inc., and others, 2021) while the dataset is created with the fabricatr
package (Blair, Cooper, Coppock, Humphreys, Rudkin, and Fultz, 2022).
manuscript\reproduce_this.pdf
manuscript\journals.Rmd
reproduce_this.pdf
to save timemanuscript\reproduce_this.pdf
manuscript\journals.Rmd
reproduce_this.pdf
to save timemanuscript\references.bib
manuscript\reproduce_this.pdf
manuscript\journals.Rmd
reproduce_this.pdf
to save timemanuscript\references.bib
manuscript\apa_7th.csl
data\journals.csv
a dataset created with the fabricatr
package (Blair, Cooper, Coppock, et al., 2022), imagined to explore the Google Scholar
rankings of fictitious journals
includes the following variables
image\google_scholar.png
For Windows, install 'Git for Windows', downloading from https://gitforwindows.org
For Mac, install 'Git', downloading from https://git-scm.com/downloads
Sign up for GitHub at https://github.com
registering an account is free
usernames are public
asdf029348
)usernames can be changed later
Download R from https://cloud.r-project.org
Download RStudio from https://rstudio.com/products/rstudio/download
RStudio allows for dividing your work with R into separate projects, each with own history etc.
...\YOURNAME-rmd
from the RStudio menu:
File -> New Project -> Existing Directory -> Browse -> ...\YOURNAME-rmd -> Open
* Recall that we have downloaded this earlier from GitHub. Back to the relevant slide.
RStudio offers various functions that facilitate working with .Rmd documents, which can be controlled at two locations:
Tools -> Global Options -> R Markdown
RStudio offers various functions that facilitate working with .Rmd documents, which can be controlled at two* locations:
Tools -> Global Options -> R Markdown
Tools -> Project Options -> R Markdown
* Some settings become available on the document toolbar as well, only when an .Rmd document is open. We will cover the document toolbar later on in the workshop. All settings can stay as they are — for now.
install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))tinytex::install_tinytex()
install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))tinytex::install_tinytex()
rmarkdown
(Allaire, Xie, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2022), for automating the process of converting R Markdown documents into other formatsinstall.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))tinytex::install_tinytex()
rmarkdown
(Allaire, Xie, McPherson, et al., 2022), for automating the process of converting R Markdown documents into other formats
tinytex
(Xie, 2022c), for PDF outputs
install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))tinytex::install_tinytex()
dplyr
(Wickham, François, Henry, and Müller, 2022), for data manipulationbase
(R Core Team, 2022), data.table
(Dowle and Srinivasan, 2021)install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))tinytex::install_tinytex()
dplyr
(Wickham, François, Henry, et al., 2022), for data manipulation
base
(R Core Team, 2022), data.table
(Dowle and Srinivasan, 2021)stargazer
(Hlavac, 2022), for tables
knitr
(Xie, 2022b), kableExtra
(Zhu, 2021), huxtable
(Hugh-Jones, 2021)install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2"))tinytex::install_tinytex()
dplyr
(Wickham, François, Henry, et al., 2022), for data manipulation
base
(R Core Team, 2022), data.table
(Dowle and Srinivasan, 2021)stargazer
(Hlavac, 2022), for tables
knitr
(Xie, 2022b), kableExtra
(Zhu, 2021), huxtable
(Hugh-Jones, 2021)ggplot2
, for figures
base
(R Core Team, 2022), plotly
(Sievert, Parmer, Hocking, Chamberlain, Ram, Corvellec, and Despouy, 2021)Downloading process can be initiated from within RStudio
Help -> Cheatsheets -> R Markdown Cheat Sheet
Pandoc User's Guide
R Markdown: The Definitive Guide (Xie, Allaire, and Grolemund, 2018)
R for Data Science (Wickham and Grolemund, 2021)
* During the workshop, R Markdown Cheat Sheet is likely to be more helpful than these resources, which I recommend to be consulted after the workshop.
File -> New File -> R Markdown -> OK
File -> Save
Observe that
* This is for demonstration purposes only. Otherwise, we will work with journals.Rmd
, which you have already downloaded, to save time.
** Alternatively, use the Save
button or the keyboard shortcut (e.g., Ctrl + S
on Windows). For shortcuts, follow Tools -> Keyboard Shortcuts Help
or Tools -> Modify Keyboard Shortcuts...
.
Observe also that the document has three components
Observe also that the document has three components
Observe also that the document has three components
Observe also that the document toolbar offers extended tools for .Rmd documents
These include, most impotantly,
Click the Knit
button to compile your .Rmd document, and observe that
You may want to delete these newly created files, as we will work with journals.Rmd
instead to save time.
When you Knit
, the following happens:
.Rmd --knitr--> .md --pandoc--> output
knitr
* executes the code if there is any, converts the resulting document from .Rmd (R Markdown) into .md (Markdown)
pandoc
** transforms the .md document into your preferred output format(s)
This process is automated by the rmarkdown
package
* If you had not already have the knitr
package, it would have been installed together with the rmarkdown
package.
** RStudio comes with a copy of pandoc
(http://pandoc.org), which is not an R package, so that you do not have to install it separately.
Behind the scenes, each .Rmd file is compiled in its own session, and therefore
Behind the scenes, each .Rmd file is compiled in its own session, and therefore
R Markdown can produce more than documents,* including
rmarkdown
bookdown
(Xie, 2022a)blogdown
(Xie, Dervieux, and Presmanes Hill, 2022)* Here we will focus on research papers only. In a separate workshop, I teach how to create professional websites with R Blogdown.
.Rmd documents start* with YAML
---title: output:---
* Technically, we can place YAML anywhere in a .Rmd document. However, it is a good practice to start with YAML so that the metadata is easly accessbile.
title
and output
are the basic variables of YAML
Typical YAML variables for an research paper are as follows:
---title: author: date: bibliography: csl: output: ---
Variables can take strings
---title: "Journals: Random Words With Random Data"output:---
Variables can take strings, options
---title: "Journals: Random Words With Random Data" output: pdf_document---
Variables can take strings, options, sub-options
---title: "Journals: Random Words With Random Data" output: pdf_document: keep_tex: true---
Variables can take strings, options, sub-options, and code
---title: "Journals: Random Words With Random Data" output: pdf_document: keep_tex: truedate: "`r format(Sys.Date(), '%d %B %Y')`"---
Documents as output formats include
---title: "Journals: Random Words With Random Data" output: html_document---
Documents as output formats include
---title: "Journals: Random Words With Random Data" output: latex_document---
Documents as output formats include
---title: "Journals: Random Words With Random Data" output: pdf_document---
Documents as output formats include
---title: "Journals: Random Words With Random Data" output: word_document---
Documents as output formats
html_document
latex_document
pdf_document
*word_document
github_document
md_document
odt_document
rtf_document
Presentations as output formats
beamer_presentation
iosslides_presentation
powerpoint_presentation
slidy_presentation
* For reasons of simplicity, this workshop focuses on LaTex and/or PDF outputs. Different output formats have slightly different customisations. See Pandoc User's Guide and/or R Markdown Cheat Sheet.
Strings with special characters, such as colon, require quotation marks — single ' or double "
---title: "Journals: Random Words With Random Data"output: pdf_document ---
Quotation marks are optional for strings without special characters
---title: "Journals: Random Words With Random Data" subtitle: A Mock Paper for an R Markdown Workshopauthor: Jane Doedate: 4 March 2020output: pdf_document ---
The syntax ^[footnotes_go_here] adds footnotes to strings
---title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"subtitle: A Mock Paper for an R Markdown Workshopauthor: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"date: 4 March 2020 output: pdf_document ---
The bibliography
and csl
variables take strings as well
---title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"subtitle: A Mock Paper for an R Markdown Workshopauthor: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" date: 4 March 2020 bibliography: references.bibcsl: apa_7th.csloutput: pdf_document ---
The strings for external files indicate (a) where the files are located and (b) how they are named
---...bibliography: references/ref_library.bib csl: "../../styles/chicago_manual_17.csl"...---
The strings for external files indicate (a) where the files are located and (b) how they are named
---...bibliography: references/ref_library.bib csl: "../../styles/chicago_manual_17.csl"...---
Notice that
the locations above are specified as relative to the working directory
for reproducibility reasons, hard-coded stings should be avoided
"C:/Users/resulumit/Dropbox/styles/chicago_manual_17.csl"
The strings indicate (a) where the files are located and (b) how they are named
---...bibliography: references/ref_library.bib csl: "../../styles/chicago_manual_17.csl"...---
Options can have sub-options
---title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"subtitle: A Mock Paper for an R Markdown Workshopauthor: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"date: 4 March 2020 bibliography: references.bibcsl: apa_7th.csl output: pdf_document: keep_tex: true---
Options can have sub-options
---title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"subtitle: A Mock Paper for an R Markdown Workshopauthor: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" date: 4 March 2020 bibliography: references.bibcsl: apa_7th.csl output: pdf_document: keep_tex: true---
Notice that
this specific setting, highlighted, will create multiple outputs
all but the last option (i.e., true
) takes a colon
options and sub-options (except the last option, again) are stepwise indented
pdf_document
and keep_tex
is coincidental Variables can take code as well
---title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"subtitle: A Mock Paper for an R Markdown Workshopauthor: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"date: "`r format(Sys.Date(), '%d %B %Y')`"bibliography: references.bibcsl: apa_7th.csl output: pdf_document---
Variables can take code as well
---title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"subtitle: A Mock Paper for an R Markdown Workshopauthor: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"date: "`r format(Sys.Date(), '%d %B %Y')`"bibliography: references.bibcsl: apa_7th.csl output: pdf_document---
Notice that
such codes can be particularly useful for variables
date
there are quotation marks around the code
we will cover codes in .Rmd documents later on in the workshop
Code and text can be combined in a string
---title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]"subtitle: A Mock Paper for an R Markdown Workshopauthor: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]"date: "First version: 4 March 2020. This version: `r format(Sys.Date(), '%d %B %Y')`."bibliography: references.bibcsl: apa_7th.csl output: pdf_document---
fontsize
10pt
11pt
and 12pt
linkcolor
, urlcolor
, citecolor
link-citations
no
yes
— a click on an citation will take the screen to the relevant entry in the list of references1) Open journals.Rmd
and fill in the YAML variables for the mock paper
reproduce_this.pdf
and/or the slides
2) Add and set one of the variables mentioned as further settings for PDF outputs above
fontsize
, linkcolor
, urlcolor
, citecolor
, link-citations
3) Add and set a completely new variable not covered so far
4) Knit
your journals.Rmd
10:00
There are not one, but several different versions of Markdown
R Markdown follows the syntax in Pandoc's Markdown
Multiple spaces on a given line are reduced to one
This is a sentence followed by four spaces. This is another sentence on the same line.
This is a sentence followed by four spaces. This is another sentence on the same line.
Line endings with fewer than two spaces are ignored
This is a sentence followed by one space.This is another sentence on a new line.
This is a sentence followed by one space. This is another sentence on a new line.
Two or more spaces at the end of lines introduce hard breaks, forcing a new line
This is a sentence followed by two spaces. This is another sentence on a new line.
This is a sentence followed by two spaces.
This is another sentence on a new line.
Spaces on lines that start with a vertical line | are kept
| a one-space indent| a five-space indent| a ten-space indent
a one-space indent
a five-space indent
a ten-space indent
Lines starting with the greater-than sign > introduce block quotes*
> In God, we trust. All others must bring data. >> --- Anonymous
In God, we trust. All others must bring data.
— Anonymous
* Notice that three hyphens grouped together introduce an em-dash. Dashes are covered later on in the workshop.
One or more* blank lines introduce a new paragraph
This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. This is the first sentence of a *new paragraph* as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.
This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.
This is the first sentence of a new paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.
* Multiple blank lines between paragraphs reduce to one.
Text with the syntax <!--comments --> is omitted from output
<!-- This paragraph needs re-writing -->This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. This is the first sentence of a new paragraph <!-- I've removed italics --> as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.
This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.
This is the first sentence of a new paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line.
5) Hard Breaks
reproduce_this.pdf
, page 1 journals.Rmd
, paragraph 16) Line Blocks / Block Quotes
reproduce_this.pdf
: page 1journals.Rmd
: block quote, between paragraphs 1 and 2reproduce_this.pdf
: page 5journals.Rmd
: hypothesis 1, between paragraphs 14 and 15; hypothesis 2, between paragraphs 16 and 1705:00
The number sign # introduces headers; lower levels are created with additional signs — up to total five levels
# Introduction becomes
## 1. Introduction becomes
### 3.1 Introduction becomes
#### Introduction becomes
##### Introduction becomes
A pair of single asterisk * or underscores _ introduces italics
*italics* becomes italics
_italics_ becomes italics as well
A pair of double asterisk or underscores introduces bold
**bold** becomes bold
__bold__ becomes bold as well
These two rules can be combined
**_bolditalics_** becomes bolditalics
_**bolditalics**_ becomes bolditalics as well
A pair of double tildes ~ introduces strikethrough
~~strikethrough~~ becomes strikethrough
Strikethrough can be combined with italics or bold
**~~strikebold~~** or __~~strikebold~~__, they both become strikebold
~~**strikebold**~~ or ~~__strikebold__~~, they both become strikebold as well
*~~strikeitalitcs~~* or _~~strikeitalitcs~~_, they both become strikeitalitcs
~~*strikeitalitcs*~~ or ~~_strikeitalitcs_~~, they both become strikeitalitcs as well
7) Headers
reproduce_this.pdf
: pages 1 to 11journals.Rmd
8) Emphases
reproduce_this.pdf
: pages 1 and 2journals.Rmd
: paragraph 203:00
You can link text to section headers in the same document
[Conclusion](#conclusion) becomes Conclusion, and a click takes the screen to that section
Multi-word headers need hyphenation
[Literature Review](#literature-review) becomes Literature Review, and it works only if the second part is hyphenated
* The links to references, figures, and tables are covered later on in the workshop.
You can link text to URLs
[visit my website](https://resulumit.com/) becomes visit my website
[https://resulumit.com](https://resulumit.com/) becomes https://resulumit.com
<https://resulumit.com> becomes https://resulumit.com as well
You can link text to URLs
[visit my website](https://resulumit.com/) becomes visit my website
[https://resulumit.com](https://resulumit.com/) becomes https://resulumit.com
<https://resulumit.com> becomes https://resulumit.com as well
You can also link text to an email address
[email me](mailto:resuluy@uio.no)* becomes email me
<resuluy@uio.no> becomes resuluy@uio.no
* Notice the prefix mailto: in the syntax.
9) Links — Internal
reproduce_this.pdf
: page 2 journals.Rmd
: paragraph 410) Links — External
reproduce_this.pdf
: page 1journals.Rmd
: title page items03:00
Inline equations go between a pair of single dollar signs $ — with no space between the signs and the equation itself
$E = mc^{2}$ becomes E = mc2
Inline equations go between a pair of single dollar signs $ — with no space between the signs and the equation itself
$E = mc^{2}$ becomes E = mc2
Block equations go in between a pair of double dollar signs — with or without spaces, it works
$$ E = mc^{2}$$ becomes
$$E = mc_{2}$$ becomes
For inline footnotes, use the ^[footnote] syntax
Jane Doe^[Corresponding author.] becomes Jane Doe1
1 Corresponding author.
For inline footnotes, use the ^[footnote] syntax
Jane Doe^[Corresponding author.] becomes Jane Doe1
1 Corresponding author.
Notice that
An alternative is to use the [^identifier] syntax, with identifiers defined elsewhere in the same document
Dr Doe holds a PhD in rock science.[^defence_date][^defence_date]: She defended her thesis in 2017.
Dr Doe holds a PhD in rock science.1
1 She defended her thesis in 2017.
An alternative is to use the [^identifier] syntax, with identifiers defined elsewhere in the same document
Dr Doe holds a PhD in rock science.[^defence_date][^defence_date]: She defended her thesis in 2017.
Dr Doe holds a PhD in rock science.1
1 She defended her thesis in 2017.
Notice that
11) Equations
reproduce_this.pdf
: page 7 journals.Rmd
: paragraph 22; block equation, between paragraphs 22 and 23
12) Footnotes
reproduce_this.pdf
: page 2 journals.Rmd
: paragraph 303:00
Lines starting with asterisk * as well as plus + or minus − signs introduce lists
- books- articles- reports
Lists can be nested within each other, with indentation
+ books+ articles - published - under review + revised and resubmitted - work in progress
List items can be numbered
1. books2. articles - published - under review + revised and resubmitted - work in progress
Two hyphens grouped together introduce an en-dash
‐‐ becomes –
Three hyphens grouped together introduce an em-dash
‐‐‐ becomes —
A pair of tildes introduces subscript
CO~2~ becomes CO2
A pair of carets introduces superscript
R^2^ becomes R2
A pair of tildes introduces subscript
CO~2~ becomes CO2
A pair of carets introduces subscript
R^2^ becomes R2
Notice that
13) Lists
reproduce_this.pdf
: page 3 journals.Rmd
: list, between paragraphs 10 and 1114) Dashes
reproduce_this.pdf
: page 2journals.Rmd
: paragraph 615) Subscripts and Superscripts
reproduce_this.pdf
: page 2journals.Rmd
: paragraph 503:00
References are defined in .bib files
pandoc
looks for a .bib file, and for the definitions therein, to process citations
bibliography
variable in YAMLpandoc
can process a citation only if there is a linked entry in the .bib file
A BibTeX entry consists of three elements
@article
bennett2015
title
, author
Different tags are available for different reference types
One could create entries by hand
A good alternative is to use Google Scholar
, which provides BibTeX entries
cite -> BibTex
and copyReference styles are defined in .csl files
pandoc
looks for a .csl file, and for the styles therein, to style citations and references
csl
variable in YAML.csl files affect the style only in outputs
All citations keys take the 'at' sign @ while square brackets and/or minus signs introduce variation
[@bennett2015] becomes (Bennett, 2015)
@bennett2015 becomes Bennett (2015)
[-@bennett2015] becomes (2015)
-@bennett2015 becomes 2015
[@bennett2015 35] becomes (Bennett, 2015, p. 35)
[@bennett2015 33-35] becomes (Bennett, 2015, pp. 33–35)
[@bennett2015, ch. 1] becomes (Bennett, 2015, ch. 1)
[@bennett2015; @gilbert2019] becomes (Bennett, 2015; Gilbert, 2019)
[see @bennett2015, for details] becomes (see Bennett, 2015, for details)
@bennett2015 [33-35] becomes Bennett (2015, pp. 33–35)
* Specifically, the outputs on this slide are formatted according to the APA 7th edition.
All citations keys take the 'at' sign @
A clever sentence.[@bennett2015] becomes A clever sentence.[1] in certain numerical sytles
A clever sentence.[@bennett2015; @gilbert2019] becomes A clever sentence.[1,2]
All citations keys take the 'at' sign @
A clever sentence.[@bennett2015] becomes A clever sentence.[1] in certain numerical sytles
A clever sentence.[@bennett2015; @gilbert2019] becomes A clever sentence.[1,2]
Individual styles may or may not use additional information, such as page numbers
A clever sentence.[@bennett2015 35] might become A clever sentence.[1] as well
All citations keys take the 'at' sign @
A clever sentence.[@bennett2015] becomes A clever sentence.[1] in certain numerical sytles
A clever sentence.[@bennett2015; @gilbert2019] becomes A clever sentence.[1,2]
Individual styles may or may not use additional information, such as page numbers
A clever sentence.[@bennett2015 35] might become A clever sentence.[1] as well
Individual styles may or may not be sensitive to variation, such as square brackets
A clever sentence. @bennett2015 might become A clever sentence.[1] as well
The list of references appears after the last line of the output document, with no section header
This is the last sentence of an APA style manuscript.## References
This is the last sentence of an APA style manuscript.
Bennett, S. (2015). Peanut butter and jelly. Journal of Bone, 1(12), 3–35.
Gilbert, T. (2019). Turning wine into water. In M. Albert (Ed.), The book of ground (pp. 124–142). Antman.
For internal links from in-text citations to the reference list, set link-citations: yes in YAML
linkcolor
variable make these links explicit---...bibliography: references.bibcsl: apa_7th.csllink-citations: yeslinkcolor: blue...---
16) Add an entry to references.bib
for the following book
17) Reproduce the citations and reference list in the mock paper
reproduce_this.pdf
: pages 3 and 11journals.Rmd
: paragraph 7 to 918) Change the reference style
19) Link the citations to the reference list
07:30
Most codes go inside code chunks
```{r}df <- read.csv("rmd_workshop_files/images_data/journals.csv") %>% mutate(age = 2020 - since, english = factor(english), subfield = factor(subfield))```
Codes can also go in line with text
The average H5 Index for the journals in the dataset is `r mean(df$h5_index)`.
Code chunks are delimited spaces between a pair of three backticks `
``````
Code chunks are delimited spaces between a pair of three backticks `
On the same line with the first delimiter, and in curly brackets {, code chunks take
```{r}```
Code chunks are delimited spaces between a pair of three backticks `
On the same line with the first delimiter, and in curly brackets {, code chunks take
```{r, setup}```
Code chunks are delimited spaces between a pair of three backticks `
On the same line with the first delimiter, and in curly brackets {, code chunks take
```{r, setup, echo=FALSE}```
The first item in code chunks indicates the engine to run the code
```{r}```
Note that
indicating an engine for each chunk is a must
r
is the specified engine, indicating that the code in the chunk above should be run by R
python
, which we will not cover in this workshop* The above chunk has no code — it is for demonstration only.
It is recommended, but optional, to label the code chunks
```{r, data_import}df <- read_csv("data/journals.csv")```
Note that
labels are written after the language engine, separated by a comma
data_import
chunks without labels are otherwise automatically numbered
duplicate labels lead to errors during compilation
Code chunks can take further options
```{r, setup, include=FALSE}```
Note that
in the example above, the include
option is set to FALSE
The complete list of options is available at https://yihui.org/knitr/options
leaving spaces around the equal sign =, between option tags and values, should be avoided
Options can be specified inside code chunks as well, after a number sign and a vertical line #|
```{r, echo=FALSE, eval=TRUE}```
```{r}#| echo = FALSE, eval = TRUE```
```{r}#| echo = FALSE#| eval = TRUE```
Options have default values
echo
, the default is TRUE
echo
: should the source code printed in the output?TRUE
: yes it should```{r}```
```{r, echo=TRUE}```
This chunk prints two things in the output document — (a) the code and (b) the head of the data frame
```{r}head(df)```
head(df)
## name origin branch h5_index h5_median english subfield## 1 Journal of Bears Americas Physical 73 97 1 1## 2 Journal of Moon Asia Social 72 106 1 0## 3 Journal of Lumber Americas Physical 72 100 1 1## 4 Journal of Houses Europe Social 72 102 1 0## 5 Journal of Water Europe Social 70 100 1 0## 6 Journal of Jeans Americas Physical 69 101 1 1## issues age## 1 7 61## 2 6 64## 3 8 30## 4 8 38## 5 5 33## 6 5 64
Setting echo=FALSE prevents the code from being displayed in the output document
```{r ... echo=FALSE}head(df)```
This chunk therefore prints one thing in the output document — the head of the data frame
## name origin branch h5_index h5_median english subfield## 1 Journal of Bears Americas Physical 73 97 1 1## 2 Journal of Moon Asia Social 72 106 1 0## 3 Journal of Lumber Americas Physical 72 100 1 1## 4 Journal of Houses Europe Social 72 102 1 0## 5 Journal of Water Europe Social 70 100 1 0## 6 Journal of Jeans Americas Physical 69 101 1 1## issues age## 1 7 61## 2 6 64## 3 8 30## 4 8 38## 5 5 33## 6 5 64
Prevent the result(s) of the source code from being displayed in the output document
```{r ... results="hide"}head(df)```
This chunk therefore prints one thing in the output document — the source code
head(df)
Setting results="asis" passes the results as they are produced by the code — pandoc
does not transform these. In creating tables for PDF output with the stargazer
package, this option is a must.
Cache results for future compilations
```{r ... cache=TRUE}```
Cache results for future compilations
```{r ... cache=TRUE}```
Note that caching
is useful especially for chunks that take a long time to execute
avoids executing the chunks at every compilation
creates a new folder in your working directory
cache.path
optionPrevent R from running the code in the chunk altogether
```{r ... eval=FALSE}```
Prevent R from running the code in the chunk altogether
```{r ... eval=FALSE}```
Prevent messages and/or warnings from being displayed in the output
```{r ... error=FALSE, message=FALSE, warning=FALSE}```
Define the actual dimensions of figures, in inches
```{r ... fig.height=6, fig.width=9}```
Define the actual dimensions of figures, in inches
```{r ... fig.height=6, fig.width=9}```
Define the size of figures as they appear in the output document, with out.width
and/or out.height
```{r ... out.width="50%"}```
Define the actual dimensions of figures, in inches
```{r ... fig.height=6, fig.width=9}```
Define the size of figures as they appear in the output document, with out.width
and/or out.height
```{r ... out.width="50%"}```
Define the alignment of figures — left
, right
, or center
```{r ... fig.align="center"}```
Define captions for figures
```{r ... fig.caption="A Scatter Plot"}```
Define captions for figures
```{r ... fig.caption="A Scatter Plot"}```
Set the resolution for figures
```{r ... dpi=300}```
Define captions for figures
```{r ... fig.caption="A Scatter Plot"}```
Set the resolution for figures
```{r ... dpi=300}```
Set extra options, such as angle, that output format would accept for figures
```{r ... out.extra="angle=45"}```
It is recommended to use the first code chunk for general setup, where you can
knitr::opts_chunk$set()
```{r, setup, include=FALSE}# chunk option defaultsknitr::opts_chunk$set(echo=FALSE, message=FALSE)# packageslibrary(dplyr)library(ggplot2)library(stargazer)# datadf_raw <- read.csv("journals.csv")```
I recommend using the second chunk for the main operations* on raw data
```{r, data, ...}df <- df_raw %>% mutate(subfield = as.factor(subfield), english = as.factor(english), age = 2020 - since) %>% select(-since)```
* I will be using the pipe operator %>%
and other functions from the dplyr
package for such operations in the following slides.
Code can also be incorporated in text, with the `r `
syntax
If we multiply _pi_ by 5, we get `r pi * 5`.
If we multiply pi by 5, we get 15.7079633.
The average H5 Index for the journals in the dataset is `r mean(df$h5_index)`, which would round to `r round(mean(df$h5_index), digits = 1)`.
The average H5 Index for the journals in the dataset is 26.3611366, which would round to 26.4.
__Only `r nrow(subset(df, english == 0))` journals__ in the dataset are published in a languageother than English.
Only 113 journals in the dataset are published in a language other than English.
20) Setup Chunk
knitr::opts_chunk$set()
dplyr
, ggplot2
, and stargazer
21) Data Chunk
subfield
and english
into factorsage
, based on since
since
from the data frame22) Inline code
reproduce_this.pdf
: page 6journals.Rmd
: paragraph 21nrow
function07:30
The syntax  embeds images, and/or figures produced elsewhere,* into .Rmd documents
* Ideally, reproducible papers should produce their own images with data and code. However, there might be situations where this is not possible.

Figures are numbered automatically

The syntax can accept width
or height
attributes as follows
{ width=40% }
knitr
The knitr
package offers a capable alternative with the include_graphics()
function
this goes inside code chunks
knitr::include_graphics("figure.extension")
this is more customisable, through the use of code chunks
out.width
or out.hight
optionsfig.height
and/or fig.width
knitr
The knitr
package offers a capable alternative with the include_graphics()
function
```{r, screenshot, echo=FALSE, fig.cap="A screenshot of the Google Scholar homepage."}knitr::include_graphics("../image/google_scholar.png")```
knitr
Size is defined with the chunk options out.width
or out.hight
```{r ... out.width="40%"}knitr::include_graphics("../image/google_scholar.png")```
knitr
Most other chunk options are common with figures plotted within R Markdown, such as fig.align
```{r ... fig.align="center"}knitr::include_graphics("../image/google_scholar.png")```
23) Images
reproduce_this.pdf
: figure 1 on page 10journals.Rmd
: figure 1, between paragraphs 19 and 2003:00
ggplot2
— OverviewA powerful package for visualising data
Used widely, not only by academics, but also by large corporations such as the New York Times
A huge amount is written on this package. See, for example,
ggplot2
communityAmong its alternatives are the base
and plotly
packages
ggplot2
— Basics1) The ggplot
function and the data
argument
ggplot
functionggplot(data = df)
ggplot2
— Basics1) The ggplot
function and the data
argument
ggplot
functionggplot(data = df)
2) The mapping aesthetics, or aes; most importantly, the variable(s) that we want to plot
ggplot
functionggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield))
ggplot2
— Basics1) The ggplot
function and the data
argument
ggplot
functionggplot(data = df)
2) The mapping aesthetics, or aes; most importantly, the variable(s) that we want to plot
ggplot
functionggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield))
3) The geometric objects, or geom; the visual representations
ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point()
ggplot2
Put the code in a chunk, and give it a caption
```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point()```
Figure 1. A scatterplot of journal metrics.
ggplot2
Add facets for subgroups, e.g., branch
```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch)```
Figure 1. A scatterplot of journal metrics.
ggplot2
Scale the colour to improve the legend
```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) + scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) ```
Figure 1. A scatterplot of journal metrics.
ggplot2
Change the theme
```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) + scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) + theme_bw()```
Figure 1. A scatterplot of journal metrics.
ggplot2
Improve the axis labels, e.g., with capital first letters
```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."}ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) + scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) + theme_bw() + labs(x = "H5 Median", y = "H5 Index")```
Figure 1. A scatterplot of journal metrics.
ggplot2
— Notesgeom_point
is one of many geoms avilable
geom_bar
for bar chartsgeom_boxplot
for box and whiskers plots24) Barplot
reproduce_this.pdf
: figure 2 on page 7journals.Rmd
: figure 2, between paragraphs 21 and 2225) Scatterplot
reproduce_this.pdf
: figure 3 on page 9journals.Rmd
: figure 3, between paragraphs 27 and 2810:00
The following syntax, outside code chunks, introduces tables that pandoc
can recognise
First Column Second Column ------------ ------------- First cell First cell Second cell Second cell Third cell Third cell
First Column | Second Column |
---|---|
First cell | First cell |
Second cell | Second cell |
Third cell | Third cell |
The position of headers, relative to their line underneath, defines column alignments
Left-Aligned Centered ---------------- ----------------First cell First cell Second cell Second cell Third cell Third cell
Left-Aligned | Centered |
---|---|
First cell | First cell |
Second cell | Second cell |
Third cell | Third cell |
A line starting with a colon, placed before or after tables, introduces captions
Centered Right-Aligned ---------------- ----------------First cell First cell Second cell Second cell Third cell Third cell : A hand-made table with R Markdown
Centered | Right-Aligned |
---|---|
First cell | First cell |
Second cell | Second cell |
Third cell | Third cell |
The caption line itself needs to be surrounded by empty lines
Centered Right-Aligned ---------------- ----------------First cell First cell Second cell Second cell Third cell Third cell : A hand-made table with R Markdown
Centered | Right-Aligned |
---|---|
First cell | First cell |
Second cell | Second cell |
Third cell | Third cell |
Tables are numbered automatically
: A hand-made table with R Markdown Centered Right-Aligned ---------------- ----------------First cell First cell Second cell Second cell Third cell Third cell
Centered | Right-Aligned |
---|---|
First cell | First cell |
Second cell | Second cell |
Third cell | Third cell |
Grid tables, with the following syntax, can handle complex cells with multiple lines and/or lists
+--------------------+--------------------+| First Column | Second Column | +====================+====================+| - First item | First cell | | - Second item | | | - Third item | |+--------------------+--------------------+|Second cell | Second cell with a | | | long text | +--------------------+--------------------+| Third cell | Third cell | | | | +--------------------+--------------------+: A grid table with multi-line cells
First Column | Second Column |
---|---|
- First item - Second item - Third item |
First cell |
Second cell | Second cell with a long text |
Third cell | Third cell |
Grid tables can be aligned as well, with colons at the boundaries of the header separator*
+--------------------+--------------------+| Left-Aligned | Centered | +:===================+:==================:+| - First item | First cell | | - Second item | | | - Third item | |+--------------------+--------------------+|Second cell | Second cell with a | | | long text | +--------------------+--------------------+| Third cell | Third cell | | | | +--------------------+--------------------+: A grid table with multi-line cells
Left-Aligned | Centered |
---|---|
- First item - Second item - Third item |
First cell |
Second cell | Second cell with a long text |
Third cell | Third cell |
* Use := for left-aligned, :=: for centered, =: for right-aligned columns.
26) Markdown Tables
reproduce_this.pdf
: table 1 on page 4journals.Rmd
: table 1, between paragraphs 11 and 1205:00
stargazer
— OverviewA capable package for creating at least three kinds of tables
Used widely by academics, even tough it has not been updated since 2018
Creates LaTeX code, HTML/CSS code, and ASCII text to be knitted
A lot is written on this package. See, for example,
Among its alternatives are the knitr
, kableExtra
, and huxtable
packages
stargazer
— NotesThe stargazer
package requires specific settings
type
argument of the stargazer()
functionOutput | Chunk Option | Type Argument |
---|---|---|
LaTex / PDF | results="asis" | latex |
HTML | results="asis" | html |
Word | comment="" | text |
* The following slides use the setting for LaTex and PDF outputs.
stargazer
— Notesstargazer
tables look slightly different in different output formats
In fact, it is currently not quite possible to knit
stargazer
code into tables in Word documents
knit
ASCII text, looking like a table knit
to HTML as well as Word, copy the tables from HTML to Wordknit
to PDF, open the PDF in Wordhuxtable
stargazer
— BasicsThe stargazer()
function
stargazer
— BasicsThe stargazer()
function
The data
argument of that function, with two main options
df
, here coming from df <- read_csv(journals.csv)lm1
, here coming from lm1 <- lm(h5_index ~ issues, data = df) stargazer
— Data TablesTable the first four rows of the dataset
```{r, data_table, echo=FALSE, results="asis"}stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)```
stargazer
— Data TablesTable the first four rows of the dataset
```{r, data_table, echo=FALSE, results="asis"}stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)```
Notice the options of the chunk and the arguments of the function
stargazer
— Data TablesTable the first four rows of the dataset
```{r, data_table, echo=FALSE, results="asis"}stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)```
Notice the options of the chunk and the arguments of the function
with echo=FALSE, the code will not be displayed in the output document
with results="asis", knitr
will pass through results without reformatting them
stargazer
— Data TablesTable the first four rows of the dataset
```{r, data_table, echo=FALSE, results="asis"}stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)```
Notice the options of the chunk and the arguments of the function
with echo=FALSE, the code will not be displayed in the output document
with results="asis", knitr
will pass through results without reformatting them
with summary = FALSE, the table will present the data, not its descriptive statistics
stargazer
— Data TablesTable the first four rows of the dataset
```{r, data_table, echo=FALSE, results="asis"}stargazer(data = head(df, n = 4), type = "latex", summary = FALSE)```
% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu
% Date and time: Fri, Apr 10, 2020 - 12:31:21
name | origin | branch | h5_index | h5_median | english | subfield | issues | age | |
1 | Journal of Bears | Americas | Physical | 73 | 97 | 1 | 1 | 7 | 61 |
2 | Journal of Moon | Asia | Social | 72 | 106 | 1 | 0 | 6 | 64 |
3 | Journal of Lumber | Americas | Physical | 72 | 100 | 1 | 1 | 8 | 30 |
4 | Journal of Houses | Europe | Social | 72 | 102 | 1 | 0 | 8 | 38 |
stargazer
— Data TablesSet header = FALSE to remove the note preceding tables
```{r, data_table, echo=FALSE, results="asis"}stargazer(data = head(df, n = 4), type = "latex", summary = FALSE, header = FALSE)```
name | origin | branch | h5_index | h5_median | english | subfield | issues | age | |
1 | Journal of Bears | Americas | Physical | 73 | 97 | 1 | 1 | 7 | 61 |
2 | Journal of Moon | Asia | Social | 72 | 106 | 1 | 0 | 6 | 64 |
3 | Journal of Lumber | Americas | Physical | 72 | 100 | 1 | 1 | 8 | 30 |
4 | Journal of Houses | Europe | Social | 72 | 102 | 1 | 0 | 8 | 38 |
stargazer
— Data TablesDefine a caption with the title
argument
```{r, data_table, echo=FALSE, results="asis"}stargazer(data = head(df, n = 4), type = "latex", summary = FALSE, header = FALSE, title = "First four rows of the dataset")```
name | origin | branch | h5_index | h5_median | english | subfield | issues | age | |
1 | Journal of Bears | Americas | Physical | 73 | 97 | 1 | 1 | 7 | 61 |
2 | Journal of Moon | Asia | Social | 72 | 106 | 1 | 0 | 6 | 64 |
3 | Journal of Lumber | Americas | Physical | 72 | 100 | 1 | 1 | 8 | 30 |
4 | Journal of Houses | Europe | Social | 72 | 102 | 1 | 0 | 8 | 38 |
stargazer
— Summary Statistics TablesCreate a table of summary statistics instead, for the complete dataset
```{r, summary_table, echo=FALSE, results="asis"}stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, title = "Descriptive statistics")```
Statistic | N | Mean | St. Dev. | Min | Max |
h5_index | 1,091 | 26.361 | 13.814 | 1 | 73 |
h5_median | 1,091 | 39.400 | 21.272 | 3 | 109 |
issues | 1,091 | 4.676 | 1.786 | 1 | 12 |
age | 1,091 | 42.902 | 26.370 | 1 | 158 |
stargazer
— Summary Statistics TablesKeep only a selection of statistics
```{r, summary_table, echo=FALSE, results="asis"}stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, title = "Descriptive statistics", summary.stat = c("n", "mean", "sd", "min", "max"))```
Statistic | N | Mean | St. Dev. | Min | Max |
h5_index | 1,091 | 26.361 | 13.814 | 1 | 73 |
h5_median | 1,091 | 39.400 | 21.272 | 3 | 109 |
issues | 1,091 | 4.676 | 1.786 | 1 | 12 |
age | 1,091 | 42.902 | 26.370 | 1 | 158 |
stargazer
— Summary Statistics TablesOmit a selection of statistics for the same effect
```{r, summary_table, echo=FALSE, results="asis"}stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, title = "Descriptive statistics", omit.summary.stat = c("p25", "p75"))```
Statistic | N | Mean | St. Dev. | Min | Max |
h5_index | 1,091 | 26.361 | 13.814 | 1 | 73 |
h5_median | 1,091 | 39.400 | 21.272 | 3 | 109 |
issues | 1,091 | 4.676 | 1.786 | 1 | 12 |
age | 1,091 | 42.902 | 26.370 | 1 | 158 |
stargazer
— Summary Statistics TablesFlip the table
```{r, summary_table, echo=FALSE, results="asis"}stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, flip = TRUE, title = "Descriptive statistics", omit.summary.stat = c("p25", "p75"))```
Statistic | h5_index | h5_median | issues | age |
N | 1,091 | 1,091 | 1,091 | 1,091 |
Mean | 26.361 | 39.400 | 4.676 | 42.902 |
St. Dev. | 13.814 | 21.272 | 1.786 | 26.370 |
Min | 1 | 3 | 1 | 1 |
Max | 73 | 109 | 12 | 158 |
27) Summary Statistics Tables
reproduce_this.pdf
: table 2 on page 8journals.Rmd
: table 2, between paragraphs 23 and 2405:00
stargazer
— Regression TablesCreate a table of regression models instead
```{r, regression_table, echo=FALSE, results="asis"}stargazer(data = lm(h5_index ~ issues, data = df), type = "latex", header = FALSE, title = "Regression Results")```
Dependent variable: | |
h5_index | |
issues | 1.913*** |
(0.227) | |
Constant | 17.415*** |
(1.137) | |
Observations | 1,091 |
R2 | 0.061 |
Adjusted R2 | 0.060 |
Residual Std. Error | 13.391 (df = 1089) |
F Statistic | 70.959*** (df = 1; 1089) |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
stargazer
— Regression TablesModels can also be estimated outside the function first
```{r, regression_table, echo=FALSE, results="asis"} lm1 <- lm(h5_index ~ issues, data = df) stargazer(data = lm1, type = "latex", header = FALSE, title = "Regression Results")```
Dependent variable: | |
h5_index | |
issues | 1.913*** |
(0.227) | |
Constant | 17.415*** |
(1.137) | |
Observations | 1,091 |
R2 | 0.061 |
Adjusted R2 | 0.060 |
Residual Std. Error | 13.391 (df = 1089) |
F Statistic | 70.959*** (df = 1; 1089) |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
stargazer
— Regression TablesKeep only a selection of statistics
```{r, regression_table, echo=FALSE, results="asis"}stargazer(data = lm1, type = "latex", header = FALSE, title = "Regression Results", keep.stat = c("n", "rsq"))```
Dependent variable: | |
h5_index | |
issues | 1.913*** |
(0.227) | |
Constant | 17.415*** |
(1.137) | |
Observations | 1,091 |
R2 | 0.061 |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
stargazer
— Regression TablesDisplay multiple models in the same table
```{r, regression_table, echo=FALSE, results="asis"}stargazer(data = list(lm1, lm2), type = "latex", header = FALSE, title = "Regression Results", keep.stat = c("n", "rsq"))```
Dependent variable: | ||
h5_index | ||
(1) | (2) | |
issues | 1.913*** | 1.424*** |
(0.227) | (0.212) | |
english1 | 17.262*** | |
(1.244) | ||
Constant | 17.415*** | 4.226*** |
(1.137) | (1.415) | |
Observations | 1,091 | 1,091 |
R2 | 0.061 | 0.202 |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
stargazer
— Regression TablesChange variable labels
```{r, regression_table, echo=FALSE, results="asis"}stargazer(data = list(lm1, lm2), type = "latex", header = FALSE, title = "Regression Results", keep.stat = c("n", "rsq"), dep.var.labels = "H5 Index", covariate.labels = c("Issues", "English"))```
Dependent variable: | ||
H5 Index | ||
(1) | (2) | |
Issues | 1.913*** | 1.424*** |
(0.227) | (0.212) | |
English | 17.262*** | |
(1.244) | ||
Constant | 17.415*** | 4.226*** |
(1.137) | (1.415) | |
Observations | 1,091 | 1,091 |
R2 | 0.061 | 0.202 |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
stargazer
— Regression TablesChange significance levels
```{r, regression_table, echo=FALSE, results="asis"}stargazer(data = list(lm1, lm2), type = "latex", header = FALSE, title = "Regression Results", keep.stat = c("n", "rsq"), dep.var.labels = "H5 Index", covariate.labels = c("Issues", "English"), star.cutoffs = c(0.05, 0.01, 0.001))```
Dependent variable: | ||
H5 Index | ||
(1) | (2) | |
Issues | 1.913*** | 1.424*** |
(0.227) | (0.212) | |
English | 17.262*** | |
(1.244) | ||
Constant | 17.415*** | 4.226** |
(1.137) | (1.415) | |
Observations | 1,091 | 1,091 |
R2 | 0.061 | 0.202 |
Note: | *p<0.05; **p<0.01; ***p<0.001 |
28) Regression Tables
reproduce_this.pdf
: table 3 on page 10journals.Rmd
: table 3, between paragraphs 30 and 3107:30
Not everything is possible to achieve with R Markdown syntax, code chunks, and/or code
Not everything is possible to achieve with R Markdown syntax, code chunks, and/or code
Workarounds available through inclusion of other languages and/or syntaxes in .Rmd documents
Not everything is possible to achieve with R Markdown syntax, code chunks, and/or code
Workarounds available through inclusion of other languages and/or syntaxes in .Rmd documents
There are no exclusive list of gaps or workarounds
How can we cross-reference figures, tables, and equations in R Markdown?
Insert a LaTeX label into the targets (figures, tables, and equations), and then use the \autoref{figure_caption} syntax in text
For figures, insert a LaTeX label into the fig.caption
option, and use the \autoref{latex_label} syntax in text
\autoref{scatter_plot} visualises the relationship between the two journal metrics.```{r ... fig.caption = "A Scatter Plot \\label{scatter_plot}"}ggplot(data = df) + geom_point(...```
Figure 1 visualises the relationship between the two journal metrics.
For Markdown tables, insert a LaTeX label after the table caption, and use the \autoref{latex_label} syntax in text
See \autoref{handmade_table} for further details.: A hand-made table with R Markdown \label{handmade_table}+--------------------+--------------------+| Left-Aligned | Centered | ...
See Table 1 for further details.
Note that there is a difference in the label syntax for figures and R Markdown tables
we use a double backslash \ \ to label figures
e.i., \\label{scatter_plot} because the label goes into a string
the first is an escape operator for the second, LaTeX backslash
Note that there is a difference in the label syntax for figures and R Markdown tables
we use a double backslash \ \ to label figures
e.i., \\label{scatter_plot} because the label goes into a string
the first is an escape operator for the second, LaTeX backslash
we use single backslash \ to label R Markdown tables
e.i., \label{handmade_table} because the label is not in any string
there is no need for the escape operator
29) Referring to Figures
reproduce_this.pdf
: pages 6 and 9journals.Rmd
: paragraphs 19, 21, and 2730) Referring to Markdown Tables
reproduce_this.pdf
: page 4journals.Rmd
: paragraph 1105:00
For stargazer
tables, define a label with the label
argument, and use the \autoref{latex_label} syntax in text
```{r, regression_table, echo=FALSE, results="asis"}stargazer(data = list(lm1, lm2), type = "latex", ... label = "regression_results")```\autoref{regression_results} provides results from two OLS models.
Table 1 provides results from two OLS models.
Dependent variable: | ||
H5 Index | ||
(1) | (2) | |
Issues | 1.913*** | 1.424*** |
(0.227) | (0.212) | |
English | 17.262*** | |
(1.244) | ||
Constant | 17.415*** | 4.226** |
(1.137) | (1.415) | |
Observations | 1,091 | 1,091 |
R2 | 0.061 | 0.202 |
Note: | *p<0.05; **p<0.01; ***p<0.001 |
Note that we can cross-reference specific results in tables as well
In Model 1, the coefficient for _Issues_ is `r round(coef(summary(lm1))["issues", "Estimate"], digits = 2)`.
In Model 1, the coefficient for Issues is 1.91.
For equations, insert a LaTeX label in an equation environment, and use the \autoref{latex_label} syntax in text
\begin{equation}\label{special_relativity}E = mc_{2}\end{equation}According to \autoref{special_relativity}, space and time are linked.
According to Equation 1, space and time are linked.
31) Referring to Tables
reproduce_this.pdf
: pages 7 and 9journals.Rmd
: paragraph 23 and 2932) Referring to Results in Regression Tables
reproduce_this.pdf
: page 9journals.Rmd
: paragraph 29Std. Error
33) Referring to Equations
reproduce_this.pdf
: page 7journals.Rmd
: paragraph 2207:30
R Markdown adds the list of references to the end of documents. This might be undesirable for some manuscripts, for example those with an appendix. Similarly, some journals require tables and figures to be added after references.
R Markdown adds the list of references to the end of documents. This might be undesirable for some manuscripts, for example those with an appendix. Similarly, some journals require tables and figures to be added after references.
Define where exactly the list of references should appear with the HMTL code <div id="refs">
# References<div id = "refs"></div># Appendix
R Markdown produces outputs with single-line-spaced text while we might prefer or be required (e.g., by journal submission rules) to double-space our manuscripts.
Use the doublespacing
command from the LaTeX package setspace
(Carlisle, Fairbairns, Harris, and Tobin, 2011)
header-includes
---...header-includes: - \usepackage{setspace}\doublespacing---
* This can be reversed anywhere in text, with the singlespacing
command.
34) Line Spacing
onehalfspacing
02:00
Pages, tables, figures etc. are numbered continuously across an output. We might prefer or be required (e.g., by journal submission rules) to change this behaviour, for example for appendices.
Use the setcounter
in combination with the renewcommand
command, outside code chunks
\setcounter{page}{1}\renewcommand*{\thepage}{A\arabic{page}}\setcounter{table}{0}\renewcommand*{\thetable}{A\arabic{table}}\setcounter{figure}{0}\renewcommand*{\thefigure}{A\arabic{figure}}
Research papers have many versions before publication
* They also often written by multiple authors and/or on different computers, increasing the number of versions created. Here I assume projects are single-authored on a single computer, leaving the topic of collaboration (including, with oneself) to the next section — Part 9.
Research papers have many versions before publication
With many versions created over time, there emerge at least two challenges
Research papers have many versions before publication
With many versions created over time, there emerge at least two challenges
We all version control, in different ways, such as
Typically, hand-made attemps to version control lead to cluttered folders
manuscript | |- journals_FINAL_19May.Rmd |- journals_FINAL.Rmd |- journals_26APRIL_newliterature.Rmd ... |- journals.Rproj |- references.bib |- apa_7th.csl
* For projects that are single-authored on a single computer, merging is typically automatic. It becomes an issue for collaborated projects, which we will cover in the next section — Part 9.
Version control with Git and GitHub requires
initial setup, done once*
project setup, repeated for every RStudio project
* We have started this process already, in Part 1 of the workshop, by downloading and installing Git and signing up for GitHub. Back to the relevant slide.
1) Enable version control with RStudio
Tools -> Global Options -> Git/SNV -> Enable version control interface for RStudio projects
RStudio will likely find Git automatically
Browse...
Git is likely to be at
c:/Program Files/Git/bin/git.exe
on Windows
/usr/local/git/bin/git
on Mac
2) If you are using Windowns, set Git Bash as your shell
Tools -> Global Options -> Terminal -> New terminals open with: Git Bash
3) Introduce yourself to Git
Tools -> Terminal -> New Terminal
git config --global user.name "YOUR-NAME" git config --global user.email "YOUR-EMAIL-ADDRESS"
git config --global --list
1) Initiate local version control with Git
Tools -> Version Control -> Project Setup... -> Version Control System -> Git
after confirming your new repository, and restarting the session, observe that
now there is now a Git tab in RStudio
your project now includes a .gitignore
file
* These instructions presume there is an exiting RStudio project to be set up for version control. If not, or to start a new project, follow from this slide first.
2) Create a new GitHub repository
Repositories -> New -> Repository name (e.g., "rwd_workshop") -> Public -> Create repository
observe the structure of the repository address
Terminal
, the address gets the .git
extension
3) Push an existing repository
Tools -> Terminal -> New Terminal
Terminal
, with your username and repository namegit remote add origin https://github.com/USER_NAME/REPOSITORY_NAME.gitgit add .git commit -m "first commit"git push -u origin master
3) Push an existing repository
Tools -> Terminal -> New Terminal
Terminal
, with your username and repository namegit remote add origin https://github.com/USER_NAME/REPOSITORY_NAME.gitgit add .git commit -m "first commit"git push -u origin master
if this is your first time using GitHub with RStudio, you will be prompted to authenticate
observe that your project files are now online, listed on the GitHub repository
1) Edit and Save
journals.Rmd
, and save itM
, for modified, as Status
1) Edit and Save
journals.Rmd
, and save itM
, for modified, as Status
2) Commit and Push
Staged
* for one or more files that you would like to commitCommit message
that summarises the editsCommit
to create a record of the new version locally to your computerClose -> Push
to push the version to GitHub* To stage is to add files to be committed. It allows us to commit files individually or together with other files.
1) Edit and Save
journals.Rmd
, and save itM
, for modified, as Status
2) Commit and Push
Staged
for one or more files that you would like to commitCommit message
that summarises the editsCommit
to create a record of the new version locally to your computerClose -> Push
to push the version to GitHub.gitignore
.gitignore
specifies which file(s) and/or folder(s) should be excluded from version control
.gitignore
file.gitignore
.gitignore
specifies which file(s) and/or folder(s) should be excluded from version control
.gitignore
file.gitignore
lists one item per line
.gitignore
.gitignore
specifies which file(s) and/or folder(s) should be excluded from version control
.gitignore
file.gitignore
lists one item per line
See the documentation at https://git-scm.com/docs/gitignore
.gitignore
There are good reasons to ignore some others, including files
journals.pdf
, as opposed to journals.Rmd.gitignore
.gitignore
has a list of project-specific files.Rproj.user.Rhistory.RData.Ruserdata
.gitignore
Observe that, by default, .gitignore
has a list of project-specific files
In addition, you can ignore, for example,
.Rproj.user.Rhistory.RData.Ruserdata/manuscript/
.gitignore
Observe that, by default, .gitignore
has a list of project-specific files
In addition, you can ignore, for example,
a specific folder, relative to the root directory
a specific file in a specific folder, relative to the root directory
.Rproj.user.Rhistory.RData.Ruserdata/manuscript//manuscript/journals.pdf
.gitignore
Observe that, by default, .gitignore
has a list of project-specific files
In addition, you can ignore, for example,
a specific folder, relative to the root directory
a specific file in a specific folder, relative to the root directory
a specific file in any folder
.Rproj.user.Rhistory.RData.Ruserdata/manuscript//manuscript/journals.pdfjournals.pdf
.gitignore
Observe that, by default, .gitignore
has a list of project-specific files
In addition, you can ignore, for example,
a specific folder, relative to the root directory
a specific file in a specific folder, relative to the root directory
a specific file in any folder
all files with a specific extension, anywhere in the project
.Rproj.user.Rhistory.RData.Ruserdata/manuscript//manuscript/journals.pdfjournals.pdf *.pdf
.gitignore
— NotesThere are many other pattern formats
.gitignore
— NotesThere are many other pattern formats
Starting to ignore a file or folder that is already being tracked requires clearing the cache
.gitignore
, enter the following line in the Terminal
/path/to/file
git rm --cached /path/to/file
.gitignore
— NotesThere are many other pattern formats
Starting to ignore a file or folder that is already being tracked requires clearing the cache
.gitignore
, enter the following line in the Terminal
/path/to/file
git rm --cached /path/to/file
The following command clears all cache
.gitignore
that involves several files or foldersgit rm -r --cached .
35) Reproducibility and Version Control
filter
function in the data chunk 36) Gitignore
journals.pdf
.gitignore
journals.pdf
from cache05:00
Many research papers are written by multiple authors and/or on multiple computers
Many research papers are written by multiple authors and/or on multiple computers
With multiple authors and/or computers, there emerges at least two additional challenges beyond version control
Many research papers are written by multiple authors and/or on multiple computers
With multiple authors and/or computers, there emerges at least two additional challenges beyond version control
We all manage collaboration, in different ways, such as
To pull
To pull
To merge
To pull
To merge
Merge conflict
Branch
Pull request
1) The setup for the owner is largely the same as in any single-author, single-computer scenario
1) The setup for the owner is largely the same as in any single-author, single-computer scenario
2) As an additional step, the owner needs to invite their collaborator(s) to the project
Settings -> Manage access -> Invite a collaborator
1) Notice that the remote part of the setup is done by the owner for the collaborator
1) Notice that the remote part of the setup is done by the owner for the collaborator
2) The local part of the setup still needs to be done
File -> New Project -> Version Control -> Git
1) Notice that the remote part of the setup is done by the owner for the collaborator
2) The local part of the setup still needs to be done
File -> New Project -> Version Control -> Git
Repository URL
, required for the above process, is the version without the .git
extension37) Owner Setup
File -> New File -> R Markdown -> OK
to create the template38) Invitation to Collaborate
39) Collaborator Setup
10:00
1) Pull
Pull
to move the up-to-date records from GitHub to your computerAlready up-to-date
.1) Pull
Pull
to move the up-to-date records from GitHub to your computerAlready up-to-date
.2) Edit and save; commit and push
40) Non-simultaneous Collaboration
take in turns with your partner to work on the same document (of the same project)
owner: edit the first header in the document (i.e., "R Markdown"), save, commit, and push
owner and collaborator: observe the changes, if any, on your own .Rmd
file, and/or on your GitHub repository
collaborator: pull, revert the header back to original, save, commit, and push
05:00
Notice that you have not encountered any errors and/or merge conflicts
because everyone edited and merged with an up-to-date document
this is the default scenario in single-author, multiple computer scenario
41) Simultaneous Collaboration — Different Lines
10:00
Notice that you have encountered an error
pulling before pushing solves the problem because the edits are not on the same line
the merge takes place automatically, on the local repository of the last pusher
42) Simultaneous Collaboration — Same Line
work on the same document at the same time
owner: edit the first header in the document again, save, commit, and push
collaborator: edit the first header in the document as well, save, commit, and push
observe the error message that the last pusher will receive
10:00
Notice that you have encountered not only an error but also a merge conflict
pulling before pushing alone does not solve the problem because the edits are on the same line
nevertheless, by pulling first, you can view the conflict directly on the file
the merge takes place on the local repository of the last pusher
1) Branch
New Branch
on the Git tab2) Edit and save; commit and push
3) Pull request
On GitHub, click
Pull requests -> New pull request
choose what is to be pulled, and write a note to your collaborator who can accept or reject the merge
43) Pull request
44) Merging
10:00
It is possible to edit .Rmd
documents directly on GitHub
Edit this file
It is possible to edit .Rmd
documents directly on GitHub
Edit this file
A GitHub account is enough for collaboration with co-authors who do not work with Git, R, or RStudio
45) GitHub edit
.Rmd
document in your collaboration project05:00
Consider converting a real project to R Markdown
Choose an existing project, preferably
Ask me for help
Allaire, J., Y. Xie, J. McPherson, et al. (2022). rmarkdown: Dynamic Documents for R. R package version 2.14. <https://CRAN.R-project.org/package=rmarkdown.
Blair, G., J. Cooper, A. Coppock, et al. (2022). fabricatr: Imagine Your Data Before You Collect It. R package version 0.16.0. <https://CRAN.R-project.org/package=fabricatr.
Carlisle, D., R. Fairbairns, E. Harris, et al. (2011). setspace – Set space between lines. LaTeX package, version 6.7a. <https://ctan.org/pkg/setspace.
Dowle, M. and A. Srinivasan (2021). data.table: Extension of
data.frame
. R package version 1.14.2.
<https://CRAN.R-project.org/package=data.table.
Gagolewski, M., B. Tartanus, o. Unicode, et al. (2021). stringi: Character String Processing Facilities. R package version 1.7.6. <https://CRAN.R-project.org/package=stringi.
Hlavac, M. (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.3. <https://CRAN.R-project.org/package=stargazer.
Hugh-Jones, D. (2021). huxtable: Easily Create and Style Tables for LaTeX, HTML and Other Formats. R package version 5.4.0. <https://hughjonesd.github.io/huxtable/.
R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. <https://www.R-project.org/.
Sievert, C., C. Parmer, T. Hocking, et al. (2021). plotly: Create Interactive Web Graphics via plotly.js. R package version 4.10.0. <https://CRAN.R-project.org/package=plotly.
Wickham, H., R. François, L. Henry, et al. (2022). dplyr: A Grammar of Data Manipulation. R package version 1.0.9. <https://CRAN.R-project.org/package=dplyr.
Wickham, H. and G. Grolemund (2021). R for data science. O'Reilly.
Xie, Y. (2022a). bookdown: Authoring Books and Technical Documents with R Markdown. R package version 0.26. <https://CRAN.R-project.org/package=bookdown.
Xie, Y. (2022b). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.39. <https://yihui.org/knitr/.
Xie, Y. (2022c). tinytex: Helper Functions to Install and Maintain TeX Live, and Compile LaTeX Documents. R package version 0.39. <https://github.com/rstudio/tinytex.
Xie, Y., J. Allaire, and G. Grolemund (2018). R Markdown: The Definitive Guide. ISBN 9781138359338. Boca Raton, Florida: Chapman and Hall/CRC. <https://bookdown.org/yihui/rmarkdown.
Xie, Y., C. Dervieux, and A. Presmanes Hill (2022). blogdown: Create Blogs and Websites with R Markdown. R package version 1.10. <https://CRAN.R-project.org/package=blogdown.
Zhu, H. (2021). kableExtra: Construct Complex Table with kable and Pipe Syntax. R package version 1.3.4. <https://CRAN.R-project.org/package=kableExtra.
Resul Umit
post-doctoral researcher in political science at the University of Oslo
teaching and studying representation, elections, and parliaments
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |