name: title-slide class: inverse, center, middle <style type="text/css"> .hljs-github .hljs { background: #e5e5e5; } .inline-c, remark-inline-code { background: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace; } .yellow-h{ background: #ffff88; } .out-t, remark-inline-code { background: #9fff9f; border-radius: 3px; padding: 4px; } .pull-left-c { float: left; width: 58%; } .pull-right-c { float: right; width: 38%; } .medium { font-size: 75% } .small { font-size: 50% } .action { background-color: #f2eecb; } </style> # Writing Reproducible Research Papers with R Markdown <br> ### Resul Umit ### 24 May 2022 .footnote[ [Skip intro — To the contents slide](#contents-slide). <a href="mailto:resuluy@uio.no?subject=R Markdown workshop">I can teach this workshop at your institution — Email me</a>. ] --- ## Who am I? Resul Umit - post-doctoral researcher in political science at the University of Oslo - teaching and studying representation, elections, and parliaments - [a recent publication](https://doi.org/10.1017/psrm.2021.30): the effects of casualties in terror attacks on elections -- <br> - teaching workshops, also on - [version control and collaboration](https://resulumit.com/teaching/git_workshop.html) - [automated web scraping ](https://resulumit.com/teaching/scrp_workshop.html) - [working with Twitter data](https://resulumit.com/teaching/twtr_workshop.html) - [creating academic websites](https://resulumit.com/teaching/rbd_workshop.html) -- <br> - more information available at [resulumit.com](https://resulumit.com/) --- ## How did I use to write? First, with .yellow-h[Stata + Word], I was ... - frustrated with Word - formatting tables, figures, citations, and equations - managing references - tired of switching between programmes/screens - and, worried about making mistakes in between - paying for programme licences --- ## How did I use to write? Then, with .yellow-h[Stata + R + LaTeX], I was ... - ~~frustrated with Word~~ - ~~formatting tables, figures, citations, and equations~~ - ~~managing references~~ - tired of switching between programmes/screens - and, worried about making mistakes in between - paying for the Stata licence - converting PDF documents to Word manually - coordinating work with co-authors who don't use LaTeX/PDF - submitting to journals which don't accept LaTeX/PDF --- ## How do I write now? Now, with .yellow-h[R Markdown], I am ... happy! - ~~frustrated with Word~~ - ~~formatting tables, figures, citations, and equations~~ - ~~managing references~~ - ~~tired of switching between programmes/screens~~ - ~~and, worried about making mistakes in between~~ - ~~paying for the Stata licence~~ - ~~converting PDF documents to Word, manually~~ - ~~coordinating work with co-authors who don't use LaTeX/PDF~~ - ~~submitting to journals which don't accept LaTeX/PDF~~ --- ## R Markdown - Efficient - write text, cite sources, tidy data, analyse, table, and plot it in one programme/screen - re-do one, more, or all of these with ease - decrease the possibility of making mistakes in the process -- - Flexible - output to various formats - e.g., HTML, LaTeX, PDF, Word -- - Open access/source - use for free - create documents accessible to anyone with a computer and internet connection - benefit from the work of a great community of users/developers --- ## Reproducibilty — Before Publication - Having written a complete draft - with data including re-coded variables, tables, figures, and text with references to specific results (e.g., numbers from summary and/or regression statistics) -- - If you and/or your co-authors decide - to reverse a re-coded variable to its previous/original measure - and/or, to exclude a subgroup of observations from analysis -- - How resource intensive would this revision be? - how long would this revision take? - how many programmes would be needed for this revision, and how much would they cost? - there is an inverse relationship between this resource intensity and reproducibilty --- ## Reproducibilty — After Publication - After your paper is published, if others, including your future self, would like to test how robust the results are - to reversing a re-coded variable to its previous/original measure - and/or, to excluding a subgroup of observations from analysis -- - How resource intensive would this test be? - how accessible is the data, documentation (how was the variable re-coded in the first place?), and the code? - how long would the test take? - how many programmes would be needed for this revision, and how much would they cost? - there is an inverse relationship between this resource intensity and reproducibilty --- ## The Workshop — Overview - Two days, on how to write reproducible research papers with R Markdown - 200+ slides, 40+ exercises, and time for converting a real project -- - Based on converting a mock manuscript written in Word to R Markdown - plus, improving its reproducibility and version-controlling it - with a PDF output in mind -- - Designed for researchers with basic knowledge of R programming language - does not cover programming with R - e.g., writing functions <br> - ability to regress, plot, and table in R will be very helpful - but not absolutely necessary — these skills can be developed after learning R Markdown as well --- ## The Workshop — Contents .pull-left[ [Part 1. Getting the Tools Ready](#part1) - e.g., downloading course material [Part 2. Introducing R Markdown](#part2) - e.g., creating a new document [Part 3. Setting Metadata](#part3) - e.g., defining output format [Part 4. Writing Text](#part4) - e.g., adding emphasis to text [Part 5. Managing References](#part5) - e.g., citing sources ] -- name: contents-slide .pull-right[ [Part 6. Adding Code, Figures, and Tables](#part6) - e.g., plotting data [Part 7. Addressing Functionality Gaps](#part7) - e.g., adjusting line spacing [Part 8. Using Version Control](#part8) - e.g., integrating Git and GitHub [Part 9. Collaborating with Others](#part9) - e.g., working simultaneously with co-authors [Part 10. Working on a Real Project](#part10) - e.g., converting a work-in-progress of yours ] --- ## The Workshop — Organisation - Sit in groups of two - participants learn as much from their partner as from instructors - switch partners after every second part - Type, rather than copy and paste, the code that you will find on these slides - typing is a part of the learning process - When you have a question - ask your partner - google together - ask me --- class: action ## The Workshop — Organisation — Slides Slides with this background colour indicate that your action is required, for - setting the workshop up - e.g., downloading course material - completing the exercises - e.g., managing references in R Markdown - there are 40+ exercises - these slides have countdown timers
03
:
00
--- ## The Workshop — Organisation — Slides - Codes and texts that go in R Markdown documents .inline-c[appear as such — in a different font, on gray background] - long codes and texts will have their own line(s) ````md ```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."} ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) + scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) ``` ```` --- ## The Workshop — Organisation — Slides - Codes and texts that go in R Markdown documents .inline-c[appear as such — in a different font, on gray background] - long codes and texts will have their own line(s) - Results that come out in output files .out-t[appear as such — in the same font, on green background] - except very obvious results, such as figures and tables -- - Specific sections are .yellow-h[highlighted yellow as such] for emphasis - these could be for anything — codes and texts in input, results in output, and/or texts on slides -- - The slides are designed for self-study as much as for the workshop - *accessible*, in substance and form, to go through on your own --- ## The Workshop — Aims - To make you aware what is possible with R Markdown - we will cover a large breath of issues, not all of it is for long-term memory - one reason why the slides are designed for self study as well <br> - awareness of what is possible, `Google`, and perseverance are all we need -- - To encourage you to convert into R Markdown - practice with a mock manuscript (Parts 3–9) - start converting a real one (Part 10) --- name: part1 class: inverse, center, middle # Part 1. Getting the Tools Ready .footnote[ [Back to the contents slide](#contents-slide). ] --- name: download-zip class: action ## Course Materials — Download from the Internet - Download the materials from <https://github.com/resulumit/rmd_workshop/tree/materials> - on the webpage, follow > `Code -> Download ZIP` <br> - Unzip and rename the folder - unzip to a location that is not synced - e.g., perhaps Documents, but not Dropbox - rename the folder as `YOURNAME-rmd` - e.g., `resul-rmd` - this will come handy when we collaborate [Part 9](#part9) --- ## Course Materials — Overview Notice that the folder has the following structure ``` YOURNAME-rmd | |- manuscript | | | |- reproduce_this.pdf | |- journals.Rmd | |- references.bib | |- apa_7th.csl | |- data | | | |- journals.csv | |- image | | | |- google_scholar.png ``` --- ## Course Materials — Contents - `manuscript\reproduce_this.pdf` - the document, formatted in Word but saved as PDF, that we will re-create with R Markdown - randomly generated sentences, with figures and tables from randomly a generated dataset<sup>*</sup> - key sections in need of attention are highlighted yellow .footnote[ <sup>*</sup> The text, _Lorem ipsum_, is generated with the `stringi` package <a name=cite-R-stringi></a>([Gagolewski, Tartanus, Unicode, Inc., and others, 2021](https://CRAN.R-project.org/package=stringi)) while the dataset is created with the `fabricatr` package <a name=cite-R-fabricatr></a>([Blair, Cooper, Coppock, Humphreys, Rudkin, and Fultz, 2022](https://CRAN.R-project.org/package=fabricatr)). ] --- ## Course Materials — Contents - `manuscript\reproduce_this.pdf` - the document, formatted in Word but saved as PDF, that we will re-create with R Markdown - randomly generated sentences, with figures and tables from randomly generated dataset - key sections in-need of attention are highlighted - `manuscript\journals.Rmd` - the R Markdown document that we will work on - includes unformatted text from `reproduce_this.pdf` to save time - major components, such as paragraphs and tables, are numbered and marked in comments to facilitate navigation -- - `manuscript\references.bib` - a BibTeX document with three fabricated references -- - `manuscript\apa_7th.csl` - a Citation Style Language document, with APA (7th Edition) referencing style ([Wiernik, 2020](https://www.zotero.org/styles/apa-no-ampersand)) --- ## Course Materials — Contents `data\journals.csv` - a dataset created with the `fabricatr` package ([Blair, Cooper, Coppock, et al., 2022](https://CRAN.R-project.org/package=fabricatr)), imagined to explore the `Google Scholar` rankings of fictitious journals - includes the following variables - **name**: journals (1090 random titles) - **origin**: geographic origins (five continents) - **branch**: major discipline of journals (four branches) - **since**: time of first publication (years) - **h5_index**: H5 Index (integers) - **h5_median**: H5 Median (integers) - **english**: English (1) *vs.* other-language (0) journals - **subfield**: subfield (1) *vs.* generalist (0) journals - **issues**: number of issues published per year (integers) --- ## Course Materials — Contents - `image\google_scholar.png` - a screeenshot image of the [Google Scholar homapage](https://scholar.google.ch/) --- class: action name: install-git ## Git — Download from the Internet and Install - For Windows, install 'Git for Windows', downloading from [https://gitforwindows.org](https://gitforwindows.org/) - select 'Git from the command line and also from 3rd-party software' - For Mac, install 'Git', downloading from [https://git-scm.com/downloads](https://git-scm.com/downloads) --- class: action ## GitHub — Open an Account Sign up for GitHub at [https://github.com](https://github.com/) - registering an account is free - usernames are public - either choose an anonymous username (e.g., `asdf029348`) - or choose one carefully — it becomes a part of users' online presence - usernames can be changed later --- class: action ## R and RStudio — Download from the Internet and Install - Download R from [https://cloud.r-project.org](https://cloud.r-project.org) - choose the version for your operating system - Download RStudio from [https://rstudio.com/products/rstudio/download](https://rstudio.com/products/rstudio/download) - choose the free version --- class: action name: rstudio-project ## RStudio Project — Create from within RStudio - RStudio allows for dividing your work with R into separate projects, each with own history etc. - [this page](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects) has more information on why projects are recommended <br> - Create a new RStudio project for the existing<sup>*</sup> workshop directory `...\YOURNAME-rmd` from the RStudio menu: > `File -> New Project -> Existing Directory -> Browse -> ...\YOURNAME-rmd -> Open` .footnote[ <sup>*</sup> Recall that we have downloaded this earlier from GitHub. [Back to the relevant slide](#download-zip). ] --- ## RStudio — R Markdown Options .pull-left-c[ RStudio offers various functions that facilitate working with .Rmd documents, which can be controlled at two locations: - global settings that apply to all markdown projects, located at: > `Tools -> Global Options -> R Markdown` ] .pull-right-c[ ![](rmd_workshop_files/images_data/window_global_options.png) ] --- ## RStudio — R Markdown Options .pull-left-c[ RStudio offers various functions that facilitate working with .Rmd documents, which can be controlled at two<sup>*</sup> locations: - global settings that apply to all markdown projects, located at: > `Tools -> Global Options -> R Markdown` - project settings that apply to a given markdown project, located at: > `Tools -> Project Options -> R Markdown` ] .pull-right-c[ ![](rmd_workshop_files/images_data/window_project_options.png) ] .footnote[ <sup>*</sup> Some settings become available on the document toolbar as well, only when an .Rmd document is open. We will cover the [document toolbar](#toolbar) later on in the workshop. All settings can stay as they are — for now. ] --- class: action ## R Packages — Install from within RStudio ```r install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", "ggplot2")) tinytex::install_tinytex() ``` --- class: action ## R Packages — Install from within RStudio ```r install.packages(c(`"rmarkdown"`, "tinytex", "dplyr", "stargazer", "ggplot2")) tinytex::install_tinytex() ``` - `rmarkdown` <a name=cite-R-rmarkdown></a>([Allaire, Xie, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng, Chang, and Iannone, 2022](https://CRAN.R-project.org/package=rmarkdown)), for automating the process of converting R Markdown documents into other formats --- class: action ## R Packages — Install from within RStudio ```r install.packages(c("rmarkdown", `"tinytex"`, "dplyr", "stargazer", "ggplot2")) *tinytex::install_tinytex() ``` - `rmarkdown` ([Allaire, Xie, McPherson, et al., 2022](https://CRAN.R-project.org/package=rmarkdown)), for automating the process of converting R Markdown documents into other formats - `tinytex` <a name=cite-R-tinytex></a>([Xie, 2022c](https://github.com/rstudio/tinytex)), for PDF outputs - requires an additional step to install - alternative: a TeX/LaTeX system installed on your computer --- class: action ## R Packages — Install from within RStudio ```r install.packages(c("rmarkdown", "tinytex", `"dplyr"`, "stargazer", "ggplot2")) tinytex::install_tinytex() ``` - `dplyr` <a name=cite-R-dplyr></a>([Wickham, François, Henry, and Müller, 2022](https://CRAN.R-project.org/package=dplyr)), for data manipulation - popular alternative: e.g., `base` <a name=cite-R-base></a>([R Core Team, 2022](https://www.R-project.org/)), `data.table` <a name=cite-R-datatable></a>([Dowle and Srinivasan, 2021](https://CRAN.R-project.org/package=data.table)) --- class: action ## R Packages — Install from within RStudio ```r install.packages(c("rmarkdown", "tinytex", "dplyr", `"stargazer"`, "ggplot2")) tinytex::install_tinytex() ``` - `dplyr` ([Wickham, François, Henry, et al., 2022](https://CRAN.R-project.org/package=dplyr)), for data manipulation - popular alternative: e.g., `base` ([R Core Team, 2022](https://www.R-project.org/)), `data.table` ([Dowle and Srinivasan, 2021](https://CRAN.R-project.org/package=data.table)) - `stargazer` <a name=cite-R-stargazer></a>([Hlavac, 2022](https://CRAN.R-project.org/package=stargazer)), for tables - popular alternatives: `knitr` <a name=cite-R-knitr></a>([Xie, 2022b](https://yihui.org/knitr/)), `kableExtra` <a name=cite-R-kableExtra></a>([Zhu, 2021](https://CRAN.R-project.org/package=kableExtra)), `huxtable` <a name=cite-R-huxtable></a>([Hugh-Jones, 2021](https://hughjonesd.github.io/huxtable/)) --- class: action ## R Packages — Install from within RStudio ```r install.packages(c("rmarkdown", "tinytex", "dplyr", "stargazer", `"ggplot2"`)) tinytex::install_tinytex() ``` - `dplyr` ([Wickham, François, Henry, et al., 2022](https://CRAN.R-project.org/package=dplyr)), for data manipulation - popular alternative: e.g., `base` ([R Core Team, 2022](https://www.R-project.org/)), `data.table` ([Dowle and Srinivasan, 2021](https://CRAN.R-project.org/package=data.table)) - `stargazer` ([Hlavac, 2022](https://CRAN.R-project.org/package=stargazer)), for tables - popular alternatives: `knitr` ([Xie, 2022b](https://yihui.org/knitr/)), `kableExtra` ([Zhu, 2021](https://CRAN.R-project.org/package=kableExtra)), `huxtable` ([Hugh-Jones, 2021](https://hughjonesd.github.io/huxtable/)) - `ggplot2` , for figures - popular alternatives: `base` ([R Core Team, 2022](https://www.R-project.org/)), `plotly` <a name=cite-R-plotly></a>([Sievert, Parmer, Hocking, Chamberlain, Ram, Corvellec, and Despouy, 2021](https://CRAN.R-project.org/package=plotly)) --- class: action ## R Markdown Cheat Sheet — Download from the Internet Downloading process can be initiated from within RStudio - follow from the RStudio menu > `Help -> Cheatsheets -> R Markdown Cheat Sheet` --- ## Other Resources<sup>*</sup> - Pandoc User's Guide - available at [https://pandoc.org/MANUAL.html](https://pandoc.org/MANUAL.html) <br> - R Markdown: The Definitive Guide <a name=cite-rmarkdown2018></a>([Xie, Allaire, and Grolemund, 2018](https://bookdown.org/yihui/rmarkdown)) - open access at [https://bookdown.org/yihui/rmarkdown](https://bookdown.org/yihui/rmarkdown/) <br> - R for Data Science <a name=cite-rfordatascience></a>([Wickham and Grolemund, 2021](#bib-rfordatascience)) - open access at [https://r4ds.had.co.nz](https://r4ds.had.co.nz/) .footnote[ <sup>*</sup> During the workshop, [R Markdown Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/main/rmarkdown-2.0.pdf) is likely to be more helpful than these resources, which I recommend to be consulted after the workshop. ] --- name: part2 class: inverse, center, middle # Part 2. Introducing R Markdown .footnote[ [Back to the contents slide](#contents-slide). ] --- class: action ## R Markdown Document — Create from within RStudio - Create a new R Markdown document from the RStudio menu:<sup>*</sup> > `File -> New File -> R Markdown -> OK` - Save your new document:<sup>**</sup> > `File -> Save` - Observe that - the document has been saved to your working directory, and - it has the .Rmd extension .footnote[ <sup>*</sup> This is for demonstration purposes only. Otherwise, we will work with `journals.Rmd`, which you have already downloaded, to save time. <sup>**</sup> Alternatively, use the `Save` button or the keyboard shortcut (e.g., `Ctrl + S` on Windows). For shortcuts, follow `Tools -> Keyboard Shortcuts Help` or `Tools -> Modify Keyboard Shortcuts...`. ] --- ## R Markdown Document — Components .pull-left-c[ Observe also that the document has three components - .yellow-h[YAML] ] .pull-right-c[ ![](rmd_workshop_files/images_data/yaml.png) ] --- ## R Markdown Document — Components .pull-left-c[ Observe also that the document has three components - YAML - .yellow-h[text] ] .pull-right-c[ ![](rmd_workshop_files/images_data/yaml.png) ![](rmd_workshop_files/images_data/text.png) ] --- ## R Markdown Document — Components .pull-left-c[ Observe also that the document has three components - YAML - text - .yellow-h[code chunks] ] .pull-right-c[ ![](rmd_workshop_files/images_data/yaml.png) ![](rmd_workshop_files/images_data/text.png) ![](rmd_workshop_files/images_data/chunk.png) ] --- name: toolbar ## R Markdown Document — Document Toolbar Observe also that the document toolbar offers extended tools for .Rmd documents ![](rmd_workshop_files/images_data/toolbar.png) <br> These include, most impotantly, - the ![](rmd_workshop_files/images_data/button_knit.png) button to compile .Rmd documents --- class: action ## R Markdown Document — Compile .pull-left-c[ - Click the `Knit` button to compile your .Rmd document, and observe that - the output document has the same name as your .Rmd document - You may want to delete these newly created files, as we will work with `journals.Rmd` instead to save time. ] .pull-right-c[ <img src="rmd_workshop_files/images_data/first_rmd.png" width="95%" /> ] --- ## R Markdown Document — Compilation Process - When you `Knit`, the following happens: > `.Rmd --knitr--> .md --pandoc--> output` - `knitr`<sup>*</sup> executes the code if there is any, converts the resulting document from .Rmd (R Markdown) into .md (Markdown) - `pandoc`<sup>**</sup> transforms the .md document into your preferred output format(s) - e.g., HTML, LaTeX, PDF, Word - This process is automated by the `rmarkdown` package .footnote[ <sup>*</sup> If you had not already have the `knitr` package, it would have been installed together with the `rmarkdown` package. <sup>**</sup> RStudio comes with a copy of `pandoc` ([http://pandoc.org](http://pandoc.org)), which is not an R package, so that you do not have to install it separately. ] --- ## R Markdown Document — Notes - Behind the scenes, each .Rmd file is compiled in its own session, and therefore - the code needs to stand alone, for reproducibility reasons - e.g., if you load a package in the Console, it will not be available to a given .Rmd file — even in the same R session -- - R Markdown can produce more than documents,<sup>*</sup> including - presentations, again with `rmarkdown` - books, with `bookdown` <a name=cite-R-bookdown></a>([Xie, 2022a](https://CRAN.R-project.org/package=bookdown)) - websites, with `blogdown` <a name=cite-R-blogdown></a>([Xie, Dervieux, and Presmanes Hill, 2022](https://CRAN.R-project.org/package=blogdown)) .footnote[ <sup>*</sup> Here we will focus on research papers only. In a separate workshop, I teach how to create professional websites with R Blogdown. ] --- name: part3 class: inverse, center, middle # Part 3. Setting Metadata .footnote[ [Back to the contents slide](#contents-slide). ] --- ## YAML — Overview .Rmd documents start<sup>*</sup> with YAML - includes the metadata variables - e.g., title, output format <br> - written between a pair of three hyphens .yellow-h[-] ```r --- title: output: --- ``` .footnote[ <sup>*</sup> Technically, we can place YAML anywhere in a .Rmd document. However, it is a good practice to start with YAML so that the metadata is easly accessbile. ] --- ## YAML — Variables - `title` and `output` are the basic variables of YAML - variable names are typed in lower case, followed by a colon <span style="background-color: #ffff88;">:</span> <br> - the list of available variables, as well as options and sub-options for these variables, depends on the output format - [Pandoc User's Guide](https://pandoc.org/MANUAL.html) provides a comprehensive documentation - [R Markdown Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/main/rmarkdown-2.0.pdf) provides a helpful list - Typical YAML variables for an research paper are as follows: ```r --- title: author: date: bibliography: csl: output: --- ``` --- ## YAML — Variables Variables can take .yellow-h[strings] ```r --- *title: "Journals: Random Words With Random Data" output: --- ``` --- ## YAML — Variables Variables can take strings, .yellow-h[options] ```r --- title: "Journals: Random Words With Random Data" *output: pdf_document --- ``` --- ## YAML — Variables Variables can take strings, options, .yellow-h[sub-options] ```r --- title: "Journals: Random Words With Random Data" output: pdf_document: * keep_tex: true --- ``` --- ## YAML — Variables Variables can take strings, options, sub-options, and .yellow-h[code] ```r --- title: "Journals: Random Words With Random Data" output: pdf_document: keep_tex: true *date: "\`r format(Sys.Date(), '%d %B %Y')`" --- ``` --- ## YAML — Variables — Output Formats .pull-left-c[ Documents as output formats include - .yellow-h[HTML] ```r --- title: "Journals: Random Words With Random Data" *output: html_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml_html.png" width="95%" /> ] --- ## YAML — Variables — Output Formats .pull-left-c[ Documents as output formats include - HTML - .yellow-h[LaTeX] ```r --- title: "Journals: Random Words With Random Data" *output: latex_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml_latex.png" width="95%" /> ] --- ## YAML — Variables — Output Formats .pull-left[ Documents as output formats include - HTML - LaTeX - .yellow-h[PDF] ```r --- title: "Journals: Random Words With Random Data" *output: pdf_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml_pdf.png" width="95%" /> ] --- ## YAML — Variables — Output Formats .pull-left-c[ Documents as output formats include - HTML - LaTeX - PDF - .yellow-h[Word] ```r --- title: "Journals: Random Words With Random Data" *output: word_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml_word.png" width="95%" /> ] --- ## YAML — Variables — Output Formats .pull-left[ - Documents as output formats - `html_document` - `latex_document` - `pdf_document`<sup>*</sup> - `word_document` - `github_document` - `md_document` - `odt_document` - `rtf_document` ] .pull-right[ - Presentations as output formats - `beamer_presentation` - `iosslides_presentation` - `powerpoint_presentation` - `slidy_presentation` ] .footnote[ <sup>*</sup> For reasons of simplicity, this workshop focuses on LaTex and/or PDF outputs. Different output formats have slightly different customisations. See [Pandoc User's Guide](https://pandoc.org/MANUAL.html) and/or [R Markdown Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/main/rmarkdown-2.0.pdf). ] --- ## YAML — Strings .pull-left-c[ Strings with special characters, such as colon, require quotation marks — single .yellow-h['] or double .yellow-h["] ```r --- *title: "Journals: Random Words With Random Data" output: pdf_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml_pdf.png" width="95%" /> ] --- ## YAML — Strings .pull-left-c[ Quotation marks are optional for strings without special characters ```r --- title: "Journals: Random Words With Random Data" *subtitle: A Mock Paper for an R Markdown Workshop *author: Jane Doe *date: 4 March 2020 output: pdf_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml2.png" width="95%" /> ] --- ## YAML — Strings — Footnotes .pull-left-c[ The syntax .inline-c[^[footnotes_go_here]] adds footnotes to strings ```r --- *title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]" subtitle: A Mock Paper for an R Markdown Workshop *author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" date: 4 March 2020 output: pdf_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml3.png" width="95%" /> ] --- ## YAML — Strings — External Files The `bibliography` and `csl` variables take strings as well ```r --- title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]" subtitle: A Mock Paper for an R Markdown Workshop author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" date: 4 March 2020 *bibliography: references.bib *csl: apa_7th.csl output: pdf_document --- ``` --- ## YAML — Strings — External Files The strings for external files indicate (a) .yellow-h[where the files are located] and (b) how they are named ```r --- ... bibliography: `references/`ref_library.bib csl: "`../../styles/`chicago_manual_17.csl" ... --- ``` --- ## YAML — Strings — External Files The strings for external files indicate (a) .yellow-h[where the files are located] and (b) how they are named ```r --- ... bibliography: `references/`ref_library.bib csl: "`../../styles/`chicago_manual_17.csl" ... --- ``` <br> Notice that - the locations above are specified as .yellow-h[relative to the working directory] - the former (references) is a sub-directory, or folder, one level down while the latter (styles) is two levels up - for reproducibility reasons, hard-coded stings should be avoided - e.g., `"C:/Users/resulumit/Dropbox/styles/chicago_manual_17.csl"` --- ## YAML — Strings — External Files The strings indicate (a) where the files are located and (b) .yellow-h[how they are named] ```r --- ... bibliography: references/`ref_library.bib` csl: "../../styles/`chicago_manual_17.csl`" ... --- ``` --- ## YAML — Options and Sub-Options .pull-left-c[ Options can have sub-options ```r --- title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]" subtitle: A Mock Paper for an R Markdown Workshop author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" date: 4 March 2020 bibliography: references.bib csl: apa_7th.csl *output: * pdf_document: * keep_tex: true --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml4.png" width="100%" /> ] --- ## YAML — Options and Sub-Options .pull-left[ Options can have sub-options ```r --- title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]" subtitle: A Mock Paper for an R Markdown Workshop author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" date: 4 March 2020 bibliography: references.bib csl: apa_7th.csl *output: * pdf_document: * keep_tex: true --- ``` ] .pull-right[ Notice that - this specific setting, highlighted, will create multiple outputs - a LaTeX and a PDF document - all but the last option (i.e., `true`) takes a colon - options and sub-options (except the last option, again) are stepwise indented - exactly with four spaces - the alignment between the colons for `pdf_document` and `keep_tex` is coincidental ] --- ## YAML — R Code .pull-left-c[ Variables can take code as well ```r --- title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]" subtitle: A Mock Paper for an R Markdown Workshop author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" *date: "\`r format(Sys.Date(), '%d %B %Y')`" bibliography: references.bib csl: apa_7th.csl output: pdf_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml5.png" width="95%" /> ] --- ## YAML — R Code .pull-left[ Variables can take code as well ```r --- title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]" subtitle: A Mock Paper for an R Markdown Workshop author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" *date: "\`r format(Sys.Date(), '%d %B %Y')`" bibliography: references.bib csl: apa_7th.csl output: pdf_document --- ``` ] .pull-right[ Notice that - such codes can be particularly useful for variables - that need frequent updates - and that can be automatically updated - e.g., `date` - there are quotation marks around the code - we will cover codes in .Rmd documents later on in the workshop ] --- ## YAML — R Code .pull-left-c[ Code and text can be combined in a string ```r --- title: "Journals: Random Words With Random Data^[Preliminary draft. Please do not cite or circulate without permission from the author.]" subtitle: A Mock Paper for an R Markdown Workshop author: "Jane Doe^[Department of Science, University of Random. Email: jane.doe@random.edu. Website: http://www.janedoe.com.]" *date: "First version: 4 March 2020. This version: \`r format(Sys.Date(), '%d %B %Y')`." bibliography: references.bib csl: apa_7th.csl output: pdf_document --- ``` ] .pull-right-c[ <img src="rmd_workshop_files/images_data/yaml6.png" width="95%" /> ] --- name: yaml-further-settings ## YAML — Some Further Settings for PDF Outputs - `fontsize` - the default is `10pt` - the other options are `11pt` and `12pt` - `linkcolor`, `urlcolor`, `citecolor` - the default is the colour of the text - the other options are white, red, green, blue, cyan, magenta, yellow - `link-citations` - the default is `no` - the other option is `yes` — a click on an citation will take the screen to the relevant entry in the list of references --- class: action ## Exercises — 1–4 1) Open `journals.Rmd` and fill in the YAML variables for the mock paper - take cues from `reproduce_this.pdf` and/or the slides <br> 2) Add and set one of the variables mentioned as [further settings for PDF outputs](#yaml-further-settings) above - i.e., `fontsize`, `linkcolor`, `urlcolor`, `citecolor`, `link-citations` <br> 3) Add and set a completely new variable not covered so far - see, for example, the [R Markdown Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/main/rmarkdown-2.0.pdf) <br> 4) `Knit` your `journals.Rmd` - observe the outcome
10
:
00
--- name: part4 class: inverse, center, middle # Part 4. Writing Text .footnote[ [Back to the contents slide](#contents-slide). ] --- ## Syntax — Overview - There are not one, but several different versions of Markdown - e.g., [Pandoc](https://pandoc.org/), [MultiMarkdown](https://fletcherpenney.net/multimarkdown/), [CommonMark](https://commonmark.org/) - each might implement the same things (e.g., citations) slightly differently, and each might offer unique functionalities <br> - R Markdown follows the syntax in Pandoc's Markdown - for the complete rules of the syntax, see [Pandoc User's Guide](https://pandoc.org/MANUAL.html) - for a useful summary of the syntax, see the [R Markdown Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/main/rmarkdown-2.0.pdf) --- ## Syntax — Lines Multiple spaces on a given line are reduced to one ```r This is a sentence followed by four spaces. This is another sentence on the same line. ``` .out-t[This is a sentence followed by four spaces. This is another sentence on the same line.] <br> Line endings with fewer than two spaces are ignored ```r This is a sentence followed by one space. This is another sentence on a new line. ``` .out-t[This is a sentence followed by one space. This is another sentence on a new line.] --- ## Syntax — Hard Breaks Two or more spaces at the end of lines introduce hard breaks, forcing a new line ```r This is a sentence followed by two spaces. This is another sentence on a new line. ``` .out-t[This is a sentence followed by two spaces. This is another sentence on a new line.] --- ## Syntax — Line Blocks Spaces on lines that start with a vertical line .yellow-h[|] are kept ```r | a one-space indent | a five-space indent | a ten-space indent ``` .out-t[ a one-space indent a five-space indent a ten-space indent ] --- ## Syntax — Block Quotes Lines starting with the greater-than sign .yellow-h[>] introduce block quotes<sup>*</sup> ```r > In God, we trust. All others must bring data. > > --- Anonymous ``` .out-t[ In God, we trust. All others must bring data. — Anonymous ] .footnote[ <sup>*</sup> Notice that three hyphens grouped together introduce an em-dash. Dashes are covered later on in the workshop. ] --- ## Syntax — Paragraphs One or more<sup>*</sup> blank lines introduce a new paragraph ```r This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. This is the first sentence of a *new paragraph* as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. ``` .out-t[ This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. This is the first sentence of a *new paragraph* as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. ] .footnote[ <sup>*</sup> Multiple blank lines between paragraphs reduce to one. ] --- ## Syntax — Comments Text with the syntax .inline-c[<!--].inline-c[comments -->] is omitted from output ```md <!-- This paragraph needs re-writing --> This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. This is the first sentence of a new paragraph <!-- I've removed italics --> as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. ``` .out-t[ This is the first sentence of a paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. This is the first sentence of a new paragraph as it is preceded by a blank line. This is the second sentence of that paragraph, which is followed by a blank line. ] --- class: action ## Exercises — 5–6 5) Hard Breaks - see `reproduce_this.pdf`, page 1 - apply in `journals.Rmd`, paragraph 1 <br> 6) Line Blocks / Block Quotes - see `reproduce_this.pdf`: page 1 - apply in `journals.Rmd`: block quote, between paragraphs 1 and 2 <br> - see `reproduce_this.pdf`: page 5 - apply in `journals.Rmd`: hypothesis 1, between paragraphs 14 and 15; hypothesis 2, between paragraphs 16 and 17
05
:
00
--- ## Syntax — Headers The number sign .yellow-h[#] introduces headers; lower levels are created with additional signs — up to total five levels .pull-left[ .inline-c[# Introduction] becomes .out-t[ # Introduction ] .inline-c[## 1. Introduction] becomes .out-t[ ## 1. Introduction ] ] .pull-right[ .inline-c[### 3.1 Introduction] becomes .out-t[ ### 3.1 Introduction ] .inline-c[#### Introduction] becomes .out-t[ #### Introduction ] .inline-c[##### Introduction] becomes .out-t[ ##### Introduction ] ] --- ## Syntax — Emphases A pair of single asterisk .yellow-h[*] or underscores .yellow-h[_] introduces italics <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">*italics*</span> becomes .out-t[*italics*] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">_italics_</span> becomes .out-t[*italics*] as well <br> A pair of double asterisk or underscores introduces bold <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">**bold**</span> becomes .out-t[**bold**] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">__bold__</span> becomes .out-t[**bold**] as well <br> These two rules can be combined <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">**_bolditalics_**</span> becomes .out-t[**_bolditalics_**] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">_**bolditalics**_</span> becomes .out-t[_**bolditalics**_] as well --- ## Syntax — Strikethrough A pair of double tildes .yellow-h[~] introduces strikethrough <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">~~strikethrough~~</span> becomes .out-t[~~strikethrough~~] <br> Strikethrough can be combined with italics or bold <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">**~~strikebold~~**</span> or <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">__~~strikebold~~__</span>, they both become .out-t[**~~strikebold~~**] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">~~**strikebold**~~</span> or <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">~~__strikebold__~~</span>, they both become .out-t[~~**strikebold**~~] as well <br> <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">*~~strikeitalitcs~~*</span> or <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">_~~strikeitalitcs~~_</span>, they both become .out-t[*~~strikeitalitcs~~*] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">~~*strikeitalitcs*~~</span> or <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">~~_strikeitalitcs_~~</span>, they both become .out-t[*~~strikeitalitcs~~*] as well --- class: action ## Exercises — 7–8 7) Headers - see `reproduce_this.pdf`: pages 1 to 11 - 10 headers, Abstract to References <br> - apply in `journals.Rmd` <br> 8) Emphases - see `reproduce_this.pdf`: pages 1 and 2 - bold and italics <br> - apply in `journals.Rmd`: paragraph 2
03
:
00
--- name: internal-links ## Syntax — Links — Internal<sup>*</sup> You can link text to section headers in the same document <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">[Conclusion](#conclusion)</span> becomes .out-t[[Conclusion](#internal-links)], and a click takes the screen to that section <br> Multi-word headers need hyphenation <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">[Literature Review](#literature-review)</span> becomes .out-t[[Literature Review](#internal-links)], and it works only if the second part is hyphenated .footnote[ <sup>*</sup> The [links to references](#reference-links), [figures](#autoref-figures), and [tables](#autoref-tables) are covered later on in the workshop. ] --- ## Syntax — Links — External You can link text to URLs <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">[visit my website](https://resulumit.com/)</span> becomes .out-t[[visit my website](https://resulumit.com/)] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">[https://resulumit.com](https://resulumit.com/)</span> becomes .out-t[[https://resulumit.com](https://resulumit.com/)] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;"><https://resulumit.com></span> becomes .out-t[<https://resulumit.com>] as well -- <br> You can also link text to an email address <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">[email me](mailto:resuluy@uio.no)</span><sup>*</sup> becomes .out-t[[email me](mailto:resuluy@uio.no)] <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;"><resuluy@uio.no></span> becomes .out-t[<resuluy@uio.no>] .footnote[ <sup>*</sup> Notice the prefix .yellow-h[mailto:] in the syntax. ] --- class: action ## Exercises — 9–10 9) Links — Internal - see `reproduce_this.pdf`: page 2 - the link to the Literature Review section <br> - apply in `journals.Rmd`: paragraph 4 <br> 10) Links — External - see `reproduce_this.pdf`: page 1 - email and website links in one of the footnotes <br> - apply in `journals.Rmd`: title page items
03
:
00
--- ## Syntax — Equations Inline equations go between a pair of single dollar signs <span style="background-color: #ffff88;">$</span> — with no space between the signs and the equation itself .inline-c[$E = mc^{2}$] becomes .out-t[*E = mc<sup>2</sup>*] <br> -- Block equations go in between a pair of double dollar signs — with or without spaces, it works .inline-c[$$ E = mc^{2}].inline-c[$$] becomes .out-t[ <center> <i> E = mc<sup>2</sup> </i> </center> ] <br> .inline-c[$$E = mc_{2}].inline-c[$$] becomes .out-t[ <center> <i> E = mc<sub>2</sub> </i> </center> ] --- ## Syntax — Footnotes — Inline Notes For inline footnotes, use the .inline-c[^[footnote]] syntax .inline-c[Jane Doe^[Corresponding author.]] becomes .out-t[Jane Doe<sup>1</sup>] <br> .footnote[ .out-t[<sup>1</sup> Corresponding author.] ] -- Notice that - the caret sign .yellow-h[^] comes .yellow-h[before] the left square bracket .yellow-h[[] - this syntax works in YAML as well as in text - footnotes in YAML get symbols, in text they get numbers --- ## Syntax — Footnotes — Notes with Identifiers An alternative is to use the .inline-c[[^identifier]] syntax, with identifiers defined elsewhere in the same document ```r Dr Doe holds a PhD in rock science.[^defence_date] [^defence_date]: She defended her thesis in 2017. ``` .out-t[Dr Doe holds a PhD in rock science.<sup>1</sup>] <br> .footnote[ .out-t[<sup>1</sup> She defended her thesis in 2017.] ] -- Notice that - the caret sign comes .yellow-h[after] the left square bracket - this syntax works in text, but not in YAML --- class: action ## Exercises — 11–12 11) Equations - see `reproduce_this.pdf`: page 7 - apply in `journals.Rmd`: paragraph 22; block equation, between paragraphs 22 and 23 <br> 12) Footnotes - see `reproduce_this.pdf`: page 2 - apply in `journals.Rmd`: paragraph 3
03
:
00
--- ## Syntax — Lists Lines starting with asterisk .yellow-h[*] as well as plus .yellow-h[+] or minus .yellow-h[−] signs introduce lists ```r - books - articles - reports ``` .out-t[ - books - articles - reports ] --- ## Syntax — Lists — Nesting Lists can be nested within each other, with indentation ```r + books + articles - published - under review + revised and resubmitted - work in progress ``` .out-t[ + books + articles - published - under review + revised and resubmitted - work in progress ] --- ## Syntax — Lists — Numbering List items can be numbered ```r 1. books 2. articles - published - under review + revised and resubmitted - work in progress ``` .out-t[ 1. books 2. articles - published - under review + revised and resubmitted - work in progress ] --- ## Syntax — Dashes Two hyphens grouped together introduce an en-dash .inline-c[‐‐] becomes .out-t[–] <br> Three hyphens grouped together introduce an em-dash .inline-c[‐‐‐] becomes .out-t[—] --- ## Syntax — Subscripts and Superscripts .pull-left[ A pair of tildes introduces subscript .inline-c[CO~2~] becomes .out-t[CO<sub>2</sub>] <br> A pair of carets introduces superscript .inline-c[R^2^] becomes .out-t[R<sup>2</sup>] ] --- ## Syntax — Subscripts and Superscripts .pull-left[ A pair of tildes introduces subscript .inline-c[CO~2~] becomes .out-t[CO<sub>2</sub>] <br> A pair of carets introduces subscript .inline-c[R^2^] becomes .out-t[R<sup>2</sup>] ] .pull-right[ Notice that - the syntax here (Markdown-based) is different than the one for equations (LaTeX-based) - e.g., <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">R^2^</span> versus <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;"> mc^{2}</span> ] --- class: action ## Exercises — 13–15 13) Lists - see `reproduce_this.pdf`: page 3 - apply in `journals.Rmd`: list, between paragraphs 10 and 11 <br> 14) Dashes - see `reproduce_this.pdf`: page 2 - apply in `journals.Rmd`: paragraph 6 <br> 15) Subscripts and Superscripts - see `reproduce_this.pdf`: page 2 - apply in `journals.Rmd`: paragraph 5
03
:
00
--- name: part5 class: inverse, center, middle # Part 5. Managing References .footnote[ [Back to the contents slide](#contents-slide). ] --- ## References — Bibliography Database .pull-left-c[ - References are defined in .bib files - they follow the BibTeX format <br> - `pandoc` looks for a .bib file, and for the definitions therein, to process citations - .bib files are specified with the `bibliography` variable in YAML <br> - `pandoc` can process a citation only if there is a linked entry in the .bib file - but not all entries have to be cited ] .pull-right-c[ <img src="rmd_workshop_files/images_data/bib.png" width="95%" /> ] --- ## References — Bibliography Database — Entries .pull-left-c[ - A BibTeX entry consists of three elements - a type - e.g., `@article` <br> - a citation-key - e.g., `bennett2015` <br> - a number of tags - e.g., `title`, `author` <br> - Different tags are available for different reference types - some tags are required, others are optional ] .pull-right-c[ <img src="rmd_workshop_files/images_data/bib1.png" width="100%" /> ] --- ## References — Bibliography Database — Entries - One could create entries by hand - requires knowing the BibTeX format, entry types, tags, and related information about references to be cited - neither efficient nor necessary <br> - A good alternative is to use `Google Scholar`, which provides BibTeX entries - follow `cite -> BibTex` and copy - paste into .bib, edit if necessary, and save <br> - Some publishers and journals provide BibTeX entries on their website as well --- ## References — Style .pull-left-c[ - Reference styles are defined in .csl files - files for different styles (e.g., APA) are available at [https://www.zotero.org/styles](https://www.zotero.org/styles) <br> - `pandoc` looks for a .csl file, and for the styles therein, to style citations and references - .csl files are specified with the `csl` variable in YAML - if unspecified, it uses a Chicago author-date format <br> - .csl files affect the style only in outputs - no matter which the style is used, the citation syntax in .Rmd documents remains the same ] .pull-right-c[ <img src="rmd_workshop_files/images_data/csl.png" width="95%" /> ] --- ## References — In-text Citation Syntax — Author-Date Styles<sup>*</sup> All citations keys take the 'at' sign .yellow-h[@] while square brackets and/or minus signs introduce variation .pull-left[ .inline-c[[@bennett2015]] becomes .out-t[(Bennett, 2015)] .inline-c[@bennett2015] becomes .out-t[Bennett (2015)] .inline-c[[-@bennett2015]] becomes .out-t[(2015)] .inline-c[-@bennett2015] becomes .out-t[2015] .inline-c[[@bennett2015 35]] becomes .out-t[(Bennett, 2015, p. 35)] .inline-c[[@bennett2015 33-35]] becomes .out-t[(Bennett, 2015, pp. 33–35)] ] .pull-right[ .inline-c[[@bennett2015, ch. 1]] becomes .out-t[(Bennett, 2015, ch. 1)] .inline-c[[@bennett2015; @gilbert2019]] becomes .out-t[(Bennett, 2015; Gilbert, 2019)] .inline-c[[see @bennett2015, for details]] becomes .out-t[(see Bennett, 2015, for details)] .inline-c[@bennett2015 [33-35]] becomes .out-t[Bennett (2015, pp. 33–35)] ] .footnote[ <sup>*</sup> Specifically, the outputs on this slide are formatted according to the APA 7<sup>th</sup> edition. ] --- ## References — In-text Citation Syntax — Numerical Styles All citations keys take the 'at' sign .yellow-h[@] .inline-c[A clever sentence.[@bennett2015]] becomes .out-t[A clever sentence.<sup>[1]</sup>] in certain numerical sytles .inline-c[A clever sentence.[@bennett2015; @gilbert2019]] becomes .out-t[A clever sentence.<sup>[1,2]</sup>] <br> -- Individual styles may or may not use additional information, such as page numbers .inline-c[A clever sentence.[@bennett2015 35]] might become .out-t[A clever sentence.<sup>[1]</sup>] as well <br> -- Individual styles may or may not be sensitive to variation, such as square brackets .inline-c[A clever sentence. @bennett2015] might become .out-t[A clever sentence.<sup>[1]</sup>] as well --- ## Citations — Reference List The list of references appears after the last line of the output document, with no section header - so that you can choose the header yourself, by ending .Rmd documents with a header of your choice ```r This is the last sentence of an APA style manuscript. ## References ``` .out-t[ This is the last sentence of an APA style manuscript. ### References Bennett, S. (2015). Peanut butter and jelly. *Journal of Bone, 1*(12), 3–35. Gilbert, T. (2019). Turning wine into water. In M. Albert (Ed.), *The book of ground* (pp. 124–142). Antman. ] --- name: reference-links ## References — Internal Links For internal links from in-text citations to the reference list, set .inline-c[link-citations: yes] in YAML - a click on these links takes the screen to the relevant entry in the list - the `linkcolor` variable make these links explicit - setting this is not necessary for the links to work — the default is black ```r --- ... bibliography: references.bib csl: apa_7th.csl *link-citations: yes *linkcolor: blue ... --- ``` --- class: action ## Exercises — 16–19 16) Add an entry to `references.bib` for the following book - *R Markdown: The Definitive Guide* by Xie and co-authors <br> 17) Reproduce the citations and reference list in the mock paper - see `reproduce_this.pdf`: pages 3 and 11 - apply in `journals.Rmd`: paragraph 7 to 9 <br> 18) Change the reference style - download the .csl file for your favourite style from [https://www.zotero.org/styles](https://www.zotero.org/styles) - put it into your working directory - update the YAML variable <br> 19) Link the citations to the reference list
07
:
30
--- name: part6 class: inverse, center, middle # Part 6. Adding Code, Figures, and Tables .footnote[ [Back to the contents slide](#contents-slide). ] --- class: center, middle # Code, in and outside Chunks --- ## Code — Overview Most codes go inside code chunks - e.g., code that imports and cleans data, and/or produces tables and/or figures ````md ```{r} df <- read.csv("rmd_workshop_files/images_data/journals.csv") %>% mutate(age = 2020 - since, english = factor(english), subfield = factor(subfield)) ``` ```` <br> Codes can also go in line with text - e.g., code that results in a single statistic ```r The average H5 Index for the journals in the dataset is \`r mean(df$h5_index)`. ``` --- ## Code Chunks — Overview .pull-left-c[ - Code chunks are delimited spaces between a pair of three backticks .yellow-h[`] - placed on their own lines in .Rmd documents, separate from text - their output, if there is any, appear in the output document - at about the same place as the chunk - might float around text to avoid breaking across pages ] .pull-right-c[ ````md ``` ``` ```` ] --- ## Code Chunks — Overview .pull-left-c[ - Code chunks are delimited spaces between a pair of three backticks ` - placed on their own lines in .Rmd documents, separate from text - their output, if there is any, appear in the output document - at about the same place as the chunk - might float around text to avoid breaking across pages <br> - On the same line with the first delimiter, and in curly brackets .yellow-h[{], code chunks take - .yellow-h[a languge engine] ] .pull-right-c[ ````md ```{`r`} ``` ```` ] --- ## Code Chunks — Overview .pull-left-c[ - Code chunks are delimited spaces between a pair of three backticks ` - placed on their own lines in .Rmd documents, separate from text - their output, if there is any, appear in the output document - at about the same place as the chunk - might float around text to avoid breaking across pages <br> - On the same line with the first delimiter, and in curly brackets {, code chunks take - a language engine - .yellow-h[a label] ] .pull-right-c[ ````md ```{r, `setup`} ``` ```` ] --- ## Code Chunks — Overview .pull-left-c[ - Code chunks are delimited spaces between a pair of three backticks ` - placed on their own lines in .Rmd documents, separate from text - their output, if there is any, appear in the output document - at about the same place as the chunk - might float around text to avoid breaking across pages <br> - On the same line with the first delimiter, and in curly brackets {, code chunks take - a language engine - a label - .yellow-h[one or more options] ] .pull-right-c[ ````md ```{r, setup, `echo=FALSE`} ``` ```` ] --- ## Code Chunks — Lenguage Engines The first item in code chunks indicates the engine to run the code ````md ```{`r`} ``` ```` <br> Note that - indicating an engine for each chunk is a must - otherwise, any code<sup>*</sup> in these chunks cannot be executed - `r` is the specified engine, indicating that the code in the chunk above should be run by R - it could have been `python`, which we will not cover in this workshop .footnote[ <sup>*</sup> The above chunk has no code — it is for demonstration only. ] --- ## Code Chunks — Labels It is recommended, but optional, to label the code chunks ````md ```{r, `data_import`} df <- read_csv("data/journals.csv") ``` ```` <br> Note that - labels are written after the language engine, separated by a comma - in the example above, the chunk is labelled as `data_import` - chunks without labels are otherwise automatically numbered - specifying informative labels can be helpful for, e.g., navigating through error messages - duplicate labels lead to errors during compilation --- ## Code Chunks — Options Code chunks can take further options ````md ```{r, setup, `include=FALSE`} ``` ```` <br> Note that - in the example above, the `include` option is set to `FALSE` - with this option and value, nothing from this chunk will be included in the output document - The complete list of options is available at <https://yihui.org/knitr/options> - [R Markdown Cheat Sheet](https://github.com/rstudio/cheatsheets/raw/main/rmarkdown-2.0.pdf) provides a helpful list as well - leaving spaces around the equal sign <span style="background-color: #ffff88;">=</span>, between option tags and values, should be avoided - such spaces might lead to errors --- ## Code Chunks — Options — Alternative Syntax Options can be specified inside code chunks as well, after a number sign and a vertical line .yellow-h[#|] - therefore the following chunks have the same function ````md ```{r, echo=FALSE, eval=TRUE} ``` ```` ````md ```{r} #| echo = FALSE, eval = TRUE ``` ```` ````md ```{r} #| echo = FALSE #| eval = TRUE ``` ```` --- ## Code Chunks — Options — Defaults Options have default values - e.g., for `echo`, the default is `TRUE` - `echo`: should the source code printed in the output? - `TRUE`: yes it should <br> - therefore the following two chunks have the same function ````md ```{r} ``` ```` ````md ```{r, `echo=TRUE`} ``` ```` --- ## Code Chunks — Options — Defaults This chunk prints two things in the output document — (a) the code and (b) the head of the data frame ````md ```{r} head(df) ``` ```` .out-t[ `head(df)` ] <hr style="height:10px; visibility:hidden;" /> .out-t[ ``` ## name origin branch h5_index h5_median english subfield ## 1 Journal of Bears Americas Physical 73 97 1 1 ## 2 Journal of Moon Asia Social 72 106 1 0 ## 3 Journal of Lumber Americas Physical 72 100 1 1 ## 4 Journal of Houses Europe Social 72 102 1 0 ## 5 Journal of Water Europe Social 70 100 1 0 ## 6 Journal of Jeans Americas Physical 69 101 1 1 ## issues age ## 1 7 61 ## 2 6 64 ## 3 8 30 ## 4 8 38 ## 5 5 33 ## 6 5 64 ``` ] --- ## Code Chunks — Options — Examples Setting .inline-c[echo=FALSE] prevents the code from being displayed in the output document ````md ```{r ... `echo=FALSE`} head(df) ``` ```` This chunk therefore prints one thing in the output document — the head of the data frame .out-t[ ``` ## name origin branch h5_index h5_median english subfield ## 1 Journal of Bears Americas Physical 73 97 1 1 ## 2 Journal of Moon Asia Social 72 106 1 0 ## 3 Journal of Lumber Americas Physical 72 100 1 1 ## 4 Journal of Houses Europe Social 72 102 1 0 ## 5 Journal of Water Europe Social 70 100 1 0 ## 6 Journal of Jeans Americas Physical 69 101 1 1 ## issues age ## 1 7 61 ## 2 6 64 ## 3 8 30 ## 4 8 38 ## 5 5 33 ## 6 5 64 ``` ] --- ## Code Chunks — Options — Examples Prevent the result(s) of the source code from being displayed in the output document ````md ```{r ... `results="hide"`} head(df) ``` ```` This chunk therefore prints one thing in the output document — the source code .out-t[ `head(df)` ] Setting .inline-c[results="asis"] passes the results as they are produced by the code — `pandoc` does not transform these. In creating tables for PDF output with the `stargazer` package, this option is a must. --- ## Code Chunks — Options — Examples Cache results for future compilations ````md ```{r ... `cache=TRUE`} ``` ```` -- <br> Note that caching - is useful especially for chunks that take a long time to execute - it can speed up the compilation process - avoids executing the chunks at every compilation - unless the chunk is newly created or edited since the last cached compilation - creates a new folder in your working directory - an alternative location can be specified with the `cache.path` option --- ## Code Chunks — Options — Examples Prevent R from running the code in the chunk altogether ````md ```{r ... `eval=FALSE`} ``` ```` -- <br> Prevent messages and/or warnings from being displayed in the output ````md ```{r ... `error=FALSE`, `message=FALSE`, `warning=FALSE`} ``` ```` --- ## Code Chunks — Options — Examples Define the .yellow-h[actual dimensions] of figures, in inches ````md ```{r ... `fig.height=6`, `fig.width=9`} ``` ```` -- <br> Define the size of figures .yellow-h[as they appear in the output document], with `out.width` and/or `out.height` ````md ```{r ... `out.width="50%"`} ``` ```` -- <br> Define the alignment of figures — `left`, `right`, or `center` ````md ```{r ... `fig.align="center"`} ``` ```` --- ## Code Chunks — Options — Examples Define captions for figures ````md ```{r ... `fig.caption="A Scatter Plot"`} ``` ```` -- <br> Set the resolution for figures ````md ```{r ... `dpi=300`} ``` ```` -- <br> Set extra options, such as angle, that output format would accept for figures ````md ```{r ... `out.extra="angle=45"`} ``` ```` --- ## Code Chunks — The Setup Chunk It is recommended to use the first code chunk for general setup, where you can - define .yellow-h[your own defaults] for chunk options, with `knitr::opts_chunk$set()` - avoids repeating chunk options <br> - load the necessary packages <br> - import raw data ````md ```{r, setup, include=FALSE} # `chunk option defaults` knitr::opts_chunk$set(echo=FALSE, message=FALSE) # `packages` library(dplyr) library(ggplot2) library(stargazer) # `data` df_raw <- read.csv("journals.csv") ``` ```` --- ## Code Chunks — The Data Chunk I recommend using the second chunk for the main operations<sup>*</sup> on raw data - e.g., for data cleaning and other transformations - some minor transformations could be left to lower chunks - e.g., capitalizing variable names for figures ````md ```{r, data, ...} df <- df_raw `%>%` mutate(subfield = as.factor(subfield), english = as.factor(english), age = 2020 - since) `%>%` select(-since) ``` ```` .footnote[ <sup>*</sup> I will be using the pipe operator <span style="background-color: #ffff88; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;"><code>%>%</code></span> and other functions from the `dplyr` package for such operations in the following slides. ] --- ## Inline Code — Overview Code can also be incorporated in text, with the <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;"><code>`r `</code></span> syntax - unlike chunks, these do not take options <br> - the output document will display the result of the code - in the exact place of the source code <br> - the result of the code will have the same formatting with the text --- ## Inline Code — Examples ```r If we multiply _pi_ by 5, we get \`r pi * 5`. ``` .out-t[If we multiply _pi_ by 5, we get 15.7079633.] <hr style="height:10px; visibility:hidden;" /> ```r The average H5 Index for the journals in the dataset is \`r mean(df$h5_index)`, which would round to \`r round(mean(df$h5_index), digits = 1)`. ``` .out-t[The average H5 Index for the journals in the dataset is 26.3611366, which would round to 26.4.] <hr style="height:10px; visibility:hidden;" /> ```r __Only \`r nrow(subset(df, english == 0))` journals__ in the dataset are published in a language other than English. ``` .out-t[__Only 113 journals__ in the dataset are published in a language other than English.] --- class: action ## Exercises — 20–22 20) Setup Chunk - introduce a setup chunk with one or more defaults chunk options, with `knitr::opts_chunk$set()` - load the packages that we will need — `dplyr`, `ggplot2`, and `stargazer` - import raw data <br> 21) Data Chunk - introduce a data chunk to transform `subfield` and `english` into factors - create a new variable `age`, based on `since` - drop `since` from the data frame <br> 22) Inline code - see `reproduce_this.pdf`: page 6 - i.e., 1091 observations <br> - apply in `journals.Rmd`: paragraph 21 - hint: use the `nrow` function
07
:
30
--- class: center, middle # Figures --- ## Figures — Images — Markdown Syntax The syntax <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">![Figure Caption](figure.extension)</span> embeds images, and/or figures produced elsewhere,<sup>*</sup> into .Rmd documents - similar to the link syntax, only this time it is preceded by an exclamation mark .yellow-h[!] - goes outside code chunks, on a new line - simple, but not very customisable .footnote[ <sup>*</sup> Ideally, reproducible papers should produce their own images with data and code. However, there might be situations where this is not possible. ] --- ## Figures — Images — Markdown Syntax ```r ![A screenshot of the Google Scholar homepage](../image/google_scholar.png) ``` <img src="rmd_workshop_files/images_data/google_scholar.png" width="65%" style="display: block; margin: auto;" /> <center>Figure 1: A screenshot of the Google Scholar homepage.</center> --- ## Figures — Images — Markdown Syntax Figures are numbered automatically ```r ![A screenshot of the Google Scholar homepage](../image/google_scholar.png) ``` <img src="rmd_workshop_files/images_data/google_scholar.png" width="65%" style="display: block; margin: auto;" /> <center>.yellow-h[Figure 1]: A screenshot of the Google Scholar homepage. </center> --- ## Figures — Images — Markdown Syntax The syntax can accept `width` or `height` attributes as follows ```r ![A screenshot of the Google Scholar homepage](../image/google_scholar.png)`{ width=40% }` ``` <img src="rmd_workshop_files/images_data/google_scholar.png" width="40%" style="display: block; margin: auto;" /> <center>Figure 1: A screenshot of the Google Scholar homepage.</center> --- ## Figures — Images — `knitr` The `knitr` package offers a capable alternative with the `include_graphics()` function - this goes inside code chunks - use the function with the double-colon operator <span style="background-color: #ffff88">::</span> - e.g., `knitr::include_graphics("figure.extension")` <br> - this is more customisable, through the use of code chunks - size is defined with the `out.width` or `out.hight` options - rather than `fig.height` and/or `fig.width` --- ## Figures — Images — `knitr` The `knitr` package offers a capable alternative with the `include_graphics()` function ````md ```{r, screenshot, echo=FALSE, fig.cap="A screenshot of the Google Scholar homepage."} knitr::include_graphics("../image/google_scholar.png") ``` ```` <img src="rmd_workshop_files/images_data/google_scholar.png" width="55%" /> <center>Figure 1: A screenshot of the Google Scholar homepage.</center> --- ## Figures — Images — `knitr` Size is defined with the chunk options `out.width` or `out.hight` ````md ```{r ... `out.width="40%"`} knitr::include_graphics("../image/google_scholar.png") ``` ```` <img src="rmd_workshop_files/images_data/google_scholar.png" width="40%" /> <center>Figure 1: A screenshot of the Google Scholar homepage.</center> --- ## Figures — Images — `knitr` Most other chunk options are common with figures plotted within R Markdown, such as `fig.align` ````md ```{r ... `fig.align="center"`} knitr::include_graphics("../image/google_scholar.png") ``` ```` <img src="rmd_workshop_files/images_data/google_scholar.png" width="40%" style="display: block; margin: auto;" /> <center>Figure 1: A screenshot of the Google Scholar homepage.</center> --- class: action ## Exercise 23) Images - see `reproduce_this.pdf`: figure 1 on page 10 - apply in `journals.Rmd`: figure 1, between paragraphs 19 and 20
03
:
00
--- ## Figures — `ggplot2` — Overview - A powerful package for visualising data - Used widely, not only by academics, but also by large corporations such as the New York Times - A huge amount is written on this package. See, for example, - the [package documentation](https://www.rdocumentation.org/packages/ggplot2/versions/3.2.1) - this [book](https://ggplot2-book.org/) by its creator Hadley Wickham - this [reference page](https://ggplot2.tidyverse.org/reference/) - this [webinar](https://www.youtube.com/watch?v=h29g21z0a68) by one of its authors, Thomas Lin Pedersen - these [extensions](https://exts.ggplot2.tidyverse.org/), maintained by the `ggplot2` community - Among its alternatives are the `base` and `plotly` packages --- ## Figures — `ggplot2` — Basics 1) The `ggplot` function and the `data` argument - specify a data frame in the main `ggplot` function ```r ggplot(data = df) ``` -- 2) The mapping aesthetics, or .yellow-h[aes]; most importantly, the variable(s) that we want to plot - specify as an additional argument in the same `ggplot` function ```r ggplot(data = df, `mapping = aes(x = h5_median, y = h5_index, color = subfield)`) ``` -- 3) The geometric objects, or .yellow-h[geom]; the visual representations - specify, after a plus sign .yellow-h[+], as an additional function ```r ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) `+` * geom_point() ``` --- ## Figures — `ggplot2` Put the code in a chunk, and give it a caption .pull-left-c[ ````md ```{r, scatterplot, `fig.cap = "A scatterplot of journal metrics."`} ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() ``` ```` ] .pull-right-c[ ![](rmd_workshop_files/figure-html/unnamed-chunk-73-1.png)<!-- --> Figure 1. A scatterplot of journal metrics. ] --- ## Figures — `ggplot2` Add facets for subgroups, e.g., `branch` .pull-left-c[ ````md ```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."} ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() `+` `facet_wrap(. ~ branch)` ``` ```` ] .pull-right-c[ ![](rmd_workshop_files/figure-html/unnamed-chunk-74-1.png)<!-- --> Figure 1. A scatterplot of journal metrics. ] --- ## Figures — `ggplot2` Scale the colour to improve the legend .pull-left-c[ ````md ```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."} ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) `+` `scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")`) ``` ```` ] .pull-right-c[ ![](rmd_workshop_files/figure-html/unnamed-chunk-75-1.png)<!-- --> Figure 1. A scatterplot of journal metrics. ] --- ## Figures — `ggplot2` Change the theme .pull-left-c[ ````md ```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."} ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) + scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) `+` `theme_bw()` ``` ```` ] .pull-right-c[ ![](rmd_workshop_files/figure-html/unnamed-chunk-76-1.png)<!-- --> Figure 1. A scatterplot of journal metrics. ] --- ## Figures — `ggplot2` Improve the axis labels, e.g., with capital first letters .pull-left-c[ ````md ```{r, scatterplot, fig.cap = "A scatterplot of journal metrics."} ggplot(data = df, mapping = aes(x = h5_median, y = h5_index, color = subfield)) + geom_point() + facet_wrap(. ~ branch) + scale_colour_discrete(name = "Journal Type", breaks = c(0, 1), labels = c("Generalist", "Subfield")) + theme_bw() `+` `labs(x = "H5 Median", y = "H5 Index")` ``` ```` ] .pull-right-c[ ![](rmd_workshop_files/figure-html/unnamed-chunk-77-1.png)<!-- --> Figure 1. A scatterplot of journal metrics. ] --- ## Figures — `ggplot2` — Notes `geom_point` is one of many geoms avilable - see this <https://ggplot2.tidyverse.org/reference> for other options, including - `geom_bar` for bar charts - `geom_boxplot` for box and whiskers plots --- class: action ## Exercises — 24–25 24) Barplot - see `reproduce_this.pdf`: figure 2 on page 7 - apply in `journals.Rmd`: figure 2, between paragraphs 21 and 22 <br> 25) Scatterplot - see `reproduce_this.pdf`: figure 3 on page 9 - apply in `journals.Rmd`: figure 3, between paragraphs 27 and 28
10
:
00
--- class: center, middle # Tables --- ## Tables — Markdown Syntax The following syntax, outside code chunks, introduces tables that `pandoc` can recognise .pull-left[ <br> ```r First Column Second Column ------------ ------------- First cell First cell Second cell Second cell Third cell Third cell ``` ] .pull-right[ <br> | First Column | Second Column | |:------------- |:--------------| | First cell | First cell | | Second cell | Second cell | | Third cell | Third cell | ] --- ## Tables — Markdown Syntax The position of headers, relative to their line underneath, defines column alignments .pull-left[ <br> ```r Left-Aligned Centered ---------------- ---------------- First cell First cell Second cell Second cell Third cell Third cell ``` ] .pull-right[ <br> | Left-Aligned | Centered | |:------------------------------------- | :-----------------------------------: | | First cell | First cell | | Second cell| Second cell | | Third cell | Third cell | ] --- ## Tables — Markdown Syntax A line starting with a colon, placed before or after tables, introduces captions <br> .pull-left[ ```r Centered Right-Aligned ---------------- ---------------- First cell First cell Second cell Second cell Third cell Third cell *: A hand-made table with R Markdown ``` ] .pull-right[ <center>Table 1: A hand-made table with R Markdown</center> <br> | Centered | Right-Aligned | |:----------------------------------------------: | --------------------------------------------------: | | First cell | First cell | | Second cell| Second cell | | Third cell | Third cell | ] --- ## Tables — Markdown Syntax The caption line itself needs to be surrounded by empty lines <br> .pull-left[ ```r Centered Right-Aligned ---------------- ---------------- First cell First cell Second cell Second cell Third cell Third cell * : A hand-made table with R Markdown * ``` ] .pull-right[ <center>Table 1: A hand-made table with R Markdown</center> <br> | Centered | Right-Aligned | |:----------------------------------------------: | --------------------------------------------------: | | First cell | First cell | | Second cell| Second cell | | Third cell | Third cell | ] --- ## Tables — Markdown Syntax Tables are numbered automatically <br> .pull-left[ ```r : A hand-made table with R Markdown Centered Right-Aligned ---------------- ---------------- First cell First cell Second cell Second cell Third cell Third cell ``` ] .pull-right[ <center>.yellow-h[Table 1]: A hand-made table with R Markdown</center> <br> | Centered | Right-Aligned | |:----------------------------------------------: | --------------------------------------------------: | | First cell | First cell | | Second cell| Second cell | | Third cell | Third cell | ] --- ## Tables — Markdown Syntax Grid tables, with the following syntax, can handle complex cells with multiple lines and/or lists .pull-left[ ```r +--------------------+--------------------+ | First Column | Second Column | +====================+====================+ | - First item | First cell | | - Second item | | | - Third item | | +--------------------+--------------------+ |Second cell | Second cell with a | | | long text | +--------------------+--------------------+ | Third cell | Third cell | | | | +--------------------+--------------------+ : A grid table with multi-line cells ``` ] .pull-right[ <br> <center>Table 1: A grid table with multi-line cells</center> <br> | First Column | Second Column | |:------------------------------------------ |:--------------------------------------------------| | - First item <br> - Second item <br> - Third item | First cell | | Second cell | Second cell with a long<br>text | | Third cell | Third cell | ] --- ## Tables — Markdown Syntax Grid tables can be aligned as well, with colons at the boundaries of the header separator<sup>*</sup> .pull-left[ ```r +--------------------+--------------------+ | Left-Aligned | Centered | *+:===================+:==================:+ | - First item | First cell | | - Second item | | | - Third item | | +--------------------+--------------------+ |Second cell | Second cell with a | | | long text | +--------------------+--------------------+ | Third cell | Third cell | | | | +--------------------+--------------------+ : A grid table with multi-line cells ``` ] .pull-right[ <br> <center>Table 1: A grid table with multi-line cells</center> <br> | Left-Aligned | Centered | |:--------------------------------------|:---------------------------------:| | - First item <br> - Second item <br> - Third item | First cell | | Second cell | Second cell with a<br>long text | | Third cell | Third cell | ] .footnote[ <sup>*</sup> Use .yellow-h[:=] for left-aligned, .yellow-h[:=:] for centered, .yellow-h[=:] for right-aligned columns. ] --- class: action ## Exercise — 26 26) Markdown Tables - see `reproduce_this.pdf`: table 1 on page 4 - apply in `journals.Rmd`: table 1, between paragraphs 11 and 12
05
:
00
--- ## Tables — `stargazer` — Overview - A capable package for creating at least three kinds of tables - raw data, in columns and rows - descriptive/summary statistics - regression models - Used widely by academics, even tough it has not been updated since 2018 - Creates LaTeX code, HTML/CSS code, and ASCII text to be knitted - A lot is written on this package. See, for example, - the [package documentation](https://www.rdocumentation.org/packages/stargazer/versions/5.2.2) - this [vignette](https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf) by its author Marek Hlavac - this [tutorial](https://www.jakeruss.com/cheatsheets/stargazer/) by Jake Russ - Among its alternatives are the `knitr`, `kableExtra`, and `huxtable` packages --- ## Tables — `stargazer` — Notes - The `stargazer` package requires specific settings - in the chunk options - and, in the `type` argument of the `stargazer()` function <br> - These settings depend on the desired output format,<sup>*</sup> as shown below .pull-left[ | Output | Chunk Option | Type Argument | |:------------- |:----------------|:-------------- | | LaTex / PDF | results="asis" | latex | | HTML | results="asis" | html | | Word | comment="" | text | ] .footnote[ <sup>*</sup> The following slides use the setting for LaTex and PDF outputs. ] --- ## Tables — `stargazer` — Notes - `stargazer` tables look slightly different in different output formats - on the following slides, they will have the HTML look - even if the slides display the setting for LaTex and PDF outputs <br> - In fact, it is currently not quite possible to `knit` `stargazer` code into tables in Word documents - though it can `knit` ASCII text, looking like a table - some popular workarounds: - `knit` to HTML as well as Word, copy the tables from HTML to Word - `knit` to PDF, open the PDF in Word - use a different package to create tables, such as `huxtable` --- ## Tables — `stargazer` — Basics - The `stargazer()` function - this is probably the only fuction you will ever use from this package - but it accepts many, many arguments to customise tables -- <br> - The `data` argument of that function, with two main options 1. a data frame for data or summary statistics tables - e.g., `df`, here coming from <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">df <- read_csv(journals.csv)</span> <br> 2. one or more regression models for regression tables - e.g., `lm1`, here coming from <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">lm1 <- lm(h5_index ~ issues, data = df)</span> --- ## Tables — `stargazer` — Data Tables Table the first four rows of the dataset ````md ```{r, data_table, `echo=FALSE`, `results="asis"`} stargazer(data = head(df, n = 4), `type = "latex"`, `summary = FALSE`) ``` ```` -- <br> Notice the options of the chunk and the arguments of the function - with .inline-c[echo=FALSE], the code will not be displayed in the output document -- - with .inline-c[results="asis"], `knitr` will pass through results without reformatting them - these results are produced in LaTeX, due to <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">type = "latex"</span> - they should remain LaTeX because our outcome document is PDF, converted from LaTeX -- - with .inline-c[summary = FALSE], the table will present the data, not its descriptive statistics --- ## Tables — `stargazer` — Data Tables Table the first four rows of the dataset ````md ```{r, data_table, echo=FALSE, results="asis"} stargazer(data = head(df, n = 4), type = "latex", summary = FALSE) ``` ```` .medium[ .out-t[% Table created by stargazer v.5.2.2 by Marek Hlavac, Harvard University. E-mail: hlavac at fas.harvard.edu % Date and time: Fri, Apr 10, 2020 - 12:31:21] <br> <center>Table 1: </center> <table style="text-align:center"><tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>name</td><td>origin</td><td>branch</td><td>h5_index</td><td>h5_median</td><td>english</td><td>subfield</td><td>issues</td><td>age</td></tr> <tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">1</td><td>Journal of Bears</td><td>Americas</td><td>Physical</td><td>73</td><td>97</td><td>1</td><td>1</td><td>7</td><td>61</td></tr> <tr><td style="text-align:left">2</td><td>Journal of Moon</td><td>Asia</td><td>Social</td><td>72</td><td>106</td><td>1</td><td>0</td><td>6</td><td>64</td></tr> <tr><td style="text-align:left">3</td><td>Journal of Lumber</td><td>Americas</td><td>Physical</td><td>72</td><td>100</td><td>1</td><td>1</td><td>8</td><td>30</td></tr> <tr><td style="text-align:left">4</td><td>Journal of Houses</td><td>Europe</td><td>Social</td><td>72</td><td>102</td><td>1</td><td>0</td><td>8</td><td>38</td></tr> <tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr></table> ] --- ## Tables — `stargazer` — Data Tables Set .inline-c[header = FALSE] to remove the note preceding tables ````md ```{r, data_table, echo=FALSE, results="asis"} stargazer(data = head(df, n = 4), type = "latex", summary = FALSE, `header = FALSE`) ``` ```` .medium[ <center>Table 1: </center> <table style="text-align:center"><tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>name</td><td>origin</td><td>branch</td><td>h5_index</td><td>h5_median</td><td>english</td><td>subfield</td><td>issues</td><td>age</td></tr> <tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">1</td><td>Journal of Bears</td><td>Americas</td><td>Physical</td><td>73</td><td>97</td><td>1</td><td>1</td><td>7</td><td>61</td></tr> <tr><td style="text-align:left">2</td><td>Journal of Moon</td><td>Asia</td><td>Social</td><td>72</td><td>106</td><td>1</td><td>0</td><td>6</td><td>64</td></tr> <tr><td style="text-align:left">3</td><td>Journal of Lumber</td><td>Americas</td><td>Physical</td><td>72</td><td>100</td><td>1</td><td>1</td><td>8</td><td>30</td></tr> <tr><td style="text-align:left">4</td><td>Journal of Houses</td><td>Europe</td><td>Social</td><td>72</td><td>102</td><td>1</td><td>0</td><td>8</td><td>38</td></tr> <tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr></table> ] --- ## Tables — `stargazer` — Data Tables Define a caption with the `title` argument ````md ```{r, data_table, echo=FALSE, results="asis"} stargazer(data = head(df, n = 4), type = "latex", summary = FALSE, `header = FALSE`, `title = "First four rows of the dataset"`) ``` ```` .medium[ <center>Table 1: First four rows of the dataset</center> <table style="text-align:center"><tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td>name</td><td>origin</td><td>branch</td><td>h5_index</td><td>h5_median</td><td>english</td><td>subfield</td><td>issues</td><td>age</td></tr> <tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">1</td><td>Journal of Bears</td><td>Americas</td><td>Physical</td><td>73</td><td>97</td><td>1</td><td>1</td><td>7</td><td>61</td></tr> <tr><td style="text-align:left">2</td><td>Journal of Moon</td><td>Asia</td><td>Social</td><td>72</td><td>106</td><td>1</td><td>0</td><td>6</td><td>64</td></tr> <tr><td style="text-align:left">3</td><td>Journal of Lumber</td><td>Americas</td><td>Physical</td><td>72</td><td>100</td><td>1</td><td>1</td><td>8</td><td>30</td></tr> <tr><td style="text-align:left">4</td><td>Journal of Houses</td><td>Europe</td><td>Social</td><td>72</td><td>102</td><td>1</td><td>0</td><td>8</td><td>38</td></tr> <tr><td colspan="10" style="border-bottom: 1px solid black"></td></tr></table> ] --- ## Tables — `stargazer` — Summary Statistics Tables Create a table of summary statistics instead, for the complete dataset ````md ```{r, summary_table, echo=FALSE, results="asis"} stargazer(`data = df`, type = "latex", `summary = TRUE`, header = FALSE, title = "Descriptive statistics") ``` ```` .medium[ <center>Table 1: Descriptive statistics</center> <table style="text-align:center"><tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Statistic</td><td>N</td><td>Mean</td><td>St. Dev.</td><td>Min</td><td>Max</td></tr> <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">h5_index</td><td>1,091</td><td>26.361</td><td>13.814</td><td>1</td><td>73</td></tr> <tr><td style="text-align:left">h5_median</td><td>1,091</td><td>39.400</td><td>21.272</td><td>3</td><td>109</td></tr> <tr><td style="text-align:left">issues</td><td>1,091</td><td>4.676</td><td>1.786</td><td>1</td><td>12</td></tr> <tr><td style="text-align:left">age</td><td>1,091</td><td>42.902</td><td>26.370</td><td>1</td><td>158</td></tr> <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr></table> ] --- ## Tables — `stargazer` — Summary Statistics Tables Keep only a selection of statistics ````md ```{r, summary_table, echo=FALSE, results="asis"} stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, title = "Descriptive statistics", `summary.stat = c("n", "mean", "sd", "min", "max")`) ``` ```` .medium[ <center>Table 1: Descriptive statistics</center> <table style="text-align:center"><tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Statistic</td><td>N</td><td>Mean</td><td>St. Dev.</td><td>Min</td><td>Max</td></tr> <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">h5_index</td><td>1,091</td><td>26.361</td><td>13.814</td><td>1</td><td>73</td></tr> <tr><td style="text-align:left">h5_median</td><td>1,091</td><td>39.400</td><td>21.272</td><td>3</td><td>109</td></tr> <tr><td style="text-align:left">issues</td><td>1,091</td><td>4.676</td><td>1.786</td><td>1</td><td>12</td></tr> <tr><td style="text-align:left">age</td><td>1,091</td><td>42.902</td><td>26.370</td><td>1</td><td>158</td></tr> <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr></table> ] --- ## Tables — `stargazer` — Summary Statistics Tables Omit a selection of statistics for the same effect ````md ```{r, summary_table, echo=FALSE, results="asis"} stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, title = "Descriptive statistics", `omit.summary.stat = c("p25", "p75")`) ``` ```` .medium[ <center>Table 1: Descriptive statistics</center> <table style="text-align:center"><tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Statistic</td><td>N</td><td>Mean</td><td>St. Dev.</td><td>Min</td><td>Max</td></tr> <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">h5_index</td><td>1,091</td><td>26.361</td><td>13.814</td><td>1</td><td>73</td></tr> <tr><td style="text-align:left">h5_median</td><td>1,091</td><td>39.400</td><td>21.272</td><td>3</td><td>109</td></tr> <tr><td style="text-align:left">issues</td><td>1,091</td><td>4.676</td><td>1.786</td><td>1</td><td>12</td></tr> <tr><td style="text-align:left">age</td><td>1,091</td><td>42.902</td><td>26.370</td><td>1</td><td>158</td></tr> <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr></table> ] --- ## Tables — `stargazer` — Summary Statistics Tables Flip the table ````md ```{r, summary_table, echo=FALSE, results="asis"} stargazer(data = df, type = "latex", summary = TRUE, header = FALSE, `flip = TRUE`, title = "Descriptive statistics", omit.summary.stat = c("p25", "p75")) ``` ```` .medium[ <center>Table 1: Descriptive statistics</center> <table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Statistic</td><td>h5_index</td><td>h5_median</td><td>issues</td><td>age</td></tr> <tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">N</td><td>1,091</td><td>1,091</td><td>1,091</td><td>1,091</td></tr> <tr><td style="text-align:left">Mean</td><td>26.361</td><td>39.400</td><td>4.676</td><td>42.902</td></tr> <tr><td style="text-align:left">St. Dev.</td><td>13.814</td><td>21.272</td><td>1.786</td><td>26.370</td></tr> <tr><td style="text-align:left">Min</td><td>1</td><td>3</td><td>1</td><td>1</td></tr> <tr><td style="text-align:left">Max</td><td>73</td><td>109</td><td>12</td><td>158</td></tr> <tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr></table> ] --- class: action ## Exercise — 27 27) Summary Statistics Tables - see `reproduce_this.pdf`: table 2 on page 8 - apply in `journals.Rmd`: table 2, between paragraphs 23 and 24
05
:
00
--- ## Tables — `stargazer` — Regression Tables .pull-left-c[ Create a table of regression models instead ````md ```{r, regression_table, echo=FALSE, results="asis"} stargazer(`data = lm(h5_index ~ issues, data = df)`, type = "latex", header = FALSE, title = "Regression Results") ``` ```` ] .pull-right-c[ .medium[ <center>Table 1: Regression Results</center> <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td>h5_index</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">issues</td><td>1.913<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.227)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>17.415<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.137)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,091</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.061</td></tr> <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.060</td></tr> <tr><td style="text-align:left">Residual Std. Error</td><td>13.391 (df = 1089)</td></tr> <tr><td style="text-align:left">F Statistic</td><td>70.959<sup>***</sup> (df = 1; 1089)</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] ] --- ## Tables — `stargazer` — Regression Tables .pull-left-c[ Models can also be estimated outside the function first ````md ```{r, regression_table, echo=FALSE, results="asis"} `lm1 <- lm(h5_index ~ issues, data = df)` stargazer(`data = lm1`, type = "latex", header = FALSE, title = "Regression Results") ``` ```` ] .pull-right-c[ .medium[ <center>Table 1: Regression Results</center> <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td>h5_index</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">issues</td><td>1.913<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.227)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>17.415<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.137)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,091</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.061</td></tr> <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.060</td></tr> <tr><td style="text-align:left">Residual Std. Error</td><td>13.391 (df = 1089)</td></tr> <tr><td style="text-align:left">F Statistic</td><td>70.959<sup>***</sup> (df = 1; 1089)</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] ] --- ## Tables — `stargazer` — Regression Tables .pull-left-c[ Keep only a selection of statistics ````md ```{r, regression_table, echo=FALSE, results="asis"} stargazer(data = lm1, type = "latex", header = FALSE, title = "Regression Results", `keep.stat = c("n", "rsq")`) ``` ```` ] .pull-right-c[ .medium[ <center>Table 1: Regression Results</center> <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td>h5_index</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">issues</td><td>1.913<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.227)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>17.415<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.137)</td></tr> <tr><td style="text-align:left"></td><td></td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,091</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.061</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] ] --- ## Tables — `stargazer` — Regression Tables .pull-left-c[ Display multiple models in the same table ````md ```{r, regression_table, echo=FALSE, results="asis"} stargazer(`data = list(lm1, lm2)`, type = "latex", header = FALSE, title = "Regression Results", keep.stat = c("n", "rsq")) ``` ```` ] .pull-right-c[ .medium[ <center>Table 1: Regression Results</center> <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="2">h5_index</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">issues</td><td>1.913<sup>***</sup></td><td>1.424<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.227)</td><td>(0.212)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">english1</td><td></td><td>17.262<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td></td><td>(1.244)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>17.415<sup>***</sup></td><td>4.226<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.137)</td><td>(1.415)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,091</td><td>1,091</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.061</td><td>0.202</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] ] --- ## Tables — `stargazer` — Regression Tables .pull-left-c[ Change variable labels ````md ```{r, regression_table, echo=FALSE, results="asis"} stargazer(data = list(lm1, lm2), type = "latex", header = FALSE, title = "Regression Results", keep.stat = c("n", "rsq"), `dep.var.labels = "H5 Index"`, `covariate.labels = c("Issues", "English")`) ``` ```` ] .pull-right-c[ .medium[ <center>Table 1: Regression Results</center> <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="2">H5 Index</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Issues</td><td>1.913<sup>***</sup></td><td>1.424<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.227)</td><td>(0.212)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">English</td><td></td><td>17.262<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td></td><td>(1.244)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>17.415<sup>***</sup></td><td>4.226<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.137)</td><td>(1.415)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,091</td><td>1,091</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.061</td><td>0.202</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] ] --- ## Tables — `stargazer` — Regression Tables .pull-left-c[ Change significance levels ````md ```{r, regression_table, echo=FALSE, results="asis"} stargazer(data = list(lm1, lm2), type = "latex", header = FALSE, title = "Regression Results", keep.stat = c("n", "rsq"), dep.var.labels = "H5 Index", covariate.labels = c("Issues", "English"), `star.cutoffs = c(0.05, 0.01, 0.001)`) ``` ```` ] .pull-right-c[ .medium[ <center>Table 1: Regression Results</center> <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="2">H5 Index</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Issues</td><td>1.913<sup>***</sup></td><td>1.424<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.227)</td><td>(0.212)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">English</td><td></td><td>17.262<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td></td><td>(1.244)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>17.415<sup>***</sup></td><td>4.226<sup>**</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.137)</td><td>(1.415)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,091</td><td>1,091</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.061</td><td>0.202</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.05; <sup>**</sup>p<0.01; <sup>***</sup>p<0.001</td></tr> </table> ] ] --- class: action ## Exercise — 28 28) Regression Tables - see `reproduce_this.pdf`: table 3 on page 10 - apply in `journals.Rmd`: table 3, between paragraphs 30 and 31
07
:
30
--- name: part7 class: inverse, center, middle # Part 7. Addressing Functionality Gaps .footnote[ [Back to the contents slide](#contents-slide). ] --- ## Functionality Gaps - Not everything is possible to achieve with R Markdown syntax, code chunks, and/or code - e.g., centering text, increasing the space between the lines of text -- <br> - Workarounds available through inclusion of other languages and/or syntaxes in .Rmd documents - e.g., incorporating HTML or LaTeX code into R Markdown - workarounds might be output specific - e.g., LaTeX-based workarounds may work only for LaTeX and PDF outputs -- <br> - There are no exclusive list of gaps or workarounds - these are specific to the output you want to achieve, problems you encounter - after writing a few manuscripts with R Markdown, you will have addressed most typical gaps in your workflow --- ## Functionality Gaps — Examples #### Problem: How can we cross-reference figures, tables, and equations in R Markdown? #### Solution: Insert a LaTeX label into the targets (figures, tables, and equations), and then use the .inline-c[\autoref{figure_caption}] syntax in text --- name: autoref-figures ## Functionality Gaps — Examples — Cross-references For .yellow-h[figures], insert a LaTeX label into the .yellow-h[`fig.caption`] option, and use the .inline-c[\autoref{latex_label}] syntax in text ````md `\autoref{scatter_plot}` visualises the relationship between the two journal metrics. ```{r ... fig.caption = "A Scatter Plot `\\label{scatter_plot}`"} ggplot(data = df) + geom_point(... ``` ```` <br> .out-t[[Figure 1](#autoref-figures) visualises the relationship between the two journal metrics.] --- name: autoref-tables ## Functionality Gaps — Examples — Cross-references For .yellow-h[Markdown tables], insert a LaTeX label after the table caption, and use the .inline-c[\autoref{latex_label}] syntax in text ```md See `\autoref{handmade_table}` for further details. : A hand-made table with R Markdown `\label{handmade_table}` +--------------------+--------------------+ | Left-Aligned | Centered | ... ``` <br> .out-t[See [Table 1](#autoref-tables) for further details.] --- ## Functionality Gaps — Examples — Cross-references — Note Note that there is a difference in the label syntax for figures and R Markdown tables - we use a double backslash .yellow-h[\ \] to label figures - e.i., <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">\\\label{scatter_plot}</span> because the label goes into a string - the first is an escape operator for the second, LaTeX backslash -- - we use single backslash .yellow-h[\] to label R Markdown tables - e.i., <span style="background-color: #e5e5e5; border-radius: 3px; padding: 4px; font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;">\label{handmade_table}</span> because the label is not in any string - there is no need for the escape operator --- class: action ## Exercises — 29–30 29) Referring to Figures - see `reproduce_this.pdf`: pages 6 and 9 - apply in `journals.Rmd`: paragraphs 19, 21, and 27 <br> 30) Referring to Markdown Tables - see `reproduce_this.pdf`: page 4 - apply in `journals.Rmd`: paragraph 11
05
:
00
--- name: autoref-tables-2 ## Functionality Gaps — Examples — Cross-references For .yellow-h[`stargazer` tables], define a label with the `label` argument, and use the .inline-c[\autoref{latex_label}] syntax in text .pull-left-c[ ````md ```{r, regression_table, echo=FALSE, results="asis"} stargazer(data = list(lm1, lm2), type = "latex", ... `label = "regression_results"`) ``` `\autoref{regression_results}` provides results from two OLS models. ```` .out-t[[Table 1](#autoref-tables-2) provides results from two OLS models.] ] .pull-right-c[ .medium[ <center>Table 1: Regression Results</center> <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="2">H5 Index</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Issues</td><td>1.913<sup>***</sup></td><td>1.424<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td>(0.227)</td><td>(0.212)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">English</td><td></td><td>17.262<sup>***</sup></td></tr> <tr><td style="text-align:left"></td><td></td><td>(1.244)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td style="text-align:left">Constant</td><td>17.415<sup>***</sup></td><td>4.226<sup>**</sup></td></tr> <tr><td style="text-align:left"></td><td>(1.137)</td><td>(1.415)</td></tr> <tr><td style="text-align:left"></td><td></td><td></td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>1,091</td><td>1,091</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.061</td><td>0.202</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.05; <sup>**</sup>p<0.01; <sup>***</sup>p<0.001</td></tr> </table> ] ] --- ## Functionality Gaps — Examples — Cross-references — Note Note that we can cross-reference specific results in tables as well - there is no gap here — this possible with inline code ```r In Model 1, the coefficient for _Issues_ is `r round(coef(summary(lm1))["issues", "Estimate"], digits = 2)`. ``` .out-t[In Model 1, the coefficient for _Issues_ is 1.91.] --- name: autoref-equation ## Functionality Gaps — Examples — Cross-references For .yellow-h[equations], insert a LaTeX label in an equation environment, and use the .inline-c[\autoref{latex_label}] syntax in text ```md \begin{equation} \label{special_relativity} E = mc_{2} \end{equation} According to \autoref{special_relativity}, space and time are linked. ``` <br> .out-t[According to [Equation 1](#autoref-equation), space and time are linked.] --- class: action ## Exercises — 31–33 31) Referring to Tables - see `reproduce_this.pdf`: pages 7 and 9 - apply in `journals.Rmd`: paragraph 23 and 29 <br> 32) Referring to Results in Regression Tables - see `reproduce_this.pdf`: page 9 - apply in `journals.Rmd`: paragraph 29 - hint: to extract the standard error from the model, use the column `Std. Error` <br> 33) Referring to Equations - see `reproduce_this.pdf`: page 7 - apply in `journals.Rmd`: paragraph 22 - hint: transform the existing equation from R Markdown to LaTeX syntax, to be able to insert the label
07
:
30
--- ## Functionality Gaps — Examples #### Problem: R Markdown adds the list of references to the end of documents. This might be undesirable for some manuscripts, for example those with an appendix. Similarly, some journals require tables and figures to be added after references. -- #### Solution: Define where exactly the list of references should appear with the HMTL code .inline-c[<div id].inline-c[="refs"></div>] ```r # References <div id = "refs"></div> # Appendix ``` --- ## Functionality Gaps — Examples #### Problem: R Markdown produces outputs with single-line-spaced text while we might prefer or be required (e.g., by journal submission rules) to double-space our manuscripts. #### Solution: Use the `doublespacing` command from the LaTeX package `setspace` <a name=cite-setspace></a>([Carlisle, Fairbairns, Harris, and Tobin, 2011](https://ctan.org/pkg/setspace)) - because the command comes from a package, we need to add it to YAML with `header-includes` - including commands in YAML ensures they are applied through the output<sup>*</sup> ```yaml --- ... header-includes: - \usepackage{setspace}\doublespacing --- ``` .footnote[ <sup>*</sup> This can be reversed anywhere in text, with the `singlespacing` command. ] --- class: action ## Exercise — 34 34) Line Spacing - introduce 1.5 spacing to the manuscript - hint: the command is called `onehalfspacing` <br> - except for the Abstract, which should be single spaced
02
:
00
--- ## Functionality Gaps — Examples #### Problem: Pages, tables, figures etc. are numbered continuously across an output. We might prefer or be required (e.g., by journal submission rules) to change this behaviour, for example for appendices. #### Solution: Use the `setcounter` in combination with the `renewcommand` command, outside code chunks ```r \setcounter{page}{1} \renewcommand*{\thepage}{A\arabic{page}} \setcounter{table}{0} \renewcommand*{\thetable}{A\arabic{table}} \setcounter{figure}{0} \renewcommand*{\thefigure}{A\arabic{figure}} ``` --- name: part8 class: inverse, center, middle # Part 8. Using Version Control .footnote[ [Back to the contents slide](#contents-slide). ] --- ## Version Control - Research papers have many versions before publication - typically written over a long period of time, in numerous sittings - at the end of every sitting, essentially a different version of the same manuscript is created<sup>*</sup> .footnote[ <sup>*</sup> They also often written by multiple authors and/or on different computers, increasing the number of versions created. Here I assume projects are single-authored on a single computer, leaving the topic of collaboration (including, with oneself) to the next section — Part 9. ] --- ## Version Control - Research papers have many versions before publication - typically written over a long period of time, in numerous sittings - at the end of every sitting, essentially a different version of the same manuscript is created <br> - With many versions created over time, there emerge at least two challenges - keeping track of changes and versions - reverting to a previous version when necessary -- <br> - We all version control, in different ways, such as - edit, rename, and save the files - use applications or websites such as Dropbox, Google Docs, Overleaf - use distributed version control systems such as Git and GitHub --- ## Version Control — Manual Attempts Typically, hand-made attemps to version control lead to cluttered folders ```r manuscript | |- journals_FINAL_19May.Rmd |- journals_FINAL.Rmd |- journals_26APRIL_newliterature.Rmd ... |- journals.Rproj |- references.bib |- apa_7th.csl ``` --- ## Version Control — Git and GitHub — Definitions - Git - a software that keeps track of versions of a set of files - it is *local* to you; the records are kept on your computer -- <br> - GitHub - a hosting service, or a website, that can keep the records - it is *remote* to you, like the Dropbox website - but unlike Dropbox, GitHub is specifically structured to keep records with Git -- <br> - Repository, or repo - a set of files whose records are kept together, by Git and/or on GitHub - it is like a folder, which can keep files and other folders containing files --- ## Version Control — Git and GitHub — Definitions - To commit - to take a snapshot of, or to version, a repository - it is like saving a new version of all files and sub-folders in your project folder with a new name - it is local, the records are kept on your computer unless you push -- <br> - To push - to move a copy of the records from Git to GitHub, from your computer to online server - it is like uploading (the new versions of) your files and sub-folders to a website - it also involves merging, if this not the first push .footnote[ <sup>*</sup> For projects that are single-authored on a single computer, merging is typically automatic. It becomes an issue for collaborated projects, which we will cover in the next section — Part 9. ] --- ## Version Control — Git and GitHub Version control with Git and GitHub requires 1. .yellow-h[initial setup], done once<sup>*</sup> - unless for a new computer or, if ever, a new GitHub account - a bit technical, but worth the hassle 2. .yellow-h[project setup], repeated for every RStudio project - shorter, less complicated .footnote[ <sup>*</sup> We have started this process already, in Part 1 of the workshop, by downloading and installing Git and signing up for GitHub. [Back to the relevant slide](#install-git). ] --- class: action ## Version Control — Git — Initial Setup 1) Enable version control with RStudio - from the RStudio menu, follow: > `Tools -> Global Options -> Git/SNV -> Enable version control interface for RStudio projects` <br> - RStudio will likely find Git automatically - In case it cannot do so on its own, help RStudio find it by clicking `Browse...` - Git is likely to be at - `c:/Program Files/Git/bin/git.exe` on Windows - `/usr/local/git/bin/git` on Mac --- class: action ## Version Control — Git — Initial Setup 2) If you are using Windowns, set Git Bash as your shell - from the RStudio menu, follow: > `Tools -> Global Options -> Terminal -> New terminals open with: Git Bash` --- class: action ## Version Control — Git — Initial Setup 3) Introduce yourself to Git - from the RStudio menu, follow: > `Tools -> Terminal -> New Terminal` - enter the following lines in the Terminal, with the email address that you have used to sign up for GitHub ```yaml git config --global user.name "YOUR-NAME" git config --global user.email "YOUR-EMAIL-ADDRESS" ``` - enter the following line in the Terminal, to observe whether the previous step was sucessful ```yaml git config --global --list ``` --- class: action ## Version Control — Git and Github — Project Setup<sup>*</sup> 1) Initiate local version control with Git - from the RStudio menu, follow: > `Tools -> Version Control -> Project Setup... -> Version Control System -> Git` <br> - after confirming your new repository, and restarting the session, observe that - now there is now a Git tab in RStudio - newly-added and/or edited files, since the last commmit, are listed here - your project now includes a `.gitignore` file - this is where you can list files and/or folders to be excluded from being tracked .footnote[ <sup>*</sup> These instructions presume there is an exiting RStudio project to be set up for version control. If not, or to start a new project, follow from [this slide](#rstudio-project) first. ] --- class: action ## Version Control — Git and Github — Project Setup 2) Create a new GitHub repository - on GitHub, follow: > `Repositories -> New -> Repository name (e.g., "rwd_workshop") -> Public -> Create repository` <br> - observe the structure of the repository address - e.i., https://github.com/USER_NAME/REPOSITORY_NAME - this is the address to view the repository online - for use in the `Terminal`, the address gets the `.git` extension - e.i., https://github.com/USER_NAME/REPOSITORY_NAME.git --- class: action ## Version Control — Git and Github — Project Setup 3) Push an existing repository - from the RStudio menu, follow: > `Tools -> Terminal -> New Terminal` - enter the following lines in the `Terminal`, with .yellow-h[your username and repository name] ```yaml git remote add origin https://github.com/`USER_NAME/REPOSITORY_NAME`.git git add . git commit -m "first commit" git push -u origin master ``` -- - if this is your first time using GitHub with RStudio, you will be prompted to authenticate - follow the instructions on your screen and in your email - observe that your project files are now online, listed on the GitHub repository --- class: action name: version-control-workflow ## Version Control — Git and Github — Workflow 1) Edit and Save - work on one or more files under version control - e.g., delete the first sentence of the abstract in `journals.Rmd`, and save it - under the Git tab in RStudio, find the list of files that you edited since the last push - these will have `M`, for modified, as `Status` -- 2) Commit and Push - tick `Staged`<sup>*</sup> for one or more files that you would like to commit - enter a `Commit message` that summarises the edits - click `Commit` to create a record of the new version locally to your computer - click `Close -> Push` to push the version to GitHub .footnote[ <sup>*</sup> To stage is to add files to be committed. It allows us to commit files individually or together with other files. ] --- class: action name: git-github-basic-workflow ## Version Control — Git and Github — Workflow 1) Edit and Save - work on one or more files under version control - e.g., delete the first sentence of the abstract in `journals.Rmd`, and save it - under the Git tab in RStudio, find the list of files that you edited since the last push - these will have `M`, for modified, as `Status` <br> 2) Commit and Push - tick `Staged` for one or more files that you would like to commit - enter a `Commit message` that summarises the edits - click `Commit` to create a record of the new version locally to your computer - click `Close -> Push` to push the version to GitHub <br> - observe the changes in the Git tab in RStudio and on the GitHub repository --- ## Version Control — Git and Github — `.gitignore` - `.gitignore` specifies which file(s) and/or folder(s) should be excluded from version control - a set of project-specific files are ignored by default - see your `.gitignore` file -- <br> - `.gitignore` lists one item per line - each line has a pattern, which determines whether one or more files or folders are to be ignored -- <br> - See the documentation at <https://git-scm.com/docs/gitignore> - for pattern formats and further details --- ## Version Control — Git and Github — `.gitignore` There are good reasons to ignore some others, including files - that contain information that we do not want others to see - e.g., personal API keys <br> - that we do not have the right to share with others - e.g., secondary data with user agreements otherwise <br> - that we (re-)create automatically as outputs - e.g., `journals.pdf`, as opposed to journals.Rmd --- ## Version Control — Git and Github — `.gitignore` .pull-left[ - Observe that, by default, `.gitignore` has a list of project-specific files - you can delete, or comment out, any or all to start including them in version control ] .pull-right[ ```r .Rproj.user .Rhistory .RData .Ruserdata ``` ] --- ## Version Control — Git and Github — `.gitignore` .pull-left[ - Observe that, by default, `.gitignore` has a list of project-specific files - In addition, you can ignore, for example, - <span style="background-color: #ffff88;">a specific folder</span>, relative to the root directory ] .pull-right[ ```r .Rproj.user .Rhistory .RData .Ruserdata */manuscript/ ``` ] --- ## Version Control — Git and Github — `.gitignore` .pull-left[ - Observe that, by default, `.gitignore` has a list of project-specific files - In addition, you can ignore, for example, - a specific folder, relative to the root directory - <span style="background-color: #ffff88;">a specific file in a specific folder</span>, relative to the root directory ] .pull-right[ ```r .Rproj.user .Rhistory .RData .Ruserdata /manuscript/ */manuscript/journals.pdf ``` ] --- ## Version Control — Git and Github — `.gitignore` .pull-left[ - Observe that, by default, `.gitignore` has a list of project-specific files - In addition, you can ignore, for example, - a specific folder, relative to the root directory - a specific file in a specific folder, relative to the root directory - <span style="background-color: #ffff88;">a specific file in any folder</span> ] .pull-right[ ```r .Rproj.user .Rhistory .RData .Ruserdata /manuscript/ /manuscript/journals.pdf *journals.pdf ``` ] --- ## Version Control — Git and Github — `.gitignore` .pull-left[ - Observe that, by default, `.gitignore` has a list of project-specific files - In addition, you can ignore, for example, - a specific folder, relative to the root directory - a specific file in a specific folder, relative to the root directory - a specific file in any folder - <span style="background-color: #ffff88;">all files with a specific extension</span>, anywhere in the project ] .pull-right[ ```r .Rproj.user .Rhistory .RData .Ruserdata /manuscript/ /manuscript/journals.pdf journals.pdf **.pdf ``` ] --- ## Version Control — Git and Github — `.gitignore` — Notes - There are many other pattern formats - see the documentation at <https://git-scm.com/docs/gitignore> -- - Starting to ignore a file or folder that is already being tracked requires clearing the cache - after changing and saving `.gitignore`, enter the following line in the `Terminal` - with your speficic `/path/to/file` ```yaml git rm --cached /path/to/file ``` -- - The following command clears *all* cache - might be useful after changes to `.gitignore` that involves several files or folders - but should be used with care, on an otherwise up-to-date repository ```yaml git rm -r --cached . ``` --- class: action ## Exercises — 35–36 35) Reproducibility and Version Control - imagine that, after producing all these tables and figures, and writing up your results, you have decided to exclude journals from Oceania from analysis - hint: use the `filter` function in the data chunk - create a new version of the manuscript - commit and push to GitHub 36) Gitignore - stop tracking `journals.pdf` - change `.gitignore` - remove `journals.pdf` from cache - commit and push to GitHub
05
:
00
--- name: part9 class: inverse, center, middle # Part 9. Collaborating with Others .footnote[ [Back to the contents slide](#contents-slide). ] --- ## Collaboration - Many research papers are written by multiple authors and/or on multiple computers - yourself on a different computer (e.g., laptop at home, desktop at office), poses similar challenges as collaboration -- - With multiple authors and/or computers, there emerges at least two additional challenges beyond version control - communicating the versions to other authors and/or computers - working on the same project with co-authors at the same time -- - We all manage collaboration, in different ways, such as - edit, rename, save, e-mail - use applications or websites such as Dropbox, Google Docs, Overleaf - use distributed version control systems such as Git and GitHub --- ## Collaboration — Git and GitHub — Definitions - To pull - to move the (presumably) up-to-date records from GitHub to your computer - it is like downloading a zipped folder of files -- - To merge - to integrate different versions into a single version - e.g., the old version on your laptop, with (the changes in) the new version from GitHub - except the first push or pull, pushing and pulling necessiate merging -- - Merge conflict - emerges when versions to be merged include edits *on the same line of the same file* - edits on different lines are not a problem as changes are tracked line by line <br> - less likely to occur in one-author-multiple-computer setting - more likely while collaborating with others <br> - requires human intervention, to decide which edit to keep and which one to discharge --- ## Collaboration — Git and GitHub — Definitions - Branch - a line of development in a repository; a copy of the repository, with all its versions, at a given time - by default, repositories have one branch, called *master* - Pull request - a proposal to pull and merge - e.g., a proposal from one co-author to another, -e.g., tp merge a branch into master - it allows a review of changes on GitHub before merge, to deal with potential merge conflicts --- ## Collaboration — Git and GitHub — Project Setup - The setup depends on the users' role, on whether they are - the *owner* who creates the GitHub repository, or - the *collaborator* who is then added to that repository -- - Once the project is setup - it continues to be associated with the owner's GitHub profile - at the same time, it is listed under the collaborator's profile as well - both the owner and the collaborator have the same rights, unless otherwise restricted --- ## Collaboration — Git and GitHub — Project Setup — Owner 1) The setup for the owner is largely the same as in any single-author, single-computer scenario - following the instructions on [this slide](#rstudio-project) forward - to introduce version control to a local project with Git, - to create a remote repository for that project on GitHub, and - to associate the local project with the remote repository <br> -- 2) As an additional step, the owner needs to invite their collaborator(s) to the project - following, from the relevant GitHub repository, > `Settings -> Manage access -> Invite a collaborator` --- ## Collaboration — Git and GitHub — Project Setup — Collaborator 1) Notice that the remote part of the setup is done by the owner for the collaborator - subject to acceptance of the invitation - invitations are available directly at <https://github.com/notifications>, but also sent via email - with an option to "Accept invitation" - on acceptance, projects appear among the repositories of the collaborator -- 2) The local part of the setup still needs to be done - by creating a new RStudio project with version control - following, from the Rstudio menu,<sup>*</sup> > `File -> New Project -> Version Control -> Git` -- - the `Repository URL`, required for the above process, is the version without the `.git` extension - in the form of https://github.com/OWNER_USER_NAME/REPOSITORY_NAME --- class: action ## Exercises — 37–39 37) Owner Setup - create a new version-controlled RStudio project, with Git and GitHub - add the default R Markdown template to your project - hint: click `File -> New File -> R Markdown -> OK` to create the template - another hint: name the project and the template in a way that they are easily distinguishable from your partner's project and template 38) Invitation to Collaborate - invite the partner in your current group as a collaborator to your new project - hint: you will need their username, full name, or email address to do so 39) Collaborator Setup - accepting the invitation from your partner, do the necessary arrangements so that you can collaborate on your partner's project
10
:
00
--- ## Colloboration — Git and Github — Workflow 1) Pull - on the Git tab in RStudio, click `Pull` to move the up-to-date records from GitHub to your computer - if your collaborator has not pushed anything since your last pull, you will be noticed that `Already up-to-date`. - collaborative projects require pulling as well as pushing because your collaborator(s) might have pushed their commits to GitHub - pulling frequently minimises the risk of merge conflicts -- 2) Edit and save; commit and push - the same procedure as in any single-author, single-computer scenario - as described on [this slide](#git-github-basic-workflow) forward - pushing frequently minimises the risk of merge conflicts --- class: action ## Exercise — 40 40) Non-simultaneous Collaboration - take in turns with your partner to work on the same document (of the same project) - *owner*: edit the first header in the document (i.e., "R Markdown"), save, commit, and push - *owner and collaborator*: observe the changes, if any, on your own `.Rmd` file, and/or on your GitHub repository - click on the relevant commit message on GitHub and observe the commit - *collaborator*: pull, revert the header back to original, save, commit, and push
05
:
00
--- ## Exercise — 40 — Notes Notice that you have not encountered any errors and/or merge conflicts - because everyone edited and merged with an up-to-date document - this is the default scenario in single-author, multiple computer scenario --- class: action ## Exercise — 41 41) Simultaneous Collaboration — Different Lines - work on the same document at the same time - *owner*: edit the first header in the document (i.e., "R Markdown"), save, commit, and push - *collaborator*: edit the second header in the document (i.e., "Including Plots"), save, commit, and push - observe the error message that the last pusher will receive, follow the instructions on RStudio to solve the problem
10
:
00
--- ## Exercise — 41 — Notes Notice that you have encountered an error - pulling before pushing solves the problem because the edits are not on the same line - hence, this is not a merge conflict - the merge takes place automatically, on the *local* repository of the last pusher --- class: action ## Exercise — 42 42) Simultaneous Collaboration — Same Line - work on the same document at the same time - *owner*: edit the first header in the document again, save, commit, and push - *collaborator*: edit the first header in the document as well, save, commit, and push - observe the error message that the last pusher will receive - follow the instructions on RStudio to solve the problem - google if necessary
10
:
00
--- ## Exercise — 42 — Notes Notice that you have encountered not only an error but also a merge conflict - pulling before pushing alone does not solve the problem because the edits are on the same line - the conflict cannot be solved automatically — it needs human intervention - nevertheless, by pulling first, you can view the conflict directly on the file - marked between less than <span style="background-color: #ffff88;"><</span> and greater than <span style="background-color: #ffff88;">></span> signs, divided by the equal signs - solution is to accept the remote version, by deleting your edit and or moving that edit to a different line - the merge takes place on the *local* repository of the last pusher --- ## Colloboration — Git and Github — Workflow — Alternative - The workflow above is rather simple, but has some disadvantages, including - not easy, albeit still possible, to see the edits of the collaborators - not clear who is in charge of the overall progress - not possible to discuss edits - not possible to compromise on conflicting edits -- - An alternative workflow exits - work on different branches of the same project - version control to your own branch - create pull requests with comments - merge the branch into master --- ## Colloboration — Git and Github — Workflow — Alternative 1) Branch - click `New Branch` on the Git tab - name it, and leave everything else as default - notice that you are now working on a new branch 2) Edit and save; commit and push - the same procedure as in any single-author, single-computer scenario - as described on [this slide](#git-github-basic-workflow) forward - notice, on GitHub, that your commit is in the new branch, while *master* remains unchanged 3) Pull request - On GitHub, click > `Pull requests -> New pull request` - choose what is to be pulled, and write a note to your collaborator who can accept or reject the merge - if there are merge conflicts, the collaborator solves them on GitHub before merging --- class: action ## Exercises — 43–44 43) Pull request - create a pull request for your collaboration project - create a branch for yourself - edit any line, save, commit, and push - request your branch to be merged 44) Merging - view the pull request of your collaborator - take the necessary steps to merge it to *master*
10
:
00
--- ## Colloboration — Git and Github — Workflow — Notes - It is possible to edit `.Rmd` documents directly on GitHub - click on any editable file, and `Edit this file` - commit changes, either as a direct commit or a pull request -- - A GitHub account is enough for collaboration with co-authors who do not work with Git, R, or RStudio - not possible to knit to see the outcome - would suit co-authors whose contributions are plain text --- class: action ## Exercises — 45 45) GitHub edit - create two edits on the `.Rmd` document in your collaboration project - commit one of the edits as a direct commit - commit the other as a pull request
05
:
00
--- name: part10 class: inverse, center, middle # Part 10. Working on a Real Project .footnote[ [Back to the contents slide](#contents-slide). ] --- ## Real Project - Consider converting a real project to R Markdown - now, in the remainder of the workshop - Choose an existing project, preferably - single-authored - at an early stage - but one that you are, will be, working on - Ask me for help - with no more slides to go through, I will now focus on helping you start your first project in R Markdown --- class: inverse, center, middle # References .footnote[ [Back to the contents slide](#contents-slide). ] --- ## References Allaire, J., Y. Xie, J. McPherson, et al. (2022). _rmarkdown: Dynamic Documents for R_. R package version 2.14. <https://CRAN.R-project.org/package=rmarkdown. Blair, G., J. Cooper, A. Coppock, et al. (2022). _fabricatr: Imagine Your Data Before You Collect It_. R package version 0.16.0. <https://CRAN.R-project.org/package=fabricatr. Carlisle, D., R. Fairbairns, E. Harris, et al. (2011). _setspace – Set space between lines_. LaTeX package, version 6.7a. <https://ctan.org/pkg/setspace. Dowle, M. and A. Srinivasan (2021). _data.table: Extension of `data.frame`_. R package version 1.14.2. <https://CRAN.R-project.org/package=data.table. Gagolewski, M., B. Tartanus, o. Unicode, et al. (2021). _stringi: Character String Processing Facilities_. R package version 1.7.6. <https://CRAN.R-project.org/package=stringi. Hlavac, M. (2022). _stargazer: Well-Formatted Regression and Summary Statistics Tables_. R package version 5.2.3. <https://CRAN.R-project.org/package=stargazer. Hugh-Jones, D. (2021). _huxtable: Easily Create and Style Tables for LaTeX, HTML and Other Formats_. R package version 5.4.0. <https://hughjonesd.github.io/huxtable/. --- ## References R Core Team (2022). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing. Vienna, Austria. <https://www.R-project.org/. Sievert, C., C. Parmer, T. Hocking, et al. (2021). _plotly: Create Interactive Web Graphics via plotly.js_. R package version 4.10.0. <https://CRAN.R-project.org/package=plotly. Wickham, H., R. François, L. Henry, et al. (2022). _dplyr: A Grammar of Data Manipulation_. R package version 1.0.9. <https://CRAN.R-project.org/package=dplyr. Wickham, H. and G. Grolemund (2021). _R for data science_. O'Reilly. Xie, Y. (2022a). _bookdown: Authoring Books and Technical Documents with R Markdown_. R package version 0.26. <https://CRAN.R-project.org/package=bookdown. Xie, Y. (2022b). _knitr: A General-Purpose Package for Dynamic Report Generation in R_. R package version 1.39. <https://yihui.org/knitr/. Xie, Y. (2022c). _tinytex: Helper Functions to Install and Maintain TeX Live, and Compile LaTeX Documents_. R package version 0.39. <https://github.com/rstudio/tinytex. --- ## References Xie, Y., J. Allaire, and G. Grolemund (2018). _R Markdown: The Definitive Guide_. ISBN 9781138359338. Boca Raton, Florida: Chapman and Hall/CRC. <https://bookdown.org/yihui/rmarkdown. Xie, Y., C. Dervieux, and A. Presmanes Hill (2022). _blogdown: Create Blogs and Websites with R Markdown_. R package version 1.10. <https://CRAN.R-project.org/package=blogdown. Zhu, H. (2021). _kableExtra: Construct Complex Table with kable and Pipe Syntax_. R package version 1.3.4. <https://CRAN.R-project.org/package=kableExtra. --- class: middle, center ## The workshop ends here. ## Congratulations for making it this far, and ## thank you for joining me! .footnote[ [Back to the contents slide](#contents-slide). ]