Fixes for Known Issues
While writing this book, we anticipated issues cropping up in the printed code. The systems our workflows rely on are in constant flux. Developers are updating their packages, website administrators may change urls, and of course, we may have made errors (we are only human). So, we agreed to maintain this website to offer fixes to any issues. If you run into such issues, please let us know!
You can use the feedback form, email us directly, or open a new issue on the GitLab repository. We also re-run the code quarterly to detect any problems.
Below, are error or warning messages that might print in our R
console, and a link to a potential solution. We also will update the code on the GitLab repository.
Chapter 3: Computing Basics
unexpected string constant in
Yes, there is a minor, but consequential, typo in the code to install all the `R` packages. It is a misplaced comma (p. 28). This results in the error message: `unexpected string constant in`. Below is the corrected code block:
cran_pkgs <- c(
"backbone", "caret", "factoextra", "gender", "ggpubr", "ggraph",
"ggrepel", "ggtern", "glmnet", "gmodels", "googleLanguageR",
"guardianapi", "gutenbergr", "hunspell", "igraph", "irr",
"lexicon", "lsa", "marginaleffects","Matrix", "network", "proustr",
"qdapDictionaries", "quanteda", "quanteda.textmodels", "remotes",
"reshape2", "reticulate", "rsample", "rsvd", "rtrek",
"semgram", "sentimentr", "sna", "stm", "stminsights",
"stringi", "tesseract", "text2map", "text2vec", "textclean",
"textstem", "tidygraph", "tidymodels", "tidyquant", "tidytext",
"tidyverse", "tokenizers", "topicdoc", "topicmodels", "udpipe"
)
install.packages(cran_pkgs)
Chapter 7: Wrangling Words
Could not download a book
In Chapter 7 (p. 113), when trying to download Lewis Carroll’s book from Project Gutenberg:
book_ids <- c(11, 12, 13, 620, 651)
my_mirror <- "http://mirrors.xmission.com/gutenberg/"
carroll <- gutenberg_download(book_ids,
meta_fields = "title",
mirror = my_mirror)
We may see the following:
Warning message:
! Could not download a book at http://mirrors.xmission.com/gutenberg//1/11/11.zip.
ℹ The book may have been archived.
ℹ Alternatively, You may need to select a different mirror.
→ See https://www.gutenberg.org/MIRRORS.ALL for options.
The solution is in the message. Go to https://www.gutenberg.org/MIRRORS.ALL and select a different mirror. For example:
my_mirror <- "https://gutenberg.pglaf.org/"
subscript out of bounds
In Chapter 7 (p. 112), when running the following code:
which.max(leven[, "Jabberwock"] )
which.min(leven[, "Jabberwock"] )
We may see the error message:
Error in leven[, "Jabberwock"] : subscript out of bounds
This is related to case. In a previous step (p. 111), tokenize_words()
lowercases the text by default. Thus, there are two solutions.
We can use lowercased “jabberwock”:
which.max(leven[, "jabberwock"] )
which.min(leven[, "jabberwock"] )
Or, we can set lowercase to FALSE
in the tokenize_words()
function:
tokenize_words(carroll$text, lowercase = FALSE)
Chapter 8: Tagging Words
"object 'rating' not found"
In Chapter 8 (p. 136 and p. ) there is a missing package required: `tidyverse`.
When running:
text_lib <- blogs |> filter(rating == "Liberal")
We will see the error:
Error: object 'rating' not found
This is weird, because the blogs
dataframe definitely has a column named, “rating!” What is happening is that without loading tidyverse
(which loads the dplyr
package), R
is defaulting to the filter()
function in the base stats
package.
Therefore, fix it with:
library(tidyverse)
Furthermore, we won’t be able to run several other code chunks in this section without this package loaded.
No spaCy environment found
In Chapter 8 (p. 143) after running:
spacy_initialize(model = "en_core_web_sm", condaenv = "myenv")
We may see the error message:
No spaCy environment found. Use `spacy_install()` to get started.
Luckily, running spacy_install()
– as the spacyr
message states – does appear to resolve the issue!
From inspecting the spacyr
package further, it seems that spacy_initialize()
no longer takes the condaenv
argument (i.e. it is deprecated). This is why it cannot find the Python package spacy
that we installed during our setup (in Chapter 3, p. 29).
Chapter 11: Extended Inductive
Using `size` aesthetic for lines was deprecated
In Chapter 11 (p. 200-1), when we run:
df_effs |>
ggplot(aes(x = rank, y = proportion)) +
geom_errorbar(aes(ymin = lower, ymax = upper),
width = 0.1, size = 1) +
geom_point(size = 3) +
facet_grid(~Topics) +
coord_flip() +
labs(x = "Rank", y = "Topic Proportion")
We may see the following warning:
Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0. Please use `linewidth` instead.
The geom_errorbar()
function uses lines
and therefore we need to change the size
argument there:
df_effs |>
ggplot(aes(x = rank, y = proportion)) +
geom_errorbar(aes(ymin = lower, ymax = upper),
width = 0.1, linewidth = 1) +
geom_point(size = 3) +
facet_grid(~Topics) +
coord_flip() +
labs(x = "Rank", y = "Topic Proportion")
A numeric `legend.position` argument in `theme()` was deprecated
In Chapter 11 (p. 233), when we run:
df_plot |>
ggplot(aes(x = as.Date(date), y = value, color = name)) +
geom_smooth(aes(linetype = name)) +
scale_linetype_manual(values = c("twodash", "solid")) +
labs(x = NULL, y = "Average Similarity to Press Releases") +
guides(linetype = guide_legend(nrow = 2)) +
theme(legend.position = c(.65, .1)) +
facet_wrap(~lean)
We may see the following warning:
A numeric `legend.position` argument in `theme()` was deprecated in ggplot2 3.5.0. Please use the `legend.position.inside` argument of `theme()` instead.
The fix is pretty straightforward from the warning message:
df_plot |>
ggplot(aes(x = as.Date(date), y = value, color = name)) +
geom_smooth(aes(linetype = name)) +
scale_linetype_manual(values = c("twodash", "solid")) +
labs(x = NULL, y = "Average Similarity to Press Releases") +
guides(linetype = guide_legend(nrow = 2)) +
theme(legend.position.inside = c(.65, .1)) +
facet_wrap(~lean)
Chapter 12: Extended Deductive
unused arguments
In Chapter 12 (p. 244), when we run:
y_trn <- to_categorical(df_trn$label, num_classes = 3)
We may see the following error:
Error in (function (x, num_classes = NULL) :
unused arguments (y =
This is odd because y
is the primary argument in the function. It seems moving to an upgraded version of the keras
package for R
called keras3
does the trick. After installing keras3
, we will need to start with a fresh R
session and then replace library(keras)
with library(keras3)
. That should do it!
could not find function "replace_non_ascii"
In Chapter 12 (p. 253), when we run:
df_shake <- df_shake |>
mutate(gutenberg_id = as.character(gutenberg_id)) |>
rename(line = text) |>
mutate(text = replace_non_ascii(line),
text = replace_curly_quote(text),
text = replace_contraction(text),
text = gsub("[[:punct:]]+", " ", text),
text = gsub("[[:digit:]]+", " ", text),
text = tolower(text),
text = str_squish(text)) |>
filter(text != "")
We see the following error:
In argument: `text = replace_non_ascii(line)`.
Caused by error in `replace_non_ascii()`:
! could not find function "replace_non_ascii"
We need to include library(textclean)
when we load libraries for the session.
as.edgelist.sna input must be an adjacency matrix/array
In Chapter 12 (p. 263), when we run:
get_betweenness(doc_proj) |>
slice_max(centrality, n = 4, with_ties = FALSE)
We see the following error:
as.edgelist.sna input must be an adjacency matrix/array, edgelist matrix, network, or sparse matrix, or list thereof.
We will also see a warning:
`graph.adjacency()` was deprecated in igraph 2.0.0. Please use `graph_from_adjacency_matrix()` instead.
The get_betweeness()
function is one we defined in the session (also on p. 263), so we’ll need to tweak it a bit. Here is the original:
get_betweenness <- function(x) {
gr <- graph.adjacency(x, mode = "undirected",
weighted = TRUE, diag = FALSE)
E(gr)$weight <- 1 / E(gr)$weight
df <- data.frame(centrality = betweenness(gr))
return(df)
}
The culprit of the error is the order we loaded the packages. The igraph
package is loaded and then the sna
package, but they both have a function called betweenness()
. This means the betweenness()
from the sna
package is masking the betweenness()
from the igraph
package. We can fix it either by reordering how we load the packages, or using an explicit call with the double colon operator. Here is the updated version of our own function:
get_betweenness <- function(x) {
gr <- graph_from_adjacency_matrix(x, mode = "undirected",
weighted = TRUE, diag = FALSE)
E(gr)$weight <- 1 / E(gr)$weight
df <- data.frame(centrality = igraph::betweenness(gr))
return(df)
}