Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use code to generate Unicode-LaTeX character mapping table #223

Merged
merged 7 commits into from
May 28, 2024
Merged

Conversation

nanxstats
Copy link
Collaborator

Fixes #218

This PR creates an internal function in R/utils.R to generate the mapping table into R/unicode_latex.R.

This eliminates the need for using the binary file sysdata.rda and is more friendly for version control.

The new, code-generated data frame is bitwise identical to the version saved in sysdata.rda, except that the int column is of class integer, not numeric.

Data ingestion issue worth following up

You might want to check the data ingestion logic. I found no evidence on how the previous version was constructed. I used some ad hoc logic to get an identical version of the table, but it would be good to check if the data included in the previous version is reasonable, or what specific filters were applied. For example, from the beginning, without using quote = "" in read.table(), it will give:

Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

This will result in only 1740 rows vs. 2757 rows when using quote = "", which avoids the warning.

@nanxstats nanxstats requested a review from elong0527 May 28, 2024 03:04
@nanxstats
Copy link
Collaborator Author

@yihui in case you got a minute to review

Copy link
Collaborator

@yihui yihui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I'd prefer using a matrix to write the data, which is a little more compact than the data frame.

Second, I wonder if it's worth the effort to make the file R/unicode_latex.R human-readable. If not, we could consider just dump() the data frame in update_unicode_latex().

I don't have a strong opinion on either point. It's fine to merge the current PR as is.

R/utils.R Outdated Show resolved Hide resolved
R/utils.R Outdated Show resolved Hide resolved
@nanxstats
Copy link
Collaborator Author

First, I'd prefer using a matrix to write the data, which is a little more compact than the data frame.

Second, I wonder if it's worth the effort to make the file R/unicode_latex.R human-readable. If not, we could consider just dump() the data frame in update_unicode_latex().

I don't have a strong opinion on either point. It's fine to merge the current PR as is.

Great! Thanks. I've applied the changes and updated the table. The matrix version is exactly what we need to be less tedious. How I hoped there could be a row-wise data frame constructor in base. 😂

Making it human-readable seems to be manageable in this case, so let's just keep it that way.

Comment on lines +56 to +64
rows <- paste(
sprintf(
'"%s", "%s", %d',
tbl$unicode,
gsub("\\", "\\\\", tbl$latex, fixed = TRUE),
tbl$int
),
sep = ", "
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paste() is unnecessary (commas have been added in sprintf()).

Suggested change
rows <- paste(
sprintf(
'"%s", "%s", %d',
tbl$unicode,
gsub("\\", "\\\\", tbl$latex, fixed = TRUE),
tbl$int
),
sep = ", "
)
rows <- sprintf(
'"%s", "%s", %d',
tbl$unicode,
gsub("\\", "\\\\", tbl$latex, fixed = TRUE),
tbl$int
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Patched in another PR: #224

Copy link
Collaborator

@elong0527 elong0527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for improving the transparency the the source data!

@nanxstats nanxstats merged commit 1373f0d into master May 28, 2024
8 checks passed
@nanxstats nanxstats deleted the sysdata branch May 28, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace Unicode / LaTeX conversion table with R script
3 participants