Skip to content

A mock-up set of text parsers written in R script to handle real-world tasks efficiently (not limited to programming tasks).

License

Notifications You must be signed in to change notification settings

btklab/rlang-mocks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rlang-mocks

A mock-up cli script set of R: The R Project for Statistical Computing that filter text-object input from the pipeline(stdin) and return text-object.

  • For use in UTF-8 Japanese environments on windows.
  • For my personal work and hobby use.
  • Note that the code is spaghetti (due to my technical inexperience).
  • Insufficient tests and error handlings.

script list:

# one-liner to create function list for PowerShell
(cat README.md | sls '^#### \[[^[]+\]').Matches.Value.Replace('#### ','') -join ", " | Set-Clipboard

主に現実世界の不定形文字列に対してパターンマッチング処理を行うためのフィルタ群。基本的な入力として、UTF-8+半角スペース区切り+行指向のパイプライン経由文字列データ(テキストオブジェクト)を期待する。

src下のファイルは1ファイル1関数。基本的に他の関数には依存しないようにしているので、関数ファイル単体を移動して利用することもできる。(一部の関数は他の関数ファイルに依存しているものもある)

充分なエラー処理をしていないモックアップ。事務職(非技術職)な筆者の毎日の仕事(おもに文字列処理)を、簡単便利に楽しくさばくための道具としてのコマンドセット。

Install functions

  1. Put *.R files under the src directory at any location.
  2. Set terminal input/output encoding to UTF-8
    • The functions expect UTF-8 encoded input, so if you want to run them on PowerShell in a Japanese environment, make sure the encoding is ready in advance.
    • if you use PowerShell, run the following dot sourcing command
      • . path/to/rlang-mocks/operator.ps1

関数群はUTF-8エンコードされた入力を期待するので、 関数実行前にカレントプロセスのエンコードをUTF-8にしておくとよい。

# for PowerShell
# install favorite functions for japanese environment
# set encode
if ($IsWindows){
    chcp 65001
    [System.Console]::OutputEncoding = [System.Text.Encoding]::GetEncoding("utf-8")
    [System.Console]::InputEncoding  = [System.Text.Encoding]::GetEncoding("utf-8")
    # compartible with multi byte code
    $env:LESSCHARSET = "utf-8"
}
# for PowerShell
# or sourcing dot files
. path/to/rlang-mocks/operator.ps1

Description of each functions

各関数の挙動と作った動機と簡単な説明。

Show functions

None

Multipurpose

rcalc.R - Cli rscript executer

  • Usage
    • man: Rscript rcalc.R [-h]
    • Rscript rcalc.R -f <formula> [-d <delim>] [opts...]
  • Library
    • require: optparse
    • optional: ggplot2, tidyverse

Synopsis:

$ Rscript rcalc.R
Usage: Rscript rcalc.R -f <formula> [-d <delim>] [opts...]
  --delim,      -d :delimiter
  --formula,    -f :formula
  --input,      -i :input data file
  --noheader       :no heder input
  --colnames,   -c :data column names separated with comma
  --colclasses     :data column clasees (character,integer,numeric,factor,etc...)
  --library,    -l :import libraries separated with comma
  --plot,       -p :show plot
  --output,     -o :output plot image
  --factor         :stringsAsFactors = TRUE
  --debug          :print debug

Examples:

# simple usage
cat iris.csv | Rscript rcalc.R -f 'summary(df)' -d ','
# rename column names
cat iris.csv | Rscript rcalc.R -f 'df |> head()' -d ',' -l 'tidyverse' -c sl,sw,pl,pw,sp
# set colnames and colclasses
Rscript rcalc.R -f 'sapply(df, class)' -i iris.csv -d ',' -c 'sl,sw,pl,pw,sp' --colclasses 'numeric,numeric,numeric,numeric,factor'
         sl          sw          pl          pw          sp
  "numeric"   "numeric"   "numeric"   "numeric" "character"
# use built-in examples
echo 1 | Rscript rcalc.R -f 'letters'
# import libraries
cat iris.csv | Rscript rcalc.R -f 'summary(df)' -d ',' -l ggplot2,optparse
echo 1 | Rscript rcalc.R -f 'iris %>% group_by(Species) %>% summarise(mean = mean(Petal.Length))' -d ',' -l dplyr

# install libraries
Rscript -e 'install.packages("palmerpenguins", repos="https://cran.r-project.org/")'
echo 1 | Rscript rcalc.R -f 'install.packages("palmerpenguins", repos="https://cran.r-project.org/")'
# output package startup message (-m|--message)
echo 1 | Rscript rcalc.R -f 'letters' -l tidyverse -m
# output csv
cat iris.csv | Rscript rcalc.R -f 'summary(df);write.csv(df,"",quote=FALSE)' -d ','
# plot
cat iris.csv | Rscript rcalc.R -f 'plot(df)' -d ',' --plot
# ggplot2 : using print() and --plot
cat iris.csv | Rscript rcalc.R -d ',' -f "p <- ggplot(data=df,mapping=aes(x=sepal_length,y=sepal_width))+layer(geom='point',stat='identity',position='identity');print(p)" -l ggplot2 --plot

the above is equivalent to the following Rscript (RScript a.R). note that unavailable multibyte character in rscript

library(ggplot2)
X11()
p <- ggplot(data=iris,mapping=aes(x=Sepal.Length,y=Sepal.Width))+
    layer(geom='point',stat='identity',position='identity')
print(p)
Sys.sleep(1000)
dev.off()
## ggplot2 another example: histogram
cat iris.csv | Rscript rcalc.R -d ',' -f "p <- ggplot(iris)+geom_histogram(aes(Petal.Length, fill=Species), binwidth=0.5)+facet_wrap(~Species);print(p)" -l ggplot2 --plot

## ggplot2 another example: histogram using gghighlight
cat iris.csv | Rscript rcalc.R -d ',' -f "p <- ggplot(iris)+geom_histogram(aes(Petal.Length, fill = Species), binwidth = 0.5)+gghighlight()+facet_wrap(~ Species);print(p)" -l ggplot2,gghighlight --plot

## ggplot2 another example: geom_point using gghighlight
cat iris.csv | Rscript rcalc.R -d ',' -f "p <- ggplot(data=df,mapping=aes(x=sepal_length,y=sepal_width,colour=species))+geom_point()+gghighlight(grepl('^v',species));print(p)" -l ggplot2,gghighlight --plot
# save plot as image file
cat iris.csv | Rscript rcalc.R -f 'hist(df$sepal_length)' -d ',' -o a.png
cat iris.csv | Rscript rcalc.R -f 'hist(df$sepal_length)' -d ',' -o a.png -s 400,300
cat iris.csv | Rscript rcalc.R -f 'hist(df$sepal_length)' -d ',' -o a.jpg -s 400,300
cat iris.csv | Rscript rcalc.R -f 'hist(df$sepal_length)' -d ',' -o a.bmp -s 6,4
cat iris.csv | Rscript rcalc.R -f 'hist(df$sepal_length)' -d ',' -o a.pdf
cat iris.csv | Rscript rcalc.R -f 'hist(df$sepal_length)' -d ',' -o a.eps
# eval external R source file

# install.packages("palmerpenguins")
# from: https://allisonhorst.github.io/palmerpenguins/
echo 1 | Rscript rcalc.R -f 'install.packages("palmerpenguins", repos="https://cran.r-project.org/")'

# eval palmerpenguins
echo 1 | Rscript rcalc.R -f a.R -l 'palmerpenguins,ggplot2' --plot
         Rscript rcalc.R -f a.R -l 'palmerpenguins,ggplot2' --plot -i penguins.csv -d ','
## palmer penguins by allison horst
echo 1 | Rscript rcalc.R -f 'penguins %>% count(species)' -l 'palmerpenguins,tidyverse'
## # A tibble: 3 x 2
##   species       n
##   <fct>     <int>
## 1 Adelie      152
## 2 Chinstrap    68
## 3 Gentoo      124
echo 1 | Rscript rcalc.R -f 'penguins %>% group_by(species) %>% summarize(across(where(is.numeric), mean, na.rm = TRUE))' -l 'palmerpenguins,tidyverse'
## # A tibble: 3 x 6
##   species   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
##   <fct>              <dbl>         <dbl>             <dbl>       <dbl> <dbl>
## 1 Adelie              38.8          18.3              190.       3701. 2008.
## 2 Chinstrap           48.8          18.4              196.       3733. 2008.
## 3 Gentoo              47.5          15.0              217.       5076. 2008.
echo 1 | Rscript rcalc.R -f "flipper_hist <- ggplot(data=penguins, aes(x=flipper_length_mm))+geom_histogram(aes(fill=species), alpha = 0.5, position='identity')+scale_fill_manual(values=c('darkorange','purple','cyan4'))+theme_minimal()+labs(x='Flipper length (mm)', y='Frequency', title='Penguin flipper lengths');print(flipper_hist)" -l 'palmerpenguins,ggplot2' --plot
 `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning message:
Removed 2 rows containing non-finite values (stat_bin).

Math

rmatcalc.R - Cli matrix calculator by connecting with pipes

Synopsis:

$ Rscript rmatcalc.R
Usage: Rscript rmatcalc.R -f <formula> [-d <delim>] [opts...]
  --formula,    -f :formula
  --delim,      -d :delimiter
  --input,      -i :input data from file
  --dtype          :array data type. defalut=numeric
  --nowrap         :no wrap
  --debug          :print debug
# frequently usage
  A+B-A%*%B : sum, difference, product
  A * B     : Multiply element by element
  A %o% B   : cross product
  A %x% B   : kronecker product

# Functions
  t(A)
  solve(A)
  eigen(A)
  det(A)
  diag(n)
  diag(a1:an)
  sum(x^n)
  crossprod(A)
  crossprod(x,y)
  rowSums(A)
  solSums(A)
  rowMeans(A)
  colMeans(A)
  x[upper.tri(A)] <- n
  x[lower.tri(A)] <- n

Examples:

# input example:
$ cat matrix
A 1 2
A 3 4
B 4 3
B 2 1

# calc example:
$ cat matrix | Rscript rmatcalc.R -f 'A%*%B'
A 1 2
A 3 4
B 4 3
B 2 1
C 8 5
C 20 13

$ cat matrix | Rscript rmatcalc.R -f 'A*B'
A 1 2
A 3 4
B 4 3
B 2 1
C 4 6
C 6 4
# transpose:
$ cat matrix | Rscript rmatcalc.R -f 't(A)'
A 1 2
A 3 4
B 1 3
B 2 4

# solve:
$ cat matrix | Rscript rmatcalc.R -f 'A%*%solve(A)'
A 1 1
A 2 4
B 1 0
B 0 1

# add new label to ans:
$ cat matrix | Rscript rmatcalc.R -f 'C=A*B'
A 1 2
A 3 4
B 4 3
B 2 1
C 4 6
C 6 4
# determinant:
#   To output a single (non-matrix) value,
#   multiply by diag(1)
$ cat matrix | Rscript rmatcalc.R -f 'C=det(A)*diag(1)'
A 2 -6 4
A 7 2 3
A 8 5 -1
C -144

# rank:
$ cat matrix | Rscript rmatcalc.R -f 'C=qr(A)$rank*diag(1)'
A 2 -6 4
A 7 2 3
A 8 5 -1
C 3

# chain calc using pipe:
$ cat matrix | Rscript rmatcalc.R -f 'C=A*B' | Rscript rmatcalc.R -f 'E=C%*%A'
A 1 2
A 3 4
B 4 3
B 2 1
C 4 6
C 6 4
E 22 32
E 18 28

Image processing

sketch.R - A wrapper script of "sketcher" library

  • Usage
    • man: Rscript sketch.R [-h]
    • Rscript sketch.R -i a.png [--debug] [[param] [param] ... [param]]
      • --input, -i :An input image
      • --style :Sketch style: 1 or 2, defalut: 1
      • --lineweight :Strength of lines: >=0.3, default: 1
      • --smooth :Smoothness of texture: >=0, default: 1
      • --gain, :Gain parameter: betw 0 and 1, default: 0.02
      • --contrast :Contrast parameter: >=0, default: 20(for style1) or 4(for style2)
      • --shadow :Shadow threshold: betw 0 and 1, defalut 0.0
      • --maxsize :Max resolution of output: >0, default: 2048
      • --output, -o :output file, default: NA
      • --debug :print debug, default: FALSE
  • Dependency
    • require: optparse, sketcher

Examples:

# Case1: Outline is missing and texture is lacking
Rscript sketch.R -i a.png --style 2 --shadow 0.4

# Case2: Due to the lack of edges in the dark region of the face
Rscript sketch.R -i a.png --shadow 0.4

# Case3: Neko. objects have unclear edges/outlines
Rscript sketch.R -i a.png --smooth 0

CREDITS

About

A mock-up set of text parsers written in R script to handle real-world tasks efficiently (not limited to programming tasks).

Resources

License

Stars

Watchers

Forks

Packages

No packages published