purrr 0.2.3を使ってみる - Technically, technophobic.

なんか意外と変更点多いなと思ったのでリリースノートを眺めてみます。

Breaking changes

`reduce()`

If reduce() fails with this message: Error: `.x` is empty, and no `.init` supplied, this is because reduce() now returns .init when .x is empty. Fix the problem by supplying an appropriate argument to .init, or by providing special behaviour when .x has length 0.

reduce()は.x（畳み込む対象）も.init（初期値）も空の時はエラーになるようになりました。

reduce(numeric(0), `*`)
#> Error: `.x` is empty, and no `.init` supplied

これはRの組み込み関数Reduce()の挙動と違います。

Reduce(`*`, numeric(0))
#> NULL

エラーにしたくない時は.initを指定します。これは長さ0でもそのまま返ってきます（この挙動が正しいのかはよくわからない…）。

reduce(numeric(0), `*`, .init = numeric(0))
#> numeric(0)

`is_*()`がrlangに移った

The type predicates have been migrated to rlang. Consequently the bare-type-predicates documentation topic is no longer in purrr, which might cause a warning if you cross-reference it.

type predicates（is_*()）がrlangに移されました。とはいえ、purrrの中でも再度exportされているので、それほど心配することはないと思います。ここに書かれているように、ドキュメント中でリファレンスを張っている場合は修正しないとリンク切れになるかもです。

purrr::is_bare_atomic
#> function (x, n = NULL) 
#> {
#>     !is.object(x) && is_atomic(x, n)
#> }
#> <bytecode: 0x0000000002f16b18>
#> <environment: namespace:rlang>

Dependencies

purrr no longer depends on lazyeval or Rcpp (or dplyr, as of the previous version).

purrrの依存関係がシンプルになりました。

で、これはいいとして、問題は次です。いくつか関数が消えているのと名前が変わっています。注意しましょう。

order_by(), sort_by() and split_by() have been removed.

contains() has been renamed to has_element()

split_by()たまに使ってたのに…

`pluck()`

pluck()はネストされたオブジェクトから要素を取り出すための新しい関数です。

pluck(.x, ..., .default = NULL)

というシグネチャになっていて、...の部分には、

数字
文字列
関数

が指定できます。数字や文字列を指定した場合は[[にそれを指定したのと同じように動きます。こんな感じです：

y <- list(list(a = 1))
pluck(y, 1, "a")
#> [1] 1

これはy[[1]]$aと等価です。

対象がリストであれば単純ですが、世の中にはいろんな要素があります。例えば、以下のように複雑なオブジェクトだとどうでしょう。

obj1 <- list("a", list(1, elt = "foobar"))
obj2 <- list("b", list(2, elt = "foobaz"))
x <- list(obj1, obj2)

関心があるのがオブジェクトのeltという要素だけの場合、それを取り出す関数を定義したりします。これをpluck()に渡すと、[[ではなくその関数が要素を取り出すために使われます。

get_elt <- function(x) x[[2]]$elt

pluck(x, 1, get_elt)
#> [1] "foobar"

上のコードはget_elt(x[[1]])と等価になります。

Map helpers

`as_mapper()`

as_function() is now as_mapper() because it is a tranformation that makes sense primarily for mapping functions, not in general (#298).

as_function()はas_mapper()という名前になりました。as_function()もまだ残っていますが、使うと以下のようなwarningが出ます。

#> Warning message:
#> `as_function()` is deprecated; please use `as_mapper()` or `rlang::as_function()` instead

ちなみにrlang::as_function()の方は何が違うかというと、as_mapper()と違って数値や文字列から関数をつくることはできません。

purrr::as_mapper(1)
#> function (x, ...) 
#> pluck(x, list(1), .default = NULL)
#> <environment: 0x0000000003fd41f0>

rlang::as_function(1)
#> Error: Can't convert a double vector to function

あと、プリミティブ関数もいい感じに扱ってくれるようになりました。

purrr::as_mapper(`-`)
#> function (.x, .y) 
#> if (missing(.y)) -.x else .x - .y
#> <bytecode: 0x000000000279aaf8>
#> <environment: 0x0000000002fa03d0>

lang::as_function(`-`)
#> function (e1, e2)  .Primitive("-")

環境とS4オブジェクトの扱い

Recursive indexing can now extract objects out of environments (#213) and S4 objects (#200), as well as lists.

環境やS4オブジェクトからの要素の展開もリストと同じようにできるようになりました。例えば、以下のようにすればpurrrとrlangからそれぞれのas_function()を取り出せます。

map(c("purrr", "rlang"), asNamespace) %>% map("as_function")
#> [[1]]
#> <promise: 0x00000000147cbda8>
#> 
#> [[2]]
#> <promise: 0x00000000094fe7b0>

`attr_getter()`

attr_getter() makes it possible to extract from attributes like map(list(iris, mtcars), attr_getter("row.names")).

要素からattributeを取り出すためのヘルパ関数です。

map(list(iris, mtcars), attr_getter("row.names")) %>% map(head)
#> [[1]]
#> [1] 1 2 3 4 5 6
#> 
#> [[2]]
#> [1] "Mazda RX4"         "Mazda RX4 Wag"     "Datsun 710"        "Hornet 4 Drive"    "Hornet Sportabout" "Valiant"

3つ以上の引数の扱い

The argument list for formula-functions has been tweaked so that you can refer to arguments by position with ..1, ..2, and so on. This makes it possible to use the formula shorthand for functions with more than two arguments (#289).

mapする対象が1つの時はmap()、2つのときはmap2()を使いますが、3つ以上の時にはpmap()が使えます。この中で指定する関数は引数を3つ以上取りますが、そういうときのために..1、..2を引数のプレースホルダとして使えます。

pmap_chr(list("I", "am", letters), ~ paste(..1, ..2, ..3))
#>  [1] "I am a" "I am b" "I am c" "I am d" "I am e" "I am f" "I am g" "I am h" "I am i" "I am j" "I am k" "I am l" "I am m"
#> [14] "I am n" "I am o" "I am p" "I am q" "I am r" "I am s" "I am t" "I am u" "I am v" "I am w" "I am x" "I am y" "I am z

`possibly()`と`safely()`中のCtrl+C

possibly(), safely() and friends no longer capture interrupts: this means that you can now terminate a mapper using one of these with Escape or Ctrl + C (#314)

ちゃんとインタラプトできるようになりました。

Map functions

`NULL`の扱い

All map functions now treat NULL the same way as an empty vector (#199), and return an empty vector if any input is an empty vector.

こういう感じです。

map(NULL, 1)
#> list()

`map()`と遅延評価

All map() functions now force their arguments in the same way that base R does for lapply() (#191).

これはあれですね、興味ある人は徹底解説読んでね、というやつですね。lapply()と遅延評価の問題です。

`imap()`

A new family of “indexed” map functions, imap(), imap_lgl() etc, provide a short-hand for map2(x, names(x)) or map2(x, seq_along(x)) (#240).

これ欲しかったやつだ！名前付きベクトルは名前が、名前なしベクトルはインデックスが.yに入ります。

imap(iris, ~ sprintf("%s is %s", .y, class(.x)))
#> $Sepal.Length
#> [1] "Sepal.Length is numeric"
#> 
#> $Sepal.Width
#> [1] "Sepal.Width is numeric"
#> 
#> $Petal.Length
#> [1] "Petal.Length is numeric"
#> 
#> $Petal.Width
#> [1] "Petal.Width is numeric"
#> 
#> $Species
#> [1] "Species is factor"

`_df()`系関数がdeprecatedに

The data frame suffix _df has been (soft) deprecated in favour of _dfr to more clearly indicate that it’s a row-bind.

_dfというのはmap_df()とかの話です。map_df()は結果をrbind（行方向に束ねる）関数でしたが、cbind（列方向に束ねる）するmap_dfc()もできたので、こっちと区別するためにmap_dfr()に名前を変えたのでそっちを使ってね、ということのようです。soft deprecatedってなんだよ、優柔不断かよ…

map_dfr(1:10, ~ list(a = .))
#> # A tibble: 10 x 1
#>        a
#>    <int>
#>  1     1
#>  2     2
#>  3     3
#>  4     4
#>  5     5
#>  6     6
#>  7     7
#>  8     8
#>  9     9
#> 10    10

map_dfc(1:10, ~ list(a = .))
#> # A tibble: 1 x 10
#>       a    a1    a2    a3    a4    a5    a6    a7    a8    a9
#>   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1     1     2     3     4     5     6     7     8     9    10

These will not be terribly useful until dplyr::bind_rows()/dplyr::bind_cols() have better semantics for vectors.

という但し書きがあるので、今後、挙動が変わることもあるのかもしれません。

Modify functions

`modify()`

A new modify() family returns the same output of the type as the input .x.

modify()はmap()と似てますが、元のオブジェクトの型や構造を保持します。map()と同じく様々なバリエーションがあります。

以下のように、map()だとリストに変換されてしまうところが、modify()だとデータフレームのままです。

iris %>%
  map_if(is.factor, as.character) %>%
  str()
#> List of 5
#>  $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...

iris %>%
  modify_if(is.factor, as.character) %>%
  str()
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...

at_depth() has been renamed to modify_depth().

modify_depth() gains new .ragged argument, and negative depths are now computed relative to the deepest component of the list (#236).

modify()はdepthで深さを指定できるみたいで便利っぽいんですが、まだいまいち理解できていないので説明は割愛します。すみません。。

New functions

`auto_browse()`

auto_browse(f) returns a new function that automatically calls browser() if f throws an error (#281).

エラーが出たらbrowse()を開始。デバッグ時に役立つみたいです。

`vec_depth()`

vec_depth() computes the depth (i.e. the number of levels of indexing) or a vector (#243).

一番深い要素は何階層下にあるかを返す関数です。vec_とついていますが、リストに使えます。Examplesに載ってるのはこんな感じ：

x <- list(
  list(),
  list(list()),
  list(list(list(1)))
)
vec_depth(x)
#> [1] 5
x %>% map_int(vec_depth)
#> [1] 1 2 4

`reduce2()`、`reduce2_right()`

reduce2() and reduce2_right() make it possible to reduce with a 3 argument function where the first argument is the accumulated value, the second argument is .x, and the third argument is .y (#163).

ややこしい…。今のところpreduce()はないようです。ireduce()とかだと使いどころありそうな予感。

quote2 <- function(x, y, z) glue::glue('{y} {z} said "{x}"')
reduce2(c("Alice", "Bob"), c("Cooper", "Dylan"), quote2, .init = "Yeah!")
#> Bob Dylan said "Alice Cooper said "Yeah!""

`list_modify()`、`list_merge()`

list_modify() extends stats::modifyList() to replace by position if the list is not named.(#201). list_merge() operates similarly to list_modify() but combines instead of replacing (#322).

list_modify()はこれまでのupdate_list()と同じようなものです。じゃあupdate_list()でいいのでは？と言いたくなりますがこれは次で説明するように将来的にdeprecatedになるようです。

list_merge()は、指定した要素を元の要素と置き換えるのではなくマージする関数です。こんな感じ：

x <- list(a = 1, b = 2)
list_modify(x, a = 2, c = 3)
#> $a
#> [1] 2
#> 
#> $b
#> [1] 2
#> 
#> $c
#> [1] 3
#> 

list_merge(x, a = 2, b = 3)
#> $a
#> [1] 1 2
#> 
#> $b
#> [1] 2
#> 
#> $c
#> [1] 3
#>

`update_list()`がdeprecatedになりそう

The legacy function update_list() is basically a version of list_modify that evaluates formulas within the list. It is likely to be deprecated in the future in favour of a tidyeval interface such as a list method for dplyr::mutate().

dplyr::mutate()とかがいずれ（いつだ？）リストも扱えるようになるのでupdate_list()はもういいよね、と。formula使えるの知らなかった…

Minor improvements and bug fixes

気になったところだけ。

cross_n() has been renamed to cross().

cross_n()がcross()になった。

is_numeric() and is_scalar_numeric() are deprecated because they don’t test for what you might expect at first sight.

is_numeric()とis_scalar_numeric()がdeprecatedに。

Deprecated functions flatmap(), map3(), map_n(), walk3(), walk_n(), zip2(), zip3(), zip_n() have been removed.

Deprecatedになってた関数が消えた。

まとめ

0.2.3はマイナーアップデートだと思ってたらけっこう変わってました。こわい…

Breaking changes

reduce()

is_*()がrlangに移った

Dependencies

pluck()

Map helpers

as_mapper()

環境とS4オブジェクトの扱い

attr_getter()

3つ以上の引数の扱い

possibly()とsafely()中のCtrl+C

Map functions

NULLの扱い

map()と遅延評価

imap()

_df()系関数がdeprecatedに