-z noexecstack
for compilation on Linux to avoid the error
version GLIBC_2.39' not found
(#1212).$describe_plan()
and $describe_optimized_plan()
are removed. Use
respectively $explain(optimized = FALSE)
and $explain()
instead (#1182).inherit_optimization
is removed from all functions that had it
(#1183).$write_parquet()
and $sink_parquet()
, the parameter data_pagesize_limit
is renamed data_page_size
(#1183).$get_optimization_toggle()
is removed, and
$set_optimization_toggle()
is renamed $optimization_toggle()
(#1183).$unpivot()
, the parameter streamable
is removed (#1183).future
that determines the compatibility level
when exporting Polars' internal data structures. This parameter is renamed
compat_level
, which takes FALSE
for the oldest flavor (more compatible)
and TRUE
for the newest one (less compatible). It can also take an integer
determining a specific compatibility level when more are added in the future.
For now, future = FALSE
can be replaced by compat_level = FALSE
(#1183).$scan_parquet()
and $read_parquet()
, the default value of
hive_partitioning
is now NULL
(#1189).$dt$epoch()
, the argument tu
is renamed to time_unit
(#1196).$fill_nan()
for DataFrame
, LazyFrame
and Expr
, the argument is
renamed value
(#1198).$shift_and_fill()
is removed and replaced by a new argument fill_value
in
$shift()
. $shift_and_fill(fill_value, periods)
can be replaced by
$shift(n, fill_value)
(#1201).$shift()
for various Expr
, the argument periods
is renamed n
(#1201).$clip()
, arguments min
and max
are renamed lower_bound
and
upper_bound
(#1203).$clip_min()
and $clip_max()
are removed. Use $clip()
with only
lower_bound
or upper_bound
instead (#1203).$write_csv
and $sink_csv()
, the argument quote
is renamed
quote_char
(#1206).$str$extract_many()
(#1163).nanoarrow_array
with zero rows to an RPolarsDataFrame
via
as_polars_df()
now keeps the original schema (#1177).$write_parquet()
has two new arguments partition_by
and
partition_chunk_size_bytes
to write a DataFrame
to a hive-partitioned
directory (#1183).$bin$size()
(#1183).$scan_parquet()
and $read_parquet()
, the parallel
argument can take
the new value "prefiltered"
(#1183).$scan_parquet()
, $scan_ipc()
and $read_parquet()
have a new argument
include_file_paths
to automatically add a column containing the path to the
source file(s) (#1183).$scan_ipc()
can read a hive-partitioned directory with its new arguments
hive_partitioning
, hive_schema
, and try_parse_hive_dates
(#1183).$scan_parquet()
and $read_parquet()
gain two new arguments for more control
on importing hive partitions: hive_schema
and try_parse_hive_dates
(#1189).$gather_every()
for LazyFrame
and DataFrame
(#1199).$glimpse()
for DataFrame
has two new arguments max_items_per_column
and
max_colname_length
(#1200).$list$sample()
(#1204).coalesce
in $join_asof()
(#1205).maintain_order
in $list$unique()
(#1207).$unnest()
for DataFrame
and LazyFrame
, the names
argument is removed
and replaced by ...
. This doesn't change the previous behavior, e.g.
df$unnest(names = c("a", "b"))
still works (#1170).$n_chunks()
, the default value of strategy
now is "first"
(#1137).$sample()
for Expr and DataFrame (#1136):
frac
is renamed fraction
;n
must be named;n
(it was already the
case for the DataFrame method);with_replacement
is now
FALSE
(it was already the case for the DataFrame method).$melt()
had several changes (#1147):
melt()
is renamed $unpivot()
.id_vars
is now index
, value_vars
is now
on
.on
is now first, then index
. The
order of the other arguments hasn't changed. Note that on
can be unnamed
but all the other arguments must be named.pivot()
had several changes (#1147):
columns
is renamed on
.on
is now first, then index
and
values
. The order of the other arguments hasn't changed. Note that on
can be unnamed but all the other arguments must be named.$write_parquet()
and $sink_parquet()
, the default value of argument
statistics
is now TRUE
and can take other values than TRUE/FALSE
(#1147).$dt$truncate()
and $dt$round()
, the argument offset
has been removed.
Use $dt$offset_by()
after those functions instead (#1147).$top_k()
and $bottom_k()
for Expr
, the arguments nulls_last
,
maintain_order
and multithreaded
have been removed. If any null
values
are in the top/bottom k
values, they will always be positioned last (#1147).$replace()
has been split in two functions depending on the desired
behaviour (#1147):
$replace()
recodes some values in the column, leaving all other values
unchanged. Compared to the previous version, it doesn't use the arguments
default
and return_dtype
anymore.$replace_strict()
replaces all values by different values. If a value
doesn't have a specific mapping, it is replaced by the default
value.$str$concat()
is deprecated, use $str$join()
(with the same arguments)
instead (#1147).pl$date_range()
and pl$date_ranges()
, the arguments time_unit
and
time_zone
have been removed. They were deprecated in previous versions
(#1147).$join()
, when how = "cross"
, on
, left_on
and right_on
must be
NULL
(#1147).$has_nulls()
(#1133).$list$explode()
(#1139).$over()
gains a new argument order_by
to specify the order of values
within each group. This is useful when the operation depends on the order of
values, such as $shift()
(#1147).$value_counts()
gains an argument normalize
to give relative frequencies
of unique values instead of their count (#1147).$join()
, there is a new argument coalesce
and the how
options now
accept "full"
instead of "outer"
and "outer_coalesce"
.$top_k()
and $bottom_k()
gain three arguments nulls_last
,
maintain_order
and multithreaded
.$rolling_*()
functions lose the arguments by
, closed
and
warn_if_unsorted
. Rolling computations based on by
must be made via the
corresponding rolling_*_by()
, e.g rolling_mean_by()
instead of
rolling_mean(by =)
(#1115).pl$scan_parquet()
and pl$read_parquet()
gain an argument glob
which
defaults to TRUE
. Set it to FALSE
to avoid considering *
as a globing
pattern.$is_not_nan()
on a null
value (NA
in R) now returns null
. Previously,
it returned TRUE
.$reshape()
, argument dims
is renamed dimensions
and there is a new
argument nested_type
specifying if the output should be of type List or
Array.$value_counts()
, all arguments must be named and there is a new argument
name
to specify the name of the output.projection_pushdown
), there is a new parameter cluster_with_columns
to
combine sequential independent calls to $with_columns()
.$str$explode()
is removed.check_sorted
argument is removed from $rolling()
and $group_by_dynamic()
.
Sortedness is now verified in a quick manner, so this argument is no longer needed
(pola-rs/polars#16494).$name$map()
stacks on Linux, so this method is deprecated and the document
is removed. Please use other methods like <LazyFrame>$rename(<function>)
instead (#1123).pl$Series
is changed (#1071).
The first argument is now name
, and the second argument is values
.$to_struct()
on an Expr is removed. This method is now only available for
Series
, DataFrame
, and in the $list
and $arr
subnamespaces. For example,
pl$col("a", "b", "c")$to_struct()
should be replaced with
pl$struct(c("a", "b", "c"))
(#1092).pl$Struct()
now only accepts named inputs and objects of class RPolarsField
.
For example, pl$Struct(pl$Boolean)
doesn't work anymore and should be named
like pl$Struct(a = pl$Boolean)
(#1053).$all()
and $any()
, the argument drop_nulls
is renamed ignore_nulls
,
and this argument must be named (#1050).$struct$with_fields()
(#1109) and new function pl$field()
to
be used in expressions in $struct$with_fields()
(#1113).RPolarsDataType
: $is_enum()
, $is_categorical()
,
$is_known()
, $is_string()
, $contains_views()
, $contains_categorical()
(#1112).$dt$combine()
, the arguments tm
and tu
are renamed time
and
time_unit
(#1116).rechunk
argument of pl$concat()
is changed from
TRUE
to FALSE
(#1125).$rename()
for LazyFrame and DataFrame, key-value pairs of names are changed to
old_name = "new_name"
instead of new_name = "old_name"
(#1129).$rename()
for LazyFrame and DataFrame, no argument is not allowed (#1129).$rolling_*()
functions, the arguments center
and ddof
must be
named (#1115).$rename()
for LazyFrame and DataFrame.
They are equivalent to polars.LazyFrame.rename(mapping: Callable[[str], str])
or polars.DataFrame.rename(mapping: Callable[[str], str])
in Python Polars (#1122, #1129).pl$read_ipc()
can read a raw vector of Apache Arrow IPC file (#1072).<DataFrame>$to_raw_ipc()
to serialize a DataFrame to a raw vector
of Apache Arrow IPC file format (#1072).<LazyFrame>$serialize()
to serialize a LazyFrame to a character
vector of JSON representation (#1073).pl$deserialize_lf()
to deserialize a LazyFrame from a character
vector of JSON representation (#1073).$str$head()
and $str$tail()
(#1074).nanoarrow::as_nanoarrow_array_stream()
and nanoarrow::infer_nanoarrow_schema()
for RPolarsSeries
(#1076).$dt$is_leap_year()
(#1077).as_polars_df()
and as_polars_series()
supports arrow::RecordBatchReader
(#1078).experimental
argument for as_polars_df(<ArrowTabular>)
, as_polars_df(<RecordBatchReader>)
,
as_polars_series(<nanoarrow_array_stream>)
, and as_polars_df(<nanoarrow_array_stream>)
(#1078).
If experimental = TRUE
, these functions switch to use
the Arrow C stream interface internally.
At this point, the performance is degraded under the expected use cases,
so the default is set to experimental = FALSE
.<SQLContext>$register_globals()
(#1064).$sql()
for DataFrame and LazyFrame (#1065).https://rpolars.github.io/
https://pola-rs.github.io/r-polars/
$cut()
and $qcut()
to bin continuous values into discrete categories (#1057).pl$scan_parquet()
and pl$read_parquet()
can read data from the internet by specifying a URL
to the first argument (#1056, @andyquinterom).pl$scan_parquet()
and pl$read_parquet()
gain an argument storage_options
to scan/read data via cloud storage providers (GCP, AWS, Azure). Note that this
support is experimental (#1056, @andyquinterom).Enum
datatype via pl$Enum()
(#1061).This is a small hot-fix release to update dependent Rust polars to 0.39.1 (#1042).
Also, there are some updates.
$len()
now correctly includes null
values in the count (#1044).$arr$max()
and $arr$min()
work without the nightly
feature (#1042).R objects inside an R list are now converted to Polars data types via
as_polars_series()
(#1021, #1022, #1023). For example, up to polars 0.15.1,
a list containing a data.frame with a column of {clock}
naive-time class
was converted to a nested List type of Float64:
data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
pl$select(
nested_data = pl$lit(list(data))
)
#> shape: (1, 1)
#> ┌──────────────────────────┐
#> │ nested_data │
#> │ --- │
#> │ list[list[list[f64]]] │
#> ╞══════════════════════════╡
#> │ [[[2.1475e9], [7305.0]]] │
#> └──────────────────────────┘
From 0.16.0, nested types are correctly converted, so that will be a List type of Struct type containing a Datetime type.
data = data.frame(time = clock::naive_time_parse("1990-01-01", precision = "day"))
pl$select(
nested_data = pl$lit(list(data))
)
#> shape: (1, 1)
#> ┌─────────────────────────┐
#> │ nested_data │
#> │ --- │
#> │ list[struct[1]] │
#> ╞═════════════════════════╡
#> │ [{1990-01-01 00:00:00}] │
#> └─────────────────────────┘
Several functions have been rewritten to match the behavior of Python Polars. There are four types of changes: a) change in argument names, b) change in the way arguments are passed (named or by position), c) arguments are removed, and d) change in the default and accepted values. Those are addressed separately below.
Change in argument names:
$reshape()
, the dims
argument is renamed to dimensions
(#1019).pl$read_*
and pl$scan_*
functions, the first argument is now
source
(#935).pl$Series()
, the argument x
is renamed values
(#933).<DataFrame>$write_*
functions, the first argument is now file
(#935).<LazyFrame>$sink_*
functions, the first argument is now path
(#935).<LazyFrame>$sink_ipc()
, the argument memmap
is renamed to memory_map
(#1032).<DataFrame>$rolling()
, <LazyFrame>$rolling()
, <DataFrame>$group_by_dynamic()
and <LazyFrame>$group_by_dynamic()
, the by
argument is renamed to
group_by
(#983).$dt$convert_time_zone()
and $dt$replace_time_zone()
, the tz
argument is renamed to time_zone
(#944).$str$strptime()
, the argument datatype
is renamed to dtype
(#939).$str$to_integer()
(renamed from $str$parse_int()
), argument radix
is
renamed to base
(#1038).Change in the way arguments are passed:
In all input/output functions, all arguments except the first argument must be named arguments (#935).
In <DataFrame>$rolling()
and <DataFrame>$group_by_dynamic()
, all
arguments except index_column
must be named arguments (#983).
In $unique()
for DataFrame
and LazyFrame
, arguments keep
and
maintain_order
must be named (#953).
In $bin$decode()
, the strict
argument must be a named argument (#980).
In $dt$replace_time_zone()
, all arguments except time_zone
must be named
arguments (#944).
In $str$contains()
, the arguments literal
and strict
must be named
(#982).
In $str$contains_any()
, the ascii_case_insensitive
argument must be
named (#986).
In $str$count_matches()
, $str$replace()
and $str$replace_all()
,
the literal
argument must be named (#987).
In $str$strptime()
, $str$to_date()
, $str$to_datetime()
, and
$str$to_time()
, all arguments (except the first one) must be named (#939).
In $str$to_integer()
(renamed from $str$parse_int()
), all arguments
must be named (#1038).
In pl$date_range()
, the arguments closed
, time_unit
, and time_zone
must be named (#950).
In $set_sorted()
and $sort_by()
, argument descending
must be named
(#1034).
In pl$Series()
, using positional arguments throws a warning, since the
argument positions will be changed in the future (#966).
# polars 0.15.1 or earlier
# The first argument is `x`, the second argument is `name`.
pl$Series(1:3, "foo")
# The code above will warn in 0.16.0
# Use named arguments to silence the warning.
pl$Series(values = 1:3, name = "foo")
pl$Series(name = "foo", values = 1:3)
# polars 0.17.0 or later (future version)
# The first argument is `name`, the second argument is `values`.
pl$Series("foo", 1:3)
This warning can also be silenced by replacing pl$Series(<values>, <name>)
by as_polars_series(<values>, <name>)
.
Arguments removed:
columns
in $drop()
is removed. $drop()
now accepts
several character scalars, such as $drop("a", "b", "c")
(#912).pl$col()
, the name
argument is removed, and the ...
argument no
longer accepts a list of characters and RPolarsSeries
class objects (#923).pl$date_range()
, the unused argument (not working in recent versions)
explode
is removed. (#950).Change in arguments default and accepted values:
pl$Series()
, the argument values
has a new default value NULL
(#966).$unique()
for DataFrame
and LazyFrame
, argument keep
has a new
default value "any"
(#953).$rolling_mean()
), the default
value of argument closed
now is NULL
. Using closed
with a fixed
window_size
now throws an error (#937).pl$date_range()
, the argument end
must be specified and the default
value of interval
is changed to "1d"
. The arguments start
and end
no longer accept numeric values (#950).pl$scan_parquet()
, the default value of the argument rechunk
is
changed from TRUE
to FALSE
(#1033).pl$scan_parquet()
and pl$read_parquet()
, the argument parallel
only accepts "auto"
, "columns"
, "row_groups"
, and "none"
.
Previously, it also accepted upper-case notation of "auto"
, "columns"
,
"none"
, and "RowGroups"
instead of "row_groups"
(#1033).$str$to_integer()
(renamed from $str$parse_int()
), the default
value of base
is changed from 2
to 10
(#1038).The usage of pl$date_range()
to create a range of Datetime
data type is
deprecated. pl$date_range()
will always create a range of Date
data type
in the future. Use pl$datetime_range()
if you want to create a range of
Datetime
instead (#950).
<DataFrame>$get_columns()
now returns an unnamed list instead of a named
list (#991).
Removed $argsort()
which was an old alias for $arg_sort()
(#930).
Removed pl$expr_to_r()
which was an alias for $to_r()
(#938).
<Series>$to_r_list()
is renamed <Series>$to_list()
(#938).
Removed <Series>$to_r_vector()
which was an old alias for
<Series>$to_vector()
(#938).
Removed <Expr>$rep_extend()
, which was an experimental method created at the
early stage of this package and does not exist in other language APIs (#1028).
The following deprecated functions are now removed: pl$threadpool_size()
,
<DataFrame>$with_row_count()
, <LazyFrame>$with_row_count()
(#965).
In $group_by_dynamic()
, the first datapoint is always preserved (#1034).
$str$parse_int()
is renamed to $str$to_integer()
(#1038).
New functions:
pl$arg_sort_by()
(#929).pl$arg_where()
to get the indices that match a condition (#922).pl$datetime()
, pl$date()
, and pl$time()
to easily create Expr of class
datetime, date, and time via columns and literals (#918).pl$datetime_range()
, pl$date_ranges()
and pl$datetime_ranges()
(#950, #962).pl$int_range()
and pl$int_ranges()
(#968)pl$mean_horizontal()
(#959)pl$read_ipc()
(#1033).is_polars_dtype()
(#927).New methods:
<LazyFrame>$to_dot()
to print the query plan of a LazyFrame with graphviz
dot syntax (#928).$clear()
for DataFrame
, LazyFrame
, and Series
(#1004).$item()
for DataFrame
and Series
(#992).$select_seq()
and $with_columns_seq()
for DataFrame
and LazyFrame
(#1003).$arr$to_list()
(#1018).$str$extract_groups()
(#979).$str$find()
(#985).<DataFrame>$write_ipc()
(#1032).RPolarsDataType
gains several methods to check the datatype, such as
$is_integer()
, $is_null()
or $is_list()
(#1036).New arguments or argument values:
ambiguous
can now take the value "null"
to convert ambigous datetimes to
null values (#937).n
in $str$replace()
(#987).non_existent
in $dt$replace_time_zone()
to specify what should happen
when a datetime doesn't exist.mapping_strategy
in $over()
(#984, #988).raise_if_undetermined
in $meta$output_name()
(#961).null_on_oob
in $arr$get()
and $list$get()
to determine what happens
when the index is out of bounds (#1034).nulls_last
, multithreaded
, and maintain_order
in $sort_by()
(#1034).Other:
join_nulls
and validate
arguments of <DataFrame>$join()
now work
correctly (#945).row_count_*
args in I/O functions
were renamed row_index_*
, but this change was not made for CSV and IPC
functions. This renaming is now made (#964).Series
methods from Expr
inside functions now works correctly (#973).
Thanks @Yunuuuu for the report.extendr-api
is updated to 2024-03-31 unreleased version (#995).
The issue that the R session crashes when a panic occurs in the Rust side is resolved.
Thanks @CGMossa for the upstream fix.parallel
argument of pl$scan_parquet()
and pl$read_parquet()
now works
correctly (#1033). Previously, any correct value was treated as "auto"
.as_polars_df(<nanoarrow_array>)
is added (#893).DataFrame
with a specific schema
with pl$DataFrame(schema = my_schema)
(#901).dtype
and nan_to_null
for pl$Series()
(#902).<DataFrame>$partition_by()
(#898).format
of $str$strptime()
is now correctly set (#892).as_polars_df(<nanoarrow_array_stream>)
is improved (#896).$pivot()
, arguments aggregate_function
, maintain_order
,
sort_columns
and separator
must be named. Values that are passed
by position are ignored.$describe()
, the name of the first column changed from "describe"
to "statistic"
.$mod()
methods and %%
works correctly to guarantee
x == (x %% y) + y * (x %/% y)
.Removed as.list()
for class RPolarsExpr
as it is a simple wrapper around
list()
(#843).
Several functions have been rewritten to match the behavior of Python Polars.
pl$col(...)
requires at least one argument. (#852)pl$head()
, pl$tail()
, pl$count()
, pl$first()
, pl$last()
, pl$max()
,
pl$min()
, pl$mean()
, pl$media()
, pl$std()
, pl$sum()
, pl$var()
,
pl$n_unique()
, and pl$approx_n_unique()
are syntactic sugar for
pl$col(...)$<method()>
. The argument ...
now only accepts characters,
that are either column names or regular expressions (#852).pl$len()
. If you want to measure the length of
specific columns, you should use pl$count(...)
(#852).<Expr>$str$concat()
method's delimiter
argument's default value is
changed from "-"
to ""
(#853).<Expr>$str$concat()
method's ignore_nulls
argument must be a
named argument (#853).pl$Datetime()
's arguments are renamed: tu
to time_unit
,
and tz
to time_zone
(#887).pl$Categorical()
has been improved to allow specifying the ordering
type
(either lexical or physical). This also means that calling pl$Categorical
doesn't create a DataType
anymore. All calls to pl$Categorical
must be
replaced by pl$Categorical()
(#860).
<Series>$rem()
is removed. Use <Series>$mod()
instead (#886).
The conversion strategy between the POSIXct type without time zone attribute
and Polars datetime has been changed (#878).
POSIXct
class vectors without a time zone attribute have UTC time internally
and is displayed based on the system's time zone. Previous versions of polars
only considered the internal value and interpreted it as UTC time, so the
time displayed as POSIXct
and in Polars was different.
# polars 0.14.1
Sys.setenv(TZ = "Europe/Paris")
datetime = as.POSIXct("1900-01-01")
datetime
#> [1] "1900-01-01 PMT"
s = polars::as_polars_series(datetime)
s
#> polars Series: shape: (1,)
#> Series: '' [datetime[ms]]
#> [
#> 1899-12-31 23:50:39
#> ]
as.vector(s)
#> [1] "1900-01-01 PMT"
Now the internal value is updated to match the displayed value.
# polars 0.15.0
Sys.setenv(TZ = "Europe/Paris")
datetime = as.POSIXct("1900-01-01")
datetime
#> [1] "1900-01-01 PMT"
s = polars::as_polars_series(datetime)
s
#> polars Series: shape: (1,)
#> Series: '' [datetime[ms]]
#> [
#> 1900-01-01 00:00:00
#> ]
as.vector(s)
#> [1] "1900-01-01 PMT"
This update may cause errors when converting from Polars to POSIXct
for non-existent
or ambiguous times. It is recommended to explicitly add a time zone before converting
from Polars to R.
Sys.setenv(TZ = "America/New_York")
ambiguous_time = as.POSIXct("2020-11-01 01:00:00")
ambiguous_time
#> [1] "2020-11-01 01:00:00 EDT"
pls = polars::as_polars_series(ambiguous_time)
pls
#> polars Series: shape: (1,)
#> Series: '' [datetime[ms]]
#> [
#> 2020-11-01 01:00:00
#> ]
## This will be error!
# pls |> as.vector()
pls$dt$replace_time_zone("UTC") |> as.vector()
#> [1] "2020-11-01 01:00:00 UTC"
Removed argument eager
in pl$date_range()
and pl$struct()
for more
consistency of output. It is possible to replace eager = TRUE
by calling
$to_series()
(#882).
$otherwise()
is now optional,
as in Python Polars. If $otherwise()
is not specified, rows that don't respect
the condition set in $when()
will be filled with null
(#836).<DataFrame>$head()
and <DataFrame>$tail()
methods now support negative
row numbers (#840).$group_by()
now works with named expressions (#846).arr
subnamespace: $median()
, $var()
, $std()
,
$shift()
, $to_struct()
(#867).$min()
and max()
now work on categorical variables (#868).list
subnamespace: $n_unique()
, $gather_every()
(#869).clock_time_point
and clock_zoned_time
objects from
the {clock}
package to Polars datetime type (#861).name
subnamespace: $prefix_fields()
and
suffix_fields()
(#873).pl$Datetime()
's time_zone
argument now accepts "*"
to match
any time zone (#887).Expr
are now available for Series
, the
experimental <Series>$expr
subnamespace is removed (#831).
Use <Series>$<method>
instead of <Series>$expr$<method>
.$flags
for DataFrame
to show the flags used internally
for each column. The output of $flags
for Series
was also improved and now
contains FAST_EXPLODE
for Series
of type list
and array
(#809).Expr
methods are also available for Series
(#819, #828, #831).as_polars_df()
for data.frame
is more memory-efficient and new arguments
schema
and schema_overrides
are added (#817).polars_code_completion_activate()
to enable code suggestions and
autocompletion after $
on polars objects. This is an experimental feature
that is disabled by default. For now, it is only supported in the native R
terminal and in RStudio (#597).<Series>$list
sub namespace methods returns Series
class object correctly (#819).$with_row_count()
for DataFrame
and LazyFrame
is deprecated and
will be removed in 0.15.0. It is replaced by $with_row_index()
.pl$count()
is deprecated and will be removed in 0.15.0. It is replaced
by pl$len()
.$explode()
for DataFrame
and LazyFrame
doesn't work anymore on
string columns.$list$join()
and pl$concat_str()
gain an argument ignore_nulls
.
The current behavior is to return a null
if the row contains any null
.
Setting ignore_nulls = TRUE
changes that.row_count_*
args in reading/scanning functions are renamed
row_index_*
.$sort()
for Series
gains an argument nulls_last
.$str$extract()
and $str$zfill()
now accept an Expr
and parse
strings as column names. Use pl$lit()
to recover the old behavior.$cum_count()
now starts from 1 instead of 0.simd
feature of the Rust library is removed in favor of
the new nightly
feature (#800).
If you specified simd
via the LIBR_POLARS_FEATURES
environment variable
during source installations, please use nightly
instead;
there is no change if you specified full_features
because
it now contains nightly
instead of simd
.$list$lengths()
-> $list$len()
pl$from_arrow()
-> as_polars_df()
or as_polars_series()
pl$set_options()
and pl$reset_options()
-> polars_options()
$is_between()
had several changes (#788):
start
and end
are renamed lower_bound
and upper_bound
.
Their behaviour doesn't change.include_bounds
is renamed closed
and must be one of "left"
,
"right"
, "both"
, or "none"
.polars_info()
returns a slightly changed list.
$threadpool_size
, which means the number of threads used by Polars,
is changed to $thread_pool_size
(#784)$version
, which indicates the version of this package,
is changed to $versions$r_package
(#791).$rust_polars
, which indicates the version of the dependent Rust Polars,
is changed to $versions$rust_crate
(#791).DataFrame
with a single list-variable.
pl$DataFrame(x = list(1:2, 3:4))
used to create a DataFrame
with two
columns named "new_column" and "new_column_1", which was unexpected. It now
produces a DataFrame
with a single list
variable. This also applies to
list-column created in $with_columns()
and $select()
(#794).pl$threadpool_size()
is deprecated and will be removed in 0.15.0. Use
pl$thread_pool_size()
instead (#784).$arr
for expressions on array
-type
columns. An array
column is similar to a list
column, but is stricter as
each sub-array must have the same number of elements (#790).sql
feature is included in the default feature (#800).
This means that functionality related to the RPolarsSQLContext
class
is now always included in the binary package.$write_parquet()
for DataFrame (#758).as.data.frame()
for RPolarsDataFrame
and RPolarsLazyFrame
accepts more arguments of as_polars_df()
and <DataFrame>$to_data_frame()
(#762).arrow::as_arrow_table()
and arrow::as_record_batch_reader()
for
RPolarsDataFrame
no longer need the {nanoarrow}
package (#754).{nanoarrow}
package are added (#730).
as_polars_df(<nanoarrow_array_stream>)
as_polars_series(<nanoarrow_array>)
as_polars_series(<nanoarrow_array_stream>)
$sort()
no longer panicks when descending = NULL
(#748).downlit::autolink()
now recognize the reference pages of this package (#739).<Expr>$where()
is removed. Use <Expr>$filter()
instead (#718).<Expr>$apply()
and <Expr>$map()
, use $map_elements()
and
$map_batches()
instead.pl$polars_info()
, use polars_info()
instead.RPOLARS_PROFILE
is renamed to LIBR_POLARS_PROFILE
RPOLARS_FULL_FEATURES
is removed and LIBR_POLARS_FEATURES
is added.
To select the full_features
, set LIBR_POLARS_FEATURES="full_features"
.RPOLARS_RUST_SOURCE
, which was used for development, has been removed.
If you want to use library binaries located elsewhere, use LIBR_POLARS_PATH
instead.eager
argument of <SQLContext>$execute()
.
Use the $collect()
method after $execute()
or as_polars_df
to get the
result as a DataFrame
. (#719)name_generator
of $list$to_struct()
is renamed fields
(#724).[
for the $list
subnamespace is removed (#724).polars.df_print
has been renamed polars.df_knitr_print
(#726).$list$lengths()
is deprecated and will be removed in 0.14.0. Use
$list$len()
instead (#724).pl$from_arrow()
is deprecated and will be removed in 0.14.0.
Use as_polars_df()
or as_polars_series()
instead (#728).pl$set_options()
and pl$reset_options()
are deprecated and will be
removed in 0.14.0. See ?polars_options
for details (#726).POLARS_MAX_THREADS
is not set (#720).
To disable this behavior and have the maximum number of threads used automatically,
one of the following ways can be used:
disable_limit_max_threads
feature.polars.limit_max_threads
option to FALSE
with the options()
function
before loading the package.$rolling()
for DataFrame
and LazyFrame
. When this is
applied, it creates an object of class RPolarsRollingGroupBy
(#682, #694).$group_by_dynamic()
for DataFrame
and LazyFrame
. When this
is applied, it creates an object of class RPolarsDynamicGroupBy
(#691).$sink_ndjson()
for LazyFrame (#681).pl$duration()
to create a duration by components (week, day,
hour, etc.), and use them with date(time) variables (#692).$list$any()
and $list$all()
(#709).pl$from_epoch()
to convert a Unix timestamp to a date(time)
variable (#708).list
subnamespace: $set_union()
, $set_intersection()
,
$set_difference()
, $set_symmetric_difference()
(#712).int64_conversion
to specify how Int64 columns (that don't have
equivalent in base R) should be converted. This option can either be set
globally with pl$set_options()
or on a case-by-case basis, e.g with
$to_data_frame(int64_conversion =)
(#706).$join()
for DataFrame
and LazyFrame
(#716):
<LazyFrame>$join()
now errors if other
is not a LazyFrame
and
<DataFrame>$join()
errors if other
is not a DataFrame
.how
now comes before left_on
).
This can lead to bugs if the user didn't use argument names.how
now accepts "outer_coalesce"
to coalesce the join keys
automatically after joining.validate
to perform some checks on join keys (e.g ensure
that there is a one-to-one matching between join keys).join_nulls
to consider null
values as a valid key.<DataFrame>$describe()
now works with all datatypes. It also gains an
interpolation
argument that is used for quantiles computation (#717).as_polars_df()
and as_polars_series()
for the arrow
package classes have been
rewritten and work better (#727).options()
. The option names don't change but
they must be prefixed with "polars."
. For example, we can now pass
options(polars.strictly_immutable = FALSE)
.polars_options()
, which returns a named
list (this is the replacement of pl$options
).polars_options_reset()
(this is the
replacement of pl$reset_options()
).polars_envvars()
to print the list of environment variables
related to polars (#735).This is a small release including a few documentation improvements and internal updates.
This version includes a few additional features and a large amount of documentation improvements.
pl$polars_info()
is moved to polars_info()
. pl$polars_info()
is deprecated
and will be removed in 0.13.0 (#662).pl$Utf8
is replaced by pl$String
.
pl$Utf8
is an alias and will keep working, but pl$String
is now preferred
in the documentation and in new code.$str$reverse()
, $str$contains_any()
, and $str$replace_many()
(#641).$rle()
and $rle_id()
(#648).is_polars_df()
, is_polars_lf()
, is_polars_series()
(#658).$gather()
now accepts negative indexing (#659).Makefile
in favor of Taskfile.yml
.
Please use task
instead of make
as a task runner in the development (#654).pl$scan_csv()
and pl$read_csv()
's comment_char
argument is renamed comment_prefix
.<DataFrame>$frame_equal()
and <Series>$series_equal()
are renamed
to <DataFrame>$equals()
and <Series>$equals()
.<Expr>$rolling_*
functions gained an argument warn_if_unsorted
.<Expr>$str$json_extract()
is renamed to <Expr>$str$json_decode()
.null
values.count
now ignores null values.NaN
values are now considered equal.$gather_every()
gained an argument offset
.$apply()
on an Expr or a Series is renamed $map_elements()
, and $map()
is renamed $map_batches()
. $map()
and $apply()
will be removed in 0.13.0 (#534).$days()
, $hours()
, $minutes()
, $seconds()
, $milliseconds()
,
$microseconds()
, $nanoseconds()
. Those were deprecated in 0.11.0 (#550).pl$concat_list()
: elements being strings are now interpreted as column names.
Use pl$lit
to concat with a string.<RPolarsExpr>$lit_to_s()
is renamed to <RPolarsExpr>$to_series()
(#582).<RPolarsExpr>$lit_to_df()
is removed (#582).DataFrame
, LazyFrame
,
Expr
, Series
, etc.) has changed. They now start with RPolars
, for example
RPolarsDataFrame
. This will only break your code if you directly use those
class names, such as in S3 methods (#554, #585).RPolars
prefix (#584).[
) for DataFrame can use columns not included in the
result for filtering (#547).[
) for LazyFrame can filter rows with Expressions (#547).as_polars_df()
for data.frame
has a new argument rownames
for to convert
the row.names attribute to a column.
This option is inspired by the tibble::as_tibble()
function (#561).as_polars_df()
for data.frame
has a new argument make_names_unique
(#561).$str$to_date()
, $str$to_time()
, $str$to_datetime()
as
alternatives to $str$strptime()
(#558).dim()
function for DataFrame and LazyFrame correctly returns integer instead of
double (#577).POSIXct
class to Polars datetime now works correctly with millisecond
precision (#589).<LazyFrame>$filter()
, <DataFrame>$filter()
, and pl$when()
now allow multiple conditions
to be separated by commas, like lf$filter(pl$col("foo") == 1, pl$col("bar") != 2)
(#598).$replace()
for expressions (#601).pl$DataFrame()$select("a",)
(#607).pl$threadpool_size()
to get the number of threads used by Polars (#620).
Thread pool size is also included in the output of pl$polars_info()
.$write_csv()
and sink_csv()
: has_header
is renamed
include_header
and there's a new argument include_bom
.pl$cov()
gains a ddof
argument.$cumsum()
, $cumprod()
, $cummin()
, $cummax()
, $cumcount()
are
renamed $cum_sum()
, $cum_prod()
, $cum_min()
, $cum_max()
,
$cum_count()
.take()
and take_every()
are renamed $gather()
and gather_every()
.$shift()
and $shift_and_fill()
now accept Expr as input.reverse = TRUE
, $arg_sort()
now places null values in the first
positions.ambiguous
in $dt$truncate()
and $dt$round()
.$str$concat()
gains an argument ignore_nulls
.pl$min()
, pl$max()
,
and pl$sum()
is deprecated and will be removed in 0.12.0. Passing several
columns to these functions will now compute the min/max/sum in each column
separately. Use pl$min_horizontal()
pl$max_horizontal()
, and
pl$sum_horizontal()
instead for rowwise computation (#508).$is_not()
is deprecated and will be removed in 0.12.0. Use $not()
instead
(#511, #531).$is_first()
is deprecated and will be removed in 0.12.0. Use $is_first_distinct()
instead (#531).pl$concat()
, the argument to_supertypes
is removed. Use the suffix
"_relaxed"
in the how
argument to cast columns to their shared supertypes
(#523).days()
, hours()
, minutes()
, seconds()
,
milliseconds()
, microseconds()
, nanoseconds()
) are renamed, for example
from $dt$days()
to $dt$total_days()
. The old usage is deprecated and will
be removed in 0.12.0 (#530).$as_data_frame()
is removed in favor of $to_data_frame()
(#533).$as_data_frame()
and $to_data_frame()
which were used to
convert GroupBy objects to R data frames are removed.
Use $ungroup()
method and the as.data.frame()
function instead (#533).$write_json()
and $write_ndjson()
for DataFrame (#502).name
in pl$date_range()
, which was deprecated for a while
(#503)..pr$DataFrame$drop_all_in_place(df)
to drop DataFrame
in-place, to release memory without invoking gc(). However, if there are other
strong references to any of the underlying Series or arrow arrays, that memory
will specifically not be released. This method is aimed for r-polars extensions,
and will be kept stable as much as possible (#504).pl$min_horizontal()
, pl$max_horizontal()
, pl$sum_horizontal()
,
pl$all_horizontal()
, pl$any_horizontal()
(#508).as_polars_df()
and as_polars_lf()
to create polars
DataFrames and LazyFrames (#519).$ungroup()
for GroupBy
and LazyGroupBy
(#522).$rolling()
to apply an Expr over a rolling window based on
date/datetime/numeric indices (#470).$name$to_lowercase()
and $name$to_uppercase()
to transform
variable names (#529).$is_last_distinct()
(#531).$floor_div()
, $mod()
, $eq_missing()
and $neq_missing()
. The base R operators %/%
and %%
for Expressions are
now translated to $floor_div()
and $mod()
(#523).
$mod()
of Polars is different from the R operator %%
, which is
not guaranteed x == (x %% y) + y * (x %/% y)
.
Please check the upstream issue pola-rs/polars#10570.[
) for polars objects now behave more like for base R objects (#543).quote_style
in $write_csv()
and $sink_csv()
can now take
the value "never"
(#483).pl$DataFrame()
now errors if the variables specified in schema
do not exist
in the data (#486).pl$SQLContext()$register()
without load the package was fixed (#496)."name"
that contains methods $prefix()
, $suffix()
keep()
(renamed from keep_name()
) and map()
(renamed from map_alias()
).$dt$round()
gains an argument ambiguous
.Expr
as input: $top_k()
, $bottom_k()
,
$list$join()
, $str$strip_chars()
, $str$strip_chars_start()
,
$str$strip_chars_end()
, $str$split_exact()
.$str$n_chars()
-> $str$len_chars()
$str$lengths()
-> $str$len_bytes()
$str$ljust()
-> $str$pad_end()
$str$rjust()
-> $str$pad_start()
$concat()
with how = "diagonal"
now accepts an argument to_supertypes
to automatically convert concatenated columns to the same type.pl$enable_string_cache()
doesn't take any argument anymore. The string cache
can now be disabled with pl$disable_string_cache()
.$scan_parquet()
gains an argument hive_partitioning
.$meta$tree_format()
has a better formatted output.$scan_csv()
and $read_csv()
now match more closely the Python-Polars API (#455):
sep
is renamed separator
, overwrite_dtypes
is renamed dtypes
,
parse_dates
is renamed try_parse_dates
.rechunk
, eol_char
, raise_if_empty
, truncate_ragged_lines
path
can now be a vector of characters indicating several paths to CSV files.
This only works if all CSV files have the same schema.RPolarsSQLContext
and its methods to perform SQL queries on DataFrame-
like objects. To use this feature, needs to build Rust library with full features
(#457).$peak_min()
and $peak_max()
to find local minima and maxima in
an Expr (#462).$read_ndjson()
and $scan_ndjson()
(#471).$with_context()
for LazyFrame
to have access to columns from
other Data/LazyFrames during the computation (#475).use_earliest
is replaced by ambiguous
.$sample()
and $shuffle()
, the argument fixed_seed
is removed.$value_counts()
, the arguments multithreaded
and sort
(sometimes called sorted
) have been swapped and renamed sort
and parallel
.$str$count_match()
gains a literal
argument.$arg_min()
doesn't consider NA
as the minimum anymore (this was already the behavior of $min()
).$is_in()
with NA
on both sides now returns NA
and not TRUE
anymore.pattern
of $str$count_matches()
can now use expressions.nightly-2023-08-26
for to build with full features.pl$options
,
pl$set_options()
and pl$reset_options()
(#384).Bump supported R version to 4.2 or later (#435).
pl$concat()
now also supports Series
, Expr
and LazyFrame
(#407).
New method $unnest()
for LazyFrame
(#397).
New method $sample()
for DataFrame
(#399).
New method $meta$tree_format()
to display an Expr
as a tree (#401).
New argument schema
in pl$DataFrame()
and pl$LazyFrame()
to override the
automatic type detection (#385).
Fix bug when calling R from polars via e.g. $map()
where query would not
complete in one edge case (#409).
New method $cat$get_categories()
to list unique values of categorical
variables (#412).
New methods $fold()
and $reduce()
to apply an R function rowwise (#403).
New function pl$raw_list
and class rpolars_raw_list
a list of R Raw's, where missing is
encoded as NULL
to aid conversion to polars binary Series. Support back and forth conversion
from polars binary literal and Series to R raw (#417).
New method $write_csv()
for DataFrame
(#414).
New method $sink_csv()
for LazyFrame
(#432).
New method $dt$time()
to extract the time from a datetime
variable (#428).
Method $profile()
gains optimization arguments and plot-related arguments (#429).
New method pl$read_parquet()
that is a shortcut for pl$scan_parquet()$collect()
(#434).
Rename $str$str_explode()
to $str$explode()
(#436).
New method $transpose()
for DataFrame
(#440).
New argument eager
of LazyFrame$set_optimization_toggle()
(#439).
{polars}
can now be installed with "R source package with Rust library binary",
by a mechanism copied from the prqlr package.
Sys.setenv(NOT_CRAN = "true")
install.packages("polars", repos = "https://rpolars.r-universe.dev")
The URL and SHA256 hash of the available binaries are recorded in tools/lib-sums.tsv
.
(#435, #448, #450, #451)
to_titlecase()
(#371).strip = true
was not actually set for the
"release-optimized" compilation profile. Now it is, but the binary sizes seems unchanged (#377).polars
performance (#188).<Expr>$arr$
& <Series>$arr$
is deprecated
in favor of "list". The subnamespace "arr" will be removed in polars 0.9.0 (#375).rust-polars was updated to 0.32.0, which comes with many breaking changes and new features. Unrelated breaking changes and new features are put in separate sections (#334):
common_subplan_elimination = TRUE
in <LazyFrame>
methods $collect()
,
$sink_ipc()
and $sink_parquet()
is renamed and split into
comm_subplan_elim = TRUE
and comm_subexpr_elim = TRUE
.when-then-otherwise
classes are renamed to When
, Then
, ChainedWhen
and ChainedThen
. The syntactically illegal methods have been removed, e.g.
chaining $when()
twice.profile=release-optimized
,
which now includes strip=false
, lto=fat
& codegen-units=1
. This should
make the binary a bit smaller and faster. See also FULL_FEATURES=true
env
flag to enable simd with nightly rust. For development or faster compilation,
use instead profile=release
.fmt
arg is renamed format
in pl$Ptimes
and <Expr>$str$strptime
.<Expr>$approx_unique()
changed name to <Expr>$approx_n_unique()
.<Expr>$str$json_extract
arg pat
changed to dtype
and has a new argument
infer_schema_length = 100
.pl$date_range()
have changed: low
-> start
,
high
-> end
, lazy = TRUE
-> eager = FALSE
. Args time_zone
and time_unit
can no longer be used to implicitly cast time types. These two args can only
be used to annotate a naive time unit. Mixing time_zone
and time_unit
for
start
and end
is not allowed anymore.<Expr>$is_in()
operation no longer supported for dtype null
.(pl$lit(NA_real_) == pl$lit(NA_real_))$lit_to_s()
renders now to null
not true
.pl$lit(NA_real_)$is_in(pl$lit(NULL))$lit_to_s()
renders now to false
and before true
pl$lit(numeric(0))$sum()$lit_to_s()
now yields 0f64
and not null
.<Expr>$all()
and <Expr>$any()
have a new arg drop_nulls = TRUE
.<Expr>$sample()
and <Expr>$shuffle()
have a new arg fix_seed
.<DataFrame>$sort()
and <LazyFrame>$sort()
have a new arg
maintain_order = FALSE
.$rpow()
is removed. It should never have been translated. Use ^
and $pow()
instead (#346).<LazyFrame>$collect_background()
renamed <LazyFrame>$collect_in_background()
and reworked. Likewise PolarsBackgroundHandle
reworked and renamed to
RThreadHandle
(#311).pl$scan_arrow_ipc
is now called pl$scan_ipc
(#343).pl$sink_ipc()
and pl$sink_parquet()
(#343)$explode()
for DataFrame
and LazyFrame
(#314).$clone()
for LazyFrame
(#347).$fetch()
for LazyFrame
(#319).$optimization_toggle()
and $profile()
for LazyFrame
(#323).$with_column()
is now deprecated (following upstream polars
). It will be
removed in 0.9.0. It should be replaced with $with_columns()
(#313).concat_str()
to concatenate several columns
into one (#349).pl$cov()
, pl$rolling_cov()
pl$corr()
, pl$rolling_corr()
(#351).pl$set_global_rpool_cap()
, pl$get_global_rpool_cap()
, class RThreadHandle
and
in_background = FALSE
param to <Expr>$map()
and $apply()
. It is now possible to run R code
with <LazyFrame>collect_in_background()
and/or let polars parallize R code in an R processes
pool. See RThreadHandle-class
in reference docs for more info. (#311)Cargo build --features "full_features"
which is not exactly the same
as Cargo build --all-features
. Some dev features are not included in "full_features" (#311).<LazyFrame>$optimization_toggle()
+ $profile()
and enable rust-polars feature
CSE: "Activate common subplan elimination optimization" (#323)pl$select(newname = pl$lit(2))
are no longer experimental
and allowed as default (#357).pl$enable_string_cache()
, pl$with_string_cache()
and pl$using_string_cache()
for joining/comparing Categorical series/columns (#361).as_polars_series()
where users or developers of extensions
can define a custom way to convert their format to Polars format. This generic
must return a Polars series. See #368 for an example (#369).reverse
by descending
in all sorting functions. This
is for consistency with the upstream Polars (#291, #293).concat_lst
to concat_list
.$str$explode
to $str$str_explode
.tz_aware
and utc
arguments from str_parse
.$date_range
's the lazy
argument is now TRUE
by default.scan_csv
and read_csv
for
consistency with the upstream Polars. scan_xxx
and read_xxx
functions are now accessed via pl
,
e.g. pl$scan_csv()
(#305).$rename()
for LazyFrame
and DataFrame
(#239)<DataFrame>$unique()
and <LazyFrame>$unique()
gain a maintain_order
argument (#238).pl$LazyFrame()
to quickly create a LazyFrame
, mostly in examples or
for demonstration purposes (#240).RPolarsErr
both on rust- and R-side. Final error messages should look very similar (#233).$columns()
, $schema()
, $dtypes()
for LazyFrame
implemented (#250).RPolarsErr
. Also RPolarsErr
will now print each context of the error on a separate line (#250).%
bug. Prepare for renaming of polars classes (#252).polars.github.io/reference_home
(#223, #264).simd
feature is now disabled by default. To enable it, set the environment variable
RPOLARS_ALL_FEATURES
to true
when build r-polars (#262).opt-level
of argminmax
is now set to 1
in the release
profile to support Rust < 1.66.
The profile can be changed by setting the environment variable RPOLARS_PROFILE
(when set to release-optimized
,
opt-level
of argminmax
is set to 3
).pl$polars_info()
will tell which features enabled (#271, #285, #305).select()
now accepts lists of expressions. For example, <DataFrame>$select(l_expr)
works with l_expr = list(pl$col("a"))
(#265).[
, dim()
, dimnames()
, length()
, names()
(#301)<DataFrame>$glimpse()
is a fast str()
-like view of a DataFrame
(#277).$over()
now accepts a vector of column names (#287).<DataFrame>$describe()
(#268).how = "cross"
in $join()
(#310).LICENSE.note
(#309).pl$set_polars_options(debug_polars = TRUE)
to profile/debug method-calls of a polars query (#193)<DataFrame>$melt(), <DataFrame>$pivot() + <LazyFrame>$melt()
methods (#232)pl$implode
, pl$explode
, pl$unique
, pl$approx_unique
, pl$head
, pl$tail
(#196)pl$list
is deprecated, use pl$implode
instead. (#196)top_k
's reverse
option is removed. Use the new bottom_k
method instead.fmt
argument of some methods (e.g. parse_date
) has been changed to format
.DataFrame
objects can be subsetted using brackets like standard R data frames: pl$DataFrame(mtcars)[2:4, c("mpg", "hp")]
(#140 @vincentarelbundock)knit_print()
method has been added to DataFrame that outputs HTML tables
(similar to py-polars' HTML output) (#125 @eitsupi)Series
gains new methods: $mean
, $median
, $std
, $var
(#170 @vincentarelbundock)use_earliest
of replace_time_zone
. (#183)strict
of parse_int
. (#183)join_asof
. (#172)rpolars
to polars
. (#84)Release date: 2023-04-16. Full changelog: v0.4.6...v0.5.0