HomeMogDBMogDB StackUqbar

Documentation:v2.0

Supported Versions:

Other Versions:

percentile

Function Description

The percentile operator supports time-series/non-time-series table quantile calculations for returning the value at the Nth percentile after sorting the data.

On the basis of sorting percentile_cont provides the function of determining the nearest neighbor elements by quartile and interpolating the output according to the quartile ratio. percentile_disc does not interpolate and outputs the raw data directly.

For example:

CREATE TIMESERIES TABLE weather(time timestamp TSTIME, city text TSTAG,province text TSTAG, temperature float) ;

There are 5 time series data as shown below:

time 00:00:00 00:01:00 00:15:00 00:36:00 00:45:00
value 10.5 20.2 30.7 40.1 50.9

The available values correspond to quartiles of 0, 0.25, 0.5, 0.75 and 1, respectively.

Now calculate the value of each quartile, the results of the calculation are as follows:

When using interpolation, use the aggregation function name: percentile_cont

Percentile_cont 0 0.5 0.8 1
Value 10.5 30.7 42.26 50.9

When interpolation is not used, the aggregation function name is used: percentile_disc

Percentile_disc 0 0.5 0.8 1
Value 10.5 30.7 40.1 50.9

The differences between these two operators are as follows:

  • percentile_disc is a discrete percentile: returns the first input value that equals or exceeds the specified score in the sort position.

  • percentile_cont is a consecutive percentile: returns a value corresponding to the score specified in the sort, inserted between neighboring input items if desired.

In the above example, when the quantile is 0.8, the original value corresponding to it does not exist, and the use of percentile_cont requires interpolation of the result, which should be done in a manner consistent with PostgreSQL.

Interpolation algorithm:

(hi-lo)*((row_num-1)*percentile – floor((row_num-1)*percentile))

In the above example, hi=50.9, lo=40.1, row_num=5, percentile=0.8, so the interpolation result is 10.8*0.2+40.1=42.26.

When using percentile_disc it does not interpolate, but returns the first value after skipping the specified row of data:

skip_row_num = ceil(row_num * percentile) - 1

The sorted rows are skipped only if the value of this calculation is greater than 1, to satisfy the need to return the first value without doing a skip with percentile=0.

Both percentile_disc and percentile_cont support ARRAY as input to process multiple percentiles in a single pass and return array results. This function compared to multiple executions of the percentile operator, only need to sort once to get all the output, both user-friendly and reduce the resource consumption of repeated scanning and sorting.

Syntax

-- interpolation
percentile_cont ( fraction double precision ) WITHIN GROUP ( ORDER BY double precision )double precision
percentile_cont ( fraction double precision ) WITHIN GROUP ( ORDER BY interval )interval

-- Interpolated ARRAY
percentile_cont ( fractions double precision[] ) WITHIN GROUP ( ORDER BY double precision )double precision[]
percentile_cont ( fractions double precision[] ) WITHIN GROUP ( ORDER BY interval )interval[]

-- noninterpolation
percentile_disc ( fraction double precision ) WITHIN GROUP ( ORDER BY anyelement ) → anyelement

-- non-interpolated ARRAY
percentile_disc ( fractions double precision[] ) WITHIN GROUP ( ORDER BY anyelement ) → anyarray

Parameter Description

Parameter Name Type Mandatory or Not Description
fractions FLOAT8OID Yes Numerical representation of quartiles, e.g., 0.5 for the median
sort_expression FLOAT8OID or INTERVALOID Yes Columns that need to be sorted and quartiles obtained, only numeric values and time intervals are supported for column types that require interpolation scenarios.
sort_expression Anyelement Yes Columns that need to be sorted and get quartiles, for scenarios that don't require interpolation, column types supported are timestamp, timestamptz, date, time, timetz, numeric, float4, float8.

Constraints

  1. Percentile_cont that requires interpolation only supports sortable Float and Interval types. 2.

  2. the percentile_disc without interpolation supports sortable types such as: timestamp, timestamptz, date, time, timetz, numeric, float4, float8.

  3. Sequential aggregation functions need to specify WITHIN GROUP (ORDER BY sort_expression).

Examples

Uqbar=# SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY value) AS median
FROM float_data;
 median
--------
   30.7
(1 row)

Uqbar=# SELECT percentile_cont(0.8) WITHIN GROUP (ORDER BY value)
FROM float_data;
 percentile_cont
-----------------
           42.26
(1 row)

Uqbar=# SELECT percentile_disc(1.5) WITHIN GROUP (ORDER BY value) FROM float_data;
ERROR:  percentile value 1.5 is not between 0 and 1

Uqbar=# SELECT percentile_disc(-1) WITHIN GROUP (ORDER BY value) FROM float_data;
ERROR:  percentile value -1 is not between 0 and 1

Uqbar=# SELECT percentile_cont(ARRAY[0,0.5,0.8,1]) WITHIN GROUP (ORDER BY duration) AS median FROM interval_data;
                         median
--------------------------------------------------------
 {"1 day 06:00:00","2 days","2 days 09:36:00","3 days"}
(1 row)

Uqbar=# SELECT percentile_disc(ARRAY[0,0.5,0.8,1]) WITHIN GROUP (ORDER BY duration) AS median FROM interval_data;
                         median
--------------------------------------------------------
 {"1 day 06:00:00","2 days","2 days 06:00:00","3 days"}
(1 row)
Copyright © 2011-2024 www.enmotech.com All rights reserved.