I have a BigQuery table that looks like this and that can't be modified:
|Country|Customer|Number of connections|Number of purchases|Country Metric 1|Country Metric 2
|Brazil|A|10|1|3|1000
|Brazil|B|90|5|3|1000
|Brazil|C|80|2|3|1000
|Namibia|B|20|1|5|2000
|Namibia|C|150|2|5|2000
About this table, please note that:
- Each combination of Country-Customer is unique.
- The country metrics, as their names suggest, only depend on the country.
- For some countries, some metrics are not available (`NULL` in the table).
- For some combinations Country-Customer, the number of connections/purchases are not available
I would like to obtain,
in the same query, the following information:
- The mean of Country Metric 1 only taking into account combinations of Country-Customer having at least a number of purchases higher or equal than 2. In the example table, there are 3 combinations: Brazil-B, Brazil-C and Namibia-C. The mean should take into account Brazil only once so the result is `(3 + 5) / 2 = 4`.
- The mean of Country Metric 2 only taking into account combinations of Country-Customer having at least a number of connections higher than 100. There is only one combination which meets this criterion in the example table: Namibia-C. Thus, the expected result is 2000.
Those are just examples but there can be more metrics and other aggregations (sum, min, max, count...) but they should be very similar.
What I have tried:
SELECT AVG(IF(purchases > 2, country_metric_1, NULL)),
AVG(IF(connections > 100, country_metric_2, NULL))
FROM table
Issue: if the same country appears in multiple combinations, the same metric is taken into account multiple times.
SELECT AVG(IF(purchases > 2, country_metric_1_p, NULL)),
AVG(IF(connections > 100, country_metric_2_p, NULL))
FROM (SELECT purchases,
connections,
IF(ROW_NUMBER() OVER (PARTITION BY country) = 1, country_metric_1, NULL) country_metric_1_p
IF(ROW_NUMBER() OVER (PARTITION BY country) = 1, country_metric_2, NULL) country_metric_2_p
FROM table)
Issue: for each country, only one combination is taken into account giving lower and random results...