许多人体验到了云数据仓库内集中计算的个最速度和效率优势,但同时大家也认识到这种方法存在一些缺点。任热编缺点之一是据缺需要学习和执行不同语言(尤其是SQL)的查询,这增加了很多复杂性。失透视独
为了解决这个问题,个最本文分享一些最难以在SQL中学习和执行的任热编具体转换,并提供实际所需的据缺SQL代码。

1. Datespine
Datespine是失透视独生成日期索引的转换。
想象下,个最您正在分析每日销售数据,任热编表格如下:

因16日和17日没有销售,据缺对应的失透视独行缺失。如果想计算平均每日销售额或构建时间序列预测模型,个最那这种格式是任热编个问题,我们需要插入缺失日期的据缺行。
基本概念:
生成或选择唯一日期生成或选择唯一产品交叉连接(笛卡尔积)1&2的所有组合将第3步的结果与原始数据进行外连接                            复制WITH GLOBAL_SPINE AS (                        SELECT                        ROW_NUMBER() OVER (                        ORDER BY                        NULL                        ) as INTERVAL_ID,                        DATEADD(                        day,                        (INTERVAL_ID - 1),                        2020-01-01T00:00 :: timestamp_ntz                        ) as SPINE_START,                        DATEADD(                        day, INTERVAL_ID, 2020-01-01T00:00 :: timestamp_ntz                        ) as SPINE_END                        FROM                        TABLE (                        GENERATOR(ROWCOUNT => 1097)                        )                        ),                        GROUPS AS (                        SELECT                        product,                        MIN(sales_date) AS LOCAL_START,                        MAX(sales_date) AS LOCAL_END                        FROM                        My_First_Table                        GROUP BY                        product                        ),                        GROUP_SPINE AS (                        SELECT                        product,                        SPINE_START AS GROUP_START,                        SPINE_END AS GROUP_END                        FROM                        GROUPS G CROSS                        JOIN LATERAL (                        SELECT                        SPINE_START,                        SPINE_END                        FROM                        GLOBAL_SPINE S                        WHERE                        S.SPINE_START >= G.LOCAL_START                        )                        )                        SELECT                        G.product AS GROUP_BY_product,                        GROUP_START,                        GROUP_END,                        T.*                        FROM                        GROUP_SPINE G                        LEFT JOIN My_First_Table T ON sales_date >= G.GROUP_START                        AND sales_date < G.GROUP_END                        AND G.product = T.product;1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.52.53.54.55.56.                                                                
最终结果如下所示:

2. Pivot
Pivot是一种将行数据转换为列数据的操作,以便更好地进行分析和可视化。
有时,在进行分析时,免费源码下载您希望重新构造表格。例如,有个罗列学生、科目和成绩表格,我们想把具体科目分解为每个列。
之前:

复制SELECT Student, MATHEMATICS, GEOGRAPHY, PHYS_ED                        FROM ( SELECT Student, Grade, Subject FROM skool)                        PIVOT ( AVG ( Grade ) FOR Subject IN ( Mathematics, Geography, Phys Ed ) ) as p                        ( Student, MATHEMATICS, GEOGRAPHY, PHYS_ED );1.2.3.4.                                                                
结果:

3. One-Hot编码(或“虚拟”变量)
One-Hot编码是一种将分类变量转换为数字变量的方法。
这里是一个使用STATE作为列进行独热编码的示例。
之前:

复制SELECT *,                        CASE WHEN State = AL THEN 1 ELSE 0 END as STATE_AL,                        CASE WHEN State = AK THEN 1 ELSE 0 END as STATE_AK,                        CASE WHEN State = AZ THEN 1 ELSE 0 END as STATE_AZ,                        CASE WHEN State = AR THEN 1 ELSE 0 END as STATE_AR,                        CASE WHEN State = AS THEN 1 ELSE 0 END as STATE_AS,                        CASE WHEN State = CA THEN 1 ELSE 0 END as STATE_CA,                        CASE WHEN State = CO THEN 1 ELSE 0 END as STATE_CO,                        CASE WHEN State = CT THEN 1 ELSE 0 END as STATE_CT,                        CASE WHEN State = DC THEN 1 ELSE 0 END as STATE_DC,                        CASE WHEN State = FL THEN 1 ELSE 0 END as STATE_FL,                        CASE WHEN State = GA THEN 1 ELSE 0 END as STATE_GA,                        CASE WHEN State = HI THEN 1 ELSE 0 END as STATE_HI,                        CASE WHEN State = ID THEN 1 ELSE 0 END as STATE_ID,                        CASE WHEN State = IL THEN 1 ELSE 0 END as STATE_IL,                        CASE WHEN State = IN THEN 1 ELSE 0 END as STATE_IN,                        CASE WHEN State = IA THEN 1 ELSE 0 END as STATE_IA,                        CASE WHEN State = KS THEN 1 ELSE 0 END as STATE_KS,                        CASE WHEN State = KY THEN 1 ELSE 0 END as STATE_KY,                        CASE WHEN State = LA THEN 1 ELSE 0 END as STATE_LA,                        CASE WHEN State = ME THEN 1 ELSE 0 END as STATE_ME,                        CASE WHEN State = MD THEN 1 ELSE 0 END as STATE_MD,                        CASE WHEN State = MA THEN 1 ELSE 0 END as STATE_MA,                        CASE WHEN State = MI THEN 1 ELSE 0 END as STATE_MI,                        CASE WHEN State = MN THEN 1 ELSE 0 END as STATE_MN,                        CASE WHEN State = MS THEN 1 ELSE 0 END as STATE_MS,                        CASE WHEN State = MO THEN 1 ELSE 0 END as STATE_MO,                        CASE WHEN State = MT THEN 1 ELSE 0 END as STATE_MT,                        CASE WHEN State = NE THEN 1 ELSE 0 END as STATE_NE,                        CASE WHEN State = NV THEN 1 ELSE 0 END as STATE_NV,                        CASE WHEN State = NH THEN 1 ELSE 0 END as STATE_NH,                        CASE WHEN State = NJ THEN 1 ELSE 0 END as STATE_NJ,                        CASE WHEN State = NM THEN 1 ELSE 0 END as STATE_NM,                        CASE WHEN State = NY THEN 1 ELSE 0 END as STATE_NY,                        CASE WHEN State = NC THEN 1 ELSE 0 END as STATE_NC,                        CASE WHEN State = ND THEN 1 ELSE 0 END as STATE_ND,                        CASE WHEN State = OH THEN 1 ELSE 0 END as STATE_OH,                        CASE WHEN State = OK THEN 1 ELSE 0 END as STATE_OK,                        CASE WHEN State = OR THEN 1 ELSE 0 END as STATE_OR,                        CASE WHEN State = PA THEN 1 ELSE 0 END as STATE_PA,                        CASE WHEN State = RI THEN 1 ELSE 0 END as STATE_RI,                        CASE WHEN State = SC THEN 1 ELSE 0 END as STATE_SC,                        CASE WHEN State = SD THEN 1 ELSE 0 END as STATE_SD,                        CASE WHEN State = TN THEN 1 ELSE 0 END as STATE_TN,                        CASE WHEN State = TX THEN 1 ELSE 0 END as STATE_TX,                        CASE WHEN State = UT THEN 1 ELSE 0 END as STATE_UT,                        CASE WHEN State = VT THEN 1 ELSE 0 END as STATE_VT,                        CASE WHEN State = VA THEN 1 ELSE 0 END as STATE_VA,                        CASE WHEN State = WA THEN 1 ELSE 0 END as STATE_WA,                        CASE WHEN State = WV THEN 1 ELSE 0 END as STATE_WV,                        CASE WHEN State = WI THEN 1 ELSE 0 END as STATE_WI,                        CASE WHEN State = WY THEN 1 ELSE 0 END as STATE_WY                        FROM BABYTABLE;1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.52.53.                                                                
结果:

4. 市场篮子分析
市场篮子分析是购物篮分析和挖掘关联规则的一种方法。在这个过程中,首先需要对数据进行格式化,以便将每笔交易聚合到单个记录中。对于个人电脑来说,这一步骤可能很具挑战性,因为它涉及数据处理和转换。然而,数据仓库专为高效处理这些数据而设计,因此它更适合执行市场篮子分析所需的数据格式化任务。数据仓库提供了便捷的功能,使数据格式化更加容易,从而支持购物篮分析和关联规则的挖掘。
复制WITH order_detail as (                        SELECT                        SALESORDERNUMBER,                        listagg(ENGLISHPRODUCTNAME, , ) WITHIN group (                        order by                        ENGLISHPRODUCTNAME                        ) as ENGLISHPRODUCTNAME_listagg,                        COUNT(ENGLISHPRODUCTNAME) as num_products                        FROM                        transactions                        GROUP BY                        SALESORDERNUMBER                        )                        SELECT                        ENGLISHPRODUCTNAME_listagg,                        count(SALESORDERNUMBER) as NumTransactions                        FROM                        order_detail                        where                        num_products > 1                        GROUP BY                        ENGLISHPRODUCTNAME_listagg                        order by                        count(SALESORDERNUMBER) desc;1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.                                                                
结果:

5. 时间序列聚合
时间序列聚合是指将时间序列数据按照一定的时间间隔进行汇总和聚合,广泛用语分析数据。b2b信息网然而,要正确执行时间序列聚合,关键因素之一是需要对数据进行适当的格式化,尤其是在使用窗口函数时。这一步骤的正确处理对于获得准确且有意义的聚合结果至关重要。
例如,假设我们想计算过去14天的平均销售额。使用窗口函数的方法要求我们将所有销售数据转换为每天一行的格式。然而,实际上,大多数销售数据是以交易级别存储的,这让格式转换变得困难。这就是时间序列聚合发挥作用的地方。通过时间序列聚合,我们可以创建历史指标的聚合结果,无需重新格式化整个数据集。如果我们想一次添加多个指标,它也会派上用场:
过去14天内的平均销售额过去6个月中最大的购买统计过去 90 天内的
亿华云计算不同产品类型
如果您想使用窗口函数,则需要通过几个步骤独立构建每个指标。
处理此问题的更好方法是使用公共表表达式 (CTE) 来定义每个预先聚合的历史窗口。
例如:

复制WITH BASIC_OFFSET_14DAY AS (                        SELECT                        A.CustomerID,                        A.TransactionDate,                        AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST14DAY,                        MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST14DAY,                        COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST14DAY                        FROM                        My_First_Table A                        INNER JOIN My_First_Table B ON A.CustomerID = B.CustomerID                        AND 1 = 1                        WHERE                        B.TransactionDate >= DATEADD(day, -14, A.TransactionDate)                        AND B.TransactionDate <= A.TransactionDate                        GROUP BY                        A.CustomerID,                        A.TransactionDate                        ),                        BASIC_OFFSET_90DAY AS (                        SELECT                        A.CustomerID,                        A.TransactionDate,                        AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST90DAY,                        MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST90DAY,                        COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST90DAY                        FROM                        My_First_Table A                        INNER JOIN My_First_Table B ON A.CustomerID = B.CustomerID                        AND 1 = 1                        WHERE                        B.TransactionDate >= DATEADD(day, -90, A.TransactionDate)                        AND B.TransactionDate <= A.TransactionDate                        GROUP BY                        A.CustomerID,                        A.TransactionDate                        ),                        BASIC_OFFSET_180DAY AS (                        SELECT                        A.CustomerID,                        A.TransactionDate,                        AVG(B.PurchaseAmount) as AVG_PURCHASEAMOUNT_PAST180DAY,                        MAX(B.PurchaseAmount) as MAX_PURCHASEAMOUNT_PAST180DAY,                        COUNT(DISTINCT B.TransactionID) as COUNT_DISTINCT_TRANSACTIONID_PAST180DAY                        FROM                        My_First_Table A                        INNER JOIN My_First_Table B ON A.CustomerID = B.CustomerID                        AND 1 = 1                        WHERE                        B.TransactionDate >= DATEADD(day, -180, A.TransactionDate)                        AND B.TransactionDate <= A.TransactionDate                        GROUP BY                        A.CustomerID,                        A.TransactionDate                        )                        SELECT                        src.*,                        BASIC_OFFSET_14DAY.AVG_PURCHASEAMOUNT_PAST14DAY,                        BASIC_OFFSET_14DAY.MAX_PURCHASEAMOUNT_PAST14DAY,                        BASIC_OFFSET_14DAY.COUNT_DISTINCT_TRANSACTIONID_PAST14DAY,                        BASIC_OFFSET_90DAY.AVG_PURCHASEAMOUNT_PAST90DAY,                        BASIC_OFFSET_90DAY.MAX_PURCHASEAMOUNT_PAST90DAY,                        BASIC_OFFSET_90DAY.COUNT_DISTINCT_TRANSACTIONID_PAST90DAY,                        BASIC_OFFSET_180DAY.AVG_PURCHASEAMOUNT_PAST180DAY,                        BASIC_OFFSET_180DAY.MAX_PURCHASEAMOUNT_PAST180DAY,                        BASIC_OFFSET_180DAY.COUNT_DISTINCT_TRANSACTIONID_PAST180DAY                        FROM                        My_First_Table src                        LEFT OUTER JOIN BASIC_OFFSET_14DAY ON BASIC_OFFSET_14DAY.TransactionDate = src.TransactionDate                        AND BASIC_OFFSET_14DAY.CustomerID = src.CustomerID                        LEFT OUTER JOIN BASIC_OFFSET_90DAY ON BASIC_OFFSET_90DAY.TransactionDate = src.TransactionDate                        AND BASIC_OFFSET_90DAY.CustomerID = src.CustomerID                        LEFT OUTER JOIN BASIC_OFFSET_180DAY ON BASIC_OFFSET_180DAY.TransactionDate = src.TransactionDate                        AND BASIC_OFFSET_180DAY.CustomerID = src.CustomerID;1.2.3.4.5.6.7.8.9.10.11.12.13.14.15.16.17.18.19.20.21.22.23.24.25.26.27.28.29.30.31.32.33.34.35.36.37.38.39.40.41.42.43.44.45.46.47.48.49.50.51.52.53.54.55.56.57.58.59.60.61.62.63.64.65.66.67.68.69.70.71.72.73.                                                                
结果:

结语
希望本篇文章有助于阐明数据从业者在操作现代数据堆栈时会遇到的不同问题。当涉及查询云数据仓库时,SQL是一把双刃剑。虽然将计算集中在云数据仓库中可以提高速度,但有时需要一些额外的SQL技能。
版权声明
本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。