DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

programing

DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

subpage 2023. 4. 29. 09:18

DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

저는 다음과 같은 고유한 NumUsers의 총계를 얻기 위해 다음과 같이 작성하려고 합니다.

NumUsers = COUNT(DISTINCT [UserAccountKey]) OVER (PARTITION BY [Mth])

관리 스튜디오는 이것에 대해 별로 만족하지 않는 것 같습니다.제거하면 오류가 사라집니다.DISTINCT키워드입니다, 하지만 그것은 명확한 숫자가 아닐 것입니다.

DISTINCT파티션 함수 내에서는 가능하지 않은 것 같습니다.어떻게 구별되는 카운트를 찾을 수 있습니까?상관된 하위 쿼리와 같은 더 전통적인 방법을 사용합니까?

좀 더 자세히 살펴보면, 아마도 이것들은OVER기능은 오라클과 다르게 작동하며, 사용할 수 없는 방식으로SQL-Server실행 총계를 계산합니다.

여기 SQLfiddle에서 파티션 함수를 사용하여 실행 총계를 계산하는 예제를 추가했습니다.

다음을 사용하는 매우 간단한 솔루션이 있습니다.

dense_rank() over (partition by [Mth] order by [UserAccountKey]) 
+ dense_rank() over (partition by [Mth] order by [UserAccountKey] desc) 
- 1

이를 통해 귀하가 요청한 내용을 정확하게 확인할 수 있습니다.매월 고유한 사용자 계정 키 수입니다.

네크로맨싱:

DENSE_RANK를 통해 파티션 BY를 통해 카운트를 MAX로 에뮬레이트하는 것은 비교적 간단합니다.

;WITH baseTable AS
(
    SELECT 'RM1' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM1' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR2' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR3' AS ADR
    UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM2' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM3' AS RM, 'ADR1' AS ADR
    UNION ALL SELECT 'RM3' AS RM, 'ADR2' AS ADR
)
,CTE AS
(
    SELECT RM, ADR, DENSE_RANK() OVER(PARTITION BY RM ORDER BY ADR) AS dr 
    FROM baseTable
)
SELECT
     RM
    ,ADR

    ,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY ADR) AS cnt1 
    ,COUNT(CTE.ADR) OVER (PARTITION BY CTE.RM) AS cnt2 
    -- Not supported
    --,COUNT(DISTINCT CTE.ADR) OVER (PARTITION BY CTE.RM ORDER BY CTE.ADR) AS cntDist
    ,MAX(CTE.dr) OVER (PARTITION BY CTE.RM ORDER BY CTE.RM) AS cntDistEmu 
FROM CTE

참고:
이는 해당 필드가 null이 아닌 필드라고 가정합니다.
필드에 NULL 항목이 하나 이상 있으면 1을 빼야 합니다.

위의 David와 유사한 솔루션을 사용하지만, 일부 행을 카운트에서 제외해야 할 경우 추가 트위스트를 사용합니다.이 경우 [UserAccountKey]가 null이 아닌 것으로 가정합니다.

-- subtract an extra 1 if null was ranked within the partition,
-- which only happens if there were rows where [Include] <> 'Y'
dense_rank() over (
  partition by [Mth] 
  order by case when [Include] = 'Y' then [UserAccountKey] else null end asc
) 
+ dense_rank() over (
  partition by [Mth] 
  order by case when [Include] = 'Y' then [UserAccountKey] else null end desc
)
- max(case when [Include] = 'Y' then 0 else 1 end) over (partition by [Mth])
- 1

확장 예제가 있는 SQL Fiddle은 여기에서 찾을 수 있습니다.

SQL-Server 2008 R2에서 이 작업을 수행하는 유일한 방법은 상관된 하위 쿼리를 사용하거나 외부 응용 프로그램을 사용하는 것입니다.

SELECT  datekey,
        COALESCE(RunningTotal, 0) AS RunningTotal,
        COALESCE(RunningCount, 0) AS RunningCount,
        COALESCE(RunningDistinctCount, 0) AS RunningDistinctCount
FROM    document
        OUTER APPLY
        (   SELECT  SUM(Amount) AS RunningTotal,
                    COUNT(1) AS RunningCount,
                    COUNT(DISTINCT d2.dateKey) AS RunningDistinctCount
            FROM    Document d2
            WHERE   d2.DateKey <= document.DateKey
        ) rt;

SQL-Server 2012에서는 제안한 구문을 사용하여 이 작업을 수행할 수 있습니다.

SELECT  datekey,
        SUM(Amount) OVER(ORDER BY DateKey) AS RunningTotal
FROM    document

단, 의 사용DISTINCT여전히 허용되지 않기 때문에 DISTINCT가 필요하거나 업그레이드가 옵션이 아닌 경우OUTER APPLY당신의 최선의 선택입니다.

간단한 SQL에는 다음과 같은 솔루션이 있습니다.

SELECT time, COUNT(DISTINCT user) OVER(ORDER BY time) AS users
FROM users

SELECT time, COUNT(*) OVER(ORDER BY time) AS users
FROM (
    SELECT user, MIN(time) AS time
    FROM users
    GROUP BY user
) t

저는 근본적으로 왜 q와 같은 질문을 가지고 이곳을 헤매고 David의 솔루션을 찾았지만, 너무 드물게 사용하기 때문에 DENSE_RANK에 대한 이전 자습서를 검토해야 했습니다. RANK나 ROW_NUMBER 대신 DENSE_RANK가 왜 작동하는지, 그리고 실제로 어떻게 작동하는지요?그 과정에서 저는 이 특정 문제에 대한 David의 솔루션 버전을 포함하도록 튜토리얼을 업데이트했고, SQL 초보자(또는 저와 같은 다른 사람들이 무언가를 잊어버리는 경우)에게 도움이 될 것이라고 생각했습니다.

전체 자습서 텍스트를 쿼리 편집기에 복사/붙여넣은 다음 각 예제 쿼리를 (별도로) 주석을 제거하고 실행하여 각 결과를 확인할 수 있습니다.(기본적으로 이 문제에 대한 솔루션은 맨 아래에 설명되어 있지 않습니다.)또는 각 예제를 자체 쿼리 편집 인스턴스로 개별적으로 복사할 수 있지만 TBLx CTE는 각각 포함되어야 합니다.

--WITH /* DB2 version */
--TBLx (Col_A, Col_B) AS (VALUES 
--     (  7,     7  ),
--     (  7,     7  ),
--     (  7,     7  ),
--     (  7,     8  ))

WITH /* SQL-Server version */
TBLx    (Col_A, Col_B) AS
  (SELECT  7,     7    UNION ALL
   SELECT  7,     7    UNION ALL
   SELECT  7,     7    UNION ALL
   SELECT  7,     8)

/*** Example-A: demonstrates the difference between ROW_NUMBER, RANK and DENSE_RANK ***/

  --SELECT Col_A, Col_B,
  --  ROW_NUMBER() OVER(PARTITION BY Col_A ORDER BY Col_B) AS ROW_NUMBER_,
  --  RANK() OVER(PARTITION BY Col_A ORDER BY Col_B)       AS RANK_,
  --  DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DENSE_RANK_
  --FROM TBLx

  /* RESULTS:
    Col_A  Col_B  ROW_NUMBER_  RANK_  DENSE_RANK_
      7      7        1          1        1
      7      7        2          1        1
      7      7        3          1        1
      7      8        4          4        2

     ROW_NUMBER: Just increments for the three identical rows and increments again for the final unique row.
                 That is, it’s an order-value (based on "sort" order) but makes no other distinction.
                 
           RANK: Assigns the same rank value to the three identical rows, then jumps to 4 for the fourth row,
                 which is *unique* with regard to the others.
                 That is, each identical row is ranked by the rank-order of the first row-instance of that
                 (identical) value-set.
                 
     DENSE_RANK: Also assigns the same rank value to the three identical rows but the fourth *unique* row is
                 assigned a value of 2.
                 That is, DENSE_RANK identifies that there are (only) two *unique* row-types in the row set.
  */

/*** Example-B: to get only the distinct resulting "count-of-each-row-type" rows ***/

--  SELECT DISTINCT -- For unique returned "count-of-each-row-type" rows, the DISTINCT operator is necessary because
--                  -- the calculated DENSE_RANK value is appended to *all* rows in the data set.  Without DISTINCT,
--                  -- its value for each original-data row-type would just be replicated for each of those rows.
--                  
--    Col_A, Col_B,                
--    DENSE_RANK() OVER(PARTITION BY Col_A ORDER BY Col_B) AS DISTINCT_ROWTYPE_COUNT_
--  FROM TBLx

  /* RESULTS:
    Col_A  Col_B  DISTINCT_ROWTYPE_COUNT_
      7      7            1
      7      8            2
  */

/*** Example-C.1: demonstrates the derivation of the "count-of-all-row-types" (finalized in Example-C.2, below) ***/

--  SELECT
--    Col_A, Col_B,
--    
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC) AS ROW_TYPES_COUNT_DESC_,
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC) AS ROW_TYPES_COUNT_ASC_,
--    
--    -- Adding the above cases together and subtracting one gives the same total count for on each resulting row:
--    
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
--       +
--    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
--      - 1   /* (Because DENSE_RANK values are one-based) */
--      AS ROW_TYPES_COUNT_
--  FROM TBLx

  /* RESULTS:
    COL_A  COL_B  ROW_TYPES_COUNT_DESC_  ROW_TYPES_COUNT_ASC_  ROW_TYPES_COUNT_
      7      7            2                     1                    2
      7      7            2                     1                    2
      7      7            2                     1                    2
      7      8            1                     2                    2
      
  */

/*** Example-C.2: uses the above technique to get a *single* resulting "count-of-all-row-types" row ***/

  SELECT DISTINCT -- For a single returned "count-of-all-row-types" row, the DISTINCT operator is necessary because the
                  -- calculated DENSE_RANK value is appended to *all* rows in the data set.  Without DISTINCT, that
                  -- value would just be replicated for each original-data row.
                  
--    Col_A, Col_B, -- In order to get a *single* returned "count-of-all-row-types" row (and field), all other fields
                    -- must be excluded because their respective differing row-values will defeat the purpose of the
                    -- DISTINCT operator, above.
                   
    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B DESC)
       +
    DENSE_RANK() OVER ( PARTITION BY Col_A ORDER BY Col_B ASC)
      - 1   /* (Because DENSE_RANK values are one-based) */
      AS ROW_TYPES_COUNT_
  FROM TBLx
  
  /* RESULTS:

    ROW_TYPES_COUNT_
          2
  */

언급URL : https://stackoverflow.com/questions/11202878/partition-function-count-over-possible-using-distinct

'programing' 카테고리의 다른 글

Postgre에서 PL/pgSQL 출력 저장CSV 파일에 대한 SQL (0)	2023.04.29
하나의 CTE를 여러 번 사용 (0)	2023.04.29
WPF 텍스트 상자에서 포커스에 있는 모든 텍스트를 자동으로 선택하는 방법은 무엇입니까? (0)	2023.04.29
Git의 마스터 브랜치를 다른 브랜치에서 완전히 대체하려면 어떻게 해야 하나요? (0)	2023.04.24
Swift - Int를 열거형으로 캐스트합니다.내부 (0)	2023.04.24

현재글DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

각종 프로그래밍 정보를 다루는 블로그입니다.

Python, Wordpress, JavaScript, AngularJS, MongoDB, C, AJAX, asp.net, CSS, sql-server, MariaDB, powershell, MySQL, oracle, REACTJS, git, jQuery, Excel, json, spring-boot,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

subpage

DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

DISTINCT를 사용하여 파티션 함수 카운트() OVER 가능

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바