SQLAI.ai

I need to join 2 tables to create a dataset for a dashboard. The 2 tables are designed as follows: Table 1 records sales, so every datetime entry is a unique sale for a certain productID, with misc things like price etc Table 2 contains updates to the pricing algorithm, this contains some logic statements and benchmarks that derived the price. The price holds for a productID until it is updated. For example: ProductID 123 gets a price update in Table 2 at 09:00, 12:12 and 15:39 Table 1 records sales at 09:39, 12:00 and 16:00 What I need is the record of the sale from Table 1 with the at that time information from Table2, So: 09:39 -- Pricing info from table 2 at the 09:00 update 12:00 -- Pricing info from table 2 at the 09:00 update 16:00 -- Pricing info from table 2 at the 15:39 update Both tables contain data dating back multiple years, and ideally I want the new table dating back to the most recent origin of the 2 tables. What would the join conditions of this look like?

To achieve the desired result, you will need to perform a non-equi JOIN between the two tables. This type of join is not directly supported in standard SQL, but you can simulate it using a subquery or a LEFT JOIN combined with a WHERE clause to filter the results.

Here's a step-by-step approach to writing the SQL statement for BigQuery:

  1. Identify the latest pricing update before each sale: For each sale in Table 1, find the most recent pricing update from Table 2 that occurred before the sale time.

  2. Join the tables: Perform a LEFT JOIN between Table 1 and the subquery result from step 1 based on the ProductID and the condition that the sale time is greater than or equal to the pricing update time.

  3. Select the required columns: Choose the columns you need for your dashboard dataset from both tables.

Assuming the following schema for your tables:

  • Table 1 (Sales): SaleID, ProductID, SaleDateTime, Price, etc.
  • Table 2 (PricingUpdates): UpdateID, ProductID, UpdateDateTime, PricingLogic, etc.

Here is the SQL statement that would accomplish this:

            
WITH LatestPricingUpdates AS ( SELECT p1.ProductID, p1.UpdateDateTime, p1.PricingLogic, -- Include any other columns from Table 2 that you need MAX(p2.UpdateDateTime) OVER ( PARTITION BY p1.ProductID ORDER BY p2.UpdateDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) AS PrevUpdateDateTime FROM `project.dataset.Table2` AS p1 LEFT JOIN `project.dataset.Table2` AS p2 ON p1.ProductID = p2.ProductID AND p1.UpdateDateTime > p2.UpdateDateTime ) SELECT s.SaleID, s.ProductID, s.SaleDateTime, s.Price AS SalePrice, p.PricingLogic AS PricingInfo -- Include any other columns from Table 1 or the LatestPricingUpdates CTE that you need FROM `project.dataset.Table1` AS s LEFT JOIN LatestPricingUpdates AS p ON s.ProductID = p.ProductID AND s.SaleDateTime >= p.UpdateDateTime AND ( s.SaleDateTime < p.PrevUpdateDateTime OR p.PrevUpdateDateTime IS NULL ) WHERE p.UpdateDateTime IS NOT NULL ORDER BY s.SaleDateTime

In this query:

  • The LatestPricingUpdates Common Table Expression (CTE) is used to find the latest pricing update before each sale for a given ProductID.
  • The MAX function combined with the OVER clause is used to get the previous update's datetime for each pricing update.
  • The main SELECT statement then joins the sales table with this CTE to get the corresponding pricing information for each sale.
  • The WHERE clause ensures that we only include sales that have a corresponding pricing update.

Please note that you may need to adjust the column names and the project and dataset names to match your actual BigQuery setup.

  • Public
  • ·
  • Fri, 24 Nov 2023 13:02:47 GMT