───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───

Executive Summary

  • This SQL implementation consolidates corporate data from research and personnel divisions into a unified data warehouse which enables integrated risk analysis. The solution uses an ETL (extract, transform, load) process, a star schema for efficient querying, and analysis queries to provide insights into risk distribution and entity comparisons. The design prioritizes data integrity, scalability, and actionable business intelligence.

ETL Implementation

Extract

  • Extracted specimen data from the SPECIMENS table into the Research_Division_Source view, capturing SPECIMEN_ID, COMMON_NAME, and THREAT_LEVEL as Lab_Subject_Id, Subject_Name, and Danger_Rating, respectively.
  • Extracted employee data from the EMPLOYEES table into the Personnel_Division_Source view, including EMPLOYEE_ID, NAME, and SECURITY_CLEARANCE as Employee_Id, Employee_Name, and Security_Clearance

Transform

  • Created the Clean_Corporate_Data view to unify specimen data with a standardized structure, mapping Lab_Subject_Id to Entity_ID, Subject_Name to Entity_Name, and setting Entity_Type to ‘SPECIMEN’.
  • Derived Risk_Category using a CASE statement based on Danger_Rating thresholds (HIGH: ≥8, MEDIUM: 4-7, LOW: 1-3, UNKNOWN: <1).

Load

  • Loaded transformed data from Clean_Corporate_Data view into the Data_Warehouse table which stores ENTITY_ID, ENTITY_NAME, ENTITY_TYPE, RISK_SCORE, and RISK_CATEGORY.
  • A SELECT COUNT(*) query verifies the number of loaded records, ensuring data integrity during the load phase.

Star Schema Design

  • The star schema simplifies queries by centralizing facts in Risk_Facts and linking to descriptive dimensions, reducing join complexity and improving performance for analytical queries. It supports scalable expansion for additional entity types and risk metrics.

Fact Table

  • Stores quantitative metrics for risk analysis, with columns Risk_Fact_Key (primary key), Entity_ID, Entity_Type_Key, Risk_Level_Key, Risk_Score, Entity_Count (default 1), and Load_Date (default GETDATE()).
  • Centralizes measurable data for efficient aggregation and analysis, linking to dimension tables for context. The CHECK constraint on Risk_Score (0-10) ensures valid data, and Entity_Count supports potential future aggregation.

Dimension Tables

Dim_Entity_Type

  • Entity_Type_Id (pk), Entity_Type_Code, Entity_Type_Description, Entity_Category
  • Categorizes entities as ‘SPECIMEN’ or ‘EMPLOYEE’, with descriptive metadata for reporting clarity.
  • Enables filtering and grouping by entity type, supporting cross-divisional analysis.

Dim_Risk_Level

  • Risk_Level_Id (primary key), Risk_Level_Code, Risk_Level_Description, Min_Score, Max_Score, Alert_Required.
  • Defines risk levels (LOW, MEDIUM, HIGH, UNKNOWN) with score ranges and alert flags for prioritization.
  • Facilitates risk-based filtering and alerting, enhancing decision-making.

Business Intelligence Findings

Risk Summary Report

  • Reveals the distribution of entities across risk levels (LOW, MEDIUM, HIGH, UNKNOWN) with counts and percentages. This highlights the proportion of high-risk entities requiring immediate attention, aiding resource allocation.

Entity Comparison Report

  • Compares average risk scores between entity types (e.g., SPECIMEN vs. EMPLOYEE), identifying which division poses greater risk. This informs targeted risk mitigation strategies.

High-Risk Specimen Analysis

  • Identifies specimens with Risk_Level_Code = 'HIGH', providing Entity_ID, Risk_Score, and Load_Date. This focuses operational efforts on critical specimens, potentially correlating high risk with containment or research priorities.

Operational Insights

  • The predominance of high-risk specimens (if observed) suggests a need for enhanced containment protocols or additional security measures.
  • Cross-divisional analysis enables unified risk management, identifying patterns (e.g., high-risk employees handling high-risk specimens) that could trigger security reviews.
  • The Alert_Required flag in Dim_Risk_Level supports automated notifications for high and medium-risk entities, streamlining incident response.

Collaboration Reflection

  • I collaborated with Riker on this implementation, and we both worked on different trigger components. We created our own versions and then we combined them together to see what would be the best solution. I also used Claude AI as well as ChatGPT to fix up some of my grammar and help out with my documentation and code styling.

Lessons Learned

Data Integration

  • Mapping disparate data sources (specimens and employees) required consistent naming (Entity_ID, Entity_Name) and risk scoring, highlighting the importance of a unified schema.
  • Views like Research_Division_Source and Clean_Corporate_Data simplified data extraction and transformation, but careful validation (e.g., CHECK constraints) was critical to prevent data inconsistencies.
  • Designing the Data_Warehouse and star schema with extensible fields (e.g., ENTITY_TYPE) allows future inclusion of additional data sources without major refactoring.

Analysis

  • Joins in analysis queries (e.g., Risk_Facts with Dim_Risk_Level) performed well due to indexed keys, but complex aggregations required careful optimization to avoid performance bottlenecks.
  • Actionable Insights: The star schema enabled straightforward reporting, with percentage calculations and risk filtering providing clear, actionable metrics for corporate decision-making.
  • Best Practices: Using views for intermediate transformations improved maintainability, while dimension tables with descriptive metadata enhanced report interpretability. Early application of filters in queries reduced execution time.

───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───