───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───
Executive Summary
- This SQL implementation consolidates corporate data from research and personnel divisions into a unified data warehouse which enables integrated risk analysis. The solution uses an ETL (extract, transform, load) process, a star schema for efficient querying, and analysis queries to provide insights into risk distribution and entity comparisons. The design prioritizes data integrity, scalability, and actionable business intelligence.
ETL Implementation
Extract
- Extracted specimen data from the
SPECIMENS
table into theResearch_Division_Source
view, capturingSPECIMEN_ID
,COMMON_NAME
, andTHREAT_LEVEL
asLab_Subject_Id
,Subject_Name
, andDanger_Rating
, respectively. - Extracted employee data from the
EMPLOYEES
table into thePersonnel_Division_Source
view, includingEMPLOYEE_ID
,NAME
, andSECURITY_CLEARANCE
asEmployee_Id
,Employee_Name
, andSecurity_Clearance
Transform
- Created the
Clean_Corporate_Data
view to unify specimen data with a standardized structure, mappingLab_Subject_Id
toEntity_ID
,Subject_Name
toEntity_Name
, and setting Entity_Type to ‘SPECIMEN’. - Derived
Risk_Category
using aCASE
statement based onDanger_Rating
thresholds (HIGH: ≥8, MEDIUM: 4-7, LOW: 1-3, UNKNOWN: <1).
Load
- Loaded transformed data from
Clean_Corporate_Data
view into theData_Warehouse
table which storesENTITY_ID
,ENTITY_NAME
,ENTITY_TYPE
,RISK_SCORE
, and RISK_CATEGORY. - A
SELECT COUNT(*)
query verifies the number of loaded records, ensuring data integrity during the load phase.
Star Schema Design
- The star schema simplifies queries by centralizing facts in
Risk_Facts
and linking to descriptive dimensions, reducing join complexity and improving performance for analytical queries. It supports scalable expansion for additional entity types and risk metrics.
Fact Table
- Stores quantitative metrics for risk analysis, with columns
Risk_Fact_Key
(primary key),Entity_ID
,Entity_Type_Key
,Risk_Level_Key
,Risk_Score
,Entity_Count
(default 1), andLoad_Date
(defaultGETDATE()
). - Centralizes measurable data for efficient aggregation and analysis, linking to dimension tables for context. The
CHECK
constraint onRisk_Score
(0-10) ensures valid data, andEntity_Count
supports potential future aggregation.
Dimension Tables
Dim_Entity_Type
Entity_Type_Id
(pk),Entity_Type_Code
,Entity_Type_Description
,Entity_Category
- Categorizes entities as ‘SPECIMEN’ or ‘EMPLOYEE’, with descriptive metadata for reporting clarity.
- Enables filtering and grouping by entity type, supporting cross-divisional analysis.
Dim_Risk_Level
- Risk_Level_Id (primary key), Risk_Level_Code, Risk_Level_Description, Min_Score, Max_Score, Alert_Required.
- Defines risk levels (LOW, MEDIUM, HIGH, UNKNOWN) with score ranges and alert flags for prioritization.
- Facilitates risk-based filtering and alerting, enhancing decision-making.
Business Intelligence Findings
Risk Summary Report
- Reveals the distribution of entities across risk levels (LOW, MEDIUM, HIGH, UNKNOWN) with counts and percentages. This highlights the proportion of high-risk entities requiring immediate attention, aiding resource allocation.
Entity Comparison Report
- Compares average risk scores between entity types (e.g., SPECIMEN vs. EMPLOYEE), identifying which division poses greater risk. This informs targeted risk mitigation strategies.
High-Risk Specimen Analysis
- Identifies specimens with
Risk_Level_Code = 'HIGH'
, providingEntity_ID
,Risk_Score
, andLoad_Date
. This focuses operational efforts on critical specimens, potentially correlating high risk with containment or research priorities.
Operational Insights
- The predominance of high-risk specimens (if observed) suggests a need for enhanced containment protocols or additional security measures.
- Cross-divisional analysis enables unified risk management, identifying patterns (e.g., high-risk employees handling high-risk specimens) that could trigger security reviews.
- The
Alert_Required
flag inDim_Risk_Level
supports automated notifications for high and medium-risk entities, streamlining incident response.
Collaboration Reflection
- I collaborated with Riker on this implementation, and we both worked on different trigger components. We created our own versions and then we combined them together to see what would be the best solution. I also used Claude AI as well as ChatGPT to fix up some of my grammar and help out with my documentation and code styling.
Lessons Learned
Data Integration
- Mapping disparate data sources (specimens and employees) required consistent naming (
Entity_ID
,Entity_Name
) and risk scoring, highlighting the importance of a unified schema. - Views like
Research_Division_Source
andClean_Corporate_Data
simplified data extraction and transformation, but careful validation (e.g.,CHECK
constraints) was critical to prevent data inconsistencies. - Designing the Data_Warehouse and star schema with extensible fields (e.g.,
ENTITY_TYPE
) allows future inclusion of additional data sources without major refactoring.
Analysis
- Joins in analysis queries (e.g.,
Risk_Facts
withDim_Risk_Level
) performed well due to indexed keys, but complex aggregations required careful optimization to avoid performance bottlenecks. - Actionable Insights: The star schema enabled straightforward reporting, with percentage calculations and risk filtering providing clear, actionable metrics for corporate decision-making.
- Best Practices: Using views for intermediate transformations improved maintainability, while dimension tables with descriptive metadata enhanced report interpretability. Early application of filters in queries reduced execution time.
───✱*.。:。✱*.:。✧*.。✰*.:。✧*.。:。*.。✱ ───