M.U.P.P.I.X. purveyors of fine Data Analysis Tools
  • Home
    • Applications
    • Blog
    • About
    • Clients
    • Company
    • Other Links
  • Training
  • Get Started
    • Muppix Keywords
    • Glossary find Keywords
    • Templates >
      • Capture
      • Explore
      • Clean-up
    • Approach to BigData
  • Linux Cheatsheet
    • Linux Cheatsheet 2
    • Essential Terminal Commands
    • Basic Linux Commands
  • SQL & Excel Commands
    • SQL Cookbook
    • SQL Cookbook 2
    • SQL search entire DataBase
    • SQL Import Table Tool
    • Excel OneLiners
  • Download
(Step by step Approach to opening up the  mysteries of a database:

Select All Tables & times of update
           All Fields

Compile all existing / production SQL & StoredProcedures & Scripts
   & dump in SQLanalyse showing:
      Databases & External columns accessed
      JOINS used
      New Columns/fields created

Using SQLAnalysis to Identify key/essential tables & columns.
     Compile all SQL scripts, Views & StoredProcedures & insert in to this sheet to extract :
            key JOINs
            group columns together, based on their values . group columns from the various tables in order of importance of the group
                     using MapFields
                     using DistinctValues  to spot breaks in any assumption ( the distincts should be similar)
Using MapChk to discover hard JOINS to other datasets
Using Compare List for JOINS to spot likely joins
Using the Muppix toolkit to extract all valid database columns from existing code
Using Schema Entity Explorer to navigate across tables and report key fields , for any inconsistencies 

Take all proven JOINS and use SQL Find other JOIN Fields in all Tables & find any other tables that could be JOINED using the same JOIN fields

Find all tables that have been updated in last month/week/day 

Find the lowest-level records , ie a Loan, or a Flightbooking  or Trades, then find all types of records that are more higher / grouped level.
Organise the most important/lowestlevel JOINS 

Using BRCheck (Business Rule Check) generate SQL to identify the data ranges of each column, distinct values & order of date columns and in so doing generate a concentrated overview of the real values if each table
            
Using SchemaIRB EntityJoin, generate SQL code to easily navigate between tables
            dump the output into SQLdata, and see in Mydata the key columns all grouped together, for comparison

     Identify JOINS on lowest level using existing production SQL/Views/StoredProcedures in SQLAnalysis  
                     list the production JOINS first
 
                     Some JOINS need non-standard code , ie select only the latest record
 
                     Identify more general JOINS using  MAPChk . Using MapChk to discover hard joins between multiple tables. List these lower down.
​
After collating all the databases & JOINs , use SchemaIRB to  Auto Join multiple tables Connect Joins accross Autogenerate SQL 

Seeing hard-to-spot for differences in output:
 Using Compare 2 spreadsheets :
      cell by cell  or
     By using a common Index/JOIN column identical column names
                    ( so the lines and the columns of each datadump can be in any /random order
                       Must specify IndexColumnName common to both datadumps)
 
  Compare each line with line above for differences
       Sometimes a line with many columns may have subtle differences in the line above , that are hard to spot

Exploring JOINS to other datasets
Use AnalysisSSIS to extract all dataabse fields used in a SSIS package
Use SQL Analysis
Find similar
Auto generate SQL code to JOIN multiple SQL tables
          

Muppix provides innovative solutions and Training to make sense of large scale data.
Backed by years of industry experience, the Muppix Team have developed a Free Data Science Toolkit to extract and analyse multi-structured information from diverse data sources


Company

Blog

Training

Professional Services

Get Started