A core tenant of the DAS approach is that the latest version of all of the data is held in the DAS – Data at Rest (DAS-DR) service. No external access to any data is allowed without going through the DAS – Data in Motion (DAS-DM) components.
Most of the DAS-DM components are implemented as Messaging Services.
All communication between the trellispark framework and external systems are decoupled by Data in Motion tables in the DAS-Record Storage Service (DAS-RSS).
- Messages are queued for transmission in the DiM-Messages table.
- Successfully transmitted messages are tracked in the DiM-MessageHistory table
- Failed messages are transferred to the DiM-MessageFailures table. They can be retried at a later time (if appropriate), or used for failure reporting and diagnostics
Storage and movement of files is implemented by DAS-File Storage Service (DAS-FSS)
Why do we need to move data?
- Not all of the data you need will be in your Data at Rest (DAS-DR)
- Need to share data with other information systems
- Optimize operations within your business and across your market place
- Exchanging data to drive business value
- Internal Information Systems
- Legacy, custom or other propriety applications
- Remote business units
- External Information Systems
- Cloud services or Software as a Service
- Third Party applications: Customers, Suppliers, Government, etc…
Why do we need Data in Motion components?
- Provide information security and privacy
- Only authenticated and authorized applications should have access to data
- Need to validate BOTH senders and recipients of data
- End to end data encryption
- Need to control and audit what data is being transferred
- Ordered delivery and automated retry on failure
- Receipt tracking and non-repudiation
- Protect the integrity of the DAS-DR service
- Need to validate the structure of data being received
- Ideally also validate the content of data as it arrives, before it is stored
Why do we need to import data?
- There are only two options:
- Retrieve the data as needed from the external system (Realtime or point-2-point)
- Copy the data into your DAS-DR (Bulk load, Batch or Extract Transform Load)
- If we copy the data, then there are two other things to consider:
- Do we allow changes to the data?
- Do we need to synchronize changes to the data with the external system?
What are the different data import scenarios?
- Real-time integrations
- Done on a per record basis using an API or messaging technology
- Validation as the record data is received and stored
- Bulk data import
- Many records are received at the same time and processed as a batch
- Initial data migrations are common as new applications are deployed
- Batch files can also be sent by external systems on demand or scheduled
- Do this once and every data migration looks unique
- Do this often and you see patterns that can be automated with minor customizations
trellispark Bulk Data Migration
- Use a “migration” table within the security context to receive the bulk data
- Basic field type validation as rows are loaded
- Create a batch TSQL stored procedure to validate and process data
- Check business rules
- Map primary / foreign keys from external system to System or Record
- Format XML for loading
- Upload the valid data into the System of Record
- Track correlation and flag any errors
Why use trellispark bulk data migration?
- You have to build a mechanism for importing bulk data
- Converting from traditional record storage to Data Agnostic Service Record Storage can be hard
- XML/JSON formatting
- Parent-Child Hierarchy
- Cross record links
- There is good news – Can leverage your UX data model to autogenerate most of the code
- Focus attention on the bits that you need to customize
- Provides consistency to the import process
- Initial migration and subsequent bulk loads
- Across multiple sources
- Across all datasets (concepts/tables)
- Process control and rollback
Why decouple information systems?
- Highly coupled information systems are inherently fragile
- A exchanges data with B in real-time data, if B fails, then A also fails…
- Decoupling A and B minimizes dependencies and increases robustness
- Any information system outside your Data as a Service boundary is a risk
- Doesn’t matter whether its Internal or External
- It might be down, or running slowly
- Which can impact the operation, performance or integrity of your solution
- Within your Data as a Service boundary tight coupling is only to Data at Rest
- But if you can’t access your own data, you can’t do anything anyway!
How do we decouple information systems?
- Buffered communications channels
- Events and Notifications
- Messages
- Service Bus
- File Transfer – Extract, Transform and Load (ETL)
- Shared data stores
- Combinations of the above!
- Drop a file and send an event/message
- Place the internal buffer inside the Data at Rest – make sure it always available!
What’s the alternative to decoupling?
- Tightly coupled data transfer using some form of real-time API calls
- Smaller data scope – typically one record per call
- Advantages:
- Real-time or near real-time
- User can immediately respond to an error in context
- Downsides:
- Increased latency – user is waiting for a response
- Availability
- Fragility – both sides depend upon the interface contract
- Compensating transactions across both systems
Sustainability Considerations
- Decoupling and buffering communications makes it easier to change components
- Internal buffering enables creation of generic message passing components
- Data Agnostic Services (DAS) reduces to need to move data internally
- DAS approach creates an effective low-cost Entity Synchronization Layer
- Change underlying data storage and transport technology if more effective and efficient solutions become available
- The efficiency of the underlying architecture as means you need less infrastructure to host your data storage and transfer