grails: approaches to importing data from an external schema - grails

Need to periodically read ~20K records from some external database (schema not under my control), and update/create respective instances in a the local schema (grails' main dataSource). The target is a single domain class.
I've mapped the external database as another dataSource. I'm thinking to use groovy.sql.Sql + raw SQL to bring-in all records, and generate domain instances as-required. Is that a reasonable path? Should I alternatively model-out the external schema, and use GORM end-to-end?
Assuming the first approach, considering testing: are there any useful tools I should look into for setting-up test data (I.E. an equivalent of build-test-data/fixtures for non-domain data)?
Thanks

Yes. Think this is reasonable given the data size and how often you are going to do this. Just dont forget to execute the sql by batch to save on resources.

Related

Data masking in Synapse serverless SQL pool

How can I implement data masking in Synapse serverless SQL pool, as currently, it is only implemented in a Synapse dedicated SQL pool?
I am expecting to achieve masking in a serverless SQL pool.
As per a Microsoft document, it is clearly stated that Dynamic data masking is only available for Dedicated SQL Pool, not for Serverless SQL Pool. As serverless SQL pool does not support Tables, Materialized views, DDL statements, DML statements, it might the reason.
Also, as Nandan suggested, it's not supported on external tables either.
You can raise a feature request here.
Just because something is not implemented, does not mean you can not implement it yourself.
First, I thought it would be great to create a function. But the dedicated and server less pools only support in line table value functions.
Second, we can also create a view with masked data. Then revoke the user from having rights to see that base table. Lets implement that for the customer id key. The code below shows the view.
--
-- Create view with masked customer number
--
CREATE VIEW saleslt.vw_dim_masked_customer
AS
SELECT
'***' +
SUBSTRING(CAST([CustomerKey] AS VARCHAR(5)), len([CustomerKey]) - 2, 2) AS MASKED,
[CustomerKey],
[FirstName],
[MiddleName],
[LastName]
FROM [saleslt].[dim_customer]
GO
-- Test view
SELECT * FROM saleslt.vw_dim_masked_customer
GO
I have a database called mssqltips that contains the adventure works data as parquet data files exposed by external tables.
The output from the view shows that our data is masked. I did not get rid of the original column, Customer Key, since I wanted to do a comparison. Also, I would add some error handling for strings that are less than 2 characters long or null.
In short, dynamic data masking as a feature might not be supported. But you can easily mask data using custom logic and views. Just remember to revoke the user access to the base table.
Dynamic data masking is supported in actual physical tables and not supported on external tables.
Hence DDM is not supported in serverless pools

Where to put complex query that returns a custom object?

I have a n-tier solution with these projects in it (simplified for this question):
Domain
Logic
Web
In the "Domain" project I have a "Repositories" namespace and each repository is mapped to a different table in the DB and query its data.
For example - the tables Customers and Orders will have the corresponding repositories - CustomersRepository and OrdersRepository.
In the Logic project I instantiate these repository objects and call their methods which actually query the DB.
Lets say I want to show a report that display some data from both tables.
This report is constructed by a collection of custom objects - IList<ReportObject>.
Now, this ReportObject object has no corresponding table in the DB and therefore has no repository object.
My question: Where should I put the part of code that actually query the DB and fetch IList<ReportObject>? Should it just be in some data controller in the Logic layer? or create another repository for the reports? Any other option?
While I think this is mainly a question of opinion, here goes:
You can create a QueryStore<ReportObject> instead of a Repository<ReportObject>. The name QueryStore is just something I came up with, it's not a coined term.
The function of such a query store would be to, well, run queries on data that is not covered by any repository. It would contain only queries and so can, for instance, easily be implemented using LINQ on top of Entity Framework querying database VIEWs for instance.
I'd put it in a custom repository (as this is not a CRUD operation). You can extend Repository (if you're working with a generic repository) and create one for the query. I wouldn't put queries in other place rather than Repositories since you'll brake the encapsulation of what Repository does. Imagine you change the database in the future, it won't be enough to change the repositories layer. Another reason to not put it there is that the logic will be spread around the application instead of being everything in just one place which simplifies debugging and improvements in the queries.
Hope it helps. Guillermo.
The repository pattern is used to encapsulate CRUD operations, but in your case you do not need any Insert or Update. I would put this into the Logic layer and access the DB directly from there.

Extract query definition from JET database via ADO

I have a program in Delphi 2010 that uses a JET (mdb) database via ADO. I would like to be able to extract the definitions of some of the queries in the database and display them to the user. Is this possible either via SQL, some ADO interface, or by interrogating the database itself (I don't seem to have rights to MSysObjects).
Some of that information is available via ADOX calls. There is an overview of the api with some examples (unfortunately not in Delphi) on the MSDN website.
Basically what you will want to do is to is to import the ADOX type library, and then use the wrapper that is generated for you to access the underlying API. From there its as simple as navigating the hierarchy to get at the data you need.
You will need to access the specific View object, and from there get the command property.
Via DAO, it's pretty easy. You just extract the SQL property of each QueryDef. In DAO from within Access, that would be:
Dim db As DAO.Database
Dim qdf As DAO.QueryDef
Set db = DBEngine.OpenDatabase("[path/name of database]")
For Each qdf In db
Debug.Print qdf.SQL
Next qdf
Set qdf = Nothing
db.Close
Set db = Nothing
I don't know how to translate that, but I think it's the simplest method once you're comfortable with using DAO instead of ADOX.
I don't use ADO at all, but I'm guessing that it has a collection of views and the SQL property would work for SELECT queries. However, if you're interested in getting the SQL for all saved QueryDefs, you'd also need to look at the DML queries, so you'd have to look at the stored procedures. I would have to look up the syntax for that, but I'm pretty certain that's how you'd get to the information via ADO.

DbConnection without Db using in-memory DataSet (or similar) as source

I'm trying to unit test a few .NET classes that (for good design reasons) require DbConnections to do their work. For these tests, I have certain data in memory to give as input to these classes.
That in-memory data could be easily expressed as a DataTable (or a DataSet that contains that DataTable), but if another class were more appropriate I could use it.
If I were somehow magically able to get a DbConnection that represented a connection to the in-memory data, then I could construct my objects, have them execute their queries against the in-memory data, and ensure that their output matched expectations. Is there some way to get a DbConnection to in-memory data? I don't have the freedom to install any additional third-party software to make this happen, and ideally, I don't want to touch the disk during the tests.
Rather than consume a DbConnection can you consume IDbConnection and mock it? We do something similar, pass the mock a DataSet. DataSet.CreateDataReader returns a DataTableReader which inherits from DbDataReader.
We have wrapped DbConnection in our own IDbConnection-like interface to which we've added an ExecuteReader() method which returns a class that implements the same interfaces as DbDataReader. In our mock, ExecuteReader simply returns what DataSet.CreateDataReader serves up.
Sounds kind of roundabout, but it is very convenient to build up a DataSet with possibly many resultsets. We name the DataTables after the stored procs that they represent the results of, and our IDbConnection mock grabs the right Datatable based on the proc the client is calling. DataTable also implements CreateDataReader so we're good to go.
An approach that I've used is to create an in-memory Sqlite database. This may be done simply by pulling in the System.Data.SQLite.Core NuGet package to your unit test project, you don't need to install any software anywhere else.
Although it sounds like a really obvious idea, it wasn't until I was looking at the Dapper unit tests that I thought to use the technique myself! See the "GetSqliteConnection" method in
https://github.com/StackExchange/dapper-dot-net/blob/bffb0972a076734145d92959dabbe48422d12922/Dapper.Tests/Tests.cs
One thing to be aware of is that if you create an in-memory sqlite db and create and populate tables, you need to be careful not to close the connection before performing your test queries because opening a new in-memory connection will get you a connection to a new in-memory database, not the database that you just carefully prepared for your tests! For some of my tests, I use a custom IDbConnection implementation that keeps the connection open to avoid this pitfall - eg.
https://github.com/ProductiveRage/SqlProxyAndReplay/blob/master/Tests/StaysOpenSqliteConnection.cs
TypeMock? (You would need to 'install' it though).
Be careful assuming that Data* can give you proper hooks for testing - its pretty the worst case in general. But you say Good Design Reasons, so I'm sure that's all covered :D

How to create OData based off RFC with multiple tables in the output?

I am working on a large project at work that requires me to create OData's for a large variety of Remote Function Calls. I was able to work out how to model and create OData's for simple RFCs; however, I am struggling with more complex RFCs that use multiple tables as well as simple exporting and importing parameters.
I want to output these tables as well as the importing and exporting parameters via GetEntity and GetEntitySet with just one call. I have done extensive searching online to find solutions but the best solution seems to be redefining the RFC's or calling the OData multiple times which is not ideal.
Is there any way to combine multiple tables with several entries in the output? When I say output, I am referring to the resulting XML from GetEntity/GetEntitySet.
For example, take the below fake RFC definition that takes a PERNR, and outputs a list of direct reports and a structure of employee details.
IMPORTING
PERNR
EXPORTING
S_EMPLOYEE_DETAILS
TABLES
T_DIRECT_REPORTS
Is there a way to combine the table, structure, and importing parameters into one output?
The first thing to understand is that the OData protocol is not intended to solely work like classical function calls. It is based however on entity/relationship kind of model.
So in your case id sugest to create an entity type named 'Employee' with the appropiate properties of your structure S_EMPLOYEE_DETAILS. With this you can e.g. implement the method GET_EMPLOYEE_ENTITY to retrieve a single instance of an employee via PERNR.
The next thing to do would be to get the direct reports of this employee. Since this is a relation 1:N from Employee to Employee in your case you can create a navigation property called 'DirectReports' with appropiate cardinality. Then in your GET_EMPLOYEE_ENTITYSET you can return the instances of table T_DIRECT_REPORTS (note that navigation property is not empty and you have to read the keys of the parent!).
Once you got this working you can move on to the 'best-practise' and implement the method GET_EXPANDED_ENTITY with filling the expand clauses, which is in my opinion the preferred way as you dont need to implement two seperate methods and is consiered faster as well (if many expands happen).
Both methods of implementation can be called via
GET EmployeeSet('12345678')?$expand=DirectReports

Resources