Fluentd + azure data explorer cluster - fluentd

I'm working on fluentd setup in kubernetes. In kubernetes I have a number of applications which are writing some logs into stdout. I can filter, parse, and send logs to azure blob storage. But I want the logs from blob storage to be ingested into azure data explorer cluster. In data explorer cluster I have a database and table which has some schema defined, already. The question is how do I modify event from fluentd in such a way that it's going to meet the table schema? Is it possible at all? Maybe there are some alternative ways of creating such setup?

Take a look at ingestion mappings, you can pick the properties that you care about and route them to the applicable columns and when a new property arrives you can change the mapping and the table schema will automatically be updated.

Yes it is possible to do this. You can ingest data stored in your blob to a custom table on azure data explorer. Refer this link
https://learn.microsoft.com/en-us/azure/data-explorer/ingest-json-formats?tabs=kusto-query-language#ingest-mapped-json-records
The below is an example where i ingest a JSON document stored in blob to a table in ADX
.ingest into table Events ('https://kustosamplefiles.blob.core.windows.net/jsonsamplefiles/simple.json') with '{"format":"json", "ingestionMappingReference":"FlatEventMapping"}'
If the schema is difficult to parse, i would recomment to ingest first to a raw table(Source Table). Then you can have a update policy to move this data into different tables after parsing. You can check this link to understand about Update policy

Consider using the ability to listen on blobs landing in storage using the event grid mechanism. Check out https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-event-grid-overview

Related

Using Google Dataflow template, is there simple way to whitelist all tables in a Database instead of passing a comma separated list of all the tables

Using this https://github.com/GoogleCloudPlatform/DataflowTemplates for CDC for MySQL and publishing to google Pub/Sub topic.
In the properties file, there is a provision for whitelistedTables= where you have to give a comma separate list of all the tables you want to monitor for change.
Is there any straightforward way to whitelist an entire database and in turn all tables in it?
Unfortunately the whitelisetedTables parameter does not allow for whitelisting all tables for a given database. However, Dataflow templates are customizable. You can download the code from Github then re-upload your modified version to GCS. Then, you can run your new templated job that allows for this feature. See this prior question: How to Customize GCP Dataflow template?. The code for the Dataflow templates live here: https://github.com/GoogleCloudPlatform/DataflowTemplates.

Finding delete entity calls to Azure storage table

Is there a way to find out if there was any delete entity call to a azure table in last 'N' minutes? Basically my goal is to find all operations that updated the table in last 'N' minutes.
Update: I am looking for a way to do it with a rest api call for a specific table in the storage.
If using Azure Portal an option, you can find this information via Metrics. For example, see the screenshot below
]
Basically here I am taking a sum of all transactions against my table storage where API call was DeleteEntity.
You can find more information about it here: https://learn.microsoft.com/en-us/azure/storage/common/storage-metrics-in-azure-monitor?toc=%2fazure%2fstorage%2fblobs%2ftoc.json.
UPDATE
If you wish to get this information programmatically, I believe you will need to use Azure Monitoring REST API. I looked up the request sent by Portal and it is sending a request to /subscriptions/<my-subscription-id>/resourceGroups/<my-resource-group>/providers/Microsoft.Storage/storageAccounts/<my-storage-account>/tableServices/default/providers/Microsoft.Insights/metrics/Transactions endpoint.
UPDATE 2
For a specific table, the only option I can think of is to fetch the data from Storage Analytics Logs which is stored in $logs blob container and then parse the CSV file manually. You may find these links helpful:
https://learn.microsoft.com/en-us/rest/api/storageservices/storage-analytics-log-format
https://learn.microsoft.com/en-us/rest/api/storageservices/storage-analytics-logged-operations-and-status-messages#logged-operations

Handling more than one db server/different schema in the same server in a single application

As part of one of the requirements in our project we need to connect to a DB based on input in the UI and then fetch the result accordingly. Now this db can be on different db server or in the same db server it can be a different schema.
We are looking to do it in the most efficient way.
One of the ways we have figured out is having the db connection information (like db server, schema etc) in separate properties files. Based on the input from the UI, we pass the input to a db factory that will read the corresponding properties file and will return the corresponding db connection if it already exists, if it doesn't then it will create a new connection and will return it.
We are using spring; we use weblogic for application deployment.
The most efficient way would be to let the respective functions "know" where to look for the requested data. But that's a lot of work in advance.
If the schemes describe similar data (e.g. address data), think about merging the data or the implementation of an frontend/proxy. Both ways would delegate the "looking for data" to the DB-Server(s) which should be able to handle each request way faster and more efficient than any program-logic.

How to handle multiple database accesses?

In my program I have multiple databases. One is fixed and cannot be changed, but there are also some others, the so called user databases.
I thought now I have to start for every database one connection and to connect to each data dictionary. How is it possible to connect to more than one database with one connection by handing over the data dictionary filename? Btw. I am using a local server.
thank you very much,
André
P.S.: Okay I might find the answer to my problem.
The Key word is CreateDDLink. The procedure is connecting to another data dictionary, but before a master dictionary has to be set.
Links may be what you are looking for as you indicated in the question. You can use the API or SQL to create a permanent link alias, or you can dynamically create links on the fly.
I would recomend reviewing this specific help file page: Using Tables from Multiple Data Dictionaries
for a permanent alias (using SQL) look at sp_createlink. You can either create the link to authenticate the current user or set up the link to authenticate as a specific user. Then use the link name in your SQL statements.
select * from linkname.tablename
Or dynamically you can use the following which will authenticate the current user:
select * from "..\dir\otherdd.add".table1
However, links are only available to SQL. If you want to use the table directly (i.e. via a TAdsTable component) you will need to create views. See KB 080519-2034. The KB mentions you can't post updates if the SQL statement for the view results in a static cursor, but you can get around that by creating triggers on the view.

quering an external oracle db in rails application

I have a website which useses a mysql database for its whole operation . But for a new requirement i need to query a external oracle database( used by other component) and compile a list of items and display in a page in the website. How is it possible to connect to a external database just for rendering a single page.
And is it possible to cache the queried result for say 1 month before invalidating the cache and get the updated list of items. i dont want query the external oracle db for each request.
Why not a monthly job that just copies the data from the Oracle database into the MySQL database ?
As stated by Myers, a simple solution is to accept a data feed. For example, a cron job could pull data from the Oracle database at defined intervals, say daily or weekly, and then insert the data into your web application's local MySQL database. The whole process could be essentially transparent to your web application. The caching interval, or how long you go between feeds, would be up to you.
I'll also point out that this could be an opportunity for an API that would more readily support sharing of data between applications. This would, of course, be more work than a simple data feed, but has the possibility of being more useful to more people.

Resources