Monitor Azure Data Lake Store

Monitor Azure Data Lake Store - monitoring

I store data in XML files in Data Lake Store within each folder, like one folder constitutes one source system.
On end of every day, i would like to run some kid of log analytics to find out how many New XML files are stored in Data Lake Store under every folder?. I have enabled Diagnostic Logs and also added OMS Log Analytics Suite.
I would like to know what is the best way to achieve this above report?

It is possible to do some aggregate report (and even create an alert/notification). Using Log Analytics, you can create a query that searches for any instances when a file is written to your Azure Data Lake Store based on either a common root path, or a file naming:
AzureDiagnostics
| where ( ResourceProvider == "MICROSOFT.DATALAKESTORE" )
| where ( OperationName == "create" )
| where ( Path_s contains "/webhdfs/v1/##YOUR PATH##")
Alternatively, the last line, could also be:
| where ( Path_s contains ".xml")
...or a combination of both.
You can then use this query to create an alert that will notify you during a given interval (e.g. every 24 hours) the number of files that were created.
Depending on what you need, you can format the query these ways:
If you use a common file naming, you can find a match where the path contains said file naming.
If you use a common path, you can find a match where the patch matches the common path.
If you want to be notified of all the instances (not just specific ones), you can use an aggregating query, and an alert when a threshold is reached/exceeded (i.e. 1 or more events):
AzureDiagnostics
| where ( ResourceProvider == "MICROSOFT.DATALAKESTORE" )
| where ( OperationName == "create" )
| where ( Path_s contains ".xml")
| summarize AggregatedValue = count(OperationName) by bin(TimeGenerated, 24h), OperationName
With the query, you can create the alert by following the steps in this blog post: https://azure.microsoft.com/en-gb/blog/control-azure-data-lake-costs-using-log-analytics-to-create-service-alerts/.
Let us know if you have more questions or need additional details.

Related

Loading Parameter Table

I am trying to load a parameter table.
I get error messages when opening the Parameter Table and trying to load a txt file (created with Excel and saved as a tab-delimited txt) via Treatmant -> Import Variable Table -> Group.
I tried using the advice given here: How to use table loader in ztree?
But I cannot import the parameter table generated.
The error messages say, e.g.:
Syntax error: line 1 (or above)
Error in period 0; subject 1

Parameter table in z-Tree is a special table and (if I am not mistaken) they are not meant to be exported or imported.
I just assumed you would like to have a special matching structure. (If you are planing to do something else, my answer might not be relevant.)
If you want to manage the Group variable from a file, you can create a table, say MATCHING and load an external file the same way it is described in the post you put the link. For instance something like that:
Period Subject Group
1 1 3
1 2 3
1 3 2
...
2 1 2
2 2 1
2 3 3
and you can add a program (subjects.do) as follows under the background stage:
Group = MATCHING.find(Subject == :Subject & Period == :Period, Group);
Just make sure you define the group for each subject and each period as if the program cannot find a valid entry for the subject and the period, it will create trouble.
Note: If you are using z-Tree 4, it seems that the variables need to be initiated first. This can be done by adding a program under the table. In z-Tree 3, this is not necessary.

Is it better to scan a table, or to create just one hash key?

I'm developing iOS app, using Swift, and I need to be able to get the posts based on how close they are to a certain location, and sort them based on when they were posted.
I know how to check how close one item's post location is to another location, but, my problem is that I need to get all of the posts within a x miles of the user.
Would it be better to scan the table, which as I understand, selects every single value from the database, and then check if every item returned is within x miles of the user? This seems resource-intensive, as I'm expecting for there to be a lot of posts.
Or, would it be better to create another table that has a static hash key, and set two local secondary indexes, one for the latitude, and one for the longitude, and just query for that one static hash key, and query where the latitude is between x and y, and the longitude is between a and b?
The AWS DynamoDB documentation warns against using a hash key that is accessed too much:
But is it really as bad as they make it seem to use the same hash key?
This would be in a table with the following values for a static hash key, where Post ID is the ID of the actual post.
**static hash key** | **latitude (local secondary index)** | **longitude (local secondary index)** | **dateCreated (local secondary index) | **Post ID**
and for the scan option:
**ID** | **latitude ** | **longitude ** | **date created ** | **poster** | **comments** | **flags** | **upvotes** | **downvotes** | **popularity**
Would having a static key be better than scanning a table performance-wise? How about cost-wise?

Follow the guidelines. We are suffering from two problems in production because of bad hashes. So my advice to you is to drop the static hash concept. It won't scale at a reasonable price, and will be a pain to monitor.
Building on top of the scan approach, you can reason about using Global Secondary Indices on the lat/lang attributes.

Trying to match values from one raw file with another raw file values in Lua

First of all: I'm an inexperienced coder and just started reading PiL. I only know a thing or two but I'm fast learning and understanding. This method is really unnecessary but I sort of want to give myself a hard time in order to learn more.
Okay so for testing and for getting to know the language more, I'm trying to grab two different values from two different files and storing them in tables
local gamemap = file.Read("addons/easymap/data/maplist.txt", "GAME")
local mapname = string.Explode( ",", gamemap )
local mapid = file.Read("addons/easymap/data/mapid.txt", "GAME")
local id = string.Explode( ",", mapid )
I'm grabbing two values which in the end are mapname and id
Once I have them, I know that using
for k, v in pairs(mapname)
It will give specific values to the data taken from the file, or at least assign them.
But what I need to do with the both tables is that if there is certain map in the server, check for the value in the table unless the map name is nil and then once having the name, grab the value of that map and match it with the id of the other file.
For example, I have in the maplist.txt file gm_construct and it is the first entry [1] and its corresponding id in mapid.txt lets say it is 54321 and it is also the first entry [1].
But now I must check the server's current map with game.GetMap function, I have that solved and all, I grab the current map, match it with the mapname table and then check for its corresponding value in the id table, which would be gm_construct = 1.
For example it would be something like
local mapdl = game.GetMap()
local match = mapname[mapdl]
if( match != nil )then --supposing the match isn't nil and it is in the table
--grab its table value, lets say it is 1 and match it with the one in the id table
It is a more complex version of this http://pastebin.com/3652J8Pv
I know it is unnecessary but doing this script will give me more options to expand the script further.
TL;DR: I need to find a function that lets me match two values coming from different tables and files, but in the end they are in the same order ([1] = [1]) in both files. Or a way to fetch a full table from another file. I don't know if a table can be loaded globally and then grabbed by another file to use it in that file.
I'm sorry if I'm asking too much, but where I live, if you want to learn to program, you have to do it on your own, no schools have classes or anything similar, at least not until University, and I'm far away from even finishing High School.
Edit: this is intended to be used on Garry's mod. The string.Explode is explained here: http://wiki.garrysmod.com/page/string/Explode
It basically separates phrases by a designated character, in this case, a comma.

Okay. If I understand correctly... You have 2 Files with data.
One with Map Names
gm_construct,
gm_flatgrass,
de_dust2,
ttt_waterworld
And One with IDs, Numbers, Whataver (related to the entries at the same position in the Map Names File
1258,
8592,
1354,
2589
And now you want to find the ID of the current Map, right?
Here is your Function
local function GetCurrentMapID()
-- Get the current map
local cur_map = game.GetMap()
-- Read the Files and Split them
local mapListRaw = file.Read("addons/easymap/data/maplist.txt", "GAME")
local mapList= string.Explode(",", mapListRaw)
local mapIDsRaw = file.Read("addons/easymap/data/mapid.txt", "GAME")
local mapIDs = string.Explode(",", mapIDsRaw)
-- Iterate over the whole map list
for k, v in pairs(mapList) do
-- Until you find the current map
if (v == cur_map) then
-- then return the value from mapIDs which is located at the same key (k)
return mapIDs[k]
end
end
-- Throw a non-breaking error if the current map is not in the Maplist
ErrorNoHalt( "Current map is not registered in the Maplist!\n" )
end
Code could have errors 'cause I couldn't test it. Pls Comment with error if so.
Source: My Experience and the GMod Wiki

Comparing values in two columns of two different Splunk searches

I am new to splunk and facing an issue in comparing values in two columns of two different queries.
Query 1
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" A_to="**" A_from="**" | transaction call_id keepevicted=true | search "xyz event:" | table _time, call_id, A_from, A_to | rename call_id as Call_id, A_from as From, A_to as To
Query 2
index="abc_ndx" source="*/ jkdhgsdjk.log" call_id="**" B_to="**" B_from="**" | transaction call_id keepevicted=true | search " xyz event:"| table _time, call_id, B_from, B_to | rename call_id as Call_id, B_from as From, B_to as To
These are my two different queries. I want to compare each values in A_from column with each values in B_from column and if the value matches, then display the those values of A_from.
Is it possible?
I have run the two queries separately and exported the results of each into csv and used vlookup function. But the problem is there is a limit of max 10000 rows of data which can be exported and so I miss out lots of data as my data search has more than 10000 records.
Any help?

Haven't got any data to test this on at the moment, however, the following should point you in the right direction.
When you have the table for the first query sorted out, you should 'pipe' the search string to an appendcols command with your second search string. This command will allow you to run a subsearch and "import" a columns into you base search.
Once you have the two columns in the same table. You can use the eval command to create a new field which compares the two values and assigns a value as you desire.
Hope this helps.
http://docs.splunk.com/Documentation/Splunk/5.0.2/SearchReference/Appendcols
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval

I'm not sure why there is a need to keep this as two separate queries. Everything is coming from the same sourcetype, and is using almost identical data. So I would do something like the following:
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" (A_to="**" A_from="**") OR (B_to="**" B_from="**")
| transaction call_id keepevicted=true
| search "xyz event:"
| eval to=if(A_from == B_from, A_from, "no_match")
| table _time, call_id, to
This grabs all events from your specified sourcetype and index, which have a call_id, and either A_to and A_from or B_to and B_from. Then it transactions all of that, lets you filter based on the "xyz event:" (Whatever that is)
Then it creates a new field called 'to' which shows A_from when A_from == B_from, otherwise it shows "no_match" (Placeholder since you didn't specify what should be done when they don't match)
There is also a way to potentially tackle this without using transactions. Although without more details into the underlying data, I can't say for sure. The basic idea is that if you have a common field (call_id in this case) you can just use stats to collect values associated with that field instead of an expensive transaction command.
For example:
index="abc_ndx" index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**"
| stats last(_time) as earliest_time first(A_to) as A_to first(A_from) as A_from first(B_to) as B_to first(B_from) as B_from by call_id
Using first() or last() doesn't actually matter if there is only one value per call_id. (You can even use min() max() avg() and you'll get the same thing) Perhaps this will help you get to the output you need more easily.

Reading Informix-SE audit trail log tables

INFORMIX-SQL 7.32 (SE):
I've created an audit trail "a_trx" for my transaction table to know who/when has added or updated rows in this table, with a snapshot of the rows contents. According to documentation, an audit table is created with the same schema of the table being audited, plus the following audit info header columns pre-fixed:
table a_trx
a_type char(2) {record type: aa = added, dd =deleted,
rr = before update image, ww = after update image.}
a_time integer {internal time value.}
a_process_id smallint {Process ID that changed record.}
a_usr_id smallint {User ID that changed record.}
a_rowid integer {Original rowid.}
[...] {Same columns as table being audited.}
So then I proceeded to generate a default perform screen for a_trx, but could not locate a_trx for my table selection. I aborted and ls'd the .dbs directory and did not see a_trx.dat or a_trx.idx, but found a_trx, which appears to be in .dat format, according to
my disk editor utility. Is there any other method for accessing this .dat clone or do I have to trick the engine by renaming it to a_trx.dat, create an .idx companion for it, tweak SYSTABLES, SYSCOLUMNS, etc. to be able to access this audit table like any other table?.. and what is the internal time value of a_time, number of seconds since 12/31/1899?

The audit logs are not C-ISAM files; they are plain log files. IIRC, they are created with '.aud' as a suffix. If you get to choose the suffix, then you would create it with a '.dat' suffix, making sure the name does not conflict with any table name.
You should be able to access them as if they were a table, but you would have to create a table (data file) and the index file to match the augmented schema, and then arrange for the '.aud' file to refer to the same location as the '.dat' file - presumably via a link or possibly a symbolic link. You can specify where the table is stored in the CREATE TABLE statement in SE.
The time is a Unix time stamp - the number of seconds since 1970-01-01T00:00:00Z.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Monitor Azure Data Lake Store - monitoring

Related

Loading Parameter Table

Is it better to scan a table, or to create just one hash key?

Trying to match values from one raw file with another raw file values in Lua

Comparing values in two columns of two different Splunk searches

Reading Informix-SE audit trail log tables

Categories

Resources